Re: [tdwg-content] delimiter characters for concatenated IDs
Thanks Gregor! I must say I really like this idea for our short term solution! Will discuss this with my team.
Best, Gabi
Von: Gregor Hagedorn [mailto:gregor.hagedorn@mfn-berlin.de] Gesendet: Dienstag, 6. Mai 2014 13:57 An: Dröge, Gabriele Cc: tdwg-content@lists.tdwg.org Betreff: Re: [tdwg-content] delimiter characters for concatenated IDs
On the original question: my preference when creating an identifier for the triplet would be to be transparent, and create the identifer with the dwc terms embedded,
i.e. use institutionCode=BGBM&collectionCode=xxx&catalogNumber=289742874239872
as the "self-coined" identifier string. While "&" may not be unique, the delimiter &collectionCode= is very likely so. And it is transparent, self-documenting what is going on (and I do not mean that people need to parse it - which they could in either case)
Gregor
On 5 May 2014 15:24, "Dröge, Gabriele" <g.droege@bgbm.orgmailto:g.droege@bgbm.org> wrote: Hi everyone,
I guess there might have been some discussions about proper delimiter characters in the past that I have missed.
In several projects, first of all in GGBN (Global Genome Biodiversity Network, http://www.ggbn.org), there is a need for making a decision now. We need to reference between different records and databases and within Darwin Core we want to use the relatedResourceID to do so.
During our GGBN workshop at TDWG last year we agreed on concatenating the traditional triple ID (Catalogue Number, Collection Code, Institution Code) and add further parameters if required too (e.g. GUID, access point). We have checked those parameters and can definitely not use a single character as delimiter.
So my question to you is, if there are already some suggestions on using two characters together as delimiters. It would be great if we could find a solution more than one community could agree on.
Otherwise I would like to open the discussion and suggest "\", "||", "|", "§|", "§§", or "\§".
Best wishes, Gabi ----------------------------------------------------------------- Gabriele Droege Coordinator - DNA Bank Network Global Genome Biodiversity Network (GGBN) Berlin-Dahlem DNA Bank Women's Officer ZE BGBM
Botanic Garden and Botanical Museum Berlin-Dahlem Freie Universität Berlin Koenigin-Luise-Str. 6-8 14195 Berlin Germany
+49 30 838 50 139tel:%2B49%2030%20838%2050%20139
www.dnabank-network.orghttp://www.dnabank-network.org www.ggbn.orghttp://www.ggbn.org www.bgbm.orghttp://www.bgbm.org
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
--------------------------- Dr. Gregor Hagedorn Head of Digital World and Information Science Museum für Naturkunde Berlin Leibniz-Institut für Evolutions- und Biodiversitätsforschung Invalidenstrasse 43, 10115 Berlin +49 (0)30 2093 8576 (work) +49-(0)30-831 5785 (private) gregor.hagedorn@mfn-berlin.demailto:gregor.hagedorn@mfn-berlin.de http://www.naturkundemuseum-berlin.de http://linkedin.com/in/gregorhagedorn
This communication, together with any attachments, is intended only for the person(s) to whom it is addressed. Redistributing or publishing it without permission may be a violation of copyright or privacy rights.
If you're going to go this route, you may also want to think about uri component encoding (JavaScript: encodeURIComponent) the individual identifier components. Then you can use a generic query string parser library to get the components from the string, and can also efficiently embed the identifier as part of the query string in the url.
- Alex
On 05/06/2014 09:45 AM, "Dröge, Gabriele" wrote:
Thanks Gregor! I must say I really like this idea for our short term solution! Will discuss this with my team.
Best,
Gabi
*Von:*Gregor Hagedorn [mailto:gregor.hagedorn@mfn-berlin.de] *Gesendet:* Dienstag, 6. Mai 2014 13:57 *An:* Dröge, Gabriele *Cc:* tdwg-content@lists.tdwg.org *Betreff:* Re: [tdwg-content] delimiter characters for concatenated IDs
On the original question: my preference when creating an identifier for the triplet would be to be transparent, and create the identifer with the dwc terms embedded,
i.e. use
institutionCode=BGBM&collectionCode=xxx&catalogNumber=289742874239872
as the "self-coined" identifier string. While "&" may not be unique, the delimiter
&collectionCode=
is very likely so. And it is transparent, self-documenting what is going on (and I do not mean that people need to parse it - which they could in either case)
Gregor
On 5 May 2014 15:24, "Dröge, Gabriele" <g.droege@bgbm.org mailto:g.droege@bgbm.org> wrote:
Hi everyone,
I guess there might have been some discussions about proper delimiter characters in the past that I have missed.
In several projects, first of all in GGBN (Global Genome Biodiversity Network, http://www.ggbn.org), there is a need for making a decision now. We need to reference between different records and databases and within Darwin Core we want to use the relatedResourceID to do so.
During our GGBN workshop at TDWG last year we agreed on concatenating the traditional triple ID (Catalogue Number, Collection Code, Institution Code) and add further parameters if required too (e.g. GUID, access point). We have checked those parameters and can definitely not use a single character as delimiter.
So my question to you is, if there are already some suggestions on using two characters together as delimiters. It would be great if we could find a solution more than one community could agree on.
Otherwise I would like to open the discussion and suggest "\", "||", "|", "§|", "§§", or "\§".
Best wishes,
Gabi
Gabriele Droege
Coordinator - DNA Bank Network
Global Genome Biodiversity Network (GGBN)
Berlin-Dahlem DNA Bank
Women's Officer ZE BGBM
Botanic Garden and Botanical Museum Berlin-Dahlem
Freie Universität Berlin
Koenigin-Luise-Str. 6-8
14195 Berlin
Germany
+49 30 838 50 139 tel:%2B49%2030%20838%2050%20139
www.dnabank-network.org http://www.dnabank-network.org
www.ggbn.org http://www.ggbn.org
www.bgbm.org http://www.bgbm.org
tdwg-content mailing list tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Dr. Gregor Hagedorn Head of Digital World and Information Science Museum für Naturkunde Berlin Leibniz-Institut für Evolutions- und Biodiversitätsforschung Invalidenstrasse 43, 10115 Berlin +49 (0)30 2093 8576 (work) +49-(0)30-831 5785 (private) gregor.hagedorn@mfn-berlin.de mailto:gregor.hagedorn@mfn-berlin.de http://www.naturkundemuseum-berlin.de http://linkedin.com/in/gregorhagedorn
This communication, together with any attachments, is intended only for the person(s) to whom it is addressed. Redistributing or publishing it without permission may be a violation of copyright or privacy rights.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Gabi,
It’s been raised before, but I’d also like to discourage anyone assembling record identifiers algorithmically.
Should a collection code change in time then one of 2 things happen, neither of which are desirable: a) the record ID changes and all links and caches are broken b) the record ID stays the same, but is confusing to any observer as it contradicts the record content
We see this happen very regularly as GBIF index datasets, and it results in the kind of blogs Rod offers regularly. I would encourage using well curated primary keys, and then offering search services such as http://gabi.de/specimen?catalogNumber=XXX&collecctionCode=YYY to aid discovery.
Cheers, Tim
On 06 May 2014, at 15:45, Dröge, Gabriele g.droege@BGBM.ORG wrote:
Thanks Gregor! I must say I really like this idea for our short term solution! Will discuss this with my team.
Best, Gabi
Von: Gregor Hagedorn [mailto:gregor.hagedorn@mfn-berlin.de] Gesendet: Dienstag, 6. Mai 2014 13:57 An: Dröge, Gabriele Cc: tdwg-content@lists.tdwg.org Betreff: Re: [tdwg-content] delimiter characters for concatenated IDs
On the original question: my preference when creating an identifier for the triplet would be to be transparent, and create the identifer with the dwc terms embedded,
i.e. use institutionCode=BGBM&collectionCode=xxx&catalogNumber=289742874239872
as the "self-coined" identifier string. While "&" may not be unique, the delimiter &collectionCode= is very likely so. And it is transparent, self-documenting what is going on (and I do not mean that people need to parse it - which they could in either case)
Gregor
On 5 May 2014 15:24, "Dröge, Gabriele" g.droege@bgbm.org wrote: Hi everyone,
I guess there might have been some discussions about proper delimiter characters in the past that I have missed.
In several projects, first of all in GGBN (Global Genome Biodiversity Network, http://www.ggbn.org), there is a need for making a decision now. We need to reference between different records and databases and within Darwin Core we want to use the relatedResourceID to do so.
During our GGBN workshop at TDWG last year we agreed on concatenating the traditional triple ID (Catalogue Number, Collection Code, Institution Code) and add further parameters if required too (e.g. GUID, access point). We have checked those parameters and can definitely not use a single character as delimiter.
So my question to you is, if there are already some suggestions on using two characters together as delimiters. It would be great if we could find a solution more than one community could agree on.
Otherwise I would like to open the discussion and suggest "\", "||", "|", "§|", "§§", or "\§".
Best wishes, Gabi
Gabriele Droege Coordinator - DNA Bank Network Global Genome Biodiversity Network (GGBN) Berlin-Dahlem DNA Bank Women's Officer ZE BGBM
Botanic Garden and Botanical Museum Berlin-Dahlem Freie Universität Berlin Koenigin-Luise-Str. 6-8 14195 Berlin Germany
+49 30 838 50 139
www.dnabank-network.org www.ggbn.org www.bgbm.org
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Dr. Gregor Hagedorn Head of Digital World and Information Science Museum für Naturkunde Berlin Leibniz-Institut für Evolutions- und Biodiversitätsforschung Invalidenstrasse 43, 10115 Berlin +49 (0)30 2093 8576 (work) +49-(0)30-831 5785 (private) gregor.hagedorn@mfn-berlin.de http://www.naturkundemuseum-berlin.de http://linkedin.com/in/gregorhagedorn
This communication, together with any attachments, is intended only for the person(s) to whom it is addressed. Redistributing or publishing it without permission may be a violation of copyright or privacy rights. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Oh my gosh, Tim you made my day! Have you tried the link you've created? Scroll down below "Kontakt"... :)
Von: Tim Robertson [GBIF] [mailto:trobertson@gbif.org] Gesendet: Dienstag, 6. Mai 2014 16:26 An: Dröge, Gabriele Cc: Gregor Hagedorn; tdwg-content@lists.tdwg.org Betreff: Re: [tdwg-content] delimiter characters for concatenated IDs
Hi Gabi,
It's been raised before, but I'd also like to discourage anyone assembling record identifiers algorithmically.
Should a collection code change in time then one of 2 things happen, neither of which are desirable: a) the record ID changes and all links and caches are broken b) the record ID stays the same, but is confusing to any observer as it contradicts the record content
We see this happen very regularly as GBIF index datasets, and it results in the kind of blogs Rod offers regularly. I would encourage using well curated primary keys, and then offering search services such as http://gabi.de/specimen?catalogNumber=XXX&collecctionCode=YYY to aid discovery.
Cheers, Tim
On 06 May 2014, at 15:45, Dröge, Gabriele <g.droege@BGBM.ORGmailto:g.droege@BGBM.ORG> wrote:
Thanks Gregor! I must say I really like this idea for our short term solution! Will discuss this with my team.
Best, Gabi
Von: Gregor Hagedorn [mailto:gregor.hagedorn@mfn-berlin.de] Gesendet: Dienstag, 6. Mai 2014 13:57 An: Dröge, Gabriele Cc: tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Betreff: Re: [tdwg-content] delimiter characters for concatenated IDs
On the original question: my preference when creating an identifier for the triplet would be to be transparent, and create the identifer with the dwc terms embedded,
i.e. use institutionCode=BGBM&collectionCode=xxx&catalogNumber=289742874239872
as the "self-coined" identifier string. While "&" may not be unique, the delimiter &collectionCode= is very likely so. And it is transparent, self-documenting what is going on (and I do not mean that people need to parse it - which they could in either case)
Gregor
On 5 May 2014 15:24, "Dröge, Gabriele" <g.droege@bgbm.orgmailto:g.droege@bgbm.org> wrote: Hi everyone,
I guess there might have been some discussions about proper delimiter characters in the past that I have missed.
In several projects, first of all in GGBN (Global Genome Biodiversity Network, http://www.ggbn.orghttp://www.ggbn.org/), there is a need for making a decision now. We need to reference between different records and databases and within Darwin Core we want to use the relatedResourceID to do so.
During our GGBN workshop at TDWG last year we agreed on concatenating the traditional triple ID (Catalogue Number, Collection Code, Institution Code) and add further parameters if required too (e.g. GUID, access point). We have checked those parameters and can definitely not use a single character as delimiter.
So my question to you is, if there are already some suggestions on using two characters together as delimiters. It would be great if we could find a solution more than one community could agree on.
Otherwise I would like to open the discussion and suggest "\", "||", "|", "§|", "§§", or "\§".
Best wishes, Gabi ----------------------------------------------------------------- Gabriele Droege Coordinator - DNA Bank Network Global Genome Biodiversity Network (GGBN) Berlin-Dahlem DNA Bank Women's Officer ZE BGBM
Botanic Garden and Botanical Museum Berlin-Dahlem Freie Universität Berlin Koenigin-Luise-Str. 6-8 14195 Berlin Germany
+49 30 838 50 139tel:%2B49%2030%20838%2050%20139
www.dnabank-network.orghttp://www.dnabank-network.org/ www.ggbn.orghttp://www.ggbn.org/ www.bgbm.orghttp://www.bgbm.org/
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- --------------------------- Dr. Gregor Hagedorn Head of Digital World and Information Science Museum für Naturkunde Berlin Leibniz-Institut für Evolutions- und Biodiversitätsforschung Invalidenstrasse 43, 10115 Berlin +49 (0)30 2093 8576 (work) +49-(0)30-831 5785 (private) gregor.hagedorn@mfn-berlin.demailto:gregor.hagedorn@mfn-berlin.de http://www.naturkundemuseum-berlin.dehttp://www.naturkundemuseum-berlin.de/ http://linkedin.com/in/gregorhagedorn
This communication, together with any attachments, is intended only for the person(s) to whom it is addressed. Redistributing or publishing it without permission may be a violation of copyright or privacy rights. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
participants (3)
-
"Dröge, Gabriele"
-
Alex Thompson
-
Tim Robertson [GBIF]