[tdwg-content] delimiter characters for concatenated IDs

"Dröge, Gabriele" g.droege at BGBM.ORG
Mon May 5 17:21:29 CEST 2014


Hi everyone,

thanks for your responses, it seems to be a hot topic ;)
I agree a global web service would be fantastic, but would in the first place only work for GBIF records if located at GBIF. Triple IDs change quite often and are not unique worldwide. The OccurrenceID should be unique, but only a few providers are using it in general and as a unique identifier particularly.

If you ask me I am fine with using the triple ID together with access point (BioCASE url or DwC-A url), but unfortunately the relatedResource class in Darwin Core does not allow this. It only contains relatedResourceID. So we don’t have a choice but using a concatenated version of the triple id.

We need to refer from DNA to Tissue to Specimen to whatever else. Any single object/record can be located in different institutions/databases and only the Specimen data are provided to GBIF.

The pipe (|) is used in several Catalogue Numbers and can’t be used. That § cannot be found on English keyboards for me is an advantage, because than it might also not appear that often in the triple ID.

So I think it would be great if we could discuss this at TDWG this year, we (GGBN) need a solution now. So either we build our own or we find a more generic one very soon. I agree that we are quite close to a solution and need a suitable roadmap to realize it.

So I guess we should try to propose another workshop at TDWG if there are still free slots available.

Best,
Gabi


Von: Chuck Miller [mailto:Chuck.Miller at mobot.org]
Gesendet: Montag, 5. Mai 2014 16:58
An: Robert Guralnick
Cc: Markus Döring; Dröge, Gabriele; tdwg-content at lists.tdwg.org; John Deck; tomc at cs.uoregon.edu; Nico Cellinese
Betreff: RE: [tdwg-content] delimiter characters for concatenated IDs

Rob,
The question/debate about “best” GUID is a complex one that appears unending after about 7 years and running.  Is there any aspect of this question that does not have two (or three) sides and proponents (some strong and vocal) on both sides?  We still don’t have community plurality on any “best” approach, much less a majority.  We have a few voices, but we need a chorus.

Chuck

From: robgur at gmail.com<mailto:robgur at gmail.com> [mailto:robgur at gmail.com] On Behalf Of Robert Guralnick
Sent: Monday, May 05, 2014 9:23 AM
To: Chuck Miller
Cc: Markus Döring; Dröge, Gabriele; tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>; John Deck; tomc at cs.uoregon.edu<mailto:tomc at cs.uoregon.edu>; Nico Cellinese
Subject: Re: [tdwg-content] delimiter characters for concatenated IDs


  We've been examining the use (ad mis-use) of the DwC triplet, and how that propagates out of local portals and platforms into other ones.   The end message from this work (and I am happy to share the manuscript and all the datasets we have compiled and examined) is that it is a _terrible_ choice for a global unique identifier.

   There are so many better choices, that don't rely on delimiters or on what is ultimately a non-globally unique, non persistent,  non resolvable choice for a (permanent, resolvable, globally unique) identifier.  As opposed to having this conversation, I wonder why we aren't having one about ALL the other more rational choices...

Best, Rob


On Mon, May 5, 2014 at 8:14 AM, Chuck Miller <Chuck.Miller at mobot.org<mailto:Chuck.Miller at mobot.org>> wrote:
Markus,
Didn’t we reach a general consensus within the last couple of years that the vertical pipe (|) was the preferred concatenation symbol?

Chuck

From: tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org> [mailto:tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org>] On Behalf Of Markus Döring
Sent: Monday, May 05, 2014 8:49 AM
To: "Dröge, Gabriele"
Cc: tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
Subject: Re: [tdwg-content] delimiter characters for concatenated IDs

Hi Gabi,
can you explain a little more what you are trying to do giving an example maybe?

It appears to me you are creating (globally) unique identifiers on the basis of various existing fields which is fine. But when you use the identifier to create resource relations they should be considered opaque and you should not need to parse out the underlying pieces again. So in that scenario the character used to concatenate the triplet does not really matter for the end user as long as its unique and points to some existing resource, indicated by the occurrenceID in case of occurrences or the materialSampleID for samples.

Best,
Markus



On 05 May 2014, at 15:24, Dröge, Gabriele <g.droege at BGBM.ORG<mailto:g.droege at BGBM.ORG>> wrote:

Hi everyone,

I guess there might have been some discussions about proper delimiter characters in the past that I have missed.

In several projects, first of all in GGBN (Global Genome Biodiversity Network, http://www.ggbn.org<http://www.ggbn.org/>), there is a need for making a decision now. We need to reference between different records and databases and within Darwin Core we want to use the relatedResourceID to do so.

During our GGBN workshop at TDWG last year we agreed on concatenating the traditional triple ID (Catalogue Number, Collection Code, Institution Code) and add further parameters if required too (e.g. GUID, access point). We have checked those parameters and can definitely not use a single character as delimiter.

So my question to you is, if there are already some suggestions on using two characters together as delimiters. It would be great if we could find a solution more than one community could agree on.

Otherwise I would like to open the discussion and suggest "\\", "||", "\|", "§|", "§§", or "\§".

Best wishes,
Gabi
-----------------------------------------------------------------
Gabriele Droege
Coordinator - DNA Bank Network
Global Genome Biodiversity Network (GGBN)
Berlin-Dahlem DNA Bank
Women's Officer ZE BGBM

Botanic Garden and Botanical Museum Berlin-Dahlem
Freie Universität Berlin
Koenigin-Luise-Str. 6-8
14195 Berlin
Germany

+49 30 838 50 139<tel:%2B49%2030%20838%2050%20139>

www.dnabank-network.org<http://www.dnabank-network.org/>
www.ggbn.org<http://www.ggbn.org/>
www.bgbm.org<http://www.bgbm.org/>
_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content


_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140505/2e439463/attachment.html 


More information about the tdwg-content mailing list