[tdwg-content] delimiter characters for concatenated IDs

Bob Morris morris.bob at gmail.com
Mon May 5 18:46:06 CEST 2014


Chuck

Hilmar is not proposing a service for management of all identifiers,
he is proposing discovery of existing, preferably resolvable and
dereferanceable,  identifiers based on queries for specimen record
metadata such as  DwC triplets, together with minting of resolvable
ones when none is discoverable.  Except on performance grounds---and
possibly not even then--- this does not even require all the
discoverable identifiers be held on the same machine as the proposed
service is hosted, nor even on a single machine at all.

Hilmar's proposal,  which I concur is useful and simple to accomplish,
is independent of the quality, syntax, specification or utility of the
returned identifiers, all of which are much argued in this thread and
in this list from the beginning of time.  Producing such a service is
not beyond the skills required for an assignment in an undergraduate
software engineering course and certainly could be accomplished in a
few days' hackathon such as Hilmar proposes.  As with any discovery
service, its ultimate utility depends on the minters promoting
underlying discoverability of the identifiers themselves.  But that
too is fairly trivial and well-understood, e.g. by the listing of them
in resolvers' SiteMaps in published ways that major spiders can find
and index them.  An example is [1].

[1] Sitemap Formats and Guidelines
https://support.google.com/webmasters/answer/183668?hl=en

On Mon, May 5, 2014 at 10:54 AM, Chuck Miller <Chuck.Miller at mobot.org> wrote:
> Hilmar,
>
> A “global” resolver that manages globally unique resolvable identifiers for
> every single specimen record in the world (billions?) as a web-service
> should be operated by a hosting facility with a global charter and globally
> funded resources.  That is the definition of GBIF to my understanding.  What
> other specimen/observation repository has greater critical mass to “mint”
> and maintain GUIDs for all the world?
>
>
>
> Chuck
>
>
>
> From: hilmar.lapp at gmail.com [mailto:hilmar.lapp at gmail.com] On Behalf Of
> Hilmar Lapp
> Sent: Monday, May 05, 2014 9:47 AM
> To: Robert Guralnick
> Cc: Chuck Miller; tdwg-content at lists.tdwg.org; John Deck;
> tomc at cs.uoregon.edu
>
>
> Subject: Re: [tdwg-content] delimiter characters for concatenated IDs
>
>
>
> I couldn't agree more.
>
>
>
> I would also ask why there still isn't a global resolver as a web-service
> that takes specimen metadata as input (such as the DwC triplet) and returns
> globally unique resolvable identifiers, minting them if necessary. If the
> technologically savvy people of this community came together, this could be
> built at least as a prototype in a couple of days. As I've suggested to
> iDigBio before, they could hold a hackathon on this, commit to hosting and
> further developing the outcome, and the problem would be solved once and for
> all. It would arguably be fully within their mandate.
>
>
>
> If instead of the many workshops that have been held on talking about the
> problem we as a community would finally will ourselves to actually solving
> it, that part really isn't so difficult.
>
>
>
>   -hilmar
>
>
>
> On Mon, May 5, 2014 at 10:23 AM, Robert Guralnick
> <Robert.Guralnick at colorado.edu> wrote:
>
>
>
>   We've been examining the use (ad mis-use) of the DwC triplet, and how that
> propagates out of local portals and platforms into other ones.   The end
> message from this work (and I am happy to share the manuscript and all the
> datasets we have compiled and examined) is that it is a _terrible_ choice
> for a global unique identifier.
>
>
>
>    There are so many better choices, that don't rely on delimiters or on
> what is ultimately a non-globally unique, non persistent,  non resolvable
> choice for a (permanent, resolvable, globally unique) identifier.  As
> opposed to having this conversation, I wonder why we aren't having one about
> ALL the other more rational choices...
>
>
>
> Best, Rob
>
>
>
>
>
> On Mon, May 5, 2014 at 8:14 AM, Chuck Miller <Chuck.Miller at mobot.org> wrote:
>
> Markus,
>
> Didn’t we reach a general consensus within the last couple of years that the
> vertical pipe (|) was the preferred concatenation symbol?
>
>
>
> Chuck
>
>
>
> From: tdwg-content-bounces at lists.tdwg.org
> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Markus Döring
> Sent: Monday, May 05, 2014 8:49 AM
> To: "Dröge, Gabriele"
> Cc: tdwg-content at lists.tdwg.org
> Subject: Re: [tdwg-content] delimiter characters for concatenated IDs
>
>
>
> Hi Gabi,
>
> can you explain a little more what you are trying to do giving an example
> maybe?
>
>
>
> It appears to me you are creating (globally) unique identifiers on the basis
> of various existing fields which is fine. But when you use the identifier to
> create resource relations they should be considered opaque and you should
> not need to parse out the underlying pieces again. So in that scenario the
> character used to concatenate the triplet does not really matter for the end
> user as long as its unique and points to some existing resource, indicated
> by the occurrenceID in case of occurrences or the materialSampleID for
> samples.
>
>
>
> Best,
>
> Markus
>
>
>
>
>
>
>
> On 05 May 2014, at 15:24, Dröge, Gabriele <g.droege at BGBM.ORG> wrote:
>
>
>
> Hi everyone,
>
>
>
> I guess there might have been some discussions about proper delimiter
> characters in the past that I have missed.
>
>
>
> In several projects, first of all in GGBN (Global Genome Biodiversity
> Network, http://www.ggbn.org), there is a need for making a decision now. We
> need to reference between different records and databases and within Darwin
> Core we want to use the relatedResourceID to do so.
>
>
>
> During our GGBN workshop at TDWG last year we agreed on concatenating the
> traditional triple ID (Catalogue Number, Collection Code, Institution Code)
> and add further parameters if required too (e.g. GUID, access point). We
> have checked those parameters and can definitely not use a single character
> as delimiter.
>
>
>
> So my question to you is, if there are already some suggestions on using two
> characters together as delimiters. It would be great if we could find a
> solution more than one community could agree on.
>
>
>
> Otherwise I would like to open the discussion and suggest "\\", "||", "\|",
> "§|", "§§", or "\§".
>
>
>
> Best wishes,
>
> Gabi
>
> -----------------------------------------------------------------
>
> Gabriele Droege
>
> Coordinator - DNA Bank Network
>
> Global Genome Biodiversity Network (GGBN)
>
> Berlin-Dahlem DNA Bank
>
> Women's Officer ZE BGBM
>
>
>
> Botanic Garden and Botanical Museum Berlin-Dahlem
>
> Freie Universität Berlin
>
> Koenigin-Luise-Str. 6-8
>
> 14195 Berlin
>
> Germany
>
>
>
> +49 30 838 50 139
>
>
>
> www.dnabank-network.org
>
> www.ggbn.org
>
> www.bgbm.org
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
>
> --
>
> Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>



-- 
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390


Filtered Push Project
Harvard University Herbaria
Harvard University

email: morris.bob at gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.


More information about the tdwg-content mailing list