[tdwg-content] delimiter characters for concatenated IDs

Mon May 5 19:29:26 CEST 2014

Some quick points.

1. Unless I’m mistaken, there seems to be some conflation of two separate questions, namely:

(a) what is the best delimiter to use when concatenating strings to make an identifier (e.g., the Darwin Core Triplet)

(b) what is the best delimiter to use when putting multiple values in the same field (which is what I think Chuck is referring to when he recommends the pipe symbol “|”)

I think Gabriele is asking (a).

Gabriele wants a solution “now”. If it’s simply a case of a convention to create a identifier string that may or may not have any meaning or persistence, then any solution will do. You could follow what the NCBI have done, for example, and use Darwin Core Triplets such as YPM:MAM:140180 (which can be resolved, at present at least,  at http://peabody.research.yale.edu/cgi-bin/Query.Ledger?LE=mam&SU=0&ID=14080 ) This solves things for “now”, but what about tomorrow?

Contrary to Hilmar, there is more to this than simply a quick hackathon. Yes, a service that takes metadata and returns one or more identifiers is a good idea and easy to create (there will often be more than one because museum codes are not unique). But who maintains this service? Who maintains the identifiers? Who do I complain to if they break? How do we ensure that they persist when, say, a museum closes down, moves its collection, changes it’s web technology? Who provides the tools that add value to the identifiers? (there’s no point having them if they are not useful)

We have an obvious role model for how to do this stuff well, and that is CrossRef. Regardless of what you think of DOIs, CrossRef is a model of exactly the sort of thing we need. There’s more infrastructure here than simply a look-up service. As an aside, the lookup based on metadata idea is equivalent to OpenURL in the bibliographic world, which was all the rage for a while until people wanted something better, and now we have DOIs. We seem determined to reinvent the painful steps others have taken, rather than learn from others and actually solve the problem. It really is painful to watch.

Yes, GBIF is the obvious place to centralise a lot of this, but it would require that GBIF can maintain stable ids for specimens. So far it can’t do this, principally because it relies on metadata provided by museums and herbaria to recognise whether a record is new or an existing one, and, guess what, the metadata keeps changing :(

If you want GBIF to do this, are you happy that every specimen gets a GBIF URL?  If not, what are you going to suggest? Perhaps every institution mints it’s own URL, say like Peabody has above. Anyone want to place a bet on how long http://peabody.research.yale.edu/cgi-bin/Query.Ledger?LE=mam&SU=0&ID=14080 is going to survive as a resolvable URL? Anybody know how I can get machine readable data from that URL? Some botanical institutions are minting fairly clean looking URLs, but how do we discover these? How do we find URLs for every collection?

My prediction is that eventually we will learn from the experience of academic publishers, who went through pretty much all of this hurt a decade ago, facing pretty much exactly the same issues (although in their case even more pressing because actual money was at stake), and who came up with a solution that has clearly worked (in their case DOIs plus CrossRef services). We will finally realise that this requires resources, and that it requires thinking strategically about what we want (what’s the bigger picture?) rather than relying on local, small-scale, half-baked solutions. Until then, we can’t have nice shiny things.

Regards

Rod