[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time

Roderic Page r.page at bio.gla.ac.uk
Thu Jan 6 09:28:09 CET 2011

Dear Steve,

Yep, the Emperor has no clothes.

I don't think the failure of LSIDs is completely down to LSID-specific  
issues. HTTP URIs in the linked data sense have their own issues, and  
if you follow the technical discussions every so often people get very  
agitated about the (in)famous HTTP 303 redirect solution to  
information versus non-information resources. DOIs are also  
technically non-trivial to implement.

I suspect the failure of LSIDs is due to a combination of issues:

1. Most biodiversity data providers are small, poorly resourced  
operations that can't guarantee 24/7 operations

2. Nobody provides central support/monitoring of LSIDs (i.e., there is  
no CrossRef equivalent where you can report a broken LSID and have a  
reasonable expectation that someone will fix it).

3. The community as a whole doesn't value unique identifiers (if they  
did, we wouldn't be in this mess)

4. There are no LSID consumers. The scientific publishing industry's  
linking infrastructure depends on DOIs, publishers use CrossRef  
services as part of their publishing workflow. There is no equivalent  
for our field (another way of restating #3).

These issues won't go away simply by replacing LSIDs with HTTP URIs.  
Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org 

Regarding specifics, Catalogue of Life LSIDs have been broken for over  
a year (see http://iphylo.blogspot.com/2010/01/thoughts-on-international-year-of.html 
  ). Nobody seems to care (see #3 above).

IPNI supports versioning of LSIDs, but the LSID without the version is  
also valid (it resolves to the latest version). I think this is a  
perfectly valid thing to do. Versioning matters to some, but not  
everyone cares about it, IPNI supports both views. If you want to cite  
a specific version, do so, if not, don't.

Index Fungorum's web services are broken (see #1 above).

The data/metadata distinction is a complete red herring and has side- 
tracked more LSID conversations than I care to remember. The  
distinction stems from LSIDs originally being conceived as identifiers  
for large data objects (e.g., sequences, images, binary streams) that  
people want accurately versioned so that could reproduce digital  
experiments. In this scenario, metadata can change because it's not  
what mattered for this reproducibility. Some people think of data  
about  taxonomic names as "metadata", and then we're off on a wild  
goose chase about should LSIDs change when metadata changes. I  
wouldn't loose any sleep over this (see discussion of IPNI above).

uBio is one of the few success stories of biodiversity informatics  
(props to Dave Remsen and David Patterson for setting this up, and  
people like Anthony Goddard for keeping it running). They've had LSIDs  
running since about 2005, and pretty much the only reason I keep my  
LSID server running is because they use some vocabulary terms I coined  
way back when.

Regarding DOIs for books, these are uncommon, although there are moves  
to express ISBNs within the DOI framework, so there may be more of  
these. But for a DOI to exist someone has to claim ownership of a  
resource and register the DOI with CrossRef. For a lot of older  
literature there won't be a publisher around to do this, but that's  
where BHL comes in.

In summary, we're in a mess, and I don't think this is really down to  
technology. It's a failure of our community to create the appropriate  
resources (e.g., centralised, curated resources of identifiers and  
associated metadata for names, publications, and specimens).



On 6 Jan 2011, at 05:32, Steve Baskauf wrote:

> Well, I have continued my quest for resolvable, RDF-producing GUIDs  
> for
> taxon/name-related stuff.  I have gotten a lot of good information  
> from
> reading Rod Page's BMC Bioinformatics paper
> (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating
> his http://bioguid.info/ site.
> From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
> based on the recent discussion, it sounds like the DIO solution for
> publications is a good direction to go IF resolution services  
> producing
> RDF comes into existence and IF it becomes possible to actually search
> for the DIOs of more obscure publications.  I tried using Rod's site  
> to
> look up a journal article using the ISSN, volume, and page and the web
> interface found the DOI and generated RDF just fine.  However, an
> attempt to use the web to find the DIO of Gleason and Cronquist's  
> Manual
> of vascular plants of Northeastern United States and adjacent Canada
> failed despite a half hour of effort (I found the UPC, the LOC call
> number, and the ISBN, but no DOI).  Maybe there just isn't a DOI for  
> it
> but there should be a way for me to know that.  So DOIs for books and
> old journal articles are not really ready for prime time.
> From the standpoint of the "scientific name" part of a TNU/taxon
> concept, I had better luck (sort of).  Rod's "Status of biodiversity
> services" page (http://www.bioguid.info/status/) was really cool.  I  
> saw
> resources I hadn't known about before.  I tried out several of the
> services that claimed to issue LSIDs.
> Catalog of Life's LSIDs didn't work with either the
> http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with  
> either a
> web browser or  the OpenLink RDF browser.  I only got an empty RDF
> element in response.
> Index Fungorum was down.
> IPNI seemed to work.  However, I was somewhat appalled to observe that
> they seem to change the revision identifier any time that they change
> any part of the metadata.  That renders the LSID useless as a  
> permanent
> GUID for the name and I believe is inconsistent with the design of  
> where the revision is only supposed to change if the underlying data
> itself (NOT metadata) changes.  (Catalog of Life says that they change
> the revision identifier EACH YEAR for all of their records!  That's  
> even
> worse!)  If I'm remembering the TDWG LSID recommendations, it is not
> even recommended to use the revision part of an LSID at all in the
> biodiversity informatics context.
> ubio.org's LSIDs seemed to work properly.
> [sorry - didn't try zoobank since I was looking for plants]
> I don't know which (if any) of the Web sites listed on Rod's status  
> page
> use generic HTTP URI guids (rather than LSIDs) to refer to taxon  
> names.
> I tried out the Global Names index that Pete was mentioning.  The URI
> version of the UUIDs (e.g.
> http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f)
> do resolve under content negotiation, but the only useful information
> that the RDF representation seems to provide is the actual name string
> that was used to generate the UUID.  Until some other useful linked
> information is added to the RDF, there doesn't seem to be much  
> advantage
> in pointing a semantic client to the URI over just using a string
> literal for the name.
> So the bottom line is that of the LSID services for names that I've
> tried so far, only ubio.org seems to have LSIDs for names that are
> unchanging, can work as a proxied URI,  and that produce actual useful
> RDF.  That's pretty disappointing given the apparently huge amount of
> work that's been put into building these various systems.
> Steve
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html

More information about the tdwg-content mailing list