[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time

joel sachs jsachs at csee.umbc.edu
Thu Jan 6 16:31:32 CET 2011

On Thu, 6 Jan 2011, Steve Baskauf wrote:

> A quick comment about ubio.org's LSID resolution:  I took a look at the 
> complete RDF source and noticed that the namespace declarations for "ubi:" 
> and "gla:" don't start with "http://".  I'm pretty sure that isn't kosher 
> RDF,

It is kosher. Linked Data best practice is to use only http URIs, but RDF 
resources can be identified by any type of URI reference [1]. In my wild 
idea talk in Montpellier, I advocated incorporating non-http URIs 
(specifically LSIDs) into Linked Data. Tim Finin and I later fleshed this 
idea out slightly [2]. Backtracking, I'm not sure lsids *should* be 
part of Linked Data, but I'm certain that they *can* be.


1. http://www.w3.org/TR/rdf-schema/
2. http://ebiquity.umbc.edu/paper/html/id/485/What-Does-it-Mean-for-a-URI-to-Resolve-

> although neither of the RDF validators complained about it (doesn't give 
> me much confidence in RDF validators).  So much for being overly optimistic 
> about there being one source that worked completely. :-)
> I should say that I'm not an advocate of LSIDs.  I actually don't like them 
> at all.  I simply investigated them in this context as something that might 
> work in a Linked Data context.  I should also say that I'm not necessarily an 
> advocate of Linked Data - I'm in the "wait and see" camp.  I'd like to give 
> it a chance but I'm not betting the store on it.  I AM an advocate of having 
> identifiers that are persistent and globally unique and it appears that both 
> Catalog of Life and IPNI (in my view) flunk the persistence test if you just 
> look at the URI (LSID with version) as a string (which you SHOULD be able to 
> do) that can be an unchanging identifier.  We desperately need persistent and 
> globally unique identifiers for a lot of things.  If they resolve to RDF/XML 
> then that's a bonus.  We need an Emperor, clothing is somewhat optional.
> A few specific responses:
> Roderic Page wrote:
>> These issues won't go away simply by replacing LSIDs with HTTP URIs.  Some 
>> of the linked data sources go belly up fairly regularly (notably 
>> http://bio2rdf.org ).
> The reason I see HTTP URIs [capable of producing either HTML or RDF/XML 
> through content negotiation] as something of greater value than LSIDs is that 
> people can at a minimum get a web page out of it.  That's the cake.  If some 
> people can also use them to get RDF for the Linked Data dream, that's the 
> icing.  The fact that HTTP URI GUIDs fail to "work" is similar to the fact 
> that regular web URLs fail to work.  That doesn't stop people from using the 
> web and it shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/ 
> and http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
>> IPNI supports versioning of LSIDs, but the LSID without the version is 
>> also valid (it resolves to the latest version). I think this is a 
>> perfectly valid thing to do. Versioning matters to some, but not  everyone 
>> cares about it, IPNI supports both views. If you want to cite  a specific 
>> version, do so, if not, don't.
> OK, I guess as long as people can just leave the version off if they want, 
> they'll get unchanging strings for their identifiers and so I guess that 
> moves them into the "usable" column.  But as far as I'm concerned, putting 
> the versions on adds to the confusion.  If I can say this without launching 
> an unnecessary thread about the opacity of GUIDs, nobody is supposed to be 
> looking at any GUID itself to infer meaning about the identified object. 
> That "no-no" seems to be exactly what Catalog of Life and IPNI seem to be 
> suggesting that people do with the LSID version numbers they are tacking on. 
> Get rid of them.
>> The data/metadata distinction is a complete red herring and has side- 
>> tracked more LSID conversations than I care to remember. The  distinction 
>> stems from LSIDs originally being conceived as identifiers  for large data 
>> objects (e.g., sequences, images, binary streams) that  people want 
>> accurately versioned so that could reproduce digital  experiments. In this 
>> scenario, metadata can change because it's not  what mattered for this 
>> reproducibility. Some people think of data  about  taxonomic names as 
>> "metadata", and then we're off on a wild  goose chase about should LSIDs 
>> change when metadata changes. I  wouldn't loose any sleep over this (see 
>> discussion of IPNI above).
> Agreed.  The important thing is that the identifier for a certain "thing" 
> (however the "thing" is defined) should not change.  I have previously 
> expressed the opinion that it won't really work to have any URI (sensu Linked 
> Data) point to "data" and that all URIs serving as GUIDs should be considered 
> to reference non-information resources (which by definition have only 
> "metadata").  Under that scenario, all resource properties are "metadata" and 
> subject to change.  The data vs. metadata argument then becomes irrelevant.
>> Regarding DOIs for books, these are uncommon, although there are moves  to 
>> express ISBNs within the DOI framework, so there may be more of  these. But 
>> for a DOI to exist someone has to claim ownership of a  resource and 
>> register the DOI with CrossRef. For a lot of older  literature there won't 
>> be a publisher around to do this, but that's  where BHL comes in.
> Then PLEASE BHL, create simple, unchanging URIs [suitable for use in the 
> Linked Data world] to identify your resources.  If you don't want to provide 
> RDF/XML, that's fine for now.  You can wait and see if that is necessary 
> later.  But at least create URIs that COULD be used for Linked Data in the 
> future and that are not long and loaded with "?" and "&" characters.  I don't 
> know how to do that because I'm not a server dude.  But it apparently isn't 
> that hard to do with a mod rewrite.  I know there are at least three people 
> (probably more) on this list who do that routinely - Rod does it with his 
> bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 
> gets written into "ugly" URL 
> http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody cares as 
> long as the "cool" URI doesn't change). 
> My two cents worth...
> Steve
>> In summary, we're in a mess, and I don't think this is really down to 
>> technology. It's a failure of our community to create the appropriate 
>> resources (e.g., centralised, curated resources of identifiers and 
>> associated metadata for names, publications, and specimens).
>> Regards
>> Rod
>> On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
>>> Well, I have continued my quest for resolvable, RDF-producing GUIDs  for
>>> taxon/name-related stuff.  I have gotten a lot of good information  from
>>> reading Rod Page's BMC Bioinformatics paper
>>> (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating
>>> his http://bioguid.info/ site.
>>> >From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
>>> based on the recent discussion, it sounds like the DIO solution for
>>> publications is a good direction to go IF resolution services  producing
>>> RDF comes into existence and IF it becomes possible to actually search
>>> for the DIOs of more obscure publications.  I tried using Rod's site  to
>>> look up a journal article using the ISSN, volume, and page and the web
>>> interface found the DOI and generated RDF just fine.  However, an
>>> attempt to use the web to find the DIO of Gleason and Cronquist's  Manual
>>> of vascular plants of Northeastern United States and adjacent Canada
>>> failed despite a half hour of effort (I found the UPC, the LOC call
>>> number, and the ISBN, but no DOI).  Maybe there just isn't a DOI for  it
>>> but there should be a way for me to know that.  So DOIs for books and
>>> old journal articles are not really ready for prime time.
>>> >From the standpoint of the "scientific name" part of a TNU/taxon
>>> concept, I had better luck (sort of).  Rod's "Status of biodiversity
>>> services" page (http://www.bioguid.info/status/) was really cool.  I  saw
>>> resources I hadn't known about before.  I tried out several of the
>>> services that claimed to issue LSIDs.
>>> Catalog of Life's LSIDs didn't work with either the
>>> http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with  either a
>>> web browser or  the OpenLink RDF browser.  I only got an empty RDF
>>> element in response.
>>> Index Fungorum was down.
>>> IPNI seemed to work.  However, I was somewhat appalled to observe that
>>> they seem to change the revision identifier any time that they change
>>> any part of the metadata.  That renders the LSID useless as a  permanent
>>> GUID for the name and I believe is inconsistent with the design of  LSIDs
>>> where the revision is only supposed to change if the underlying data
>>> itself (NOT metadata) changes.  (Catalog of Life says that they change
>>> the revision identifier EACH YEAR for all of their records!  That's  even
>>> worse!)  If I'm remembering the TDWG LSID recommendations, it is not
>>> even recommended to use the revision part of an LSID at all in the
>>> biodiversity informatics context.
>>> ubio.org's LSIDs seemed to work properly.
>>> [sorry - didn't try zoobank since I was looking for plants]
>>> I don't know which (if any) of the Web sites listed on Rod's status  page
>>> use generic HTTP URI guids (rather than LSIDs) to refer to taxon  names.
>>> I tried out the Global Names index that Pete was mentioning.  The URI
>>> version of the UUIDs (e.g.
>>> http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f)
>>> do resolve under content negotiation, but the only useful information
>>> that the RDF representation seems to provide is the actual name string
>>> that was used to generate the UUID.  Until some other useful linked
>>> information is added to the RDF, there doesn't seem to be much  advantage
>>> in pointing a semantic client to the URI over just using a string
>>> literal for the name.
>>> So the bottom line is that of the LSID services for names that I've
>>> tried so far, only ubio.org seems to have LSIDs for names that are
>>> unchanging, can work as a proxied URI,  and that produce actual useful
>>> RDF.  That's pretty disappointing given the apparently huge amount of
>>> work that's been put into building these various systems.
>>> Steve
>>> -- 
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> ---------------------------------------------------------
>> Roderic Page
>> Professor of Taxonomy
>> Institute of Biodiversity, Animal Health and Comparative Medicine
>> College of Medical, Veterinary and Life Sciences
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QQ, UK
>> Email: r.page at bio.gla.ac.uk
>> Tel: +44 141 330 4778
>> Fax: +44 141 330 2792
>> AIM: rodpage1962 at aim.com
>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>> Twitter: http://twitter.com/rdmpage
>> Blog: http://iphylo.blogspot.com
>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> .
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu

More information about the tdwg-content mailing list