[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time

Steve Baskauf steve.baskauf at vanderbilt.edu
Thu Jan 6 17:35:56 CET 2011


Rod and Joel,
I stand corrected on all points.  Thanks for the clarification.
Steve

Roderic Page wrote:
> Dear Steve,
>
> On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
>
>> A quick comment about ubio.org's LSID resolution:  I took a look at 
>> the complete RDF source and noticed that the namespace declarations 
>> for "ubi:" and "gla:" don't start with "http://".  I'm pretty sure 
>> that isn't kosher RDF, although neither of the RDF validators 
>> complained about it (doesn't give me much confidence in RDF 
>> validators).  So much for being overly optimistic about there being 
>> one source that worked completely. :-)
>
> I don't think RDF namespaces need to be HTTP URIs. There is an 
> expectation by most RDF clients that namespaces are resolvable via 
> HTTP URIs and will return RDF, but this need not be true (some well 
> known namespaces such as http://search.yahoo.com/mrss/ 
> and http://www.georss.org/georss/ don't resolve to RDF). So, ASFAIK 
> LSIDs are perfectly valid, if "unfriendly".
>
>>
>> I should say that I'm not an advocate of LSIDs.  I actually don't 
>> like them at all.  I simply investigated them in this context as 
>> something that might work in a Linked Data context.  I should also 
>> say that I'm not necessarily an advocate of Linked Data - I'm in the 
>> "wait and see" camp.  I'd like to give it a chance but I'm not 
>> betting the store on it.  I AM an advocate of having identifiers that 
>> are persistent and globally unique and it appears that both Catalog 
>> of Life and IPNI (in my view) flunk the persistence test if you just 
>> look at the URI (LSID with version) as a string (which you SHOULD be 
>> able to do) that can be an unchanging identifier.  We desperately 
>> need persistent and globally unique identifiers for a lot of things.  
>> If they resolve to RDF/XML then that's a bonus.  We need an Emperor, 
>> clothing is somewhat optional.
>>
>> A few specific responses:
>>
>> Roderic Page wrote:
>>> These issues won't go away simply by replacing LSIDs with HTTP URIs.  
>>> Some of the linked data sources go belly up fairly regularly (notably http://bio2rdf.org 
>>> ).
>>>   
>> The reason I see HTTP URIs [capable of producing either HTML or 
>> RDF/XML through content negotiation] as something of greater value 
>> than LSIDs is that people can at a minimum get a web page out of it.  
>> That's the cake.  If some people can also use them to get RDF for the 
>> Linked Data dream, that's the icing.  The fact that HTTP URI GUIDs 
>> fail to "work" is similar to the fact that regular web URLs fail to 
>> work.  That doesn't stop people from using the web and it shouldn't 
>> stop well designed (i.e. http://www.w3.org/TR/cooluris/ and 
>> http://www.w3.org/Provider/Style/URI) URIs from being used as GUIDs.
>
> I'm not disagreeing at all,  simply saying that replacing LSIDs with 
> HTTP URIs (or anything else) doesn't magic away the issue of keeping 
> the identifiers "live". 
>
>>> IPNI supports versioning of LSIDs, but the LSID without the version is  
>>> also valid (it resolves to the latest version). I think this is a  
>>> perfectly valid thing to do. Versioning matters to some, but not  
>>> everyone cares about it, IPNI supports both views. If you want to cite  
>>> a specific version, do so, if not, don't.
>>>   
>> OK, I guess as long as people can just leave the version off if they 
>> want, they'll get unchanging strings for their identifiers and so I 
>> guess that moves them into the "usable" column.  But as far as I'm 
>> concerned, putting the versions on adds to the confusion.  If I can 
>> say this without launching an unnecessary thread about the opacity of 
>> GUIDs, nobody is supposed to be looking at any GUID itself to infer 
>> meaning about the identified object.  That "no-no" seems to be 
>> exactly what Catalog of Life and IPNI seem to be suggesting that 
>> people do with the LSID version numbers they are tacking on.  Get rid 
>> of them.
>
> Version numbers are optional and are part of the LSID spec. Given that 
> the unversioned ones work I don't see a huge problem here, especially 
> as the first thing people will say regarding any identifier is "what 
> happens if my data changes?" Versioning is there if you need it, if 
> you don't need it, don't use it. 
>
> Opacity is another red herring. Does anybody know any truly opaque 
> identifiers, ones about which I can't infer anything, not even that 
> its an identifier? In point of fact, most real world identifiers are 
> laden with embedded semantics (including checksums, etc.). 
> Furthermore, any recent discussion of good URL structure pretty much 
> converges on making them clean, human-readable, and hackable. 
>
>>> The data/metadata distinction is a complete red herring and has side- 
>>> tracked more LSID conversations than I care to remember. The  
>>> distinction stems from LSIDs originally being conceived as identifiers  
>>> for large data objects (e.g., sequences, images, binary streams) that  
>>> people want accurately versioned so that could reproduce digital  
>>> experiments. In this scenario, metadata can change because it's not  
>>> what mattered for this reproducibility. Some people think of data  
>>> about  taxonomic names as "metadata", and then we're off on a wild  
>>> goose chase about should LSIDs change when metadata changes. I  
>>> wouldn't loose any sleep over this (see discussion of IPNI above).
>>>   
>> Agreed.  The important thing is that the identifier for a certain 
>> "thing" (however the "thing" is defined) should not change.  I have 
>> previously expressed the opinion that it won't really work to have 
>> any URI (sensu Linked Data) point to "data" and that all URIs serving 
>> as GUIDs should be considered to reference non-information resources 
>> (which by definition have only "metadata").  Under that scenario, all 
>> resource properties are "metadata" and subject to change.  The data 
>> vs. metadata argument then becomes irrelevant.
>>> Regarding DOIs for books, these are uncommon, although there are moves  
>>> to express ISBNs within the DOI framework, so there may be more of  
>>> these. But for a DOI to exist someone has to claim ownership of a  
>>> resource and register the DOI with CrossRef. For a lot of older  
>>> literature there won't be a publisher around to do this, but that's  
>>> where BHL comes in.
>>>   
>> Then PLEASE BHL, create simple, unchanging URIs [suitable for use in 
>> the Linked Data world] to identify your resources.  If you don't want 
>> to provide RDF/XML, that's fine for now.  You can wait and see if 
>> that is necessary later.  But at least create URIs that COULD be used 
>> for Linked Data in the future and that are not long and loaded with 
>> "?" and "&" characters.  I don't know how to do that because I'm not 
>> a server dude.  But it apparently isn't that hard to do with a mod 
>> rewrite.  I know there are at least three people (probably more) on 
>> this list who do that routinely - Rod does it with his bioguid.info 
>> site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 gets 
>> written into "ugly" URL 
>> http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 but nobody 
>> cares as long as the "cool" URI doesn't change). 
>
> BHL does have simple, clean, stable URIs, such 
> as http://www.biodiversitylibrary.org/page/26246005 for a 
> page, http://www.biodiversitylibrary.org/item/84644 for an item, and 
> so on. 
>
> Regards
>
> Rod
>
>
>>
>> My two cents worth...
>> Steve
>>> In summary, we're in a mess, and I don't think this is really down to  
>>> technology. It's a failure of our community to create the appropriate  
>>> resources (e.g., centralised, curated resources of identifiers and  
>>> associated metadata for names, publications, and specimens).
>>>
>>> Regards
>>>
>>> Rod
>>>
>>>
>>>
>>> On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
>>>
>>>   
>>>> Well, I have continued my quest for resolvable, RDF-producing GUIDs  
>>>> for
>>>> taxon/name-related stuff.  I have gotten a lot of good information  
>>>> from
>>>> reading Rod Page's BMC Bioinformatics paper
>>>> (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from investigating
>>>> his http://bioguid.info/ site.
>>>>
>>>> >From the standpoint of the "sec./sensu" part of a TNU/taxon concept,
>>>> based on the recent discussion, it sounds like the DIO solution for
>>>> publications is a good direction to go IF resolution services  
>>>> producing
>>>> RDF comes into existence and IF it becomes possible to actually search
>>>> for the DIOs of more obscure publications.  I tried using Rod's site  
>>>> to
>>>> look up a journal article using the ISSN, volume, and page and the web
>>>> interface found the DOI and generated RDF just fine.  However, an
>>>> attempt to use the web to find the DIO of Gleason and Cronquist's  
>>>> Manual
>>>> of vascular plants of Northeastern United States and adjacent Canada
>>>> failed despite a half hour of effort (I found the UPC, the LOC call
>>>> number, and the ISBN, but no DOI).  Maybe there just isn't a DOI for  
>>>> it
>>>> but there should be a way for me to know that.  So DOIs for books and
>>>> old journal articles are not really ready for prime time.
>>>>
>>>> >From the standpoint of the "scientific name" part of a TNU/taxon
>>>> concept, I had better luck (sort of).  Rod's "Status of biodiversity
>>>> services" page (http://www.bioguid.info/status/) was really cool.  I  
>>>> saw
>>>> resources I hadn't known about before.  I tried out several of the
>>>> services that claimed to issue LSIDs.
>>>>
>>>> Catalog of Life's LSIDs didn't work with either the
>>>> http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with  
>>>> either a
>>>> web browser or  the OpenLink RDF browser.  I only got an empty RDF
>>>> element in response.
>>>>
>>>> Index Fungorum was down.
>>>>
>>>> IPNI seemed to work.  However, I was somewhat appalled to observe that
>>>> they seem to change the revision identifier any time that they change
>>>> any part of the metadata.  That renders the LSID useless as a  
>>>> permanent
>>>> GUID for the name and I believe is inconsistent with the design of  
>>>> LSIDs
>>>> where the revision is only supposed to change if the underlying data
>>>> itself (NOT metadata) changes.  (Catalog of Life says that they change
>>>> the revision identifier EACH YEAR for all of their records!  That's  
>>>> even
>>>> worse!)  If I'm remembering the TDWG LSID recommendations, it is not
>>>> even recommended to use the revision part of an LSID at all in the
>>>> biodiversity informatics context.
>>>>
>>>> ubio.org's LSIDs seemed to work properly.
>>>>
>>>> [sorry - didn't try zoobank since I was looking for plants]
>>>>
>>>> I don't know which (if any) of the Web sites listed on Rod's status  
>>>> page
>>>> use generic HTTP URI guids (rather than LSIDs) to refer to taxon  
>>>> names.
>>>> I tried out the Global Names index that Pete was mentioning.  The URI
>>>> version of the UUIDs (e.g.
>>>> http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f)
>>>> do resolve under content negotiation, but the only useful information
>>>> that the RDF representation seems to provide is the actual name string
>>>> that was used to generate the UUID.  Until some other useful linked
>>>> information is added to the RDF, there doesn't seem to be much  
>>>> advantage
>>>> in pointing a semantic client to the URI over just using a string
>>>> literal for the name.
>>>>
>>>> So the bottom line is that of the LSID services for names that I've
>>>> tried so far, only ubio.org seems to have LSIDs for names that are
>>>> unchanging, can work as a proxied URI,  and that produce actual useful
>>>> RDF.  That's pretty disappointing given the apparently huge amount of
>>>> work that's been put into building these various systems.
>>>>
>>>> Steve
>>>>
>>>> -- 
>>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>>> Vanderbilt University Dept. of Biological Sciences
>>>>
>>>> postal mail address:
>>>> VU Station B 351634
>>>> Nashville, TN  37235-1634,  U.S.A.
>>>>
>>>> delivery address:
>>>> 2125 Stevenson Center
>>>> 1161 21st Ave., S.
>>>> Nashville, TN 37235
>>>>
>>>> office: 2128 Stevenson Center
>>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>>> http://bioimages.vanderbilt.edu
>>>>
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>>
>>>>     
>>>
>>> ---------------------------------------------------------
>>> Roderic Page
>>> Professor of Taxonomy
>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>> College of Medical, Veterinary and Life Sciences
>>> Graham Kerr Building
>>> University of Glasgow
>>> Glasgow G12 8QQ, UK
>>>
>>> Email: r.page at bio.gla.ac.uk
>>> Tel: +44 141 330 4778
>>> Fax: +44 141 330 2792
>>> AIM: rodpage1962 at aim.com
>>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>>> Twitter: http://twitter.com/rdmpage
>>> Blog: http://iphylo.blogspot.com
>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> .
>>>
>>>   
>>
>> -- 
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>     
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r.page at bio.gla.ac.uk <mailto:r.page at bio.gla.ac.uk>
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> AIM: rodpage1962 at aim.com <mailto:rodpage1962 at aim.com>
> Facebook: http://www.facebook.com/profile.php?id=1112517192
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>
>
>
>
>
>
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110106/1888f2e5/attachment.html 


More information about the tdwg-content mailing list