[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time

Roderic Page r.page at bio.gla.ac.uk
Thu Jan 6 22:48:23 CET 2011


Dear Pete,

Replication is a good point, although it does depend on the source  
being available for at least part of the time (not always a given,  
sadly), plus the copies need to be easily discoverable (ideally  
transparently as far as the user is concerned -- they shouldn't need  
to go hunting). Phil Cryer wrote a nice post about this, proposing a  
solution based on CouchDB, which makes replication pretty trivial. See http://fak3r.com/2009/04/29/resolving-lsids-wit-url-resolvers-and-couchdb/

Regards

Rod

On 6 Jan 2011, at 20:49, Peter DeVries wrote:

> There are too many ideas to respond to at once but there is one  
> issue that is being overblown.
>
> That is this issue of uptime.
>
> If these resources are exposed properly at any given time there  
> should be several copies of any resource available somewhere else in  
> the cloud.
>
> For instance, if EUNIS or Bioimages is down, the data is still  
> available on my SPARQL endpoint.
>
> In the case of EUNIS is should also be available at:
>
> http://linkeddata.uriburner.com/fct/
>
> http://sindice.com/
>
> And probably many other places.
>
> Yes, we all strive for 99.99% uptime, but I had my choice I would  
> rather that these sites spend what little developmental resources  
> they have
> on properly exposing real semantic web identifiers than on a site  
> that has 99.99% uptime but exposed LSID's.
>
> The Linked Data system has build in redundancy, we might as well  
> take advantage of it.
>
> Respectfully,
>
> - Pete
>
> On Thu, Jan 6, 2011 at 10:11 AM, Roderic Page <r.page at bio.gla.ac.uk>  
> wrote:
> Dear Steve,
>
> On 6 Jan 2011, at 15:10, Steve Baskauf wrote:
>
>> A quick comment about ubio.org's LSID resolution:  I took a look at  
>> the complete RDF source and noticed that the namespace declarations  
>> for "ubi:" and "gla:" don't start with "http://".  I'm pretty sure  
>> that isn't kosher RDF, although neither of the RDF validators  
>> complained about it (doesn't give me much confidence in RDF  
>> validators).  So much for being overly optimistic about there being  
>> one source that worked completely. :-)
>
> I don't think RDF namespaces need to be HTTP URIs. There is an  
> expectation by most RDF clients that namespaces are resolvable via  
> HTTP URIs and will return RDF, but this need not be true (some well  
> known namespaces such as http://search.yahoo.com/mrss/ and http://www.georss.org/georss/ 
>  don't resolve to RDF). So, ASFAIK LSIDs are perfectly valid, if  
> "unfriendly".
>
>>
>> I should say that I'm not an advocate of LSIDs.  I actually don't  
>> like them at all.  I simply investigated them in this context as  
>> something that might work in a Linked Data context.  I should also  
>> say that I'm not necessarily an advocate of Linked Data - I'm in  
>> the "wait and see" camp.  I'd like to give it a chance but I'm not  
>> betting the store on it.  I AM an advocate of having identifiers  
>> that are persistent and globally unique and it appears that both  
>> Catalog of Life and IPNI (in my view) flunk the persistence test if  
>> you just look at the URI (LSID with version) as a string (which you  
>> SHOULD be able to do) that can be an unchanging identifier.  We  
>> desperately need persistent and globally unique identifiers for a  
>> lot of things.  If they resolve to RDF/XML then that's a bonus.  We  
>> need an Emperor, clothing is somewhat optional.
>>
>> A few specific responses:
>>
>> Roderic Page wrote:
>>> These issues won't go away simply by replacing LSIDs with HTTP URIs.
>>> Some of the linked data sources go belly up fairly regularly  
>>> (notably http://bio2rdf.org
>>> ).
>>>
>> The reason I see HTTP URIs [capable of producing either HTML or RDF/ 
>> XML through content negotiation] as something of greater value than  
>> LSIDs is that people can at a minimum get a web page out of it.   
>> That's the cake.  If some people can also use them to get RDF for  
>> the Linked Data dream, that's the icing.  The fact that HTTP URI  
>> GUIDs fail to "work" is similar to the fact that regular web URLs  
>> fail to work.  That doesn't stop people from using the web and it  
>> shouldn't stop well designed (i.e. http://www.w3.org/TR/cooluris/  
>> and http://www.w3.org/Provider/Style/URI) URIs from being used as  
>> GUIDs.
>
> I'm not disagreeing at all,  simply saying that replacing LSIDs with  
> HTTP URIs (or anything else) doesn't magic away the issue of keeping  
> the identifiers "live".
>
>>> IPNI supports versioning of LSIDs, but the LSID without the  
>>> version is
>>> also valid (it resolves to the latest version). I think this is a
>>> perfectly valid thing to do. Versioning matters to some, but not
>>> everyone cares about it, IPNI supports both views. If you want to  
>>> cite
>>> a specific version, do so, if not, don't.
>>>
>> OK, I guess as long as people can just leave the version off if  
>> they want, they'll get unchanging strings for their identifiers and  
>> so I guess that moves them into the "usable" column.  But as far as  
>> I'm concerned, putting the versions on adds to the confusion.  If I  
>> can say this without launching an unnecessary thread about the  
>> opacity of GUIDs, nobody is supposed to be looking at any GUID  
>> itself to infer meaning about the identified object.  That "no-no"  
>> seems to be exactly what Catalog of Life and IPNI seem to be  
>> suggesting that people do with the LSID version numbers they are  
>> tacking on.  Get rid of them.
>
> Version numbers are optional and are part of the LSID spec. Given  
> that the unversioned ones work I don't see a huge problem here,  
> especially as the first thing people will say regarding any  
> identifier is "what happens if my data changes?" Versioning is there  
> if you need it, if you don't need it, don't use it.
>
> Opacity is another red herring. Does anybody know any truly opaque  
> identifiers, ones about which I can't infer anything, not even that  
> its an identifier? In point of fact, most real world identifiers are  
> laden with embedded semantics (including checksums, etc.).  
> Furthermore, any recent discussion of good URL structure pretty much  
> converges on making them clean, human-readable, and hackable.
>
>>> The data/metadata distinction is a complete red herring and has  
>>> side-
>>> tracked more LSID conversations than I care to remember. The
>>> distinction stems from LSIDs originally being conceived as  
>>> identifiers
>>> for large data objects (e.g., sequences, images, binary streams)  
>>> that
>>> people want accurately versioned so that could reproduce digital
>>> experiments. In this scenario, metadata can change because it's not
>>> what mattered for this reproducibility. Some people think of data
>>> about  taxonomic names as "metadata", and then we're off on a wild
>>> goose chase about should LSIDs change when metadata changes. I
>>> wouldn't loose any sleep over this (see discussion of IPNI above).
>>>
>> Agreed.  The important thing is that the identifier for a certain  
>> "thing" (however the "thing" is defined) should not change.  I have  
>> previously expressed the opinion that it won't really work to have  
>> any URI (sensu Linked Data) point to "data" and that all URIs  
>> serving as GUIDs should be considered to reference non-information  
>> resources (which by definition have only "metadata").  Under that  
>> scenario, all resource properties are "metadata" and subject to  
>> change.  The data vs. metadata argument then becomes irrelevant.
>>> Regarding DOIs for books, these are uncommon, although there are  
>>> moves
>>> to express ISBNs within the DOI framework, so there may be more of
>>> these. But for a DOI to exist someone has to claim ownership of a
>>> resource and register the DOI with CrossRef. For a lot of older
>>> literature there won't be a publisher around to do this, but that's
>>> where BHL comes in.
>>>
>> Then PLEASE BHL, create simple, unchanging URIs [suitable for use  
>> in the Linked Data world] to identify your resources.  If you don't  
>> want to provide RDF/XML, that's fine for now.  You can wait and see  
>> if that is necessary later.  But at least create URIs that COULD be  
>> used for Linked Data in the future and that are not long and loaded  
>> with "?" and "&" characters.  I don't know how to do that because  
>> I'm not a server dude.  But it apparently isn't that hard to do  
>> with a mod rewrite.  I know there are at least three people  
>> (probably more) on this list who do that routinely - Rod does it  
>> with his bioguid.info site (i.e. "cool" URI http://bioguid.info/doi:10.1093/bib/bbn022 
>>  gets written into "ugly" URL http://bioguid.info/openurl.php?id=doi:10.1093/bib/bbn022 
>>  but nobody cares as long as the "cool" URI doesn't change).
>
> BHL does have simple, clean, stable URIs, such as http://www.biodiversitylibrary.org/page/26246005 
>  for a page, http://www.biodiversitylibrary.org/item/84644 for an  
> item, and so on.
>
> Regards
>
> Rod
>
>
>>
>> My two cents worth...
>> Steve
>>> In summary, we're in a mess, and I don't think this is really down  
>>> to
>>> technology. It's a failure of our community to create the  
>>> appropriate
>>> resources (e.g., centralised, curated resources of identifiers and
>>> associated metadata for names, publications, and specimens).
>>>
>>> Regards
>>>
>>> Rod
>>>
>>>
>>>
>>> On 6 Jan 2011, at 05:32, Steve Baskauf wrote:
>>>
>>>
>>>> Well, I have continued my quest for resolvable, RDF-producing GUIDs
>>>> for
>>>> taxon/name-related stuff.  I have gotten a lot of good information
>>>> from
>>>> reading Rod Page's BMC Bioinformatics paper
>>>> (http://dx.doi.org/10.1186/1471-2105-10-S14-S5) and from  
>>>> investigating
>>>> his http://bioguid.info/ site.
>>>>
>>>> >From the standpoint of the "sec./sensu" part of a TNU/taxon  
>>>> concept,
>>>> based on the recent discussion, it sounds like the DIO solution for
>>>> publications is a good direction to go IF resolution services
>>>> producing
>>>> RDF comes into existence and IF it becomes possible to actually  
>>>> search
>>>> for the DIOs of more obscure publications.  I tried using Rod's  
>>>> site
>>>> to
>>>> look up a journal article using the ISSN, volume, and page and  
>>>> the web
>>>> interface found the DOI and generated RDF just fine.  However, an
>>>> attempt to use the web to find the DIO of Gleason and Cronquist's
>>>> Manual
>>>> of vascular plants of Northeastern United States and adjacent  
>>>> Canada
>>>> failed despite a half hour of effort (I found the UPC, the LOC call
>>>> number, and the ISBN, but no DOI).  Maybe there just isn't a DOI  
>>>> for
>>>> it
>>>> but there should be a way for me to know that.  So DOIs for books  
>>>> and
>>>> old journal articles are not really ready for prime time.
>>>>
>>>> >From the standpoint of the "scientific name" part of a TNU/taxon
>>>> concept, I had better luck (sort of).  Rod's "Status of  
>>>> biodiversity
>>>> services" page (http://www.bioguid.info/status/) was really  
>>>> cool.  I
>>>> saw
>>>> resources I hadn't known about before.  I tried out several of the
>>>> services that claimed to issue LSIDs.
>>>>
>>>> Catalog of Life's LSIDs didn't work with either the
>>>> http://www.bioguid.info/ or http://lsid.tdwg.org/ proxies with
>>>> either a
>>>> web browser or  the OpenLink RDF browser.  I only got an empty RDF
>>>> element in response.
>>>>
>>>> Index Fungorum was down.
>>>>
>>>> IPNI seemed to work.  However, I was somewhat appalled to observe  
>>>> that
>>>> they seem to change the revision identifier any time that they  
>>>> change
>>>> any part of the metadata.  That renders the LSID useless as a
>>>> permanent
>>>> GUID for the name and I believe is inconsistent with the design of
>>>> LSIDs
>>>> where the revision is only supposed to change if the underlying  
>>>> data
>>>> itself (NOT metadata) changes.  (Catalog of Life says that they  
>>>> change
>>>> the revision identifier EACH YEAR for all of their records!  That's
>>>> even
>>>> worse!)  If I'm remembering the TDWG LSID recommendations, it is  
>>>> not
>>>> even recommended to use the revision part of an LSID at all in the
>>>> biodiversity informatics context.
>>>>
>>>> ubio.org's LSIDs seemed to work properly.
>>>>
>>>> [sorry - didn't try zoobank since I was looking for plants]
>>>>
>>>> I don't know which (if any) of the Web sites listed on Rod's status
>>>> page
>>>> use generic HTTP URI guids (rather than LSIDs) to refer to taxon
>>>> names.
>>>> I tried out the Global Names index that Pete was mentioning.  The  
>>>> URI
>>>> version of the UUIDs (e.g.
>>>> http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f)
>>>> do resolve under content negotiation, but the only useful  
>>>> information
>>>> that the RDF representation seems to provide is the actual name  
>>>> string
>>>> that was used to generate the UUID.  Until some other useful linked
>>>> information is added to the RDF, there doesn't seem to be much
>>>> advantage
>>>> in pointing a semantic client to the URI over just using a string
>>>> literal for the name.
>>>>
>>>> So the bottom line is that of the LSID services for names that I've
>>>> tried so far, only ubio.org seems to have LSIDs for names that are
>>>> unchanging, can work as a proxied URI,  and that produce actual  
>>>> useful
>>>> RDF.  That's pretty disappointing given the apparently huge  
>>>> amount of
>>>> work that's been put into building these various systems.
>>>>
>>>> Steve
>>>>
>>>> -- 
>>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>>> Vanderbilt University Dept. of Biological Sciences
>>>>
>>>> postal mail address:
>>>> VU Station B 351634
>>>> Nashville, TN  37235-1634,  U.S.A.
>>>>
>>>> delivery address:
>>>> 2125 Stevenson Center
>>>> 1161 21st Ave., S.
>>>> Nashville, TN 37235
>>>>
>>>> office: 2128 Stevenson Center
>>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>>> http://bioimages.vanderbilt.edu
>>>>
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>>
>>>>
>>> ---------------------------------------------------------
>>> Roderic Page
>>> Professor of Taxonomy
>>> Institute of Biodiversity, Animal Health and Comparative Medicine
>>> College of Medical, Veterinary and Life Sciences
>>> Graham Kerr Building
>>> University of Glasgow
>>> Glasgow G12 8QQ, UK
>>>
>>> Email: r.page at bio.gla.ac.uk
>>> Tel: +44 141 330 4778
>>> Fax: +44 141 330 2792
>>> AIM: rodpage1962 at aim.com
>>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>>> Twitter: http://twitter.com/rdmpage
>>> Blog: http://iphylo.blogspot.com
>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> .
>>>
>>>
>>
>> -- 
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r.page at bio.gla.ac.uk
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> AIM: rodpage1962 at aim.com
> Facebook: http://www.facebook.com/profile.php?id=1112517192
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>
>
>
>
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> -- 
> ---------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
> About the GeoSpecies Knowledge Base
> ------------------------------------------------------------

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html







-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110106/f4002148/attachment-0001.html 


More information about the tdwg-content mailing list