Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG?

9 Apr 2009

      In a rare attempt at being constructive, here are a few thoughts.

LSIDs and linked data
=================

If adoption of LSIDs proceeds, then we should make efforts to see that  
they play nicely with Linked Data efforts. For example, a HTTP  
resolver would need to support 303 redirects and content negotiation.  
This help avoid us creating our own ghetto, but still exploit whatever  
advantages LSIDs have.

Roger set up something along these lines to handle Biological  
Collections Index (BioCol) LSIDs. There is a nice tool at http://validator.linkeddata.org/ 
  to check whether a URI behaves as Linked Data tools expect. Sadly  
the proxied BioCol LSIDs (e.g., http://biocol.org/urn:lsid:biocol.org:col:15670 
  ) don't validate, but this should be easy to fix. The TDWG resolver  
similarly fails.

I've implemented a simple resolver at bioGUID that returns either raw  
RDF or a clumsily formatted HTML version of the XML, but which passes  
the http://validator.linkeddata.org tests. An example URI is http://bioguid.info/urn:lsid:indexfungorum.org:names:213649 
, which validates http://tinyurl.com/cgje5n

So, my first recommendation is to ensure that a TDWG HTTP proxy passes http://validator.linkeddata.org/ 
  . This means we can play with the Semantic Web crowd with LSIDs.

Note that getting HTTP URIs to play with Linked Data is not trivial,  
so whatever technology we adopt we'll need clear guidelines as to how  
to use it. As an aside bioGUID can make DOIs play nice as well (they  
don't by default), and Kinglsey Idehen http://www.openlinksw.com/blog/ 
~kidehen/ of OpenLink Software is supporting LSIDs in the Linked Data  
tools he's developing.

Ontology
=======

As part of my experiment to wikify taxonomic names, literature, etc.,  
I've been playing with the TDWG vocabularies. I've a few grizzles, but  
in general they've been really useful, and I think these will be key  
(as Donald and Lee have emphasised).

Service
======

Ironically one of the examples Lee listed when defending the TDWG's  
resolver (urn:lsid:gdb.org:GenomicSegment:GDB132938) seems to have  
disappeared (I think TDWG has a cached copy). This raises the ongoing  
problem of service availability. TDWG's resolver could help here, in  
that could be used to generate reports on service quality and notify  
providers when something's wrong. Whatever GUID technology adopted  
this will be an issue, and the challenge is to build tools and  
mechanisms to manage this.

Funding
======

I've nothing useful to say here, other than to suggest that clearly  
the integration of biodiversity data sales pitch hasn't (yet?)  
succeeded. I think us techies get it, but we've not made that vision  
real or compelling. If we had, I think we'd have institutions falling  
over themselves to ensure the infrastructure exists and persists.  
Naive, I know, but we could ask why we haven't managed to convince  
those with the purse strings that this stuff matters.

One quick and dirty way that might help is if the TDWG LSID resolver  
stored all the metadata in the LSIDs it resolves in a triple store and  
exposes a SPARQL query interface to that metadata. We could then start  
to look for interesting links between data.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page@bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962@aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html