[tdwg-tag] SourceForge LSID project websites broken - role for TDWG? [SEC=UNCLASSIFIED]

Wed Apr 15 17:16:41 CEST 2009

Perhaps it would help (me at least)  if we separated two meanings of  
"multiple identiifers":

1. Multiple identifiers for the same object, e.g. "urn:lsid:zoobank.org:pub:2C6BD020-B54A-4119-9693-3231C9FCEFA6 
" and "doi: 10.3897/zookeys.7.111"

2. Multiple forms of the same identifier, e.g. "urn:lsid:zoobank.org:pub:2C6BD020-B54A-4119-9693-3231C9FCEFA6 
" and "http://bioguid.info/urn:lsid:zoobank.org:pub:2C6BD020-B54A-4119-9693-3231C9FCEFA6 
"

I assume that you mean #2, multiple forms of the same identifier?

Personally I would argue that the RDF should always contain a  
canonical, un-proxied version of an identifier (whether LSID or DOI),  
because:

1. having only the proxied version assumes that there is only one  
suitable proxy (there may be multiple ones)

2. it assumes that the specified proxy will always exist (our track  
record in durable HTTP services is poor)

3. assumes the specified proxy will always match conform to current  
standards

4. it imposes an overhead on clients that want the canonical  
identifier (i.e., they have to strip away the proxy)

I predict that for any meaningful, successful (i.e., widely adopted)  
identifier there will be multiple services that will be capable of  
consuming that identifier, not just HTTP proxies. DOIs can be proxied  
(by several servers, including http://dx.doi.org/ and http://hdl.handle.net 
  ), resolved using OpenURL resolvers, etc.

In order to play ball with Linked Data, there are several ways forward:

1. Always refer to LSIDs in their proxied form (see above for reasons  
why this might not be a good idea)

2. Ensure that at least one proxy exists which can resolve LSIDs in a  
linked data friendly way (see http://bioguid.info as an example)

3. Use/develop linked data clients that understand LSIDs (e.g., http://linkeddata.uriburner.com/ 
, see http://linkeddata.uriburner.com/about/html/urn:lsid:zoobank.org:pub:2C6BD020-B54A-4119-9693-3231C9FCEFA6 
  )

2 and 3 already exist, so I'm not so keen on 1.

For me this is the biggest hurdle faced by HTTP URIs -- I have to  
choose one. As an analogy, I can identify a book using an ISBN (say,  
0226644677). How do I represent this in RDF? Well, I could use an HTTP  
URI, say http://www.amazon.com/Tangled-Trees-Phylogeny-Cospeciation-Coevolution/dp/0226644677/ 
  , or maybe http://www.worldcat.org/isbn/0226644677 . There are many,  
many I could choose from (see http://en.wikipedia.org/wiki/Special:BookSources/0226644677 
  ). However, so long as I know that the ISBN is 0226644677, I'm free  
to use whatever URI best suits my needs.

Imagine, for example, a publisher such as PLoS or Magnolia Press  
(publisher of Zootaxa). They might want to display LSIDs linked to  
their own LSID resolver that embellishes the metadata with information  
they have (e.g., they might wish to highlight links to other content  
they host). In a sense this is much the same idea as supported by  
OpenURL COinS (http://ocoins.info/), where OpenURL-format metadata is  
embedded in a HTML document and the user choose what resolver to use  
to resolve the links. Having LSIDs prefixed with a HTTP proxy makes  
this task a little harder.

Rod

On 15 Apr 2009, at 15:00, greg whitbread wrote:

> I like this compromise Rod.  But I think that it is time to give up on
> the concept of multiple identifiers.
>
> The existing TDWG recommendation that
> '5. All references to LSIDs within RDF documents should use the
> proxified form', basically states that LSID will never appear in any
> way other than bundled into an http URI  - if we are also to publish
> data as RDF.
>
> That sounds as if it means that those wanting to use LSID resolution
> will first have to extract the LSID part from the http URI which will
> now appear everywhere we would expect to find our unique identifier.
>
> Donald has presented a strong case for unique identifiers conforming
> to the LSID specification but we have now an equally strong case that
> in its http form our identifier must behave as a dereferenceable URN
> per W3C linked data recommendations.
>
> Acceptance of this requirement will impact on existing services
> expecting RDF back from the proxy service without content negotiation
> and require update of TDWG GUID policy and recommendations so it is
> important that we try to get this sorted here as soon as possible.
>
> greg
>
> 2009/4/10 Roderic Page <r.page at bio.gla.ac.uk>:
>> In a rare attempt at being constructive, here are a few thoughts.
>>
>> LSIDs and linked data
>> =================
>>
>> If adoption of LSIDs proceeds, then we should make efforts to see  
>> that
>> they play nicely with Linked Data efforts. For example, a HTTP
>> resolver would need to support 303 redirects and content negotiation.
>> This help avoid us creating our own ghetto, but still exploit  
>> whatever
>> advantages LSIDs have.
>>
>> Roger set up something along these lines to handle Biological
>> Collections Index (BioCol) LSIDs. There is a nice tool at http://validator.linkeddata.org/
>>  to check whether a URI behaves as Linked Data tools expect. Sadly
>> the proxied BioCol LSIDs (e.g., http://biocol.org/urn:lsid:biocol.org:col:15670
>>  ) don't validate, but this should be easy to fix. The TDWG resolver
>> similarly fails.
>>
>> I've implemented a simple resolver at bioGUID that returns either raw
>> RDF or a clumsily formatted HTML version of the XML, but which passes
>> the http://validator.linkeddata.org tests. An example URI is http://bioguid.info/urn:lsid:indexfungorum.org:names:213649
>> , which validates http://tinyurl.com/cgje5n
>>
>> So, my first recommendation is to ensure that a TDWG HTTP proxy  
>> passes http://validator.linkeddata.org/
>>  . This means we can play with the Semantic Web crowd with LSIDs.
>>
>> Note that getting HTTP URIs to play with Linked Data is not trivial,
>> so whatever technology we adopt we'll need clear guidelines as to how
>> to use it. As an aside bioGUID can make DOIs play nice as well (they
>> don't by default), and Kinglsey Idehen http://www.openlinksw.com/ 
>> blog/
>> ~kidehen/ of OpenLink Software is supporting LSIDs in the Linked Data
>> tools he's developing.
>>
>> Ontology
>> =======
>>
>> As part of my experiment to wikify taxonomic names, literature, etc.,
>> I've been playing with the TDWG vocabularies. I've a few grizzles,  
>> but
>> in general they've been really useful, and I think these will be key
>> (as Donald and Lee have emphasised).
>>
>> Service
>> ======
>>
>> Ironically one of the examples Lee listed when defending the TDWG's
>> resolver (urn:lsid:gdb.org:GenomicSegment:GDB132938) seems to have
>> disappeared (I think TDWG has a cached copy). This raises the ongoing
>> problem of service availability. TDWG's resolver could help here, in
>> that could be used to generate reports on service quality and notify
>> providers when something's wrong. Whatever GUID technology adopted
>> this will be an issue, and the challenge is to build tools and
>> mechanisms to manage this.
>>
>> Funding
>> ======
>>
>> I've nothing useful to say here, other than to suggest that clearly
>> the integration of biodiversity data sales pitch hasn't (yet?)
>> succeeded. I think us techies get it, but we've not made that vision
>> real or compelling. If we had, I think we'd have institutions falling
>> over themselves to ensure the infrastructure exists and persists.
>> Naive, I know, but we could ask why we haven't managed to convince
>> those with the purse strings that this stuff matters.
>>
>> One quick and dirty way that might help is if the TDWG LSID resolver
>> stored all the metadata in the LSIDs it resolves in a triple store  
>> and
>> exposes a SPARQL query interface to that metadata. We could then  
>> start
>> to look for interesting links between data.
>>
>> Regards
>>
>> Rod
>>
>>
>> ---------------------------------------------------------
>> Roderic Page
>> Professor of Taxonomy
>> DEEB, FBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QQ, UK
>>
>> Email: r.page at bio.gla.ac.uk
>> Tel: +44 141 330 4778
>> Fax: +44 141 330 2792
>> AIM: rodpage1962 at aim.com
>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>> Twitter: http://twitter.com/rdmpage
>> Blog: http://iphylo.blogspot.com
>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>
>>
>>
>> ------
>> If you have received this transmission in error please notify us  
>> immediately by return e-mail and delete all copies. If this e-mail  
>> or any attachments have been sent to you in error, that error does  
>> not constitute waiver of any confidentiality, privilege or  
>> copyright in respect of information in the e-mail or attachments.
>>
>>
>>
>> Please consider the environment before printing this email.
>>
>> ------
>>
>>
>
>
>
> -- 
> Greg Whitbread
> Australian National Botanic Gardens
> Australian National Herbarium
> +61 2 62509482
> ghw at anbg.gov.au
>

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html