[Tdwg-guid] Zoobank LSID Service

Fri Sep 22 22:58:16 CEST 2006

Hi Peter,

peter.hollas at thomson.com wrote:
> 2. Is there any standard scheme for LSID discovery? i.e. Would it be a
> good/bad idea to extend the LSID service to allow machine queries of
> LSIDs by taxon name rather than discovering them through the web
> interface?
>
> Any comments and suggestions are very welcome!
>
> Regards, Peter.
>
>   
In my view, LSID is primarily a naming scheme.  While it does allow for 
the resolution of data objects, it was not designed to  support other 
common data access tasks such as discovery, search, query, or harvest.  
To support these additional data access tasks I feel we ought to look to 
other standard or well-established protocols.  Some protocols, like the 
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) are 
well suited to harvesting.  Others, like the W3C's SPARQL protocol are 
well suited to search and query.  Unfortunately there's no single 
protocol for working with RDF data that does everything we need.  The 
problem for data providers is that each additional protocol that they 
are asked to support adds an additional burden on them.  At the same 
time, the easier a provider makes it to access their data, the more 
their data will be used.  However, many data providing organizations 
wish to devote their resources to the creation and curation of data, 
rather than to the implementation of data access protocols.

We're hoping to provide an end-to-end solution for this problem with the 
wasabi project (formally known as DiGIR2).  While the rest of this 
message is a bit of a plug, for wasabi, it also acts to illustrate the 
fact that different stakeholders in a data network have different 
requirements for data access.

The central idea behind wasabi is that data providing organizations 
generate RDF data objects, assign LSIDs to them, and drop them into a 
locally-installed wasabi server.  The wasabi server makes data objects 
available through a variety of standard data access protocols.  It 
supports LSID to metadata resolution through a simple HTTP-get protocol 
and through a plugin to the IBM LSID resolver.  The wasabi server also 
allows data aggregators to use OAI-PMH to efficiently fetch data objects 
in bulk so that they can be indexed for search.  One benefit of the OAI 
protocol, especially the implementation in the wasabi server, is that it 
allows for incremental harvesting.  Because it only sends the data 
objects that have changed since the last harvest, OAI-PMH decreases the 
load on a provider's server and allows for fast indexing of their data.  
Finally the wasabi server provides direct SPARQL query access to data 
objects.  Any time a wasabi server is queried (through OAI, SPARQL, or 
LSID resolution), the access is logged.  This helps data providers keep 
track of who is using their data.

If a provider doesn't want to write a custom program that generates 
RDF/XML they can use the wasabi server's synchronizer program, along 
with a concept mapping configuration file, to periodically connect to 
their database, transform its contents into RDF data objects, and load 
them into the wasabi server.  The wasabi server will keep track of which 
objects are newly added, deleted, or updated.

The RDF server is only one component of the wasabi project.  It also 
includes a  library that implements the client side of the supported 
data access protocols.  This makes it easy for people to write custom 
software to grab data from wasabi servers.  Researchers who want to 
gather large amounts of data for analysis can use the client library to 
simplify the task.

Another important part of the wasabi project is the indexer.  The 
indexer supports harvesting from multiple distributed wasabi servers.  
Harvested data are then pushed into the indexer which can generate 
indices that are designed to support various types of queries.  For 
example, the indexer can use a Google-style inverted index that is well 
suited to full-text queries, a database with geospatial extensions to 
support geographical queries, or even a triple store.  The wasabi 
indexer could be used by large data aggregators like GBIF or by custom 
software developers.  One example of the later are the developers of 
collections management software who might want to index and cache data 
from TCS providers so that their users can associate specimens with 
taxon concepts through an easy to use desktop software package.

Both the client library and the indexer allow computers to access and 
work with data.  The wasabi project also provides an extensible web 
portal.  Built over the client library and an index of  data objects 
harvested from wasabi servers, the portal component allows data to be 
accessed by people through a customizable interface.  It supports  
browsing, searching, and downloading of data objects.  One use for the 
portal component might be as the public face of a thematic data networks 
like FishNet2, MaNIS, OrNIS, or HerpNet.

The wasabi project is free and open source.  It is implemented in Java.  
The server, client library, and indexer portions of the project are now 
in beta and we plan to release them by the end of the year.  The portal 
is still under active development.

I'll be presenting wasabi at the TDWG meeting in St. Louis next month.

-Steve