Hi Rod,
Questioning the value of taxonomic databases while on a TDWG list is a separate discussion...
I think we have to accept that at present there is no unified, curated, up-to-date taxonomic treatment for all life: meaning that in order to retrieve taxonomic information about "any" taxon, we (either as a human client or a remote app) may well need to query more than one taxonomic DB to locate relevant content. So I guess the essence of my question is, can we simplify / standardise things so that such resources can be queried in a standardised way (with only the destination / resource name changing) and, having done so, receive consistently structured responses (whether TCS, DwC, or other). The answer at present appears to be "no" which begs the question of what incentives there are or are not to do so, and thence whether TDWG as the "biodiversity standards" body, has a reason to exist in this space.
The reasons most obvious to me are (1) querying multiple taxonomic data sources in order to build a more complete picture than any one of them can currently supply on its own; (2) comparing different viewpoints or current treatments of a particular taxon between sources of "expertise", bearing in mind that these may differ and between them provide more insight than a single "received view"; (3) providing access to ancillary information / "taxon pages" specific to the data source in question which may for example provide attribute, distribution, literature information associated with the taxa in addition to just the names; and (4) treating the remote information as an expert source which can be queried remotely on demand trather than having to host all the same information locally - in the same way as quering any other remote data source, maintained by relevant experts, may have a place in system design as opposed to hosting everything internally - think Google Maps or whatever - and just returning the subset of information relevant to a particular query at a particular time. In other words we outsource the data collation and ongoing management to someone whose mission (and hopefully resourcing) it is to do this and concentrate on what we can do with the data once received.
I would have thought that none of the above is rocket science and has indeed already been achieved in other domains for example the OGC web mapping services already mentioned, the data standards required by OBIS and GBIF for participation in their data aggregating networks, and so on. What we have here is a parallel "taxonomic information aggregating" activity which similarly would ideally need standards for data interchange if the poor consumer is not to deal with a multiplicity of uncontrolled local data structures and query/response syntaxes. Indeed the parallel with OGC standards is not completely theoretical in that OGC WFS (web feature service) can be adapted to map to taxonomic information (just qwithout the spatial component) without difficulty if only the community could agree on a relevant schema - in other words tools exist already (GeoServer, DeeGree) which could handle the requests/responses I believe, but they have no defined standards to work with unless you roll-your-own...
Just my 2 cents of course... I amagine the "global names" folks and their associates would have more to say on this matter of standardising access to distributed taxonomic data sources.
Regards - Tony
-----Original Message----- From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Saturday, 3 November 2012 4:58 PM To: Rees, Tony (CMAR, Hobart) Cc: J.Kennedy@napier.ac.uk; mdoering@gbif.org; deepreef@bishopmuseum.org; pmurray@anbg.gov.au; eotuama@gbif.org; tdwg-tag@lists.tdwg.org; Pigot, Simon (CMAR, Hobart) Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
Playing devil's advocate I think there are several issues here:
- The example you gave of an OGC query illustrates what for me is a
major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward.
- Taxonomic concepts aren't much use unless connected to data.
Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable.
- I think the standards-first approach tends to put the cart before
the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask?
Regards
Rod
Sent from my iPhone
On 3 Nov 2012, at 03:41, Tony.Rees@csiro.au wrote:
Hi Jessie, also others who have responded thus far,
You said:
I think it would be great if the major databases that describe taxa
(not
just list names) described their data as concepts and allowed people
to
link to their databases when identifying specimens and when
sequencing
etc, this would be the start of a really useful biodiversity
network.
Agreed! And also the databases that "just list names" are dealing
with concepts as we know, comprising a valid name plus all listed synonyms in these cases...
My feeling is the reason that there is not yet any standardization in
this area - every data resource does its own thing using its own home- grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards- compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example
http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&req... t=GetMap&layers=bioreg:CAAB37020002&styles=&bbox=109.0,-44.5,156.5,- 8.5&width=512&height=388&srs=EPSG:4326&format=image/gif
So maybe for either TCS, DwC and so on a missing part of the task is
to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course!
More soon,