Re: [tdwg-tag] Any TCS users with experiences to report?
Playing devil's advocate I think there are several issues here:
1. The example you gave of an OGC query illustrates what for me is a major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward.
2. Taxonomic concepts aren't much use unless connected to data. Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable.
3. I think the standards-first approach tends to put the cart before the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask?
Regards
Rod
Sent from my iPhone
On 3 Nov 2012, at 03:41, Tony.Rees@csiro.au wrote:
Hi Jessie, also others who have responded thus far,
You said:
I think it would be great if the major databases that describe taxa (not just list names) described their data as concepts and allowed people to link to their databases when identifying specimens and when sequencing etc, this would be the start of a really useful biodiversity network.
Agreed! And also the databases that "just list names" are dealing with concepts as we know, comprising a valid name plus all listed synonyms in these cases...
My feeling is the reason that there is not yet any standardization in this area - every data resource does its own thing using its own home-grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards-compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example
http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&req...
So maybe for either TCS, DwC and so on a missing part of the task is to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course!
More soon,
Hi Rod,
Questioning the value of taxonomic databases while on a TDWG list is a separate discussion...
I think we have to accept that at present there is no unified, curated, up-to-date taxonomic treatment for all life: meaning that in order to retrieve taxonomic information about "any" taxon, we (either as a human client or a remote app) may well need to query more than one taxonomic DB to locate relevant content. So I guess the essence of my question is, can we simplify / standardise things so that such resources can be queried in a standardised way (with only the destination / resource name changing) and, having done so, receive consistently structured responses (whether TCS, DwC, or other). The answer at present appears to be "no" which begs the question of what incentives there are or are not to do so, and thence whether TDWG as the "biodiversity standards" body, has a reason to exist in this space.
The reasons most obvious to me are (1) querying multiple taxonomic data sources in order to build a more complete picture than any one of them can currently supply on its own; (2) comparing different viewpoints or current treatments of a particular taxon between sources of "expertise", bearing in mind that these may differ and between them provide more insight than a single "received view"; (3) providing access to ancillary information / "taxon pages" specific to the data source in question which may for example provide attribute, distribution, literature information associated with the taxa in addition to just the names; and (4) treating the remote information as an expert source which can be queried remotely on demand trather than having to host all the same information locally - in the same way as quering any other remote data source, maintained by relevant experts, may have a place in system design as opposed to hosting everything internally - think Google Maps or whatever - and just returning the subset of information relevant to a particular query at a particular time. In other words we outsource the data collation and ongoing management to someone whose mission (and hopefully resourcing) it is to do this and concentrate on what we can do with the data once received.
I would have thought that none of the above is rocket science and has indeed already been achieved in other domains for example the OGC web mapping services already mentioned, the data standards required by OBIS and GBIF for participation in their data aggregating networks, and so on. What we have here is a parallel "taxonomic information aggregating" activity which similarly would ideally need standards for data interchange if the poor consumer is not to deal with a multiplicity of uncontrolled local data structures and query/response syntaxes. Indeed the parallel with OGC standards is not completely theoretical in that OGC WFS (web feature service) can be adapted to map to taxonomic information (just qwithout the spatial component) without difficulty if only the community could agree on a relevant schema - in other words tools exist already (GeoServer, DeeGree) which could handle the requests/responses I believe, but they have no defined standards to work with unless you roll-your-own...
Just my 2 cents of course... I amagine the "global names" folks and their associates would have more to say on this matter of standardising access to distributed taxonomic data sources.
Regards - Tony
-----Original Message----- From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Saturday, 3 November 2012 4:58 PM To: Rees, Tony (CMAR, Hobart) Cc: J.Kennedy@napier.ac.uk; mdoering@gbif.org; deepreef@bishopmuseum.org; pmurray@anbg.gov.au; eotuama@gbif.org; tdwg-tag@lists.tdwg.org; Pigot, Simon (CMAR, Hobart) Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
Playing devil's advocate I think there are several issues here:
- The example you gave of an OGC query illustrates what for me is a
major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward.
- Taxonomic concepts aren't much use unless connected to data.
Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable.
- I think the standards-first approach tends to put the cart before
the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask?
Regards
Rod
Sent from my iPhone
On 3 Nov 2012, at 03:41, Tony.Rees@csiro.au wrote:
Hi Jessie, also others who have responded thus far,
You said:
I think it would be great if the major databases that describe taxa
(not
just list names) described their data as concepts and allowed people
to
link to their databases when identifying specimens and when
sequencing
etc, this would be the start of a really useful biodiversity
network.
Agreed! And also the databases that "just list names" are dealing
with concepts as we know, comprising a valid name plus all listed synonyms in these cases...
My feeling is the reason that there is not yet any standardization in
this area - every data resource does its own thing using its own home- grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards- compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example
http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&req... t=GetMap&layers=bioreg:CAAB37020002&styles=&bbox=109.0,-44.5,156.5,- 8.5&width=512&height=388&srs=EPSG:4326&format=image/gif
So maybe for either TCS, DwC and so on a missing part of the task is
to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course!
More soon,
Hi Tony,
A few quick comments.
Querying multiple sources on the fly ("federation") seems to me to be doomed to fail. I tried it in 2005 with the now defunct "Taxonomic Search Engine" and the performance hit of multiple HTTP requests, multiple, changeable interfaces and variable up time of the source databases made it hard work. I think at the scale we operation centralisation is the way forward. The arguments against centralisation tend to boil down to the interests of the data providers outweighing those of the users, which is a bad thing.
We have a very large, centralised taxonomy, namely the GBIF classification (it's easily the biggest around), itself based on an aggregation of lots of taxonomies. Why not focus on making that the best documented classification we can? There are mechanisms (such as GitHub) that we could use to enable people to download it, improve it, fork it if they wish, and so on. GBIF has names connected to actual data, and data that arguably is useful outside taxonomy, so it would seem a sensible place to focus resources. If not GBIF, then who?
There is, however, one major problem with GBIF, and indeed most other classifications. They bear little relationship to evolutionary history, especially at deeper levels (it doesn't help that there isn't a "tree of life"). In one sense this is fine, as I think we need to keep phylogeny and classification separated otherwise we conflate two rather different things. But we do need to integrate evolutionary information. The NCBI classification will continue to grow and be central to organising genomic information, therefore we need a mapping between GBIF and NCBI. Much of this will be done via names, but a lot won't, and will rely on other links, such as specimens. We also need to integrate phylogenies themselves, which is a different challenge. Unless we deal with genomics and phylogenetics the taxonomic database community risks being even more marginalised.
My own feeling is that we've spent a lot of time fussing with standards, etc., without working out what would be the best landscape for the people who use taxonomic information. IMHO we should be building a Google for biodiversity information. Until we do, we're basically just mucking about.
Regards
Rod
On 5 Nov 2012, at 00:33, Tony.Rees@csiro.au Tony.Rees@csiro.au wrote:
Hi Rod,
Questioning the value of taxonomic databases while on a TDWG list is a separate discussion...
I think we have to accept that at present there is no unified, curated, up-to-date taxonomic treatment for all life: meaning that in order to retrieve taxonomic information about "any" taxon, we (either as a human client or a remote app) may well need to query more than one taxonomic DB to locate relevant content. So I guess the essence of my question is, can we simplify / standardise things so that such resources can be queried in a standardised way (with only the destination / resource name changing) and, having done so, receive consistently structured responses (whether TCS, DwC, or other). The answer at present appears to be "no" which begs the question of what incentives there are or are not to do so, and thence whether TDWG as the "biodiversity standards" body, has a reason to exist in this space.
The reasons most obvious to me are (1) querying multiple taxonomic data sources in order to build a more complete picture than any one of them can currently supply on its own; (2) comparing different viewpoints or current treatments of a particular taxon between sources of "expertise", bearing in mind that these may differ and between them provide more insight than a single "received view"; (3) providing access to ancillary information / "taxon pages" specific to the data source in question which may for example provide attribute, distribution, literature information associated with the taxa in addition to just the names; and (4) treating the remote information as an expert source which can be queried remotely on demand trather than having to host all the same information locally - in the same way as quering any other remote data source, maintained by relevant experts, may have a place in system design as opposed to hosting everything internally - think Google Maps or whatever - and just returning the subset of information relevant to a particular query at a particular time. In other words we outsource the data collation and ongoing management to someone whose mission (and hopefully resourcing) it is to do this and concentrate on what we can do with the data once received.
I would have thought that none of the above is rocket science and has indeed already been achieved in other domains for example the OGC web mapping services already mentioned, the data standards required by OBIS and GBIF for participation in their data aggregating networks, and so on. What we have here is a parallel "taxonomic information aggregating" activity which similarly would ideally need standards for data interchange if the poor consumer is not to deal with a multiplicity of uncontrolled local data structures and query/response syntaxes. Indeed the parallel with OGC standards is not completely theoretical in that OGC WFS (web feature service) can be adapted to map to taxonomic information (just qwithout the spatial component) without difficulty if only the community could agree on a relevant schema - in other words tools exist already (GeoServer, DeeGree) which could handle the requests/responses I believe, but they have no defined standards to work with unless you roll-your-own...
Just my 2 cents of course... I amagine the "global names" folks and their associates would have more to say on this matter of standardising access to distributed taxonomic data sources.
Regards - Tony
-----Original Message----- From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Saturday, 3 November 2012 4:58 PM To: Rees, Tony (CMAR, Hobart) Cc: J.Kennedy@napier.ac.uk; mdoering@gbif.org; deepreef@bishopmuseum.org; pmurray@anbg.gov.au; eotuama@gbif.org; tdwg-tag@lists.tdwg.org; Pigot, Simon (CMAR, Hobart) Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
Playing devil's advocate I think there are several issues here:
- The example you gave of an OGC query illustrates what for me is a
major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward.
- Taxonomic concepts aren't much use unless connected to data.
Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable.
- I think the standards-first approach tends to put the cart before
the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask?
Regards
Rod
Sent from my iPhone
On 3 Nov 2012, at 03:41, Tony.Rees@csiro.au wrote:
Hi Jessie, also others who have responded thus far,
You said:
I think it would be great if the major databases that describe taxa
(not
just list names) described their data as concepts and allowed people
to
link to their databases when identifying specimens and when
sequencing
etc, this would be the start of a really useful biodiversity
network.
Agreed! And also the databases that "just list names" are dealing
with concepts as we know, comprising a valid name plus all listed synonyms in these cases...
My feeling is the reason that there is not yet any standardization in
this area - every data resource does its own thing using its own home- grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards- compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example
http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&req... t=GetMap&layers=bioreg:CAAB37020002&styles=&bbox=109.0,-44.5,156.5,- 8.5&width=512&height=388&srs=EPSG:4326&format=image/gif
So maybe for either TCS, DwC and so on a missing part of the task is
to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course!
More soon,
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
Hi Rod,
Really it is up to the major players / DBs who currently provide web services to respond to this, but I will add a few comments in passing:
[Rod P.:]
Querying multiple sources on the fly ("federation") seems to me to be doomed to fail. I tried it in 2005 with the now defunct "Taxonomic Search Engine" and the performance hit of multiple HTTP requests, multiple, changeable interfaces and variable up time of the source databases made it hard work. I think at the scale we operation centralisation is the way forward. The arguments against centralisation tend to boil down to the interests of the data providers outweighing those of the users, which is a bad thing.
[Tony:] I encountered similar problems trying to run real-time distributed data queries within the OBIS system in the early 2000's. On the other hand a number of the data contributors were small(-ish) players, often in museums, without a very well resourced or robust infrastructure for data publishing. The same should hopefully not apply to the "major players" being addressed here such as Catalogue of Life, ITIS, NCBI, GBIF, Australian NSLs and so on.
Interestingly the web mapping clients I mentioned earlier maintain an entirely distributed query model (since no one portal has the capacity to host all the data locally) which in the main, seems to work fairly well as most contributors perhaps take their data publishing obligations a bit more seriously - e.g. test that they work, and commit to maintaining "up" services as far as reasonably possible.
[Rod P.:]
We have a very large, centralised taxonomy, namely the GBIF classification (it's easily the biggest around), itself based on an aggregation of lots of taxonomies. Why not focus on making that the best documented classification we can? There are mechanisms (such as GitHub) that we could use to enable people to download it, improve it, fork it if they wish, and so on. GBIF has names connected to actual data, and data that arguably is useful outside taxonomy, so it would seem a sensible place to focus resources. If not GBIF, then who?
[Tony:] Well, there is some territory/overlap to deal with here, since as well as GBIF, other players would presumably like to claim the high ground in this space, most notably Catalogue of Life, GNA and others (NCBI, wikispecies, The Plant List/IPNI for plants...) - at present there is no obvious"one stop shop". Also the Open Tree of Life project (OTTOL) is building its own "master taxonomy" using some of the same sources as GBIF, as I understand, possibly also with a view to being user editable (?)
Ultimately, if as you contend there is no market for unifying or even providing web services from distributed systems, why would the following exist (presumably with efforts to maintain them and continue to develop them), such as:
http://www.itis.gov/web_service.html http://webservice.catalogueoflife.org/ https://www.anbg.gov.au/confluence/display/bdv/NSL+Services http://www.marinespecies.org/aphia.php?p=webservice
and so on?
- Tony (waiting other relevant persons to chime in here, maybe...)
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Monday, 5 November 2012 8:17 PM To: Rees, Tony (CMAR, Hobart) Cc: J.Kennedy@napier.ac.uk; mdoering@gbif.org; deepreef@bishopmuseum.org; pmurray@anbg.gov.au; eotuama@gbif.org; tdwg-tag@lists.tdwg.org; Pigot, Simon (CMAR, Hobart) Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
Hi Tony,
A few quick comments.
Querying multiple sources on the fly ("federation") seems to me to be doomed to fail. I tried it in 2005 with the now defunct "Taxonomic Search Engine" and the performance hit of multiple HTTP requests, multiple, changeable interfaces and variable up time of the source databases made it hard work. I think at the scale we operation centralisation is the way forward. The arguments against centralisation tend to boil down to the interests of the data providers outweighing those of the users, which is a bad thing.
We have a very large, centralised taxonomy, namely the GBIF classification (it's easily the biggest around), itself based on an aggregation of lots of taxonomies. Why not focus on making that the best documented classification we can? There are mechanisms (such as GitHub) that we could use to enable people to download it, improve it, fork it if they wish, and so on. GBIF has names connected to actual data, and data that arguably is useful outside taxonomy, so it would seem a sensible place to focus resources. If not GBIF, then who?
There is, however, one major problem with GBIF, and indeed most other classifications. They bear little relationship to evolutionary history, especially at deeper levels (it doesn't help that there isn't a "tree of life"). In one sense this is fine, as I think we need to keep phylogeny and classification separated otherwise we conflate two rather different things. But we do need to integrate evolutionary information. The NCBI classification will continue to grow and be central to organising genomic information, therefore we need a mapping between GBIF and NCBI. Much of this will be done via names, but a lot won't, and will rely on other links, such as specimens. We also need to integrate phylogenies themselves, which is a different challenge. Unless we deal with genomics and phylogenetics the taxonomic database community risks being even more marginalised.
My own feeling is that we've spent a lot of time fussing with standards, etc., without working out what would be the best landscape for the people who use taxonomic information. IMHO we should be building a Google for biodiversity information. Until we do, we're basically just mucking about.
Regards
Rod
On 5 Nov 2012, at 00:33, <Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au> <Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au> wrote:
Hi Rod,
Questioning the value of taxonomic databases while on a TDWG list is a separate discussion...
I think we have to accept that at present there is no unified, curated, up-to-date taxonomic treatment for all life: meaning that in order to retrieve taxonomic information about "any" taxon, we (either as a human client or a remote app) may well need to query more than one taxonomic DB to locate relevant content. So I guess the essence of my question is, can we simplify / standardise things so that such resources can be queried in a standardised way (with only the destination / resource name changing) and, having done so, receive consistently structured responses (whether TCS, DwC, or other). The answer at present appears to be "no" which begs the question of what incentives there are or are not to do so, and thence whether TDWG as the "biodiversity standards" body, has a reason to exist in this space.
The reasons most obvious to me are (1) querying multiple taxonomic data sources in order to build a more complete picture than any one of them can currently supply on its own; (2) comparing different viewpoints or current treatments of a particular taxon between sources of "expertise", bearing in mind that these may differ and between them provide more insight than a single "received view"; (3) providing access to ancillary information / "taxon pages" specific to the data source in question which may for example provide attribute, distribution, literature information associated with the taxa in addition to just the names; and (4) treating the remote information as an expert source which can be queried remotely on demand trather than having to host all the same information locally - in the same way as quering any other remote data source, maintained by relevant experts, may have a place in system design as opposed to hosting everything internally - think Google Maps or whatever - and just returning the subset of information relevant to a particular query at a particular time. In other words we outsource the data collation and ongoing management to someone whose mission (and hopefully resourcing) it is to do this and concentrate on what we can do with the data once received.
I would have thought that none of the above is rocket science and has indeed already been achieved in other domains for example the OGC web mapping services already mentioned, the data standards required by OBIS and GBIF for participation in their data aggregating networks, and so on. What we have here is a parallel "taxonomic information aggregating" activity which similarly would ideally need standards for data interchange if the poor consumer is not to deal with a multiplicity of uncontrolled local data structures and query/response syntaxes. Indeed the parallel with OGC standards is not completely theoretical in that OGC WFS (web feature service) can be adapted to map to taxonomic information (just qwithout the spatial component) without difficulty if only the community could agree on a relevant schema - in other words tools exist already (GeoServer, DeeGree) which could handle the requests/responses I believe, but they have no defined standards to work with unless you roll-your-own...
Just my 2 cents of course... I amagine the "global names" folks and their associates would have more to say on this matter of standardising access to distributed taxonomic data sources.
Regards - Tony
-----Original Message----- From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Saturday, 3 November 2012 4:58 PM To: Rees, Tony (CMAR, Hobart) Cc: <J.Kennedy@napier.ac.ukmailto:J.Kennedy@napier.ac.uk>; <mdoering@gbif.orgmailto:mdoering@gbif.org>; <deepreef@bishopmuseum.orgmailto:deepreef@bishopmuseum.org>; pmurray@anbg.gov.aumailto:pmurray@anbg.gov.au; eotuama@gbif.orgmailto:eotuama@gbif.org; tdwg-tag@lists.tdwg.orgmailto:tdwg-tag@lists.tdwg.org; Pigot, Simon (CMAR, Hobart) Subject: Re: [tdwg-tag] Any TCS users with experiences to report?
Playing devil's advocate I think there are several issues here:
1. The example you gave of an OGC query illustrates what for me is a major limitation of existing approaches (such as DiGiR and TAPIR), they focus on standardising queries not identifiers. Hence we can query databases in a consistent (if cumbersome) way, but have no easy way to refer to the things (taxa, specimens, etc.) we retrieve. Having stable, reusable, resolvable identifiers would be a step forward.
2. Taxonomic concepts aren't much use unless connected to data. Arguably the most widely used taxonomic database in biodiversity is the NCBI taxonomy database, which has stable identifiers, an API, and taxa that are connected to data (sequences and publications). The GBIF backbone classification is also connected to data (specimens and observations) although its taxon identifiers (like its occurrence ids) aren't terribly stable.
3. I think the standards-first approach tends to put the cart before the horse. I'm not sure it's the lack of standards that is the problem, it's the lack of usable information in taxonomic databases. Apart from NCBI and GBIF, what science can I do with taxonomic databases? What questions do they allow me to ask?
Regards
Rod
Sent from my iPhone
On 3 Nov 2012, at 03:41, <Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au> wrote:
Hi Jessie, also others who have responded thus far,
You said:
I think it would be great if the major databases that describe taxa (not just list names) described their data as concepts and allowed people to link to their databases when identifying specimens and when sequencing etc, this would be the start of a really useful biodiversity network.
Agreed! And also the databases that "just list names" are dealing with concepts as we know, comprising a valid name plus all listed synonyms in these cases...
My feeling is the reason that there is not yet any standardization in this area - every data resource does its own thing using its own home- grown schema in the main (that is, presuming a web service is even offered) and the "standards group" (TDWG) has not pushed a model of any sort of standard client which expects to be able to access distributed taxonomic information in a standard way, so there is no incentive for the sources to provide this. Sort of like a fax machine with no-one on the other end wishing to communicate with it. By contrast (for example) the OGC has defined standards for geospatial web services which, once adhered to, allow one wants one's own data to be accessed by standards- compliant remote client apps in a standard way, so if I publish a layer (map) from my geoserver here (http://www.cmar.csiro.au/geoserver/ ) as layer name = bioreg:CAAB37020002 then any remote client can access it via standard syntax which will retrieve it in a specified format, for example
http://www.cmar.csiro.au/geoserver/wms?service=WMS&version=1.1.0&req... t=GetMap&layers=bioreg:CAAB37020002&styles=&bbox=109.0,-44.5,156.5,- 8.5&width=512&height=388&srs=EPSG:4326&format=image/gif
So maybe for either TCS, DwC and so on a missing part of the task is to define the syntax for such calls (plus relevant expected responses) for taxonomic data and then create some example both publishing and retrieving (client) software to exercise this - provided there is an interest in doing so of course!
More soon,
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.ukmailto:r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
Hi Tony,
Quick thoughts:
1. I've not seen much correlation between "major player" status and quality of infrastructure (although I take your general point). Catalogue of Life LSIDs are broken, and are routinely so. That nobody seems to care says something about efforts to adopt globally unique identifiers (and/or the fate of LSIDs).
2. "some territory/overlap to deal with here" Yup, that's part of our problem. Not only do we have lots of smaller taxonomic databases, we seem intent on multiplying those that seek global coverage. Pity the poor user confronted by this.
3. The Open Tree of Life project seems to be partly reinventing the wheel, given the overlap in sources with those used by GBIF (see http://opentreeoflife.org/2012/10/11/is-it-a-plant-or-is-it-a-monkey/#commen... )
4. That individual databases have their own web services doesn't negate anything I've said. That we have so many, and so varied in their form and output simply emphasises the problem.
I realise that there are all sorts of deep-seated reasons for the situation we are in, much of it to do with funding issues, project politics, the need for people to build systems that solve pressing "local" problems, and the lack of a constituency for a global solution. But I marvel at our ability to keep generating new taxonomic databases, with the promise of linking it all together sometime in the future.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
In an effort to prevent the loss of information presented in the recent thread on TCS as it relates to RDF, I have created a summary at http://code.google.com/p/tdwg-rdf/wiki/TCSthread with links to the archived messages. I left out posts focused specifically on XML schemas which I consider to be outside the scope of the TDWG RDF group (that is somebody else's problem). I will continue to add to this page as additional relevant posts occur and can also post more "further information" links if anybody wants to send them to me.
Steve
participants (3)
-
Roderic Page
-
Steve Baskauf
-
Tony.Rees@csiro.au