Re: SEEK Project and TDWG-SDD
After reading this thread, I'd like to return to the key issue raised by Jim in his initial post:
Bryan described to us how your schema is intended in include all of
the
data and metadata associated with a diagnostic character set for a
group
of taxa. My understanding is thus incomplete and based only on our single discussion, but it struck me that TDWG-SDD has an opportunity
to
have much broader acceptance and support if your schema was not
designed
as a single data object--to contain both the metadata about the
package
(or work or whatever you refer to it as) *and* the descriptive data
that
describe the individual concepts.
If the taxa/concepts had their own schemas and were linked to the package metadata with a GUID, maybe a DOI or some other globally
unique
identifier, then the XML concept data sets could be used for other systems like concept based classification or database management systems. This would in theory, and in my view, give the work of your group much more leverage, exposure and relevance to a broader group of scientists and users of names and concepts.
I know that the SDD team have already been considering these issues (in conjunction with the ABCD team and others), but I would like to make the following points reflecting my priorities here in GBIF.
1. It will certainly be beneficial to model the data elements from SDD in such a way that they can be reused in other documents. I would think that this should ideally allow at least for the Terminology, Entities, Descriptions and Keys elements to form independent schema elements which could be used in other schemas. We do not want the use of these elements to be restricted just because they have been tightly bound to a specific document schema. The intention with the ABCD schema was in part to start the development of a library of reusable XML biodiversity data types (not just a fixed document structure). I see much of the current work of TDWG (and of GBIF) to be developing just such a library, from which we can seek to compose a wide variety of top level document structures (specimen/observation data, character tables, diagnostic keys, taxonomic revisions, markup of legacy taxonomic literature like the Biologia Centrali-Americana).
2. On the other hand, it does seem sensible to provide for the metadata to be transferred with each data set so that ownership information, usage restrictions, known limitations, etc. are not lost. I would therefore like to put effort into adopting or developing a top-level document envelope suitable for all classes of biodiversity data exchange. This should include information on origin and ownership, data transformation history, taxonomic, geographic and temporal coverage (as appropriate) and any metadata necessary to allow processors to identify the schema(s) in use for the actual data within the document. The real content should be separable for re-use in other contexts, but such a metadata wrapper standard would bring us closer to automating the manipulation of a wide range of content. I have in mind here something like the ABCD structure, with a top level DataSets wrapper containing a number of different DataSet objects, each of which is made up of a set of Units. In effect the Units element would be a container for data elements from SDD, ABCD, TDWG-Names, etc. The DataSet-level elements would provide a common metadata model for all of these documents.
I feel that we need a grand vision for how we will unify all of these different schemas into an overall information model. Consider the task of developing a full taxonomic revision for a group using XML documents. This would naturally include references to specimens underlying a concept (external references to ABCD documents?), character data (SDD), and nomenclatural data (TDWG-Names). The goal should be for a processor to be able to take such a document and treat it as an element within a comprehensive electronic library of biodiversity data. We will need registries of the locations of different documents (with some form of GUID for each document) and mechanisms for managing the cross-references (taxon names, catalog numbers for specimens, author names, character definitions, etc.). Such an infrastructure would allow us to populate our taxon concepts with all of the relevant information.
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
participants (1)
-
Donald Hobern