After reading this thread, I'd like to return to the key issue raised by
Jim in his initial post:
> Bryan described to us how your schema is intended in include all of
the
> data and metadata associated with a diagnostic character set for a
group
> of taxa. My understanding is thus incomplete and based only on our
> single discussion, but it struck me that TDWG-SDD has an opportunity
to
> have much broader acceptance and support if your schema was not
designed
> as a single data object--to contain both the metadata about the
package
> (or work or whatever you refer to it as) *and* the descriptive data
that
> describe the individual concepts.
>
> If the taxa/concepts had their own schemas and were linked to the
> package metadata with a GUID, maybe a DOI or some other globally
unique
> identifier, then the XML concept data sets could be used for other
> systems like concept based classification or database management
> systems. This would in theory, and in my view, give the work of your
> group much more leverage, exposure and relevance to a broader group of
> scientists and users of names and concepts.
I know that the SDD team have already been considering these issues (in
conjunction with the ABCD team and others), but I would like to make the
following points reflecting my priorities here in GBIF.
1. It will certainly be beneficial to model the data elements from SDD
in such a way that they can be reused in other documents. I would think
that this should ideally allow at least for the Terminology, Entities,
Descriptions and Keys elements to form independent schema elements which
could be used in other schemas. We do not want the use of these
elements to be restricted just because they have been tightly bound to a
specific document schema. The intention with the ABCD schema was in
part to start the development of a library of reusable XML biodiversity
data types (not just a fixed document structure). I see much of the
current work of TDWG (and of GBIF) to be developing just such a library,
from which we can seek to compose a wide variety of top level document
structures (specimen/observation data, character tables, diagnostic
keys, taxonomic revisions, markup of legacy taxonomic literature like
the Biologia Centrali-Americana).
2. On the other hand, it does seem sensible to provide for the metadata
to be transferred with each data set so that ownership information,
usage restrictions, known limitations, etc. are not lost. I would
therefore like to put effort into adopting or developing a top-level
document envelope suitable for all classes of biodiversity data
exchange. This should include information on origin and ownership, data
transformation history, taxonomic, geographic and temporal coverage (as
appropriate) and any metadata necessary to allow processors to identify
the schema(s) in use for the actual data within the document. The real
content should be separable for re-use in other contexts, but such a
metadata wrapper standard would bring us closer to automating the
manipulation of a wide range of content. I have in mind here something
like the ABCD structure, with a top level DataSets wrapper containing a
number of different DataSet objects, each of which is made up of a set
of Units. In effect the Units element would be a container for data
elements from SDD, ABCD, TDWG-Names, etc. The DataSet-level elements
would provide a common metadata model for all of these documents.
I feel that we need a grand vision for how we will unify all of these
different schemas into an overall information model. Consider the task
of developing a full taxonomic revision for a group using XML documents.
This would naturally include references to specimens underlying a
concept (external references to ABCD documents?), character data (SDD),
and nomenclatural data (TDWG-Names). The goal should be for a processor
to be able to take such a document and treat it as an element within a
comprehensive electronic library of biodiversity data. We will need
registries of the locations of different documents (with some form of
GUID for each document) and mechanisms for managing the cross-references
(taxon names, catalog numbers for specimens, author names, character
definitions, etc.). Such an infrastructure would allow us to populate
our taxon concepts with all of the relevant information.
Donald
---------------------------------------------------------------
Donald Hobern (dhobern(a)gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
---------------------------------------------------------------