Re: [tdwg-tag] darwin core terms inside tdwg ontology

28 Apr 2009

      Please note, the most current draft of the DarwinCore is:
  here http://code.google.com/p/darwincore/ and 
  here http://rs.tdwg.org/dwc/index.htm, 
  not here
http://wiki.tdwg.org/twiki/bin/view/DarwinCore/DarwinCoreDraftStandard), 

My primary concern with the latest draft is the absence of an explicit class
identifier (and an implicit class definition) that indicates what kind of
data object the sender is transmitting.  If an indexer/aggregator is indexing
multiple kinds of resources, as GBIF is, and a publisher provides a record
with these elements [ScientificName, ScientificNameAuthorship,
NamePublishedIn, and Country], how should the indexer interepret this record?
Is it an organism occurrence record, an authoritative taxonomic record (the
country name indicating the entire known range of the taxon), or part of a
taxonomic checklist for that country?  The term/element [BasisOfRecord] is
the first step in narrowing the possible meanings, but it's the only step and
it appears not to be a required step.  (I interpret the "Status: recommended"
to mean that it's optional.)   At a minimum, BasisOfRecord should be
required.  It would still be possible to publish garbage (at least hard to
interpret records) because our tools don't constrain structure, and there
isn't (yet) any guidance controlling the structures for different classes of
objects. 

The new GBIF Internet Publishing Toolkit (IPT) supports one-to-many
relationships among a series of flat tables and looks like it's going to make
it easier to transmit more complicated data than we were doing with DiGIR and
TAPIR.  In conjunction with this new expanded bag of elements we could see a
lot more complex data get published.  If I may use a skiing metaphor, we've
been on the bunny slopes (for beginners) up until now.  The new DarwinCore
looks like a nicely groomed black diamond slope, but we haven't had any
lessons or even watched anyone else do what we're going to attempt.  I think
we're going to end up in a heap at the bottom of the hill, and the
aggregators are going to have to sort it out.  ( Woohoo! Extreme biodiversity
informatics! )

Finally, I did not mean to say that the Ontology should not be ratified, as
in not supported; what I meant was that it should not be a standard because
we will want to change its contents without versioning the ontology as a
whole.  (Also, its not an application schema, so it can't be used directly
unless we venture into somewhat uncharted territory.)  Its role is to help us
keep our application schemas coordinated.  

The earlier versions of the DarwinCore (or our protocols, or the way we used
them together) were too limiting (see Greg's comments); this version allows
terms to be combined in nonsensical ways.  It could make life very difficult
for data integrators.  

I think the bottom line is that we URGENTLY need a similar concerted effort
to advance the TDWG (biodiversity informatics) ontology, and a companion set
of application schemas coming forward from the collections, marine biology,
paleo, observation, taxon-name-concept groups as soon as possible.  

-Stan

	-----Original Message-----
	From: Chuck Miller [mailto:Chuck.Miller@mobot.org] 
	Sent: Monday, April 27, 2009 7:22 AM
	To: Kevin Richards; Blum, Stan; Technical Architecture Group
mailinglist; exec@tdwg.org
	Subject: RE: [tdwg-tag] darwin core terms inside tdwg ontology

	Kevin,

	I agree with you and Stan that the ontology is useful to all schemas.
It seems to me that a "TDWG Ontology" is a totally new and different kind of
thing than all the data exchange standards of the prior 10 years - DwC, SDD,
TCS, etc.  But, it is a very useful and important new kind of thing that
should be part of the TDWG standards architecture. It challenges prior
thinking about the nature of TDWG standards to grasp what standardizing on an
ontology means.  But, I think it's what is needed.

	If TDWG standardized on one Ontology, then the vocabulary of all data
exchange could be standardized on it.  Then all TDWG standards could be
revised over time to comply to that vocabulary standard, including DwC.  

	Stan said: " I'd like to hear the rationale for combining taxonomic
name/concept with organism occurrence." An occurrence record generally has an
organism's name associated with it in the real world. It is necessary and
inevitable that vocabulary about organism names will be used in an occurrence
data exchange schema like DwC. We have been stymied with this idea for years.
A standard Ontology/vocabulary for the elements of name information needed to
be associated with an occurrence, or a description, or a taxon concept would
go a long way toward solving this duality.  The "standard vocabulary" would
not be standardized within DwC but it would be used in DwC.

	Of course there is the problem of the hundreds of installations of
DiGIR that use DwC "classic" and are no doubt not going to change for a long
time.  I think they just have to be accepted and worked around going forward.
It's impractical to think of anything else.  But, the past should not
roadblock the future and we need to get moving toward that future.

	Stan thinks that the Ontology is not appropriate for TDWG
ratification.  Why not?  Change has to start somewhere. Yes, other standards
would probably be in conflict if the Ontology were ratified, but I think we
want to ultimately have consistency across all the standards and that means
there has to be change going forward.  I think a ratified TDWG Ontology would
provide the foundation upon which to start building those changes.

	Chuck

________________________________