[tdwg-tag] darwin core terms inside tdwg ontology

Peter DeVries pete.devries at gmail.com
Wed Apr 29 21:45:00 CEST 2009


Hi Markus,
This is very cool. :-)

I had some ideas, suggestions:

Add:
TaxonConceptID => unique identifier of taxon concept, some sort of
resolvable GUID

Change:
TaxonID to TaxonNameID since it seems to link to something like a uBio
namebankid

Question:
higherTaxonID, higherTaxon  are these the accepted names or the names on the
collection label?

If accepted, change to higherTaxonNameID, higherTaxonName

Add:
HigherTaxonConceptID, this would link to a the taxon concept for the next
highest group with a more stable identifier than the family name / lsid.

Question:
scientificNameAuthorship  does this include the year? also I replace @ with
"and" to avoid encoding, decoding e.g. "Smith and Jones"

Question:
taxonAccordingTo  is this a DOI? or linked identifier?

Comment:
namePublishedIn should these be a subproperty of
dcterms:BibliographicCitation?

Change:
acceptedTaxonID => acceptedTaxonNameID since it appears this would be a uBio
namebank ID type identfier
acceptedTaxon    => acceptedTaxonName since this is the current accepted
name.

These changes should not effect any of your goals but it would help
disambiguate a species concept i.e. this is a real species, from the current
taxonomic
hypothesis that this species belongs in this genus and family.

The recognition that something is a real species changes very little
overtime, but the taxonomic hypothesis implicit in the Genus and species
names
seems to change fairly often.

It also would have all the synonyms point to a relatively stable GUID for
the species concept, not the latest accepted taxonomic hypothesis i.e.
"name"

The rational is that this makes it easier to find all records of the same
species that happen to labeled either *Aedes triseriatus* or *Ochleratatus
triseriatus *via
names and the LSID's for those names.

[Caution Red Herring]

I will also throw this idea out there, as red herring. Is there any
advantage to differentiating between different taxon levels for instance,
species, family, order etc?

For instance having

SpeciesTaxon,
  speciesTaxonNameID
  speciesScientificName ...
  speciesInFamilyTaxonID
  speciesInFamily

FamilyTaxon
  familyTaxonNameID
  familyScientificName
  familyinOrderTaxonID
  familyinOrderName

...

This would make it clearer as to what attributes apply to which records. For
instance, an order would have a kingdom, phylum, class, but not a family.
Another way to achieve this is to assume that Taxon is a species, or
paraspecies entity like subspecies, variety, and not one of the higher level
groupings.
For example, a Taxon is a OTU operating at or near the species level. I
think most of the DarwinCore observation records will be of species or
paraspecies
anyway.

These are just some things to think about,

Respectfully,

- Pete

On Tue, Apr 28, 2009 at 12:15 AM, "Markus Döring (GBIF)"
<mdoering at gbif.org>wrote:

> Im trying my best to catch up with all the mails in this thread
> lately, but surely missed something.
> Ive ended up with a rather long post, please excuse, but I think the
> new dwc draft needs some explanations...
>
>
>
> The idea of darwin core is to provide a simple list of terms which are
> useful in different contexts - just like dublin core does. We have
> been thinking about giving the terms a "domain", the dublin core
> terminology for binding a property to a class. Although we did declare
> a terms domain in the description, we finally decided that this is not
> more than a grouping of terms. Assigning a domain is a rather
> controversary task as properties can belong to several classes and the
> granularity of class definitions is strongly depending on your views.
> This is likely the mayor obstacle in getting an agreed ontology on the
> road too I believe.
>
> But in order to express the context of a collection of darwin core
> terms we also defined class terms representing a taxon/name, an
> occurrence/specimen, a site, a collecting event and more. Actually
> there are 2 ways of expressing the type of a thing in the new terms -
> using a class term (e.g. as the parent xml element instead of just
> record or inside the rowType attribute of the text archive meta file)
> or using the basis of record when using the simple/classic darwin core
> with a meaningfree DarwinRecord parent element.
>
> During last TDWG I got very attracted to the KISS idea that was
> present all over, but especially in Chucks talk. Why has darwin core
> been so much more successful than other tdwg formats? Why can't we use
> the same approach to share taxonomic or nomenclatural data? In the end
> the core properties for a taxon or name are rather limited and if you
> take TCS apart there is not much more you can do with it than what the
> taxonomic dwc terms provide - leaving concept relations aside. We also
> decided not to separate a name from a taxon. That happens in the
> context and due to the present of a taxonomic status or taxonomic
> classification. In order to save you from looking up the latest draft,
> here is the list of the taxonomic terms with a few quick annoations
> from myself:
>
> [ taken from http://darwincore.googlecode.com/svn/trunk/terms/
> index.htm ]
>
> taxonID  # the taxon/name ID, preferrably a GUID but local ids are
> permitted too. Any ID
> scientificName  # the full name incl authorship
> higherTaxonID  # ID pointing to the next higher taxon
> higherTaxon  #  full scientific name of the next higher taxon. Can be
> used alternatively or in addition to the ID above
> kingdom  # the classic dwc higher ranks to simplify transfer in many
> cases and useful to disambiguate homonyms
> phylum
> class
> order
> family
> genus
> subgenus
> specificEpithet  # atomised epitheton part as you would guess
> taxonRank  # any rank string, preferrably taken from a controlled list
> such as the TCS one
> infraspecificEpithet
> scientificNameAuthorship  # the authorship alone. Similar to the other
> atomised parts this might not be needed as the complete sciname string
> is good enough, but it seemed desired by many
> nomenclaturalCode # the code to disambiguate homonyms across the codes
> taxonAccordingTo # the taxon concept sec reference
> namePublishedIn # the nomenclatural reference, original publication of
> the name
> taxonomicStatus  # a status such as accepted, (heterotypic) synonym,
> missapplied name. Ideally also a vocabulary such as the ontology one
> for nomenclatural note types
> nomenclaturalStatus  # similar to above, but only nomenclaturaly
> relevant statuses such as nom. illeg.
> acceptedTaxonID  # pointer to accepted taxon in case of synonyms or
> misapplied names
> acceptedTaxon  # explicit string version of above
> basionymID  #  pointer to the basionym
> basionym # explicit string version of above
>
>
> So frankly I think there is a lot you can do with these terms. Every
> synonym will be a name on its own and link to the accepted taxon,
> having its own taxonomic status that can provide you with details
> about the reason for it being a synonym or misapplied name. If
> combined with 1 to many extensions, you could even exchange pretty
> detailed species information in a *very* simple model.
> Imagine you have datat based on extensions for species such as
> distribution (multiple distribution status per area per species),
> descriptions (just a simple title, body, and description type ala SPM
> classes would be great), multimedia, alternative links/guids, types
> data. As an exercise I have also translated parts of the GISIN schema
> into extensions based around species described by darwin core terms
> and it works fine.
>
> Surely darwin core terms can leave you in doubt about the exact
> context, just as dublin core does. But it immediately allows you to
> share data in a simple way that people are comfortable with and it is
> not tight to a technology per se. I see this as the biggest advantage
> of all. It plays nice in the world of xml, rdf, plain text, xhtml,
> microformats - just anything as it is based on just a list of terms
> with an associated globally unique URI. And by decoupling the
> recommendations on how to use the dwc terms in the context of
> different technologies, we can keep the term definitions stable no
> matter what comes next.
>
>
> Also I wanted to avoid confusion about IPT and the dwc text archives.
> The text archive, extensions and vocabularies idea implemented in the
> IPT are not IPT specific. It's just a very simple exchanging format
> based on the dwc recommendations for how to use darwin core in the
> context of text/csv files. For the interested I am writing on client
> code currently, so there will be a java archive reader in a few days
> that allows you to easily iterate over the core dwc records in an
> archive and pulls out the related extension records together with the
> core record.
>
>
> I am glad to see this list being so alive again!
> Markus
>
>
>
> On Apr 28, 2009, at 3:22, Blum, Stan wrote:
>
> > @font-face { font-family: Tahoma; } @page Section1 {size: 8.5in
> > 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; } P.MsoNormal { FONT-
> > SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman" }
> > LI.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY:
> > "Times New Roman" } DIV.MsoNormal { FONT-SIZE: 12pt; MARGIN: 0in 0in
> > 0pt; FONT-FAMILY: "Times New Roman" } A:link {        COLOR: blue; TEXT-
> > DECORATION: underline } SPAN.MsoHyperlink { COLOR: blue; TEXT-
> > DECORATION: underline } A:visited { COLOR: blue; TEXT-DECORATION:
> > underline } SPAN.MsoHyperlinkFollowed { COLOR: blue; TEXT-
> > DECORATION: underline } P { FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt;
> > FONT-FAMILY: "Times New Roman" } SPAN.EmailStyle18 { COLOR: navy;
> > FONT-FAMILY: Arial; mso-style-type: personal-reply } DIV.Section1
> > { page: Section1 }
> > Please note, the most current draft of the DarwinCore is:
> >   here http://code.google.com/p/darwincore/ and
> >   here http://rs.tdwg.org/dwc/index.htm,
> >   not here
> http://wiki.tdwg.org/twiki/bin/view/DarwinCore/DarwinCoreDraftStandard)
> > ,
> >
> > My primary concern with the latest draft is the absence of an
> > explicit class identifier (and an implicit class definition) that
> > indicates what kind of data object the sender is transmitting.  If
> > an indexer/aggregator is indexing multiple kinds of resources, as
> > GBIF is, and a publisher provides a record with these elements
> > [ScientificName, ScientificNameAuthorship, NamePublishedIn, and
> > Country], how should the indexer interepret this record?  Is it an
> > organism occurrence record, an authoritative taxonomic record (the
> > country name indicating the entire known range of the taxon), or
> > part of a taxonomic checklist for that country?  The term/element
> > [BasisOfRecord] is the first step in narrowing the possible
> > meanings, but it's the only step and it appears not to be a required
> > step.  (I interpret the "Status: recommended" to mean that it's
> > optional.)   At a minimum, BasisOfRecord should be required.  It
> > would still be possible to publish garbage (at least hard to
> > interpret records) because our tools don't constrain structure, and
> > there isn't (yet) any guidance controlling the structures for
> > different classes of objects.
> >
> > The new GBIF Internet Publishing Toolkit (IPT) supports one-to-many
> > relationships among a series of flat tables and looks like it's
> > going to make it easier to transmit more complicated data than we
> > were doing with DiGIR and TAPIR.  In conjunction with this new
> > expanded bag of elements we could see a lot more complex data get
> > published.  If I may use a skiing metaphor, we've been on the bunny
> > slopes (for beginners) up until now.  The new DarwinCore looks like
> > a nicely groomed black diamond slope, but we haven't had any lessons
> > or even watched anyone else do what we're going to attempt.  I think
> > we're going to end up in a heap at the bottom of the hill, and the
> > aggregators are going to have to sort it out.  ( Woohoo! Extreme
> > biodiversity informatics! )
> >
> >
> > Finally, I did not mean to say that the Ontology should not be
> > ratified, as in not supported; what I meant was that it should not
> > be a standard because we will want to change its contents without
> > versioning the ontology as a whole.  (Also, its not an application
> > schema, so it can't be used directly unless we venture into somewhat
> > uncharted territory.)  Its role is to help us keep our application
> > schemas coordinated.
> >
> > The earlier versions of the DarwinCore (or our protocols, or the way
> > we used them together) were too limiting (see Greg's comments); this
> > version allows terms to be combined in nonsensical ways.  It could
> > make life very difficult for data integrators.
> >
> > I think the bottom line is that we URGENTLY need a similar concerted
> > effort to advance the TDWG (biodiversity informatics) ontology, and
> > a companion set of application schemas coming forward from the
> > collections, marine biology, paleo, observation, taxon-name-concept
> > groups as soon as possible.
> >
> > -Stan
> >
> > -----Original Message-----
> > From: Chuck Miller [mailto:Chuck.Miller at mobot.org]
> > Sent: Monday, April 27, 2009 7:22 AM
> > To: Kevin Richards; Blum, Stan; Technical Architecture Group
> > mailinglist; exec at tdwg.org
> > Subject: RE: [tdwg-tag] darwin core terms inside tdwg ontology
> >
> > Kevin,
> >
> > I agree with you and Stan that the ontology is useful to all
> > schemas.  It seems to me that a “TDWG Ontology” is a totally new and
> > different kind of thing than all the data exchange standards of the
> > prior 10 years – DwC, SDD, TCS, etc.  But, it is a very useful and
> > important new kind of thing that should be part of the TDWG
> > standards architecture. It challenges prior thinking about the
> > nature of TDWG standards to grasp what standardizing on an ontology
> > means.  But, I think it’s what is needed.
> >
> >
> >
> > If TDWG standardized on one Ontology, then the vocabulary of all
> > data exchange could be standardized on it.  Then all TDWG standards
> > could be revised over time to comply to that vocabulary standard,
> > including DwC.
> >
> >
> >
> > Stan said: “ I'd like to hear the rationale for combining taxonomic
> > name/concept with organism occurrence.” An occurrence record
> > generally has an organism’s name associated with it in the real
> > world. It is necessary and inevitable that vocabulary about organism
> > names will be used in an occurrence data exchange schema like DwC.
> > We have been stymied with this idea for years. A standard Ontology/
> > vocabulary for the elements of name information needed to be
> > associated with an occurrence, or a description, or a taxon concept
> > would go a long way toward solving this duality.  The “standard
> > vocabulary” would not be standardized within DwC but it would be
> > used in DwC.
> >
> >
> >
> > Of course there is the problem of the hundreds of installations of
> > DiGIR that use DwC “classic” and are no doubt not going to change
> > for a long time.  I think they just have to be accepted and worked
> > around going forward.  It’s impractical to think of anything else.
> > But, the past should not roadblock the future and we need to get
> > moving toward that future.
> >
> >
> >
> > Stan thinks that the Ontology is not appropriate for TDWG
> > ratification.  Why not?  Change has to start somewhere. Yes, other
> > standards would probably be in conflict if the Ontology were
> > ratified, but I think we want to ultimately have consistency across
> > all the standards and that means there has to be change going
> > forward.  I think a ratified TDWG Ontology would provide the
> > foundation upon which to start building those changes.
> >
> >
> >
> > Chuck
> >
> >
> >
> > _______________________________________________
> > tdwg-tag mailing list
> > tdwg-tag at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>



-- 
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20090429/4997b122/attachment.html 


More information about the tdwg-tag mailing list