[tdwg-content] A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent)
Steve Baskauf
steve.baskauf at vanderbilt.edu
Mon Oct 14 16:34:15 CEST 2013
Rod,
http://code.google.com/p/tdwg-rdf/wiki/BiodiversityOntologies which has
been online since last March.
Steve
Roderic Page wrote:
...
> What might help is a way to visualise the TDWG LSID ontology in terms
> of the interconnections between the different classes. I'm not aware
> of such a visualisation (nor of an equivalent one for the Darwin Core
> classes).
>
> In any event, it seems odd to have two distinct ontologies that are
> both in use, and which overlap so significantly.
>
> Regards
>
> Rod
> On 13 Oct 2013, at 16:12, Donald Hobern [GBIF] wrote:
>
>> It’s been a couple of weeks but I said I’d try to write something
>> about a more general concern I have around the way we use
>> basisOfRecord and dcterms:type to hold values like occurrence, event
>> and materialSample. This is something that has concerned me for
>> years and that, I worry, is making everything we all do much messier
>> than it need be.
>>
>> I believe that the way we have come to use Darwin Core basisOfRecord
>> is confused and unhelpful. I really wish we used Darwin Core like this:
>>
>> 1. basisOfRecord should be used ONLY to indicate the type of
>> evidence that lies behind a record – a key aspect of whether the
>> record is likely to be useful for different purposes
>> 2. basisOfRecord values should be taken from a hierarchical
>> vocabulary with three main branches:
>> a. “specimens” (i.e. biological material that can be reviewed),
>> with a hierarchy of subordinate values such as “pinnedSpecimen”,
>> “herbariumSheet”, etc.
>> b. derived, non-biological evidence (not sure what name), with a
>> hierarchy of subordinate values such as “dnaSequence”,
>> “soundRecording”, “stillImage”, etc.
>> c. asserted observations with no revisitable evidence other
>> than the authority of the observer
>> 3. TDWG should deliver a basic ontology in the form of a graph
>> of key relationships between the most significant conceptual entities
>> in our world (TaxonName, TaxonConcept, Identification, Collection,
>> Specimen, Locality, Agent, …)
>> 4. This ontology should not attempt to map all the complexity
>> of biodiversity-related data – just provide the high-level map and
>> key relationships (TaxonConcept hasName TaxonName, Specimen heldIn
>> Collection, etc.) – it should leave definition of other properties as
>> a separate, open-ended activity for the community
>> 5. This ontology should be reviewed at regular intervals and
>> versioned as necessary to address critical gaps – provided that
>> backwards compatibility is maintained (splitting a class into
>> multiple consitituent classes probably won’t break anything, so start
>> simple)
>> 6. The Darwin Core vocabulary should be published as a flat,
>> open-ended list of terms with clear definitions that can be freely
>> combined as columns in denormalised records
>> 7. Every Darwin Core term should be documented to be tightly
>> associated with a single, fixed class in the ontology (e.g.
>> scientificName and specificEpithet are ALWAYS considered to be
>> properties of a TaxonName whether or not that TaxonName object is
>> clearly referenced or separated out)
>> 8. Every data publisher should be encouraged to share all
>> relevant data elements in their source data in the most convenient
>> normalised or denormalised form, provided they use the recognised
>> Darwin Core properties for elements that match the definition for
>> those terms, and provided they give some metadata for other
>> elements. Possible forms include:
>> a. A completely hierarchical, ABCD-like, XML representation
>> b. A completely flat denormalised, simple-DwC-like, CVS
>> representation, if the data includes no elements with higher cardinality
>> c. A set of flat, relational, CVS representations, as with
>> Darwin Core Archive star schemas, but with freedom to have more
>> complex graphed relationships as needed
>> 9. Each table of CVS data in 8b and 8c is a view that
>> corresponds to a linear subgraph of the TDWG ontology, identified by
>> the classes of the DwC properties used – this allows us to infer the
>> “shape” of the data in terms of the ontology
>> 10. If we do this, we do not need to worry about whether a record
>> is a checklist record, an event, an occurrence, a material sample or
>> whatever else, although we could use the dcterms: type property, or
>> some new property, to hold this detail as a further clue to intent
>> and possible use for the record
>>
>> Here is an example. In today’s terms, what sort of DwC record is
>> this? Do I really have to replace “recordId” with “eventId”,
>> “occurrenceId” or similar? And which should I choose?
>>
>> *recordId, decimalLatitude, decimalLongitude, coordinatePrecision,
>> eventDate, scientificName, individualCount*
>>
>> I think it is clear that this record tells us that there was a
>> recording event at a particular time and place where someone or some
>> process recorded a given number of individual organisms which were
>> identified as representatives of a taxon concept with a name
>> corresponding to the supplied scientific name. In other words this
>> gives us some properties from a subgraph that might include, say,
>> instances of TDWG Event, Locality, Date, Occurrence, Identification,
>> TaxonConcept and TaxonName classes. None of these is specifically
>> referenced but we can unambiguously fold the flat record onto the
>> ontology. We can moreover then use the combination of supplied
>> elements to decide whether this record would be of interest to GBIF,
>> a national information facility, a tool cataloguing uses of
>> scientific names, etc. The same will also apply if multiple CVS
>> tables are provided as in 8c.
>>
>> I have thought about this for a long time and cannot yet think of an
>> area in which this would not work efficiently – and unambiguously –
>> for all concerned. There are some cases where multiple instances of
>> the same ontology class would be referenced within a single record,
>> which may mean more care is needed by the publisher (e.g. if an
>> insect specimen record includes a reference to a host plant). There
>> may be cases where automated review of the data indicates that there
>> are impossible combinations or ambiguities that the publisher must
>> resolve. However I believe we could use this approach to generalise
>> all mobilisation and consumption of biodiversity data (including all
>> the things we have addressed under ABCD, SDD, TCS, Plinian Core,
>> etc.) and to make it genuinely possible for any data holder to share
>> all the data they have in a form that makes sense to them, while
>> allowing others to consume these data intelligently.
>>
>> Right now, I think our confused use of basisOfRecord is almost the
>> only thing that stops us from exploring this. We have blurred the
>> question of the evidence for a record, with the question of the
>> “shape” of the record as a subgraph. These are different things.
>> Separating them will allow us to get away from some of our
>> unresolvable debates and open up the doors to much simpler data
>> sharing and reuse.
>>
>> Thanks,
>>
>> Donald
>>
>> ----------------------------------------------------------------------
>> Donald Hobern - GBIF Director - dhobern at gbif.org
>> <mailto:dhobern at gbif.org>
>> Global Biodiversity Information Facility http://www.gbif.org/
>> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>> Tel: +45 3532 1471 Mob: +45 2875 1471 Fax: +45 2875 1480
>> ----------------------------------------------------------------------
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r.page at bio.gla.ac.uk <mailto:r.page at bio.gla.ac.uk>
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> Skype: rdmpage
> Facebook: http://www.facebook.com/rdmpage
> LinkedIn: http://uk.linkedin.com/in/rdmpage
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page
> Citations:
> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ>
> ORCID: http://orcid.org/0000-0002-7101-9767
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20131014/ebfbe7a1/attachment.html
More information about the tdwg-content
mailing list