[tdwg-content] A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent)

Mon Oct 14 16:34:15 CEST 2013

Rod,
http://code.google.com/p/tdwg-rdf/wiki/BiodiversityOntologies which has 
been online since last March.
Steve

Roderic Page wrote:
...
> What might help is a way to visualise the TDWG LSID ontology in terms 
> of the interconnections between the different classes. I'm not aware 
> of such a visualisation (nor of an equivalent one for the Darwin Core 
> classes). 
>
> In any event, it seems odd to have two distinct ontologies that are 
> both in use, and which overlap so significantly.
>
> Regards
>
> Rod
> On 13 Oct 2013, at 16:12, Donald Hobern [GBIF] wrote:
>
>> It’s been a couple of weeks but I said I’d try to write something 
>> about a more general concern I have around the way we use 
>> basisOfRecord and dcterms:type to hold values like occurrence, event 
>> and materialSample.  This is something that has concerned me for 
>> years and that, I worry, is making everything we all do much messier 
>> than it need be.
>>  
>> I believe that the way we have come to use Darwin Core basisOfRecord 
>> is confused and unhelpful.  I really wish we used Darwin Core like this:
>>  
>> 1.       basisOfRecord should be used ONLY to indicate the type of 
>> evidence that lies behind a record – a key aspect of whether the 
>> record is likely to be useful for different purposes
>> 2.       basisOfRecord values should be taken from a hierarchical 
>> vocabulary with three main branches:
>> a.       “specimens” (i.e. biological material that can be reviewed), 
>> with a hierarchy of subordinate values such as “pinnedSpecimen”, 
>> “herbariumSheet”, etc.
>> b.      derived, non-biological evidence (not sure what name), with a 
>> hierarchy of subordinate values such as “dnaSequence”, 
>> “soundRecording”, “stillImage”, etc.
>> c.       asserted observations with no revisitable evidence other 
>> than the authority of the observer
>> 3.       TDWG should deliver a basic ontology in the form of a graph 
>> of key relationships between the most significant conceptual entities 
>> in our world (TaxonName, TaxonConcept, Identification, Collection, 
>> Specimen, Locality, Agent, …)
>> 4.       This ontology should not attempt to map all the complexity 
>> of biodiversity-related data – just provide the high-level map and 
>> key relationships (TaxonConcept hasName TaxonName, Specimen heldIn 
>> Collection, etc.) – it should leave definition of other properties as 
>> a separate, open-ended activity for the community
>> 5.       This ontology should be reviewed at regular intervals and 
>> versioned as necessary to address critical gaps – provided that 
>> backwards compatibility is maintained (splitting a class into 
>> multiple consitituent classes probably won’t break anything, so start 
>> simple)
>> 6.       The Darwin Core vocabulary should be published as a flat, 
>> open-ended list of terms with clear definitions that can be freely 
>> combined as columns in denormalised records
>> 7.       Every Darwin Core term should be documented to be tightly 
>> associated with a single, fixed class in the ontology (e.g. 
>> scientificName and specificEpithet are ALWAYS considered to be 
>> properties of a TaxonName whether or not that TaxonName object is 
>> clearly referenced or separated out)
>> 8.       Every data publisher should be encouraged to share all 
>> relevant data elements in their source data in the most convenient 
>> normalised or denormalised form, provided they use the recognised 
>> Darwin Core properties for elements that match the definition for 
>> those terms, and provided they give some metadata for other 
>> elements.  Possible forms include:
>> a.       A completely hierarchical, ABCD-like, XML representation
>> b.      A completely flat denormalised, simple-DwC-like, CVS 
>> representation, if the data includes no elements with higher cardinality
>> c.       A set of flat, relational, CVS representations, as with 
>> Darwin Core Archive star schemas, but with freedom to have more 
>> complex graphed relationships as needed
>> 9.       Each table of CVS data in 8b and 8c is a view that 
>> corresponds to a linear subgraph of the TDWG ontology, identified by 
>> the classes of the DwC properties used – this allows us to infer the 
>> “shape” of the data in terms of the ontology
>> 10.   If we do this, we do not need to worry about whether a record 
>> is a checklist record, an event, an occurrence, a material sample or 
>> whatever else, although we could use the dcterms: type property, or 
>> some new property, to hold this detail as a further clue to intent 
>> and possible use for the record
>>  
>> Here is an example.  In today’s terms, what sort of DwC record is 
>> this?  Do I really have to replace “recordId” with “eventId”, 
>> “occurrenceId” or similar? And which should I choose?
>>  
>> *recordId, decimalLatitude, decimalLongitude, coordinatePrecision, 
>> eventDate, scientificName, individualCount*
>>  
>> I think it is clear that this record tells us that there was a 
>> recording event at a particular time and place where someone or some 
>> process recorded a given number of individual organisms which were 
>> identified as representatives of a taxon concept with a name 
>> corresponding to the supplied scientific name.  In other words this 
>> gives us some properties from a subgraph that might include, say, 
>> instances of TDWG Event, Locality, Date, Occurrence, Identification, 
>> TaxonConcept and TaxonName classes. None of these is specifically 
>> referenced but we can unambiguously fold the flat record onto the 
>> ontology.  We can moreover then use the combination of supplied 
>> elements to decide whether this record would be of interest to GBIF, 
>> a national information facility, a tool cataloguing uses of 
>> scientific names, etc.  The same will also apply if multiple CVS 
>> tables are provided as in 8c.
>>  
>> I have thought about this for a long time and cannot yet think of an 
>> area in which this would not work efficiently – and unambiguously – 
>> for all concerned.  There are some cases where multiple instances of 
>> the same ontology class would be referenced within a single record, 
>> which may mean more care is needed by the publisher (e.g. if an 
>> insect specimen record includes a reference to a host plant). There 
>> may be cases where automated review of the data indicates that there 
>> are impossible combinations or ambiguities that the publisher must 
>> resolve.  However I believe we could use this approach to generalise 
>> all mobilisation and consumption of biodiversity data (including all 
>> the things we have addressed under ABCD, SDD, TCS, Plinian Core, 
>> etc.) and to make it genuinely possible for any data holder to share 
>> all the data they have in a form that makes sense to them, while 
>> allowing others to consume these data intelligently.
>>  
>> Right now, I think our confused use of basisOfRecord is almost the 
>> only thing that stops us from exploring this.  We have blurred the 
>> question of the evidence for a record, with the question of the 
>> “shape” of the record as a subgraph.  These are different things.  
>> Separating them will allow us to get away from some of our 
>> unresolvable debates and open up the doors to much simpler data 
>> sharing and reuse.
>>  
>> Thanks,
>>  
>> Donald
>>  
>> ----------------------------------------------------------------------
>> Donald Hobern - GBIF Director - dhobern at gbif.org 
>> <mailto:dhobern at gbif.org>
>> Global Biodiversity Information Facility http://www.gbif.org/
>> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>> Tel: +45 3532 1471  Mob: +45 2875 1471  Fax: +45 2875 1480
>> ----------------------------------------------------------------------
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email:  r.page at bio.gla.ac.uk <mailto:r.page at bio.gla.ac.uk>
> Tel:  +44 141 330 4778
> Fax:  +44 141 330 2792
> Skype:  rdmpage
> Facebook:  http://www.facebook.com/rdmpage
> LinkedIn:  http://uk.linkedin.com/in/rdmpage
> Twitter:  http://twitter.com/rdmpage
> Blog:  http://iphylo.blogspot.com
> Home page:  http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> Wikipedia:  http://en.wikipedia.org/wiki/Roderic_D._M._Page
> Citations:  
> http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ 
> <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ>
> ORCID:  http://orcid.org/0000-0002-7101-9767
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20131014/ebfbe7a1/attachment.html