[tdwg-content] A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent)

Robert Guralnick Robert.Guralnick at colorado.edu
Sun Oct 13 22:10:48 CEST 2013


 Rod --- There are a couple different conceptions of interrelationships
between Darwin Core "classes", including the Darwin Core Semantic Web
effort led by Steve Baskauf and Cam Web, and the BiSciCol project.  Darwin
Core SW is here: https://code.google.com/p/darwin-sw/ and the BiSciCol
"take" is here:  http://biscicol.blogspot.com/2013_03_01_archive.html.  The
Darwin Core SW version includes new classes not in Darwin Core, while
BiSciCol uses only existing class terms and a very simple set of
predicates.

 I think in many people's view, including those of the authors of the above
(although I hate speaking for them), neither DW-SW or DW-BiSciCol may be
really able to handle the current needs for linking resources together
effectively.  There has been a major effort to refocus away from
jury-rigging Darwin Core to try to serve in a more semantic framework and
pushing towards other solutions that align biodiversity standards more with
the OBO Foundry (http://www.obofoundry.org/).  The Biocollections Ontology
(BCO; https://code.google.com/p/bco/) represents (what I hope) is a clear
rethinking of the challenge that does connect back to the Darwin Core.

Best, Rob



On Sun, Oct 13, 2013 at 1:52 PM, Roderic Page <r.page at bio.gla.ac.uk> wrote:

> I've always been somewhat puzzled by the disconnect between the TDWG LSID
> ontology (e.g., http://rs.tdwg.org/ontology/voc/TaxonConcept ) which has
> a rich set of classes and links between those classes, and Darwin Core
> (e.g., http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm ) which
> overlaps with this vocabulary and, in my opinion, does a worse job in some
> areas, notably taxon names and concepts. Maybe the LSID vocabulary suffered
> from the limited uptake of LSIDs (apart from the nomenclators and Catalogue
> of Life) or from the complexity of dealing with RDF, but it seems that much
> of the essential work was done when Roger Hyam created that ontology.
>
> What might help is a way to visualise the TDWG LSID ontology in terms of
> the interconnections between the different classes. I'm not aware of such a
> visualisation (nor of an equivalent one for the Darwin Core classes).
>
> In any event, it seems odd to have two distinct ontologies that are both
> in use, and which overlap so significantly.
>
> Regards
>
> Rod
> On 13 Oct 2013, at 16:12, Donald Hobern [GBIF] wrote:
>
> It’s been a couple of weeks but I said I’d try to write something about a
> more general concern I have around the way we use basisOfRecord and
> dcterms:type to hold values like occurrence, event and materialSample.
> This is something that has concerned me for years and that, I worry, is
> making everything we all do much messier than it need be.****
> ** **
> I believe that the way we have come to use Darwin Core basisOfRecord is
> confused and unhelpful.  I really wish we used Darwin Core like this:****
> ** **
> 1.       basisOfRecord should be used ONLY to indicate the type of
> evidence that lies behind a record – a key aspect of whether the record is
> likely to be useful for different purposes****
> 2.       basisOfRecord values should be taken from a hierarchical
> vocabulary with three main branches:****
> a.       “specimens” (i.e. biological material that can be reviewed),
> with a hierarchy of subordinate values such as “pinnedSpecimen”,
> “herbariumSheet”, etc.****
> b.      derived, non-biological evidence (not sure what name), with a
> hierarchy of subordinate values such as “dnaSequence”, “soundRecording”,
> “stillImage”, etc.****
> c.       asserted observations with no revisitable evidence other than
> the authority of the observer****
> 3.       TDWG should deliver a basic ontology in the form of a graph of
> key relationships between the most significant conceptual entities in our
> world (TaxonName, TaxonConcept, Identification, Collection, Specimen,
> Locality, Agent, …)****
> 4.       This ontology should not attempt to map all the complexity of
> biodiversity-related data – just provide the high-level map and key
> relationships (TaxonConcept hasName TaxonName, Specimen heldIn Collection,
> etc.) – it should leave definition of other properties as a separate,
> open-ended activity for the community****
> 5.       This ontology should be reviewed at regular intervals and
> versioned as necessary to address critical gaps – provided that backwards
> compatibility is maintained (splitting a class into multiple consitituent
> classes probably won’t break anything, so start simple)****
> 6.       The Darwin Core vocabulary should be published as a flat,
> open-ended list of terms with clear definitions that can be freely combined
> as columns in denormalised records****
> 7.       Every Darwin Core term should be documented to be tightly
> associated with a single, fixed class in the ontology (e.g. scientificName
> and specificEpithet are ALWAYS considered to be properties of a TaxonName
> whether or not that TaxonName object is clearly referenced or separated out)
> ****
> 8.       Every data publisher should be encouraged to share all relevant
> data elements in their source data in the most convenient normalised or
> denormalised form, provided they use the recognised Darwin Core properties
> for elements that match the definition for those terms, and provided they
> give some metadata for other elements.  Possible forms include:****
> a.       A completely hierarchical, ABCD-like, XML representation****
> b.      A completely flat denormalised, simple-DwC-like, CVS
> representation, if the data includes no elements with higher cardinality**
> **
> c.       A set of flat, relational, CVS representations, as with Darwin
> Core Archive star schemas, but with freedom to have more complex graphed
> relationships as needed****
> 9.       Each table of CVS data in 8b and 8c is a view that corresponds
> to a linear subgraph of the TDWG ontology, identified by the classes of the
> DwC properties used – this allows us to infer the “shape” of the data in
> terms of the ontology****
> 10.   If we do this, we do not need to worry about whether a record is a
> checklist record, an event, an occurrence, a material sample or whatever
> else, although we could use the dcterms: type property, or some new
> property, to hold this detail as a further clue to intent and possible use
> for the record****
> ** **
> Here is an example.  In today’s terms, what sort of DwC record is this?
> Do I really have to replace “recordId” with “eventId”, “occurrenceId” or
> similar? And which should I choose?****
> ** **
> *recordId, decimalLatitude, decimalLongitude, coordinatePrecision,
> eventDate, scientificName, individualCount*
> ** **
> I think it is clear that this record tells us that there was a recording
> event at a particular time and place where someone or some process recorded
> a given number of individual organisms which were identified as
> representatives of a taxon concept with a name corresponding to the
> supplied scientific name.  In other words this gives us some properties
> from a subgraph that might include, say, instances of TDWG Event, Locality,
> Date, Occurrence, Identification, TaxonConcept and TaxonName classes. None
> of these is specifically referenced but we can unambiguously fold the flat
> record onto the ontology.  We can moreover then use the combination of
> supplied elements to decide whether this record would be of interest to
> GBIF, a national information facility, a tool cataloguing uses of
> scientific names, etc.  The same will also apply if multiple CVS tables are
> provided as in 8c.****
> ** **
> I have thought about this for a long time and cannot yet think of an area
> in which this would not work efficiently – and unambiguously – for all
> concerned.  There are some cases where multiple instances of the same
> ontology class would be referenced within a single record, which may mean
> more care is needed by the publisher (e.g. if an insect specimen record
> includes a reference to a host plant). There may be cases where automated
> review of the data indicates that there are impossible combinations or
> ambiguities that the publisher must resolve.  However I believe we could
> use this approach to generalise all mobilisation and consumption of
> biodiversity data (including all the things we have addressed under ABCD,
> SDD, TCS, Plinian Core, etc.) and to make it genuinely possible for any
> data holder to share all the data they have in a form that makes sense to
> them, while allowing others to consume these data intelligently.****
> ** **
> Right now, I think our confused use of basisOfRecord is almost the only
> thing that stops us from exploring this.  We have blurred the question of
> the evidence for a record, with the question of the “shape” of the record
> as a subgraph.  These are different things.  Separating them will allow us
> to get away from some of our unresolvable debates and open up the doors to
> much simpler data sharing and reuse.****
> ** **
> Thanks,****
> ** **
> Donald****
> ** **
> ----------------------------------------------------------------------****
> Donald Hobern - GBIF Director - dhobern at gbif.org****
> Global Biodiversity Information Facility http://www.gbif.org/****
> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark****
> Tel: +45 3532 1471  Mob: +45 2875 1471  Fax: +45 2875 1480****
> ----------------------------------------------------------------------****
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email:  r.page at bio.gla.ac.uk
> Tel:  +44 141 330 4778
> Fax:  +44 141 330 2792
> Skype:  rdmpage
> Facebook:  http://www.facebook.com/rdmpage
> LinkedIn:  http://uk.linkedin.com/in/rdmpage
> Twitter:  http://twitter.com/rdmpage
> Blog:  http://iphylo.blogspot.com
> Home page:  http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> Wikipedia:  http://en.wikipedia.org/wiki/Roderic_D._M._Page
> Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> ORCID:  http://orcid.org/0000-0002-7101-9767
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20131013/a7b4e493/attachment.html 


More information about the tdwg-content mailing list