Re: [tdwg-content] A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent)

13 Oct 2013

      Sorry, I don't agree at all. 

The core Darwin-SW classes include only Darwin Core classes and the two 
proposed DwC classes (Organism and CollectionObject a.k.a. 
dsw:IndividualOrganism and dsw:Evidence) which underwent 30 day public 
comment period [1] and were submitted to the Executive which recommended 
further consideration by the RDF Task Group and the community at large.  
The Documenting Darwin Core sessions at the TDWG meeting will pick up 
these and other open issues for further discussion and hopefully move 
them towards closure one way or the other.  If the two proposed classes 
are at some point accepted for inclusion in DwC, Darwin-SW will use the 
new classes and deprecate dsw:IndividualOrganism and dsw:Evidence, 
leaving only Darwin Core classes as the core classes in Darwin-SW. 

It is NOT my view that Darwin-SW is unable to handle current needs for 
linking resources effectively.  If anyone wants to know why I say that, 
come to our talk in the Friday 9AM session on Ontologies and Formal 
Models at the meeting.  We will show how real SPARQL queries on 
Darwin-SW-based data can address important competency questions 
involving diverse linked resources.  Or see me any time during the 
meeting earlier in the week and I'll be happy to give you a personal 
demonstration not limited to 9 minutes. 

Steve

[1] 
http://lists.tdwg.org/pipermail/tdwg-content/2011-September/002727.html  
see also open issue https://code.google.com/p/darwincore/issues/detail?id=69

Robert Guralnick wrote:
...
Rod --- There are a couple different conceptions of 
interrelationships between Darwin Core "classes", including the Darwin 
Core Semantic Web effort led by Steve Baskauf and Cam Web, and the 
BiSciCol project.  Darwin Core SW is 
here: https://code.google.com/p/darwin-sw/ and the BiSciCol "take" is 
here:  http://biscicol.blogspot.com/2013_03_01_archive.html.  The 
Darwin Core SW version includes new classes not in Darwin Core, while 
BiSciCol uses only existing class terms and a very simple set of 
predicates.
I think in many people's view, including those of the authors of the 
above (although I hate speaking for them), neither DW-SW or 
DW-BiSciCol may be really able to handle the current needs for linking 
resources together effectively.  There has been a major effort to 
refocus away from jury-rigging Darwin Core to try to serve in a more 
semantic framework and pushing towards other solutions that align 
biodiversity standards more with the OBO Foundry 
(http://www.obofoundry.org/).  The Biocollections Ontology 
(BCO; https://code.google.com/p/bco/) represents (what I hope) is a 
clear rethinking of the challenge that does connect back to the Darwin 
Core.
Best, Rob
On Sun, Oct 13, 2013 at 1:52 PM, Roderic Page <r.page@bio.gla.ac.uk 
<mailto:r.page@bio.gla.ac.uk>> wrote:
I've always been somewhat puzzled by the disconnect between the
    TDWG LSID ontology
    (e.g., http://rs.tdwg.org/ontology/voc/TaxonConcept ) which has a
    rich set of classes and links between those classes, and Darwin
    Core
    (e.g., http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm )
    which overlaps with this vocabulary and, in my opinion, does a
    worse job in some areas, notably taxon names and concepts. Maybe
    the LSID vocabulary suffered from the limited uptake of LSIDs
    (apart from the nomenclators and Catalogue of Life) or from the
    complexity of dealing with RDF, but it seems that much of the
    essential work was done when Roger Hyam created that ontology.
What might help is a way to visualise the TDWG LSID ontology in
    terms of the interconnections between the different classes. I'm
    not aware of such a visualisation (nor of an equivalent one for
    the Darwin Core classes).
In any event, it seems odd to have two distinct ontologies that
    are both in use, and which overlap so significantly.
Regards
Rod
    On 13 Oct 2013, at 16:12, Donald Hobern [GBIF] wrote:
...
It’s been a couple of weeks but I said I’d try to write something
    about a more general concern I have around the way we use
    basisOfRecord and dcterms:type to hold values like occurrence,
    event and materialSample.  This is something that has concerned
    me for years and that, I worry, is making everything we all do
    much messier than it need be.
I believe that the way we have come to use Darwin Core
    basisOfRecord is confused and unhelpful.  I really wish we used
    Darwin Core like this:
1.       basisOfRecord should be used ONLY to indicate the type
    of evidence that lies behind a record – a key aspect of whether
    the record is likely to be useful for different purposes
    2.       basisOfRecord values should be taken from a hierarchical
    vocabulary with three main branches:
    a.       “specimens” (i.e. biological material that can be
    reviewed), with a hierarchy of subordinate values such as
    “pinnedSpecimen”, “herbariumSheet”, etc.
    b.      derived, non-biological evidence (not sure what name),
    with a hierarchy of subordinate values such as “dnaSequence”,
    “soundRecording”, “stillImage”, etc.
    c.       asserted observations with no revisitable evidence other
    than the authority of the observer
    3.       TDWG should deliver a basic ontology in the form of a
    graph of key relationships between the most significant
    conceptual entities in our world (TaxonName, TaxonConcept,
    Identification, Collection, Specimen, Locality, Agent, …)
    4.       This ontology should not attempt to map all the
    complexity of biodiversity-related data – just provide the
    high-level map and key relationships (TaxonConcept hasName
    TaxonName, Specimen heldIn Collection, etc.) – it should leave
    definition of other properties as a separate, open-ended activity
    for the community
    5.       This ontology should be reviewed at regular intervals
    and versioned as necessary to address critical gaps – provided
    that backwards compatibility is maintained (splitting a class
    into multiple consitituent classes probably won’t break anything,
    so start simple)
    6.       The Darwin Core vocabulary should be published as a
    flat, open-ended list of terms with clear definitions that can be
    freely combined as columns in denormalised records
    7.       Every Darwin Core term should be documented to be
    tightly associated with a single, fixed class in the ontology
    (e.g. scientificName and specificEpithet are ALWAYS considered to
    be properties of a TaxonName whether or not that TaxonName object
    is clearly referenced or separated out)
    8.       Every data publisher should be encouraged to share all
    relevant data elements in their source data in the most
    convenient normalised or denormalised form, provided they use the
    recognised Darwin Core properties for elements that match the
    definition for those terms, and provided they give some metadata
    for other elements.  Possible forms include:
    a.       A completely hierarchical, ABCD-like, XML representation
    b.      A completely flat denormalised, simple-DwC-like, CVS
    representation, if the data includes no elements with higher
    cardinality
    c.       A set of flat, relational, CVS representations, as with
    Darwin Core Archive star schemas, but with freedom to have more
    complex graphed relationships as needed
    9.       Each table of CVS data in 8b and 8c is a view that
    corresponds to a linear subgraph of the TDWG ontology, identified
    by the classes of the DwC properties used – this allows us to
    infer the “shape” of the data in terms of the ontology
    10.   If we do this, we do not need to worry about whether a
    record is a checklist record, an event, an occurrence, a material
    sample or whatever else, although we could use the dcterms: type
    property, or some new property, to hold this detail as a further
    clue to intent and possible use for the record
Here is an example.  In today’s terms, what sort of DwC record is
    this?  Do I really have to replace “recordId” with “eventId”,
    “occurrenceId” or similar? And which should I choose?
*recordId, decimalLatitude, decimalLongitude,
    coordinatePrecision, eventDate, scientificName, individualCount*
I think it is clear that this record tells us that there was a
    recording event at a particular time and place where someone or
    some process recorded a given number of individual organisms
    which were identified as representatives of a taxon concept with
    a name corresponding to the supplied scientific name.  In other
    words this gives us some properties from a subgraph that might
    include, say, instances of TDWG Event, Locality, Date,
    Occurrence, Identification, TaxonConcept and TaxonName classes.
    None of these is specifically referenced but we can unambiguously
    fold the flat record onto the ontology.  We can moreover then use
    the combination of supplied elements to decide whether this
    record would be of interest to GBIF, a national information
    facility, a tool cataloguing uses of scientific names, etc.  The
    same will also apply if multiple CVS tables are provided as in 8c.
I have thought about this for a long time and cannot yet think of
    an area in which this would not work efficiently – and
    unambiguously – for all concerned.  There are some cases where
    multiple instances of the same ontology class would be referenced
    within a single record, which may mean more care is needed by the
    publisher (e.g. if an insect specimen record includes a reference
    to a host plant). There may be cases where automated review of
    the data indicates that there are impossible combinations or
    ambiguities that the publisher must resolve.  However I believe
    we could use this approach to generalise all mobilisation and
    consumption of biodiversity data (including all the things we
    have addressed under ABCD, SDD, TCS, Plinian Core, etc.) and to
    make it genuinely possible for any data holder to share all the
    data they have in a form that makes sense to them, while allowing
    others to consume these data intelligently.
Right now, I think our confused use of basisOfRecord is almost
    the only thing that stops us from exploring this.  We have
    blurred the question of the evidence for a record, with the
    question of the “shape” of the record as a subgraph.  These are
    different things.  Separating them will allow us to get away from
    some of our unresolvable debates and open up the doors to much
    simpler data sharing and reuse.
Thanks,
Donald
----------------------------------------------------------------------
    Donald Hobern - GBIF Director - dhobern@gbif.org
    <mailto:dhobern@gbif.org>
    Global Biodiversity Information Facility http://www.gbif.org/
    GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø,
    Denmark
    Tel: +45 3532 1471 <tel:%2B45%203532%201471>  Mob: +45 2875 1471
    <tel:%2B45%202875%201471>  Fax: +45 2875 1480
    <tel:%2B45%202875%201480>
    ----------------------------------------------------------------------
    _______________________________________________
    tdwg-content mailing list
    tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>
    http://lists.tdwg.org/mailman/listinfo/tdwg-content
---------------------------------------------------------
    Roderic Page
    Professor of Taxonomy
    Institute of Biodiversity, Animal Health and Comparative Medicine
    College of Medical, Veterinary and Life Sciences
    Graham Kerr Building
    University of Glasgow
    Glasgow G12 8QQ, UK
Email:  r.page@bio.gla.ac.uk <mailto:r.page@bio.gla.ac.uk>
    Tel:  +44 141 330 4778 <tel:%2B44%20141%20330%204778>
    Fax:  +44 141 330 2792 <tel:%2B44%20141%20330%202792>
    Skype:  rdmpage
    Facebook:  http://www.facebook.com/rdmpage
    LinkedIn:  http://uk.linkedin.com/in/rdmpage
    Twitter:  http://twitter.com/rdmpage
    Blog:  http://iphylo.blogspot.com
    Home page:  http://taxonomy.zoology.gla.ac.uk/rod/rod.html
    Wikipedia:  http://en.wikipedia.org/wiki/Roderic_D._M._Page
    Citations: 
    http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
    <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ>
    ORCID:  http://orcid.org/0000-0002-7101-9767
_______________________________________________
    tdwg-content mailing list
    tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>
    http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu