Can all DwC be Simple DwC?
It may be worth teasing out the boundary of what can/cannot be represented via simple (i.e. flat) Darwin Core. Auxilliary terms (which comprise the classes "MeasurementOrFact" and "ResourceRelationship") are currently left out of Simple Darwin Core. The DwC Terms Quick Reference Guide [1] states: "The terms found under the Auxiliary Terms section can only be meaningfully implemented in an application that supports relational structures".
The Simple Darwin Core reference [2] explains "Every field in the Simple Darwin Core may appear either once or not at all in a single record - otherwise how could you distinguish one 'scientificName' field from another one?"
Couldn't this be addressed using subscripts? E.g. MeasurementID_1="tail length" MeasurementID_2="brain volume" MeasurementUnit_1="cm" MeasurementUnit_2="cubic cm" MeasurementValue_1="9" MeasurementValue_2="8"
(There is precedent for subscripting rdf properties: http://www.w3.org/TR/rdf-schema/#ch_containermembershipproperty)
Allowing this would increase the types of data that can be exchanged via spreadsheet. It would also have the virtue of permitting a single DwC standard, which would help to lesssen confusion amongst potential adopters. (The existence of a "Simple Darwin Core" seems to imply that "regular Darwin Core" is not simple.) Are there reasons not to do this?
Thanks - Joel.
1. http://rs.tdwg.org/dwc/terms/ 2. http://rs.tdwg.org/dwc/terms/simple/index.htm
Couldn't this be addressed using subscripts? E.g. MeasurementID_1="tail length" MeasurementID_2="brain volume" MeasurementUnit_1="cm" MeasurementUnit_2="cubic cm" MeasurementValue_1="9" MeasurementValue_2="8"
I have no problems with pragmatism, but why are the characters not RDF concepts itself? I believe they should be rdf vocabularies for those who want to reason on them.
One more problem: the states can be polymorphic, as in
MeasurementID_1="flower color" MeasurementValue_1="red" MeasurementValue_1="orange"
Gregor
Hi Joel and Gregor,
This is exactly the type of issue being addressed by observation models in the SONet and TDWG Observations groups. Existing observations models (e.g., OBOE, O+M, etc) do provide the ability to use terms from OWL vocabularies for the entity being measured and the measured characteristics as well as for units, value space domains, and a variety of other salient components of the observations. I think there's a lot of potential for crosswalking these with characters described in EQ and SDD. If you're interested in the observations work, you might be interested in the slides and some of the background material that we prepared about the ways in which different observational data models work and how they serialize observational data, which are listed on the agenda from the last TDWG Observations meeting: https://sonet.ecoinformatics.org/workshops/tdwg-2010-meeting
These models address exactly the case described, but they don't try to put complex, multidimensional data relations into a single flat structure as is proposed using the subscripts shown here.
Observational data modeling work is discussed on the open mailing list ' obs@ecoinformatics.org' if you want to join in the discussion (subscribe at http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/obs).
Cheers, Matt
On Wed, Jan 5, 2011 at 12:22 PM, Gregor Hagedorn g.m.hagedorn@gmail.comwrote:
Couldn't this be addressed using subscripts? E.g. MeasurementID_1="tail length" MeasurementID_2="brain volume" MeasurementUnit_1="cm" MeasurementUnit_2="cubic cm" MeasurementValue_1="9" MeasurementValue_2="8"
I have no problems with pragmatism, but why are the characters not RDF concepts itself? I believe they should be rdf vocabularies for those who want to reason on them.
One more problem: the states can be polymorphic, as in
MeasurementID_1="flower color" MeasurementValue_1="red" MeasurementValue_1="orange"
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Matt -
Thanks for the invitation - I've joined the list. One of the posts that caught my attention during October's tdwg-content avalanche was Cam Webb's discussion of some of the differences in the various approaches (OBOE, EQ, SDD, etc.) to representing character data [1]. Cam used the following example to show how observation ontologies could be worked into occurrence records:
---- @prefix oboe: http://ecoinformatics.org/oboe/oboe.1.0/oboe-core.owl# . @prefix dwc: http://rs.tdwg.org/dwc/terms/ . @prefix dcterms: http://purl.org/dc/terms/ . @prefix sernec: http://bioimages.vanderbilt.edu/rdf/terms# . @prefix geo: http://www.w3.org/2003/01/geo/wgs84_pos# . @prefix : <#> .
http://phylodiversity.net/xmalesia/indiv/9 a sernec:Individual ; sernec:derivativeOccurrence _:blank1 .
_:blank1 # The Occurrence of the Individual... a dwc:Occurrence ; # ... at a position in space-time... dcterms:created "2008-01-01" ; dcterms:spatial [ a dcterms:Location ; geo:lon "109.95371" ; geo:lat "-1.25530" ; dwc:coordinateUncertaintyInMeters "100" ; ] ; # is a recordable OBOE Entity a oboe:Entity ; # as recorded by a human dcterms:creator "Cam Webb" ; dwc:basisOfRecord "HumanObservation" .
# The details of the observation: [] a oboe:Observation ; oboe:ofEntity [ # The observed entity is actually *part of* the occurrence # of the Individual at a particular Space and Time a :Fruit ; :partOf _:blank1 ; ] ; oboe:hasMeasurement [ oboe:ofCharacteristic :Color ; oboe:hasValue :Green ; ] .
:Fruit a oboe:Entity . :Color a oboe:Characteristic . :Green a oboe:Entity .
---
I was interested at the time (and still am) in exploring the capabilities and limits of Darwin Core, so I represented the above in DwC as: --- dcterms:creator Cam Webb dcterms:created 2008-01-01 geo:lon 109.95371 geo:lat -1.25530 dwc:coordinateUncertaintyInMeters 100 dwc:individualID http://phylodiversity.net/xmalesia/indiv/9 dwc:basisOfRecord HumanObservation
dwc:measurementType fruit colour dwc:measurementValue green ---
In other words, Darwin Core, as it exists, can be used to represent EQ modeling. It would seem easy to extend it to accomodate OBOE modeling also, e.g.
dwc:measurementEntity fruit dwc:measurementCharacteristic colour dwc:measurementValue green
As Cam pointed out, either of these approaches would make it easy to incorporate OBO ontologies such as PATO into Darwin Core records. It seems to me that how the model is serialized (flat, nested, etc.) doesn't really alter its semantics.
Looking back at the the obs-archives, I see that, conversly, Darwin Core occurrence records can be expressed via OBOE.
What do you think of the idea of using this year's TDWG field event (if there is one) to explore capturing and reasoning over observational data?
All the best - Joel. ---
1. http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001736.html
On Wed, 5 Jan 2011, Matt Jones wrote:
Hi Joel and Gregor,
This is exactly the type of issue being addressed by observation models in the SONet and TDWG Observations groups. Existing observations models (e.g., OBOE, O+M, etc) do provide the ability to use terms from OWL vocabularies for the entity being measured and the measured characteristics as well as for units, value space domains, and a variety of other salient components of the observations. I think there's a lot of potential for crosswalking these with characters described in EQ and SDD. If you're interested in the observations work, you might be interested in the slides and some of the background material that we prepared about the ways in which different observational data models work and how they serialize observational data, which are listed on the agenda from the last TDWG Observations meeting: https://sonet.ecoinformatics.org/workshops/tdwg-2010-meeting
These models address exactly the case described, but they don't try to put complex, multidimensional data relations into a single flat structure as is proposed using the subscripts shown here.
Observational data modeling work is discussed on the open mailing list ' obs@ecoinformatics.org' if you want to join in the discussion (subscribe at http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/obs).
Cheers, Matt
On Wed, Jan 5, 2011 at 12:22 PM, Gregor Hagedorn g.m.hagedorn@gmail.comwrote:
Couldn't this be addressed using subscripts? E.g. MeasurementID_1="tail length" MeasurementID_2="brain volume" MeasurementUnit_1="cm" MeasurementUnit_2="cubic cm" MeasurementValue_1="9" MeasurementValue_2="8"
I have no problems with pragmatism, but why are the characters not RDF concepts itself? I believe they should be rdf vocabularies for those who want to reason on them.
One more problem: the states can be polymorphic, as in
MeasurementID_1="flower color" MeasurementValue_1="red" MeasurementValue_1="orange"
Gregor _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Wed, 5 Jan 2011, Gregor Hagedorn wrote:
Couldn't this be addressed using subscripts? E.g. MeasurementID_1="tail length" MeasurementID_2="brain volume" MeasurementUnit_1="cm" MeasurementUnit_2="cubic cm" MeasurementValue_1="9" MeasurementValue_2="8"
I have no problems with pragmatism, but why are the characters not RDF concepts itself? I believe they should be rdf vocabularies for those who want to reason on them.
First, please let me correct myself. Everywhere I wrote "MeasurementID", I should have written "measurementType". To your point ...
Recommended best practice for DwC:measurementType is to use a controlled vocabulary. The controlled vocabulary can be an ontology. I was using literals only for simplicity of illustration. (We should probably start collecting controlled vocabularies/ontologies to point to from the documentation.)
One more problem: the states can be polymorphic, as in
MeasurementID_1="flower color" MeasurementValue_1="red" MeasurementValue_1="orange"
Yes, although I don't see it as a problem. I'm not sure why the following restriction (from the Simple Darwin Core reference) is necessary: "Every field in the Simple Darwin Core may appear either once or not at all in a single record." For example, we considered the bioblitz records to be simple Darwin Core, even though there were multiple taxonConceptID columns (for the same scientificName).
Joel.
Gregor
participants (3)
-
Gregor Hagedorn
-
joel sachs
-
Matt Jones