Re: [tdwg-content] A plea around basisOfRecord
There is a lot of content here, but let me try to address a few topics.
From Donald's original email:
2. basisOfRecord values should be taken from a hierarchical vocabulary with three main branches: a. ?specimens? (i.e. biological material that can be reviewed), with a hierarchy of subordinate values such as ?pinnedSpecimen?, ?herbariumSheet?, etc. b. derived, non-biological evidence (not sure what name), with a hierarchy of subordinate values such as ?dnaSequence?, ?soundRecording?, ?stillImage?, etc. c. asserted observations with no revisitable evidence other than the authority of the observer
I agree with the idea of having basis of record organized as a hierarchical vocabulary (preferably as an ontology), but disagree with the proposed hierarchy, mostly because b groups fundamentally different types of entities. The BCO is working on just such a hierarchy, and could easily include any terms needed for basis of record. If the exact subclassOf hierarchy of the BCO is not ideal for Darwin Core, syntactic classes that group terms based on other criteria can be used for DwC.
3. TDWG should deliver a basic ontology in the form of a graph of key relationships between the most significant conceptual entities in our world (TaxonName, TaxonConcept, Identification, Collection, Specimen, Locality, Agent, ?)
The BCO could serve as this basic ontology for TDWG. Some additional classes would be needed, and maybe those could come from the TDWG ontology (but I would need to look into that). The BCO is intended to serve a broader need than just linking Darwin Core archives, but it is a simple matter to make a subset that would serve the specific needs of TDWG. The advantage of using BCO is that it is designed to be compatible with other kinds of life science ontologies and data, which makes it more flexible and more likely that it will fulfill unforseen future needs. I (along with co-authors) plan to give some more background on this approach in a talk at TDWG in a few weeks. I am not ruling out DSW as a potential solution, either. My main concerns with DSW are 1) it is tightly coupled to the Darwin Core and therefor inherits some of the limitations of the Darwin Core. 2) many of the classes and relations are application specific, and therefor not interpretable outside the context of the application. I am looking forward to learning more about DSW at the upcoming TDWG meeting, where I expect there will be lengthy discussions of the relative merits of DSW versus BCO. Both ontologies will be presented in the same session.
4. This ontology should not attempt to map all the complexity of biodiversity-related data ? just provide the high-level map and key relationships (TaxonConcept hasName TaxonName, Specimen heldIn Collection, etc.) ? it should leave definition of other properties as a separate, open-ended activity for the community
The BCO is more of an open-ended activity that attempts to model of biodiversity, but, as I mentioned above, subsets can be created for more specific needs. In contrast, creating small, application-specific ontologies for every application does little to overcome the problem of data silos. When everyone makes their own ontology, we are not any better off than when there are no ontologies.
5. This ontology should be reviewed at regular intervals and versioned as necessary to address critical gaps ? provided that backwards compatibility is maintained (splitting a class into multiple consitituent classes probably won?t break anything, so start simple)
An essential part of any ontology is maintenance.
7. Every Darwin Core term should be documented to be tightly associated with a single, fixed class in the ontology (e.g. scientificName and specificEpithet are ALWAYS considered to be properties of a TaxonName whether or not that TaxonName object is clearly referenced or separated out)
What I see as one of the fundamental problems with DwC is that the bulk of the terms are properties. That means that when data suppliers enter data in a spread sheet, they are entering literals or URIs that are the objects of these properties. Having hundreds of properties that are unique to DwC seriously limits the interoperability of DwC tagged data sets (as I mentioned in response to point 4). I think a better solution would be to create ontology classes for most of the DwC terms (along with a few properties), and then create data annotations that are instances of those classes. That is a much more common way of organizing ontology-annotated data and would allow reasoners to work over the data. However, this would be a fundamental change to the nature of the Darwin Core.
8. Every data publisher should be encouraged to share all relevant data elements in their source data in the most convenient normalised or denormalised form, provided they use the recognised Darwin Core properties for elements that match the definition for those terms, and provided they give some metadata for other elements. Possible forms include: a. A completely hierarchical, ABCD-like, XML representation b. A completely flat denormalised, simple-DwC-like, CVS representation, if the data includes no elements with higher cardinality c. A set of flat, relational, CVS representations, as with Darwin Core Archive star schemas, but with freedom to have more complex graphed relationships as needed
It is important to limit the set of acceptable formats, and develop tools for interconversion among those formats.
9. Each table of CVS data in 8b and 8c is a view that corresponds to a linear subgraph of the TDWG ontology, identified by the classes of the DwC properties used ? this allows us to infer the ?shape? of the data in terms of the ontology
If all DwC terms were classes or properties (depending on which was more relevant) in an ontology, the graph could be inferred automatically. There are already tools for converting tabular ontology-tagged data into an ontology graph.
Ramona ------------------------------------------------------ Ramona L. Walls, Ph.D. Scientific Analyst, The iPlant Collaborative, University of Arizona Laboratory Research Associate, New York Botanical Garden
On Mon, Oct 14, 2013 at 3:00 AM, tdwg-content-request@lists.tdwg.orgwrote:
Send tdwg-content mailing list submissions to tdwg-content@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-content or, via email, send a message with subject or body 'help' to tdwg-content-request@lists.tdwg.org
You can reach the person managing the list at tdwg-content-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-content digest..."
Today's Topics:
- Re: A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent) (Steve Baskauf)
- Re: A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent) (Steve Baskauf)
Message: 1 Date: Sun, 13 Oct 2013 17:13:58 -0500 From: Steve Baskauf steve.baskauf@vanderbilt.edu Subject: Re: [tdwg-content] A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent) To: Robert Guralnick Robert.Guralnick@colorado.edu Cc: TDWG Content Mailing List tdwg-content@lists.tdwg.org Message-ID: 525B1B26.1010806@vanderbilt.edu Content-Type: text/plain; charset="windows-1252"
Sorry, I don't agree at all.
The core Darwin-SW classes include only Darwin Core classes and the two proposed DwC classes (Organism and CollectionObject a.k.a. dsw:IndividualOrganism and dsw:Evidence) which underwent 30 day public comment period [1] and were submitted to the Executive which recommended further consideration by the RDF Task Group and the community at large. The Documenting Darwin Core sessions at the TDWG meeting will pick up these and other open issues for further discussion and hopefully move them towards closure one way or the other. If the two proposed classes are at some point accepted for inclusion in DwC, Darwin-SW will use the new classes and deprecate dsw:IndividualOrganism and dsw:Evidence, leaving only Darwin Core classes as the core classes in Darwin-SW.
It is NOT my view that Darwin-SW is unable to handle current needs for linking resources effectively. If anyone wants to know why I say that, come to our talk in the Friday 9AM session on Ontologies and Formal Models at the meeting. We will show how real SPARQL queries on Darwin-SW-based data can address important competency questions involving diverse linked resources. Or see me any time during the meeting earlier in the week and I'll be happy to give you a personal demonstration not limited to 9 minutes.
Steve
[1] http://lists.tdwg.org/pipermail/tdwg-content/2011-September/002727.html see also open issue https://code.google.com/p/darwincore/issues/detail?id=69
Robert Guralnick wrote:
Rod --- There are a couple different conceptions of interrelationships between Darwin Core "classes", including the Darwin Core Semantic Web effort led by Steve Baskauf and Cam Web, and the BiSciCol project. Darwin Core SW is here: https://code.google.com/p/darwin-sw/ and the BiSciCol "take" is here: http://biscicol.blogspot.com/2013_03_01_archive.html. The Darwin Core SW version includes new classes not in Darwin Core, while BiSciCol uses only existing class terms and a very simple set of predicates.
I think in many people's view, including those of the authors of the above (although I hate speaking for them), neither DW-SW or DW-BiSciCol may be really able to handle the current needs for linking resources together effectively. There has been a major effort to refocus away from jury-rigging Darwin Core to try to serve in a more semantic framework and pushing towards other solutions that align biodiversity standards more with the OBO Foundry (http://www.obofoundry.org/). The Biocollections Ontology (BCO; https://code.google.com/p/bco/) represents (what I hope) is a clear rethinking of the challenge that does connect back to the Darwin Core.
Best, Rob
On Sun, Oct 13, 2013 at 1:52 PM, Roderic Page <r.page@bio.gla.ac.uk mailto:r.page@bio.gla.ac.uk> wrote:
I've always been somewhat puzzled by the disconnect between the TDWG LSID ontology (e.g., http://rs.tdwg.org/ontology/voc/TaxonConcept ) which has a rich set of classes and links between those classes, and Darwin Core (e.g., http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm ) which overlaps with this vocabulary and, in my opinion, does a worse job in some areas, notably taxon names and concepts. Maybe the LSID vocabulary suffered from the limited uptake of LSIDs (apart from the nomenclators and Catalogue of Life) or from the complexity of dealing with RDF, but it seems that much of the essential work was done when Roger Hyam created that ontology. What might help is a way to visualise the TDWG LSID ontology in terms of the interconnections between the different classes. I'm not aware of such a visualisation (nor of an equivalent one for the Darwin Core classes). In any event, it seems odd to have two distinct ontologies that are both in use, and which overlap so significantly. Regards Rod On 13 Oct 2013, at 16:12, Donald Hobern [GBIF] wrote:
It?s been a couple of weeks but I said I?d try to write something about a more general concern I have around the way we use basisOfRecord and dcterms:type to hold values like occurrence, event and materialSample. This is something that has concerned me for years and that, I worry, is making everything we all do much messier than it need be. I believe that the way we have come to use Darwin Core basisOfRecord is confused and unhelpful. I really wish we used Darwin Core like this: 1. basisOfRecord should be used ONLY to indicate the type of evidence that lies behind a record ? a key aspect of whether the record is likely to be useful for different purposes 2. basisOfRecord values should be taken from a hierarchical vocabulary with three main branches: a. ?specimens? (i.e. biological material that can be reviewed), with a hierarchy of subordinate values such as ?pinnedSpecimen?, ?herbariumSheet?, etc. b. derived, non-biological evidence (not sure what name), with a hierarchy of subordinate values such as ?dnaSequence?, ?soundRecording?, ?stillImage?, etc. c. asserted observations with no revisitable evidence other than the authority of the observer 3. TDWG should deliver a basic ontology in the form of a graph of key relationships between the most significant conceptual entities in our world (TaxonName, TaxonConcept, Identification, Collection, Specimen, Locality, Agent, ?) 4. This ontology should not attempt to map all the complexity of biodiversity-related data ? just provide the high-level map and key relationships (TaxonConcept hasName TaxonName, Specimen heldIn Collection, etc.) ? it should leave definition of other properties as a separate, open-ended activity for the community 5. This ontology should be reviewed at regular intervals and versioned as necessary to address critical gaps ? provided that backwards compatibility is maintained (splitting a class into multiple consitituent classes probably won?t break anything, so start simple) 6. The Darwin Core vocabulary should be published as a flat, open-ended list of terms with clear definitions that can be freely combined as columns in denormalised records 7. Every Darwin Core term should be documented to be tightly associated with a single, fixed class in the ontology (e.g. scientificName and specificEpithet are ALWAYS considered to be properties of a TaxonName whether or not that TaxonName object is clearly referenced or separated out) 8. Every data publisher should be encouraged to share all relevant data elements in their source data in the most convenient normalised or denormalised form, provided they use the recognised Darwin Core properties for elements that match the definition for those terms, and provided they give some metadata for other elements. Possible forms include: a. A completely hierarchical, ABCD-like, XML representation b. A completely flat denormalised, simple-DwC-like, CVS representation, if the data includes no elements with higher cardinality c. A set of flat, relational, CVS representations, as with Darwin Core Archive star schemas, but with freedom to have more complex graphed relationships as needed 9. Each table of CVS data in 8b and 8c is a view that corresponds to a linear subgraph of the TDWG ontology, identified by the classes of the DwC properties used ? this allows us to infer the ?shape? of the data in terms of the ontology 10. If we do this, we do not need to worry about whether a record is a checklist record, an event, an occurrence, a material sample or whatever else, although we could use the dcterms: type property, or some new property, to hold this detail as a further clue to intent and possible use for the record Here is an example. In today?s terms, what sort of DwC record is this? Do I really have to replace ?recordId? with ?eventId?, ?occurrenceId? or similar? And which should I choose? *recordId, decimalLatitude, decimalLongitude, coordinatePrecision, eventDate, scientificName, individualCount* I think it is clear that this record tells us that there was a recording event at a particular time and place where someone or some process recorded a given number of individual organisms which were identified as representatives of a taxon concept with a name corresponding to the supplied scientific name. In other words this gives us some properties from a subgraph that might include, say, instances of TDWG Event, Locality, Date, Occurrence, Identification, TaxonConcept and TaxonName classes. None of these is specifically referenced but we can unambiguously fold the flat record onto the ontology. We can moreover then use the combination of supplied elements to decide whether this record would be of interest to GBIF, a national information facility, a tool cataloguing uses of scientific names, etc. The same will also apply if multiple CVS tables are provided as in 8c. I have thought about this for a long time and cannot yet think of an area in which this would not work efficiently ? and unambiguously ? for all concerned. There are some cases where multiple instances of the same ontology class would be referenced within a single record, which may mean more care is needed by the publisher (e.g. if an insect specimen record includes a reference to a host plant). There may be cases where automated review of the data indicates that there are impossible combinations or ambiguities that the publisher must resolve. However I believe we could use this approach to generalise all mobilisation and consumption of biodiversity data (including all the things we have addressed under ABCD, SDD, TCS, Plinian Core, etc.) and to make it genuinely possible for any data holder to share all the data they have in a form that makes sense to them, while allowing others to consume these data intelligently. Right now, I think our confused use of basisOfRecord is almost the only thing that stops us from exploring this. We have blurred the question of the evidence for a record, with the question of the ?shape? of the record as a subgraph. These are different things. Separating them will allow us to get away from some of our unresolvable debates and open up the doors to much simpler data sharing and reuse. Thanks, Donald
Donald Hobern - GBIF Director - dhobern@gbif.org <mailto:dhobern@gbif.org> Global Biodiversity Information Facility http://www.gbif.org/ GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen ?, Denmark Tel: +45 3532 1471 <tel:%2B45%203532%201471> Mob: +45 2875 1471 <tel:%2B45%202875%201471> Fax: +45 2875 1480 <tel:%2B45%202875%201480>
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r.page@bio.gla.ac.uk <mailto:r.page@bio.gla.ac.uk> Tel: +44 141 330 4778 <tel:%2B44%20141%20330%204778> Fax: +44 141 330 2792 <tel:%2B44%20141%20330%202792> Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ> ORCID: http://orcid.org/0000-0002-7101-9767 _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
Some responses inline
Ramona Walls wrote: ... If the
3. TDWG should deliver a basic ontology in the form of a graph of key relationships between the most significant conceptual entities in our world (TaxonName, TaxonConcept, Identification, Collection, Specimen, Locality, Agent, ?)
The BCO could serve as this basic ontology for TDWG. Some additional classes would be needed, and maybe those could come from the TDWG ontology (but I would need to look into that).
The status of the TDWG ontology is probably too uncertain for this. The original intention was that parts of the ontology would become bona fide TDWG standards, but this never happened and the TDWG ontology is not being actively maintained. The Vocabulary Management Group report [1] discusses the status of the TDWG ontology and makes some recommendations. We'll see whether anything follows from those recommendations.
The BCO is intended to serve a broader need than just linking Darwin Core archives, but it is a simple matter to make a subset that would serve the specific needs of TDWG. The advantage of using BCO is that it is designed to be compatible with other kinds of life science ontologies and data, which makes it more flexible and more likely that it will fulfill unforseen future needs. I (along with co-authors) plan to give some more background on this approach in a talk at TDWG in a few weeks. I am not ruling out DSW as a potential solution, either. My main concerns with DSW are 1) it is tightly coupled to the Darwin Core and therefor inherits some of the limitations of the Darwin Core. 2) many of the classes and relations are application specific, and therefor not interpretable outside the context of the application. I am looking forward to learning more about DSW at the upcoming TDWG meeting, where I expect there will be lengthy discussions of the relative merits of DSW versus BCO. Both ontologies will be presented in the same session.
It is my belief that the design of BCO and DSW allows them to accomplish different kinds of things. I also hope there is some time to talk about this, although the session is scheduled on the last day of the conference, so I'm not sure how much time there will be left for a discussion to take place.
7. Every Darwin Core term should be documented to be tightly associated with a single, fixed class in the ontology (e.g. scientificName and specificEpithet are ALWAYS considered to be properties of a TaxonName whether or not that TaxonName object is clearly referenced or separated out)
What I see as one of the fundamental problems with DwC is that the bulk of the terms are properties. That means that when data suppliers enter data in a spread sheet, they are entering literals or URIs that are the objects of these properties. Having hundreds of properties that are unique to DwC
about 167 unique to DwC, not hundreds
seriously limits the interoperability of DwC tagged data sets (as I mentioned in response to point 4). I think a better solution would be to create ontology classes for most of the DwC terms (along with a few properties), and then create data annotations that are instances of those classes. That is a much more common way of organizing ontology-annotated data and would allow reasoners to work over the data. However, this would be a fundamental change to the nature of the Darwin Core.
When you say the "many class" approach is more common, it depends on what circles you are circulating in. It is certainly more common in the OBO ontology world. But I'm not sure that is true for TDWG in general in which this is an old discussion (see Roger Hyam/Bob Morris's commentary at [2], also possibly [3] and the thread that follows).
Steve
[1] http://www.gbif.org/resources/2246 [2] http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot Note particularly the comment "...The TDWG ontology's principal role is not modeling the entire domain to permit inference but allowing the mark up of data so that it will flow between applications as freely as possible. It has to be something that is easy to map into multiple technologies and something that people can agree on rapidly. This strongly suggests that the tagging approach should be taken wherever possible. First agree on the basic semantic units and model the rest of the semantics with tagging. Only subclass when absolutely necessary. ..." Also note comments about OWL inference. [3] http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002393.html in which I get educated by Bob Morris and Hilmar Lapp (typical)
Ramona
Ramona L. Walls, Ph.D. Scientific Analyst, The iPlant Collaborative, University of Arizona Laboratory Research Associate, New York Botanical Garden
On Mon, Oct 14, 2013 at 3:00 AM, <tdwg-content-request@lists.tdwg.org mailto:tdwg-content-request@lists.tdwg.org> wrote:
Send tdwg-content mailing list submissions to tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-content or, via email, send a message with subject or body 'help' to tdwg-content-request@lists.tdwg.org <mailto:tdwg-content-request@lists.tdwg.org> You can reach the person managing the list at tdwg-content-owner@lists.tdwg.org <mailto:tdwg-content-owner@lists.tdwg.org> When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-content digest..." Today's Topics: 1. Re: A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent) (Steve Baskauf) 2. Re: A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent) (Steve Baskauf) ---------------------------------------------------------------------- Message: 1 Date: Sun, 13 Oct 2013 17:13:58 -0500 From: Steve Baskauf <steve.baskauf@vanderbilt.edu <mailto:steve.baskauf@vanderbilt.edu>> Subject: Re: [tdwg-content] A plea around basisOfRecord (Was: Proposed new Darwin Core terms - abundance, abundanceAsPercent) To: Robert Guralnick <Robert.Guralnick@colorado.edu <mailto:Robert.Guralnick@colorado.edu>> Cc: TDWG Content Mailing List <tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>> Message-ID: <525B1B26.1010806@vanderbilt.edu <mailto:525B1B26.1010806@vanderbilt.edu>> Content-Type: text/plain; charset="windows-1252" Sorry, I don't agree at all. The core Darwin-SW classes include only Darwin Core classes and the two proposed DwC classes (Organism and CollectionObject a.k.a. dsw:IndividualOrganism and dsw:Evidence) which underwent 30 day public comment period [1] and were submitted to the Executive which recommended further consideration by the RDF Task Group and the community at large. The Documenting Darwin Core sessions at the TDWG meeting will pick up these and other open issues for further discussion and hopefully move them towards closure one way or the other. If the two proposed classes are at some point accepted for inclusion in DwC, Darwin-SW will use the new classes and deprecate dsw:IndividualOrganism and dsw:Evidence, leaving only Darwin Core classes as the core classes in Darwin-SW. It is NOT my view that Darwin-SW is unable to handle current needs for linking resources effectively. If anyone wants to know why I say that, come to our talk in the Friday 9AM session on Ontologies and Formal Models at the meeting. We will show how real SPARQL queries on Darwin-SW-based data can address important competency questions involving diverse linked resources. Or see me any time during the meeting earlier in the week and I'll be happy to give you a personal demonstration not limited to 9 minutes. Steve [1] http://lists.tdwg.org/pipermail/tdwg-content/2011-September/002727.html see also open issue https://code.google.com/p/darwincore/issues/detail?id=69 Robert Guralnick wrote: > > Rod --- There are a couple different conceptions of > interrelationships between Darwin Core "classes", including the Darwin > Core Semantic Web effort led by Steve Baskauf and Cam Web, and the > BiSciCol project. Darwin Core SW is > here: https://code.google.com/p/darwin-sw/ and the BiSciCol "take" is > here: http://biscicol.blogspot.com/2013_03_01_archive.html. The > Darwin Core SW version includes new classes not in Darwin Core, while > BiSciCol uses only existing class terms and a very simple set of > predicates. > > I think in many people's view, including those of the authors of the > above (although I hate speaking for them), neither DW-SW or > DW-BiSciCol may be really able to handle the current needs for linking > resources together effectively. There has been a major effort to > refocus away from jury-rigging Darwin Core to try to serve in a more > semantic framework and pushing towards other solutions that align > biodiversity standards more with the OBO Foundry > (http://www.obofoundry.org/). The Biocollections Ontology > (BCO; https://code.google.com/p/bco/) represents (what I hope) is a > clear rethinking of the challenge that does connect back to the Darwin > Core. > > Best, Rob > > > > On Sun, Oct 13, 2013 at 1:52 PM, Roderic Page <r.page@bio.gla.ac.uk <mailto:r.page@bio.gla.ac.uk> > <mailto:r.page@bio.gla.ac.uk <mailto:r.page@bio.gla.ac.uk>>> wrote: > > I've always been somewhat puzzled by the disconnect between the > TDWG LSID ontology > (e.g., http://rs.tdwg.org/ontology/voc/TaxonConcept ) which has a > rich set of classes and links between those classes, and Darwin > Core > (e.g., http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm ) > which overlaps with this vocabulary and, in my opinion, does a > worse job in some areas, notably taxon names and concepts. Maybe > the LSID vocabulary suffered from the limited uptake of LSIDs > (apart from the nomenclators and Catalogue of Life) or from the > complexity of dealing with RDF, but it seems that much of the > essential work was done when Roger Hyam created that ontology. > > What might help is a way to visualise the TDWG LSID ontology in > terms of the interconnections between the different classes. I'm > not aware of such a visualisation (nor of an equivalent one for > the Darwin Core classes). > > In any event, it seems odd to have two distinct ontologies that > are both in use, and which overlap so significantly. > > Regards > > Rod > On 13 Oct 2013, at 16:12, Donald Hobern [GBIF] wrote: > >> It?s been a couple of weeks but I said I?d try to write something >> about a more general concern I have around the way we use >> basisOfRecord and dcterms:type to hold values like occurrence, >> event and materialSample. This is something that has concerned >> me for years and that, I worry, is making everything we all do >> much messier than it need be. >> >> I believe that the way we have come to use Darwin Core >> basisOfRecord is confused and unhelpful. I really wish we used >> Darwin Core like this: >> >> 1. basisOfRecord should be used ONLY to indicate the type >> of evidence that lies behind a record ? a key aspect of whether >> the record is likely to be useful for different purposes >> 2. basisOfRecord values should be taken from a hierarchical >> vocabulary with three main branches: >> a. ?specimens? (i.e. biological material that can be >> reviewed), with a hierarchy of subordinate values such as >> ?pinnedSpecimen?, ?herbariumSheet?, etc. >> b. derived, non-biological evidence (not sure what name), >> with a hierarchy of subordinate values such as ?dnaSequence?, >> ?soundRecording?, ?stillImage?, etc. >> c. asserted observations with no revisitable evidence other >> than the authority of the observer >> 3. TDWG should deliver a basic ontology in the form of a >> graph of key relationships between the most significant >> conceptual entities in our world (TaxonName, TaxonConcept, >> Identification, Collection, Specimen, Locality, Agent, ?) >> 4. This ontology should not attempt to map all the >> complexity of biodiversity-related data ? just provide the >> high-level map and key relationships (TaxonConcept hasName >> TaxonName, Specimen heldIn Collection, etc.) ? it should leave >> definition of other properties as a separate, open-ended activity >> for the community >> 5. This ontology should be reviewed at regular intervals >> and versioned as necessary to address critical gaps ? provided >> that backwards compatibility is maintained (splitting a class >> into multiple consitituent classes probably won?t break anything, >> so start simple) >> 6. The Darwin Core vocabulary should be published as a >> flat, open-ended list of terms with clear definitions that can be >> freely combined as columns in denormalised records >> 7. Every Darwin Core term should be documented to be >> tightly associated with a single, fixed class in the ontology >> (e.g. scientificName and specificEpithet are ALWAYS considered to >> be properties of a TaxonName whether or not that TaxonName object >> is clearly referenced or separated out) >> 8. Every data publisher should be encouraged to share all >> relevant data elements in their source data in the most >> convenient normalised or denormalised form, provided they use the >> recognised Darwin Core properties for elements that match the >> definition for those terms, and provided they give some metadata >> for other elements. Possible forms include: >> a. A completely hierarchical, ABCD-like, XML representation >> b. A completely flat denormalised, simple-DwC-like, CVS >> representation, if the data includes no elements with higher >> cardinality >> c. A set of flat, relational, CVS representations, as with >> Darwin Core Archive star schemas, but with freedom to have more >> complex graphed relationships as needed >> 9. Each table of CVS data in 8b and 8c is a view that >> corresponds to a linear subgraph of the TDWG ontology, identified >> by the classes of the DwC properties used ? this allows us to >> infer the ?shape? of the data in terms of the ontology >> 10. If we do this, we do not need to worry about whether a >> record is a checklist record, an event, an occurrence, a material >> sample or whatever else, although we could use the dcterms: type >> property, or some new property, to hold this detail as a further >> clue to intent and possible use for the record >> >> Here is an example. In today?s terms, what sort of DwC record is >> this? Do I really have to replace ?recordId? with ?eventId?, >> ?occurrenceId? or similar? And which should I choose? >> >> *recordId, decimalLatitude, decimalLongitude, >> coordinatePrecision, eventDate, scientificName, individualCount* >> >> I think it is clear that this record tells us that there was a >> recording event at a particular time and place where someone or >> some process recorded a given number of individual organisms >> which were identified as representatives of a taxon concept with >> a name corresponding to the supplied scientific name. In other >> words this gives us some properties from a subgraph that might >> include, say, instances of TDWG Event, Locality, Date, >> Occurrence, Identification, TaxonConcept and TaxonName classes. >> None of these is specifically referenced but we can unambiguously >> fold the flat record onto the ontology. We can moreover then use >> the combination of supplied elements to decide whether this >> record would be of interest to GBIF, a national information >> facility, a tool cataloguing uses of scientific names, etc. The >> same will also apply if multiple CVS tables are provided as in 8c. >> >> I have thought about this for a long time and cannot yet think of >> an area in which this would not work efficiently ? and >> unambiguously ? for all concerned. There are some cases where >> multiple instances of the same ontology class would be referenced >> within a single record, which may mean more care is needed by the >> publisher (e.g. if an insect specimen record includes a reference >> to a host plant). There may be cases where automated review of >> the data indicates that there are impossible combinations or >> ambiguities that the publisher must resolve. However I believe >> we could use this approach to generalise all mobilisation and >> consumption of biodiversity data (including all the things we >> have addressed under ABCD, SDD, TCS, Plinian Core, etc.) and to >> make it genuinely possible for any data holder to share all the >> data they have in a form that makes sense to them, while allowing >> others to consume these data intelligently. >> >> Right now, I think our confused use of basisOfRecord is almost >> the only thing that stops us from exploring this. We have >> blurred the question of the evidence for a record, with the >> question of the ?shape? of the record as a subgraph. These are >> different things. Separating them will allow us to get away from >> some of our unresolvable debates and open up the doors to much >> simpler data sharing and reuse. >> >> Thanks, >> >> Donald >> >> ---------------------------------------------------------------------- >> Donald Hobern - GBIF Director - dhobern@gbif.org <mailto:dhobern@gbif.org> >> <mailto:dhobern@gbif.org <mailto:dhobern@gbif.org>> >> Global Biodiversity Information Facility http://www.gbif.org/ >> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen ?, >> Denmark >> Tel: +45 3532 1471 <tel:%2B45%203532%201471> <tel:%2B45%203532%201471> Mob: +45 2875 1471 <tel:%2B45%202875%201471> >> <tel:%2B45%202875%201471> Fax: +45 2875 1480 >> <tel:%2B45%202875%201480> >> ---------------------------------------------------------------------- >> _______________________________________________ >> tdwg-content mailing list >> tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> <mailto:tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content > > --------------------------------------------------------- > Roderic Page > Professor of Taxonomy > Institute of Biodiversity, Animal Health and Comparative Medicine > College of Medical, Veterinary and Life Sciences > Graham Kerr Building > University of Glasgow > Glasgow G12 8QQ, UK > > Email: r.page@bio.gla.ac.uk <mailto:r.page@bio.gla.ac.uk> <mailto:r.page@bio.gla.ac.uk <mailto:r.page@bio.gla.ac.uk>> > Tel: +44 141 330 4778 <tel:%2B44%20141%20330%204778> <tel:%2B44%20141%20330%204778> > Fax: +44 141 330 2792 <tel:%2B44%20141%20330%202792> <tel:%2B44%20141%20330%202792> > Skype: rdmpage > Facebook: http://www.facebook.com/rdmpage > LinkedIn: http://uk.linkedin.com/in/rdmpage > Twitter: http://twitter.com/rdmpage > Blog: http://iphylo.blogspot.com > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page > Citations: > http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ> > <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ <http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ>> > ORCID: http://orcid.org/0000-0002-7101-9767 > > > _______________________________________________ > tdwg-content mailing list > tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> <mailto:tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org>> > http://lists.tdwg.org/mailman/listinfo/tdwg-content > > -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 322-4942 <tel:%28615%29%20322-4942> If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu
participants (2)
-
Ramona Walls
-
Steve Baskauf