August 2016 - tdwg-content

Re: [tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class
by Nico Franz 19 Nov '25

19 Nov '25

Hello all: I have a few inserted comments, hopefully with some clarifying effect: On 5/4/2011 10:54 AM, Peter DeVries wrote: > Hi Steve, > > I may have overloaded the term specimen to make the explanation easier > to follow. > > A specimen could be an individual or it could be part of an individual. > > To some extent you need to think about how these models will be used. > > If you subscribe to the model that a species is whatever a taxonomists > says it is then it is difficult to make statements like. > > X% of the world's species will be extinct by 2050. > > If you mean a species as defined by the concept documented at this URI > which is supported by these specimens, images, and DNA then you are on > firmer ground. > > Species in the natural world do a pretty good job recognizing those > individuals that are appropriate mates. In other words members of > their own species. > > Are we modeling species or variations in human conceptualizations of > species? > >> Assuming that there was only one individual organism identified >> there is really one one species (or hybrid). > Aaaaaaaaaaaaaah! Please Nico C., don't take this one up! > > > I stick with this. Assuming you don't have a hybrid individual. That > individual is one species. The fact that human may disagree on what > species it is a human issue. > > Again, Are we modeling species or variations in human > conceptualizations of species? NMF: I believe there is a history of biodiversity database engineering that actually led some people, initially in Australia (Greg Whitbread?) and Europe (Berendsohn's group) to place more emphasis on modeling human conceptualizations of taxa. I understand this happened in the late 80s / early 90s. The motivation was, apparently, that no long-term consensus seemed available to model a single and sufficiently widely accepted taxonomy of, say, Central European vascular plant species or mosses. I believe that projects such as the Euro + Med PlantBase (http://www.emplantbase.org/home.html) are examples of the persistence (success?) of this 20+ year old practice. I think this is the kind of background that Steve is referring to. > > Which of these is of primary importance to decision makers and > non-taxonomist biologists? > Part of the problem with various publications relating to ontologies > and taxonomy is that their species models entail a specific > phylogenetic hypothesis. > > In the real world taxa are not as clean as some would like to make > them out to be. > > Each individual is a unique combination of thousands of separate gene > lineages which often do not follow clean monophyletic paths. > > I would argue that most of those who work with species related data > see them as useful typological constructs which in general follow the > biological species model. NMF: The solution of how to properly model things such as species in RDF, OWL (etc.) should ideally accommodate all "species concepts" (BCS, ESC, PSC, GSC, there's dozens), i.e. it should be neutral towards this issue. The examples below are too simple, ignoring for example cases such as pro parte synonymy, or a concept ("species") that appears unique and monophyletic to one author but (vastly) polyphyletic to another. To do an accurate mapping in that latter case (which is not super-common but it's real), I think one would have to come down on saying we're modeling human conceptualizations. They are not just that, so really it's not a strict dichotomy, meaning that by modeling conceptualizations we do not have to say "the species only exist on our heads", or "well, then anything goes". I agree (sheepishly) that it does come down to what the input data look like, and what the expectations are towards just accommodating the user community versus pushing them in a particular direction. Some of this also on p. 47 of this PDF: https://journals.ku.edu/index.php/jbi/article/view/3927/3852 > > /Aedes triseriatus/ owl:sameAs /Ochlerotatus triseriatus/ > > Others seem to see them as phylogenetic end nodes which entail a > specific phylogenetic history. > > /Aedes triseriatus/ distinctFrom /Ochlerotatus triseriatus/ > / > / > If you are primarily interested in understanding issues of ecology, > disease, diversity and conservation the former model is more > appropriate than the later. > > Respectfully, > > - Pete Regards, Nico Nico M. Franz, Ph.D. Assistant Professor Director, UPRM Invertebrate Collection Department of Biology University of Puerto Rico Call Box 9000 Mayagüez, PR 00681-9000 Phone: (787) 832-4040, ext. 3005 Fax: (787) 834-3673 E-mail: nico.franz(a)upr.edu Laboratory website: http://academic.uprm.edu/~franz/ UPRM-INVCOL: http://uprm-invcol-project.tumblr.com/

4 3

Abstracts due 6 Sept for TDWG2016
by Kampmeier, Gail E 31 Aug '16

31 Aug '16

Hola! Time to get serious about writing up your abstract for the upcoming TDWG 2016 meeting in Costa Rica this December http://www.tdwg.org/conference2016/. The due date is 6 September for invited speakers in symposia as well as for contributed oral and poster presentations. The body of the abstract should be 400 words or less; see https://mbgserv18.mobot.org/ocs/index.php/tdwg/tdwg2016/schedConf/cfp for more information. Currently 13 symposia (three symposia combined recently under the banner of Semantics for Biodiversity Science - An Interoperability Frontier), 7 workshops, and 10 Interest/Task Groups are planning to meet at TDWG this year. For information about registration, see http://www.tdwg.org/conference2016/#c1807 And don’t forget about the pre-conference Software Carpentry Workshop, which will be taught by Matthew Collins, Deb Paul, and Dimitrios Koureas 3-4 December http://www.tdwg.org/conference2016/#c1862 At $100 ($50 for students), this is not to be missed. Space is limited, so sign up soon! The conference website also features information about hotels http://www.tdwg.org/conference2016/hotels/ with special rates negotiated for TDWG2016 and post-conference excursions or BioCursos http://www.tdwg.org/conference2016/excursions/ through OTS (Organization for Tropical Studies). Hasta pronto en Costa Rica! Gail & Erick P.S. Please pass the word to other interested networks Gail Kampmeier Erick Mata Program co-chairs TDWG 2016 conference(a)tdwg.org<mailto:conference@tdwg.org>

1 0

CFP: TDWG 2016 Workshop on the use of wikis in biodiversity informatics
by joel sachs 28 Aug '16

28 Aug '16

Everyone, The submission deadline for the below is soon, but the bar is low - a paragraph or two describing what you're up to, and any challenges that you're facing. Our main goal is to get a sense of who's doing what in this space, and to discuss prospects for helping each other. If you'd like to be involved in the conversation, but won't be able to make it to Costa Rica, please let me know. Best, Joel. ---- The Use of Wikis in Biodiversity Informatics Workshop to be held at TDWG 2016 (Dec. 5-9) in Santa Clara de San Carlos, Costa Rica Instruction for submitting abstracts are at https://mbgserv18.mobot.org/ocs/index.php/tdwg/tdwg2016/schedConf/cfp Abstracts due: Sept. 6, 2016 Wiki technologies, in particular those built on top of Semantic MediaWiki and Wikidata, are being used to store, curate, query, integrate, and reason over a range of biodiversity-related data. This workshop will comprise a selection of talks describing some of these uses, followed by a discussion of gaps in the wiki ecology, and opportunities that might exist to fill those gaps through coordinated research and development.

1 0

Re: [tdwg-content] Implementing Darwin Core in RDF
by Steve Baskauf 28 Aug '16

28 Aug '16

I looked at the tdwg-content archives to try to figure out which of the last two emails that I wrote were actually sent out to the list. This one below does not show up in the archives because "An HTML attachment was scrubbed...", apparently the actual email. I'll try sending it again in hopes that the message actually survives. The other message that contained comments about Douglas' example files (and which actually had attachments) never came to me, but shows up in the archives at http://lists.tdwg.org/pipermail/tdwg-content/2016-August/003642.html I'm not sure what is going on with the listserv, my spam filter, or both. Steve -------- Original Message -------- Subject: Re: [tdwg-content] Implementing Darwin Core in RDF Date: Fri, 26 Aug 2016 10:44:45 -0500 From: Steve Baskauf <steve.baskauf(a)vanderbilt.edu> Organization: Vanderbilt University Dept. of Biological Sciences To: Douglas Campbell <Douglas.Campbell(a)tepapa.govt.nz>, "tdwg-content(a)lists.tdwg.org" <tdwg-content(a)lists.tdwg.org> References: Douglas (and the list), I took some time to look at your Turtle file, but will have to come back to it again later when I have more time before giving you specific feedback. However, I wanted to make a few general comments to express how I think about the use of RDF in the biodiversity informatics context. Before embarking down the path of exposing metadata as RDF, I think that one needs to ask the question: "Why do I want to expose RDF?" There are several things that you get with RDF: 1. It's machine-readable. 2. It's a W3C standard that enables linking resources described by different data providers as a graph. 3. RDF triples from different sources can easily be aggregated in a triplestore and be queried using SPARQL. 4. It is compatible with machine reasoning. If all you care about is item #1, then RDF is probably more trouble than it is worth. There are more conventional means for providing access to data without requiring human intervention, with vanilla JSON being a prime example. If your primary concern is breaking down data silos (i.e. you like the idea of Linked Data), then item #4 is probably more trouble that it's worth. Machine reasoning is possible, but carrying out effective, non-trivial reasoning, requires a clear idea of what kind of reasoning you want to undertake and careful thought about how to structure the RDF. RDF isn't the only way to achieve item #2. Although JSON-LD can be valid RDF, it doesn't have to be. There also other popular non-RDF graph database options may be less troublesome if all you care about is linking data as a graph. It seems to me that item #3 is the sweet spot where the potential benefits of using RDF could outweigh its hassles. IF multiple data providers exposed their data using a consistent graph topology, a machine could harvest triples from the various providers, throw them into a standard triplestore, and other machines could query the aggregated data using SPARQL to ask interesting questions. All of this could be done in an automated fashion, with little or no human intervention. (There is the additional problem of providers disciplining themselves to using consensus URIs to identify resources that would be linked across data silos rather than minting their own new ones, but that's a different issue that I won't get into in this email.) However, if the multiple providers each use their own graph topology, then there are problems. Simply merging the triples into a single graph would be no problem. However, constructing a query that would work with every possible graph model that providers dream up would be virtually impossible. So if one aspires to merge RDF triples from multiple providers, the burden of enabling querying that will work with all data can fall at three places in the workflow: 1. The burden could fall on *providers* to discipline themselves to follow a consensus graph model. This requires them to express their triples as a graph that might not correspond to their native data model, but once that is done, no further action would be required down the pipeline. 2. The burden could fall on *aggregators* to "clean up" triples as they are ingested into the triplestore. For example, it's not difficult to create SPARQL construct queries that would add nodes that aren't present in the providers data but that are needed for consistency with the graph model used by the aggregator. However, that would depend on the aggregator to create a sort of "mapping" for each provider that uses a different graph model. It's likely that would require at least some human intervention, which is at odds with the goal of automated machine discovery and ingestion. 3. The burden could fall on the *users* who want to query the aggregated triples. The "dirty" triples from multiple providers following differing graph models could just be thrown together in a triplestore. Those who want to query the data would have to create complex queries that were designed to catch all of the variations in graph topologies that providers might reasonably dream up, or else risk missing some results that might be important. It seems likely that this approach would be doomed to failure. Who would want to waste their time doing all that work when there would be a high probability of missing some results anyway? Although the second option would probably be feasible, in order to achieve the real promise of RDF for our community, it seems to me like option #1 would be the way to go. Bringing this back to the specific example of Douglas' data and Darwin-SW, Darwin-SW was designed with the intention of facilitating aggregation of RDF metadata from diverse sources: traditional museum specimens, event-oriented occurrences (bioblitzes, trawls), mark-recapture, repeated samples, collection of tissues/DNA without collecting the organism, camera traps, observations without collection (e.g. birder observations), etc. Every one of these categories of data providers are going to have some part of the Darwin-SW model that they don't care about. Museums may care about specimens but not organisms. Birders may care about organisms but not specimens. Bioblitzers may care passionately about events with many occurrences, while mark-recapture people may think that events are useless since every occurrence happens at a separate event. In order for Darwin-SW to work as intended, each kind of provider is going to just have to bite the bullet and create nodes for every kind of resource in the Darwin-SW model whether they like it or not. This is very much NOT in the wild-west spirit of RDF where Anybody can say Anything about Any resource. But if one cares about standardization, one will follow the template even if it doesn't fit perfectly with the provider's situation. In writing Darwin-SW, we rigged it (some would say poisoned it) with semantics that cause a graph to become inconsistent if the Darwin-SW object properties are used to join classes in a way that's inconsistent with the Darwin-SW model. I won't go into the details here - you can read about them in section 3.2 of the paper [1]. We've received some criticism for doing that, and perhaps it was a bad idea. But that's the way things stand at the present. It's a sort of veiled threat that says, "there's nothing to stop you from using the Darwin-SW object properties inappropriately, but if you do, you will create a graph that a reasoner will flag as inconsistent". If a provider is bothered by that, it's better to mint its own object properties and not use the Darwin-SW ones. I'll try to provide a more direct response about Douglas' sample data later on. One thing that I might say now is that I'm not sure that I understand what Douglas means when he says "it would be impossible to create an Organism entity. And in fact we will not create Organism entities as we don’t need them in our context". If the problem is a reluctance to mint and maintain URIs for resources that don't play an important role in the data, it's quite possible to just let them be blank nodes without URIs. If the issue is that the Organism doesn't have an identification, that's no problem. An Organism can have zero to many identifications at any particular time. New identifications can be added at any point in the future. Steve [1] http://www.semantic-web-journal.net/content/darwin-sw-darwin-core-based-ter… Douglas Campbell wrote: > > Thank you all for your feedback. > > > > I am hopeful there will eventually be a ‘standard’ vocabulary for the > DwC associations. In the meantime it looks like DSW will suit us the > best. dwcFP is quite loose (and having version numbers in the IRI > means the namespace has broken once already). BCO is quite > bewildering – I like the solidity of underlying OBO framework, but BCO > has a steep learning curve and doesn’t seem to tie into well-known LD > ontologies. > > > > The downside of DSW for us is the extra abstract Occurrence and > Organism entities. We basically hold specimens from field collection > events that we identify – that is the data in our collection > management system, so the other entities would just be fabricated for > data exchange (not because we specifically model or store them). > > > > I have mocked up a herbaria record starting from the specimen point of > view (the core things that we hold in our collections). This is > attached as turtle and also our original JSON-LD (converted at > http://json-ld.org/playground/ ). > > > > *I had to go to Occurrence directly from the Specimen/Token using > dsw:evidenceFor (rather than via Organism and dsw:hasOccurence) as it > may not have been identified - meaning it would be impossible to > create an Organism entity. And in fact we will not create Organism > entities as we don’t need them in our context. > > * The Occurrence is just a placeholder (since we don’t have data for > it), which means I have associated the Agent to the Event rather than > the Occurrence – it so happens that this is valid as the domain isn’t > specified in dwciri:recordedBy. > > *I have include properties of linked-to entities because we need them > populated as part of the web API for searching, etc. This is similar > to the DwC concept of convenience properties. > > * There’s a few other messy parts like what IRIs we will create for > the Occurrence and Identification entities. > > > > I’d be interested in any comments about whether this is on the right > track and/or would be considered conformant DwC RDF J > > > > Cheers, > > Douglas > > > > > > *From:* tdwg-content [mailto:tdwg-content-bounces@lists.tdwg.org] *On > Behalf Of *John Deck > *Sent:* Thursday, 18 August 2016 4:02 a.m. > *To:* tdwg-content(a)lists.tdwg.org > *Subject:* Re: [tdwg-content] Implementing Darwin Core in RDF > > > > Another angle to consider here is using the Biological Collections > Ontology (https://github.com/BiodiversityOntologies/bco) which is > part of the OBO Foundry framework (http://www.obofoundry.org/) and > for which development and work has been done in integrating with > Darwin Core and Audubon Core > (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4511409/) The OBO > community has a longer history with the biomedical community but > gaining traction in biodiversity. Since these are all ontologies, > there should be less confusion on the meaning of classes. For example, > the notion of Observation in DwC is broken down into at least some of > the following classes: "Observing process", "Material target of > observation", "Material target of observation role". Relations are > well specified as well and drawn from the Relations Ontology > (http://www.obofoundry.org/ontology/ro.html) BTW, along with the PPO > (plant phenology ontology > -- https://github.com/PlantPhenoOntology/PPO) team, i'm working on > annotating instance data with BCO and PPO now, an ongoing project and > we will have more information on this effort at the upcoming TDWG > meeting. Happy to answer additional questions you have. > > > > John Deck > > > > On Wed, Aug 17, 2016 at 7:59 AM, Paul J. Morris <mole(a)morris.net > <mailto:mole@morris.net>> wrote: > > While commenting on the RDF guide, I wrote implementations of delivery > of flat Darwin Core in RDF into Symbiota and the Harvard University > Herbaria web search using content negotiation. Symbiota, when an > occurrence (or agent record) is requested with an accept header of > text/turtle; will deliver the occurrence record in a turtle > serialization (an example is below), when requested with an accept > header of application/rdf+xml will return flat Darwin Core in RDF/XML > > For structured relations beyond flat Darwin Core, there is another > alternative to darwin-sw (which is also in use in the wild), dwcFP: > http://filteredpush.org/ontologies/FP/2.0/dwcFP.owl Bob Morris can > probably comment further, but one of the design goals of dwcFP was > fewer added inferences than darwin-sw (object properties have ranges > specified, but not domains). > > dwcFP defines a set of object properties for making relations between > core Darwin Core classes that we've been using in annotations: > > owl:ObjectProperty rdf:about="dwcFP:hasIdentification" > owl:ObjectProperty rdf:about="dwcFP:usesTaxon" > owl:ObjectProperty rdf:about="dwcFP:hasCollectingEvent" > owl:ObjectProperty rdf:about="dwcFP:hasLocality" > owl:ObjectProperty rdf:about="dwcFP:hasGeologicalContext" > owl:ObjectProperty rdf:about="dwcFP:hasGeoreference" > owl:ObjectProperty rdf:about="dwcFP:hasAssociatedMedia" > > One object property we added specifically to simplyfy SPARQL queries on > taxonomic trees: > > owl:ObjectProperty rdf:about="dwcFP:descendantTaxonOf" > > And additional object properties that cover more possible relations: > > owl:ObjectProperty rdf:about="dwcFP:relationProperty" > owl:ObjectProperty rdf:about="dwcFP:hasAcceptedNameUsage" > owl:ObjectProperty rdf:about="dwcFP:namePublishedIn" > owl:ObjectProperty rdf:about="dwcFP:hasOriginalNameUsage" > owl:ObjectProperty rdf:about="dwcFP:hasParentNameUsage" > owl:ObjectProperty rdf:about="dwcFP:hasScientificName" > owl:ObjectProperty rdf:about="dwcFP:hasTaxonConcept" > owl:ObjectProperty rdf:about="dwcFP:taxonomicAuthorityFor" > owl:ObjectProperty rdf:about="dwcFP:nomenclaturalCode" > owl:ObjectProperty rdf:about="dwcFP:nomenclaturalStatus" > > Attached is an example RDF document (which is a commented extract from > the body of an annotation in a new occurrence annotation document) > that describes an occurrence. There were about 350,000 such occurrence > records wrapped in annotations generated in the NEVP project. This > representation preceeded the Darwin Core RDF guide, and I've added a > couple of comments about things we would do differently now. > > In using dwcFP, we encountered the same kind of questions you are > raising about where to place datatype properties that might go in more > than one place. We ended up hanging dwc:scientificName, > dwc:scientificNameAuthorship, etc, off of the Identification, and then > a Taxon (through dwcFP:usesTaxon) off the Identification, with only a > guid hung off the Taxon. Bob can probably comment further, but I think > our motivation for this was that the new occurrence annotation > documents are intended to describe transcription of data off of > specimens (and container (herbarium sheet and folder)), and that > conceptually these scientfic names were properties of transcriptions of > identifications. If I recall correctly, we are doing the opposite in > new identification annotations (hanging dwc:scientificName off of a > Taxon instance). > > -Paul > > -- > > Example flat Darwin Core occurrence record as RDF produced by Symbiota: > > http://symbiota4.acis.ufl.edu/scan/portal/collections/individual/index.php?… > <http://symbiota4.acis.ufl.edu/scan/portal/collections/individual/index.php?…> > with > HTTP Accept (through, for example, the Modify Headers firefox plugin): > text/turtle;q=1.0,text/xml,application/xml,application/xhtml+xml,text/html; > q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 > > Delivers: > > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# > <http://www.w3.org/1999/02/22-rdf-syntax-ns>> . > @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema# > <http://www.w3.org/2000/01/rdf-schema>> . > @prefix owl: <http://www.w3.org/2002/07/owl# > <http://www.w3.org/2002/07/owl>> . > @prefix foaf: <http://xmlns.com/foaf/0.1/> . > @prefix dwc: <http://rs.tdwg.org/dwc/terms/> . > @prefix dwciri: <http://rs.tdwg.org/dwc/iri/> . > @prefix dc: <http://purl.org/dc/elements/1.1/> . > @prefix dcterms: <http://purl.org/dc/terms/> . > @prefix dcmitype: <http://purl.org/dc/dcmitype/> . > <urn:uuid:00032bdf-8862-4ca1-b8a6-ba35ee59f56a> > a dwc:Occurrence ; > dwc:institutionCode "ASNP" ; > dwc:collectionCode "ENT" ; > dwciri:inCollection > <urn:uuid:af7140c3-4aa2-41ac-b3e9-4c7415b3ce90> ; a > dcmitype:PhysicalObject ; dwc:basisOfRecord "PreservedSpecimen" ; > dwc:catalogNumber "735" ; > dwc:kingdom "Animalia" ; > dwc:phylum "Arthropoda" ; > dwc:class "Insecta" ; > dwc:order "Hemiptera" ; > dwc:family "Miridae" ; > dwc:scientificName "Phytocoris" ; > dwc:scientificNameAuthorship "Fallén, 1814" ; > dwc:genus "Phytocoris" ; > dwc:identifiedBy "G. W. Cowper 2008" ; > dwc:recordedBy "G. W. Cowper" ; > dwc:eventDate "2008-10-18" ; > dwc:year "2008" ; > dwc:month "10" ; > dwc:day "18" ; > dwc:startDayOfYear "292" ; > dwc:fieldNumber "-" ; > dwc:individualCount "1" ; > dwc:preparations "Academy Drawer" ; > dwc:country "United States" ; > dwc:stateProvince "New Jersey" ; > dwc:county "Burlington" ; > dwc:municipality "Woodland" ; > dwc:locality "SW of Chatsworth Beat Pinus along sandy road in > upland pine & oak woodland.; 1 individual(s) collected; Way Point > [-]" ; dwc:decimalLatitude "39.806067" ; dwc:decimalLongitude > "-74.545333" ; dwc:verbatimCoordinates "N+39º48.364' W-074º32.720'" ; > dwc:disposition "ANSP" ; > dcterms:modified "2015-04-25 17:13:22" ; > dcterms:license <http://creativecommons.org/licenses/by-nc/3.0/> ; > dcterms:rightsHolder <Academy of Natural Sciences> ; > dcterms:accessRights "CC BY-NC (Attribution-Non-Commercial)" ; > dcterms:references > "http://symbiota4.acis.ufl.edu/scan/portal/collections/individual/index.php?…". > > On Wed, 17 Aug 2016 06:02:01 +0000 > Douglas Campbell <Douglas.Campbell(a)tepapa.govt.nz > <mailto:Douglas.Campbell@tepapa.govt.nz>> wrote: > > > Thanks for taking the time to bring me up to speed, Steve. > > > > I'm familiar with the complexities of preparing specifications and > > realise I've come in mid-way. I'll spend some time reading up on the > > containers and occurrences history. But I'm unclear whether it is > > better to use DSW terms in anticipation of them having longevity, or > > just to mint our own in the meantime? > > > > For the taxon convenience fields... > > > > I thought I read in the DwC RDF schema that properties like taxonRank > > were in the Taxon class (so using them in Identification conflicts > > with their definition), but looking again I see that the spec uses > > "dwcattributes:organizedInClass" which specifically does not imply > > domain or range. So now I'm at peace with that. :) > > > > However, I am pointing to our own RDF versions of taxon > > classification terms that we use, and using DwC properties to define > > these taxon terms, plus I am combing these all together in a single > > JSON-LD API result. So at this point I don't think I need to repeat > > the convenience properties in Identification (as they are available > > directly in the Taxon object). While this seems to fit our purpose I > > can see that this may be sub-optimal for others who download the data > > and use it separately - I'll need to contemplate that scenario some > > > more. > > > > Cheers, > > Douglas > > > > > > From: Steve Baskauf [mailto:steve.baskauf@Vanderbilt.Edu > <mailto:steve.baskauf@Vanderbilt.Edu>] > > Sent: Wednesday, 17 August 2016 6:32 a.m. > > To: Douglas Campbell > > Cc: tdwg-content(a)lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> > > Subject: Re: [tdwg-content] Implementing Darwin Core in RDF > > > > Douglas, > > I was the lead author on the DwC RDF Guide, so I can try to answer > > your questions about it. The TDWG RDF Task Group is still in > > operation, although it hasn't been very active for the past several > > years. The RDG TG has an online "home" at the TDWG Github site.[1] > > However, the content didn't survive the migration from Google Code > > very well, so it takes some effort at this point to sort through it. > > The TG also has an email list [2] but there has been little traffic > > on it recently. > > > > *Dereferencing of the DwC IRI namespace* - Unfortunately, the dwciri: > > namespace terms don't dereference at the present time. This needs to > > be corrected. I've created a Turtle serialization [3] of how I think > > the RDF should be written for the dwciri: terms, but it isn't served > > when one attempts to dereference the terms and hasn't been > > incorporated into the official DwC repository. Part of the problem > > here is that the guidelines for documenting terms in machine-readable > > form are still going through the adoption process.[4] I'm hopeful > > that when the Documentation Specification is ratified, we can make > > sure that all existing DwC terms dereference in a consistent manner. > > > > *Best practice for connecting containers together* - By this, I'm > > assuming you mean linking instances of the various Darwin Core > > classes, or in RDF terms, linking nodes. The RDF Guide is silent on > > how to do this. That's not great from the standpoint of actually > > turning Darwin Core records into RDF, but it was a way to complete > > the guide in a finite amount of time. What is missing is a consensus > > domain model that would lay out how instances of the Darwin Core > > classes would be linked. Such a model should be developed, but that > > has not yet happened. Again, there is a draft standard submitted for > > review [5], which if adopted will specify (in Section 4) a process > > for developing such a model. When we wrote the RDF Guide, we > > provided ancillary documents [6], which included examples that > > followed the RDF Guide and linked instances using various proposed > > models. There are links to web pages containing examples using > > TaxonConcept, BiSciCol, and Darwin-SW object properties to link class > > instances. I am not sure whether there is any RDF "in the wild" for > > the first two examples. I'm more familiar with Darwin-SW, as I was > > involved in its development [7]. There is a Semantic Web Journal > > article about Darwin-SW [8], so I won't go into detail about it here, > > except to say that its data model was developed following an > > extensive discussion on the tdwg-content email list [9] about how > > members of the community understood the Darwin Core classes. The > > relationship between Darwin-SW model and the historical 1993 ACS > > Model can be viewed at [10]. There are a bit over a million triples > > of data "in the wild" modeled on Darwin-SW in accordance with the DwC > > RDF guide, accessible at a SPARQL end point. [11] Some examples > > showing how to play around with SPARQL queries of these data are at > > [12]. > > > > *The overlapping scope of Occurrence and Specimen types* - There is a > > long history behind the meaning of "Occurrence". There is an > > out-of-date-summary of some of the discussion around this topic in > > the Darwin-SW documentation [13]. I think that at the time when > > Darwin Core was originally adopted, an Occurrence was considered a > > sort of superclass of the Specimen and Observation classes. However, > > after a lot of discussion, the meaning of dwc:Occurrence was > > clarified by changing it to its current definition: "An existence of > > an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a > > particular place at a particular time." In this view, an Occurrence > > isn't a concrete thing like a Specimen - it's more like a database > > join between an Event instance (time and place) and an Organism, > > which allows for a one-to-many relationship between a Organism and > > Occurrences, and a one-to-many relationships between an Event and > > Occurrences. It also allows for a single occurrence of an organism > > at a time and place to be documented by one-to-many forms of > > evidence, which could include PreservedSpecimens, HumanObservation > > data, or images of various sorts. In RDF terms, an Occurrence could > > be thought of as a node that is linked to Event, Organism, and > > evidence instances nodes. You can see this represented graphically > > at [7], where "dsw:Token" refers to a generic class for evidence. In > > any case, separating Occurrence (as a node linking Events to > > Organisms) from Specimen allows an Occurrence to be documented by one > > to many instances of any kind of evidence, or even multiple kinds of > > evidence. For example, an Occurrence could be documented by a > > PhysicalSpecimen as well as several images. Here is an example of an > > organism with two Occurrences: > > http://bioimages.vanderbilt.edu/org-jorgem/rec13_0004 The first > > occurrence on 2013-07-24 was documented by 42 camera trap images, and > > the second occurrence on 2013-07-25 was documented by 21 camera trap > > images. You can see how this was represented in RDF at [14]. In > > most cases, specimen records will be much simpler than this, with one > > organism, documented at one occurrence, with evidence of one > > PreservedSpecimen. Such a simpler case could be represented with a > > simpler model. But the more complex model allows specimen-derived > > occurrence records to be merged with other kinds of occurrence > > records, such as the camera trap example I gave, mark-recapture bird > > banding observations, iNaturalist occurrences documented by photos of > > the organism, etc. > > > > *Conflicting usage of Taxon fields in the Identification object* - In > > order to explain the rationale behind why what seem to be > > taxon-related properties are assigned to Identification instances, I > > must refer to the idea of "convenience terms" as expressed in Section > > 2.7 of the RDF Guide.[15] In a perfect world, we would have the > > following: > > > > a collection item linked by dwciri:inCollection to an IRI-identified > > collection an identification instance linked by dwciri:toTaxon to an > > IRI-identified taxon (a.k.a. taxon concept) a location instance > > linked by dwciri:inDescribedPlace to an IRI-identified geographic > > place (a.k.a. "feature") > > > > If the linked IRI-identified object resources were described by RDF, > > it would not be necessary to include any of the Darwin Core > > "convenience" properties included in Table 3.5 [16]. The information > > contained in the values of those properties could be discovered by > > dereferencing the object IRIs and traversing subsequent links from > > that RDF. However, if those IRIs don't exist, then the convenience > > properties provide a string-based mechanism to relate the subject > > resource to other resources that should be linked to the same > > (unidentified) object resource. So for example, if we say a specimen > > has the convenience properties and values > > > > dwc:collectionCode="Mamm" > > dwc:institutionCode ="MVZ" > > > > we are not saying that "Mamm" is the collection code of the specimen > > and that "MVZ" is the institution code of the specimen. Rather, we > > mean that the specimen should be linked to a collection (with unknown > > IRI) whose code is MVZ and whose owning institution has the code > > "MVZ". Similarly, if we say that an identification has the > > convenience properties and values > > > > dwc:genus="Hersiliiadae" > > dwc:specificEpithet="yaeyamaensis" > > > > we are not saying that "yaeyamaensis" is the specific epithet of the > > identification and that "Hersiliiadae" is the genus of the > > identification. Rather, we mean that the identification should be > > linked to a taxon (with unknown IRI) for which the specificEpithet > > part of its name string is "yaeyamaensis", which is included in the > > genus "Hersiliiadae". This may seem odd, particularly if you are > > used to thinking of genus and specific epithet as properties of a > > taxon. But the sets of DwC convenience properties are intended to be > > a temporary, string-based way to describe an unidentified resource to > > which the subject resource should be linked. At some future time, if > > IRIs can be discovered, those sets of convenience properties might be > > dropped if dereferencing the IRIs provides the same information. In > > these examples, one might replace with: > > > > a collection item linked by dwciri:inCollection to > > http://grbio.org/cool/0rht-pj95 an identification instance linked to > > http://zoobank.org/75C9EA16-72B1-44C9-AD40-3C3D41323AB9 > > > > although I don't think either of these IRIs currently dereference to > > meaningful machine-readable RDF (although they have human-readable > > web pages). > > > > I hope that this has provided you with some answers, or at least a > > starting point for additional exploration or questions. Please feel > > free to reply if there were parts of what I wrote that weren't clear. > > > > Steve Baskauf > > > > [1] https://github.com/tdwg/rdf > > [2] http://groups.google.com/group/tdwg-rdf > > [3] > > > https://github.com/tdwg/vocab/blob/master/code-examples/darwin-core/dwciri.… > > [4] > > https://github.com/tdwg/vocab/blob/master/documentation-specification.md > > [5] > > https://github.com/tdwg/vocab/blob/master/maintenance-specification.md > > [6] https://github.com/tdwg/rdf/blob/master/DwCAncillary.md [7] > > https://github.com/darwin-sw/dsw [8] > > > http://www.semantic-web-journal.net/content/darwin-sw-darwin-core-based-ter… > > [9] https://github.com/darwin-sw/dsw/wiki/TdwgContentEmailSummary > > [10] > > https://github.com/darwin-sw/dsw/blob/master/img/acs-dsw-poster.pptx > > [11] http://rdf.library.vanderbilt.edu/sparql?view [12] > > > https://github.com/HeardLibrary/semantic-web/blob/master/learning-sparql/le… > > [13] https://github.com/darwin-sw/dsw/wiki/ClassOccurrence [14] > > http://bioimages.vanderbilt.edu/org-jorgem/rec13_0004.rdf [15] > > > http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#2.7_Darwin_Core_convenien… > > [16] > > > http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#3.5_Darwin_Core_convenien… > > > > > > > > Douglas Campbell wrote: > > Hi all, > > > > I am implementing Darwin Core in RDF as part of our API at Te Papa > > (Museum of New Zealand). My aim is to map our specimen metadata to > > rich Darwin Core RDF using JSON-LD, then 'dumb down' to Simple Darwin > > Core to contribute to virtual herbariums. I have mocked-up some > > records, however there are some areas where I'm not quite sure how to > > interpret the Darwin Core RDF Guide. > > > > The areas of confusion I have include: > > * Best practice for connecting containers together > > * Dereferencing of the DwC IRI namespace > > * The overlapping scope of the Occurrence and Specimen types > > * Conflicting usage of Taxon fields in the Identification object. > > > > I'm hoping for suggestions: > > 1. Are there any implementations of DwC RDF data online that I could > > look at as examples to follow? 2. What/to whom is the best way to ask > > specific questions about DwC RDF? > > > > At this stage our API prototype is only available internally but > > there is some documentation available publicly at: > > https://github.com/te-papa/collections-api/wiki > > > > Thanks in advance, > > Douglas > > > > Douglas Campbell > > Business Analyst > > Collections Information Services > > Museum of New Zealand Te Papa Tongarewa > > > > ________________________________ > > > > Visit the Te Papa website http://www.tepapa.govt.nz > > The email message together with the accompanying attachments may be > > CONFIDENTIAL. If you have received this message in error, please > > notify https://www.tepapa.govt.nz/about/contact-us/general-enquiries > > immediately and delete the original message. The views expressed in > > this message are those of the individual sender, except where the > > sender specifically states them to be views of Te Papa. Te Papa > > employs strict virus checking measures and accepts no liability for > > any loss caused either directly or indirectly by a virus arising from > > the use of this message or any attached file. > > > > ________________________________ > > This email has been filtered by SMX. For more information visit > > > smxemail.com <http://smxemail.com><http://smxemail.com/> > > > > > > > > -- > > > > Steven J. Baskauf, Ph.D., Senior Lecturer > > > > Vanderbilt University Dept. of Biological Sciences > > > > > > > > postal mail address: > > > > PMB 351634 > > > > Nashville, TN 37235-1634, U.S.A. > > > > > > > > delivery address: > > > > 2125 Stevenson Center > > > > 1161 21st Ave., S. > > > > Nashville, TN 37235 > > > > > > > > office: 2128 Stevenson Center > > > > phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) > 322-4942 <tel:%28615%29%20322-4942> > > > > If you fax, please phone or email so that I will know to look for it. > > > > http://bioimages.vanderbilt.edu > > > > http://vanderbilt.edu/trees > > > > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Visit the Te Papa website http://www.tepapa.govt.nz > > The email message together with the accompanying attachments may be > > CONFIDENTIAL. If you have received this message in error, please > > notify https://www.tepapa.govt.nz/about/contact-us/general-enquiries > > immediately and delete the original message. The views expressed in > > this message are those of the individual sender, except where the > > sender specifically states them to be views of Te Papa. Te Papa > > employs strict virus checking measures and accepts no liability for > > any loss caused either directly or indirectly by a virus arising from > > the use of this message or any attached file. > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > ______________________________________________________________________________ > > > > This email has been filtered by SMX. > > For more information visit http://smxemail.com > > > ______________________________________________________________________________ > > > > > -- > Paul J. Morris > Biodiversity Informatics Manager > Museum of Comparative Zoölogy, Harvard University > mole(a)morris.net <mailto:mole@morris.net> AA3SD PGP public key available > > _______________________________________________ > tdwg-content mailing list > tdwg-content(a)lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> > http://lists.tdwg.org/mailman/listinfo/tdwg-content > > > > > > -- > > John Deck > (541) 914-4739 > > ------------------------------------------------------------------------ > > Visit the Te Papa website http://www.tepapa.govt.nz > The email message together with the accompanying attachments may be > *CONFIDENTIAL*. If you have received this message in error, please > notify https://www.tepapa.govt.nz/about/contact-us/general-enquiries > immediately and delete the original message. The views expressed in > this message are those of the individual sender, except where the > sender specifically states them to be views of Te Papa. Te Papa > employs strict virus checking measures and accepts no liability for > any loss caused either directly or indirectly by a virus arising from > the use of this message or any attached file. > > ------------------------------------------------------------------------ > This email has been filtered by SMX. For more information visit > smxemail.com <http://smxemail.com/> -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees

1 0

Implementing Darwin Core in RDF
by Douglas Campbell 28 Aug '16

28 Aug '16

Hi all, I am implementing Darwin Core in RDF as part of our API at Te Papa (Museum of New Zealand). My aim is to map our specimen metadata to rich Darwin Core RDF using JSON-LD, then 'dumb down' to Simple Darwin Core to contribute to virtual herbariums. I have mocked-up some records, however there are some areas where I'm not quite sure how to interpret the Darwin Core RDF Guide. The areas of confusion I have include: * Best practice for connecting containers together * Dereferencing of the DwC IRI namespace * The overlapping scope of the Occurrence and Specimen types * Conflicting usage of Taxon fields in the Identification object. I'm hoping for suggestions: 1. Are there any implementations of DwC RDF data online that I could look at as examples to follow? 2. What/to whom is the best way to ask specific questions about DwC RDF? At this stage our API prototype is only available internally but there is some documentation available publicly at: https://github.com/te-papa/collections-api/wiki Thanks in advance, Douglas Douglas Campbell Business Analyst Collections Information Services Museum of New Zealand Te Papa Tongarewa +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Visit the Te Papa website http://www.tepapa.govt.nz The email message together with the accompanying attachments may be CONFIDENTIAL. If you have received this message in error, please notify https://www.tepapa.govt.nz/about/contact-us/general-enquiries immediately and delete the original message. The views expressed in this message are those of the individual sender, except where the sender specifically states them to be views of Te Papa. Te Papa employs strict virus checking measures and accepts no liability for any loss caused either directly or indirectly by a virus arising from the use of this message or any attached file. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ______________________________________________________________________________ This email has been filtered by SMX. For more information visit http://smxemail.com ______________________________________________________________________________

5 9