Comments inline.
On Tue, Aug 25, 2009 at 6:20 PM, Donald Hoberndhobern@gmail.com wrote:
Thanks, John.
I must still be missing something on the definitions of Event, etc. Where are the definitions you quote below?
They are modifications of the most recently published definitions as an attempt to address the issues you raised. I wanted to see if the new ones were adequate.
http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm has definitions like the rather circular "A resource describing an occurrence" and the cryptic "A non-persistent, time-based occurrence" - which raises a whole new problem because (at least as I understand it) "occurrence" in this definition means something more general than "occurrence" as used for the Occurrence resource type.
The definition "A resource describing an occurrence" is a direct copy of the definition of dcterms:Event (Dublin Core). Originally I did nothing to change definitions of terms adopted from Dublin Core, but as discussion has progressed, and precisely because Dublin Core is purposely vague, I decided it would be useful to add "For Darwin Core, ..." to the definitions to describe how they are intended to be used in biodiversity contexts. Though the new definitions are more explicit, I don't believe they warrant new term refined from the original Dublin Core terms. So again, the proposed definitions are:
dcterms: Event - "A non-persistent, time-based occurrence. For Darwin Core, a resource describing an instance of the Location class."
Event (class) - "The category of information pertaining to an event (an action that occurs at a place and during a period of time)."
dwctypes:Occurrence - "A resource describing an instance of the Occurrence class."
Occurrence (class) - "The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.)."
As for the question whether examples or lengthier definitions would be good, I tend to think that in Dublin Core world, examples would only be a sample of possible uses within the range intended by a definition and that they may therefore help to steer people towards appropriate use but cannot introduce any constraints which could help us to tighten up cross-resource data integration.
Exactly, and that's what I was trying to do with Darwin Core terms - to constrain their semantics only as far as possible given the range of uses to which they may be put.
The scientific name example is a clear case. I have a feeling that the semantics for this element as it stands right now (spanning Occurrence and Checklist uses) imply a very general definition like "the scientific name which is connected with this item" with the nature of that connection unstated or at best covered by an implicit "it should be obvious in each case what we expect to see in this field". I admit that abuses of this term to do things we don't expect seem unlikely - someone giving ScientificName="Homo sapiens" because that is the scientific name for the collector, or something. However the laudable wish not to over-restrict the use of these terms does have the corollary that consumers of data always have to be aware that a provider may use the terms in semantically incompatible ways. If we knew that Occurrence->ScientificName was defined to mean something like "the scientific name used for the most refined taxon concept to which the specimen or observation has been identified", and that Occurrence->DecimalLatitude and Occurrence->DecimalLongitude related to the site at which the observation was made or the specimen was collected, we would more or less have a contract that Occurrence records containing a ScientificName, Latitude and Longitude were explicit assertions that an individual of the given taxon concept had indeed been found at those coordinates. If we have no assurance that the ScientificName relates to the identification of the specimen/observation, or that the coordinates relate to the observing/collecting site (rather than say the coordinates for the collection building), we always have some latent uncertainty about what is asserted.
Agreed. My original naive attempt to solve the problem was to assign terms to domains (classes), but as Hilmar pointed out, that would have inappropriate inference consequences. It also doesn't solve the problem when a term could describe (be a property of) more than one class. The other attempt to clarify the semantics was the recommendation of the use of the dcterms:type term to define which class (Occurrence, Location, etc.) a record represents. So, if the dcterms:type was Occurrence, you could infer that the scientificName in the record was for an Identification of an Occurrence because of the natural relationship between Taxon, Identification, and Occurrence. Those relationship can be made explicit with the TDWG Ontology, but it wasn't Darwin Core's place, in my mind, to finish the Ontology work. I think Darwin Core is safe without doing so, and will not be adversely affected when the Ontology work is done as it will only really affect the RDF.
Nevertheless, you are right, the uncertainty is there. But I think we can avoid most problems simply by describing what a term would mean in a given context. That's what I tried to do in the Definition for scientificName:
"The taxon name (with date and authorship information if applicable). When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term."
Very best wishes,
Donald
The same! Thanks for the commentary.
John
-----Original Message----- From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John R. WIECZOREK Sent: Wednesday, 26 August 2009 10:52 AM To: Donald Hobern Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] FW: [STDTRK] Request for a Decision for Public Review of DarwinCore Draft Standard
Finally getting around to some older messages in an effort to finish up the public review of Darwin Core. Comments inline.
On Sun, Jul 26, 2009 at 6:03 PM, Donald Hoberndhobern@gmail.com wrote:
At Gail's request, I'm forwarding some discussion between Renato and
myself
on the Darwin Core draft
Thanks,
Donald
Renato,
You are quite right - domains and ranges may cause us more problems. At very least it may be sensible for these to be things which get asserted within other OWL files used within specific projects to govern their own application models and inference rules.
Domains have been removed for all terms. A new attribute "organizedInClass" is used for term organizational maintenance.
In the end my concerns are really around the need for more clarity on the way that the dcTerms:type values are to be used and how this relates to
past
use of Darwin Core. I'm not sure I ultimately disagree with any of the decisions made. However I still cannot find any actual definition for the Occurrence and Event cases to explain what situations they are intended to cover. Unless we take the time to define the intended scope for all our terms and property values, it is hard to predict whether data from
multiple
sources can be expected to be suitable for combination.
I agree that there a great challenge for the new Darwin Core will be to make sure that publishers and consumers do not mix apples with oranges, unless their particular use case warrants doing so.
What is missing from the explanations at http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm? Or from the definitions:
Event - The category of information pertaining to an event (an action that occurs at a place and during a period of time).
Occurrence - The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.).
dcmitype:Event - A non-persistent, time-based occurrence. For Darwin Core, a resource describing an instance of the Location class.
dwctype:Occurrence - A resource describing an instance of the Occurrence class.
The scientific name case is one example. I would like an explicit
statement
that it means nothing more than "the name of a taxon (somehow) associated with this record" rather like a Dublin Core subject. If, e.g. in the case of an Occurrence record, it is meant to be a statement that a taxon was actually recorded at the location on the given date, we may need to be
more
explicit. I'm still not comfortable with leaving these things unstated.
Would clarity be best served by examples? Or by lengthier definitions? I have adopted the Dublin Core tendency toward brevity wherever possible, not pretending to know the scope of usage of a term in the long run. We have the supplementary wiki at our disposal to clarify as much as necessary. It has only just begun to be filled with useful material that I happened to feel qualified to provide based on my experience. I think we could do a lot more with it, and not affect the standard while doing so.
I must however emphasise that I am very happy to see how much work has
gone
into this revision and the level of forethought in addressing many
important
issues.
Much appreciated, sincerely.
John
Donald
-----Original Message----- From: renato@cria.org.br [mailto:renato@cria.org.br] Sent: Friday, 24 July 2009 7:04 AM Subject: RE: [STDTRK] Request for a Decision for Public Review of
DarwinCore
Draft Standard
Hi Donald,
Scientific name is precisely the kind of term that I feel should be generic. There's an ancient search interface at CRIA that illustrates the use case "give me everything you have related with this scientific name":
http://names.cria.org.br/index?lang=en (check all checkboxes at the bottom of the page)
In SPARQL I think the query would simply look like:
SELECT ?x WHERE { ?x http://rs.tdwg.org/ont/scientificName "some name" }
instead of repeating the same condition for every possible combination of domain#property.
Most id properties (collectionID, locationID, etc.) should also probably be "domainless" since they can appear in objects from many different classes.
Best Regards,
Renato
Looking at what is in the DwC document, I think my concerns are with plans to use DwC for checklist data rather than the DwC proposal itself, but the problem issue may be in there somewhere. Here are some comments
I
sent earlier:
I need to take some time and provide some comments on the use of Darwin Core for non-occurrence data. In general I believe we need to be moving towards simple class properties with tightly defined explanations of the expected content and format. This use of DwC seems to me to be a significant dilution of the semantic content of these properties. If DwC is an object property just for a taxon occurrence, the explanation of dwc:ScientificName would be something like "The scientific name assigned to the taxon to which the recorded organism was identified". If we
extend
it to cover taxon occurrences, checklist entries and all the other things that people seem to have in mind, the explanation would reduce to "The scientific name which is associated with this record". In practice few people will be stumbled, but I really don't like it. It would be so easy just to have chk:ScientificName as well as dwc:ScientificName and to keep the semantics explicit. This becomes particularly problematic when we play with RDFS and OWL. We could choose to define the
"dwc:ScientificName"
property to have a domain restricted to TaxonOccurrence, allowing a reasoner > to infer that objects with this property can be treated as TaxonOccurrence records. With the diluted dwc:ScientificName all we can infer is that the object is a ThingWithSomethingToDoWithTheBiologicalDomain.
Donald
-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content