[tdwg-content] Conflict between DarwinCore and DublinCore usage of dcterms:type / basisOfRecord

Steve Baskauf steve.baskauf at vanderbilt.edu
Sat Oct 24 09:16:27 CEST 2009


My apologies to John Wieczorek for the panicked tone of my previous 
email to the list.  It appeared to me (apparently incorrectly) that the 
issue was being closed without addressing the concern that I raised when 
I initially brought this up in my post to the MRTG wiki 
(http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC).   
I have the following general concerns about what is being proposed.  I 
will follow with a particular use-case comment in a separate email. 

*Hierarchy clarification needed. * If I am understanding the proposal as 
John has summarized it there would be three terms that could apply to a 
resource with metadata subject to DwC.  A new DwC term /recordClass/ has 
with controlled values corresponding to the TDWG classes: "Occurrence", 
"Event", "Location", "Taxon".  /dcterms:type/ is another term which 
could theoretically have DCMI Type values of: "Collection", "Dataset", 
"Event", "Image", "InteractiveResource", "MovingImage", 
"PhysicalObject", "Service", "Software", "Sound", "StillImage", or 
"Text" (although not all may be appropriate in the DwC context).   It is 
not clear to me in John's proposal whether the assignment of 
/dcterms:type/ is intended to be independent of the value of 
/recordClass/, or if /dcterms:type/ is intended to be a subclass of 
/recordClass/ Occurrence (i.e. applicable only to resources that 
represent things that can document Occurrences such as Images and 
PhysicalObjects). In Gregor's email initiating this discussion he 
indicated that he felt that they should be independent.  But in John's 
proposal, the third term: /basisOfRecord/ is clearly intended to be a 
subclass of /dcterms:type/ with possible values of "StillImage", 
"MovingImage", "Sound", "PreservedSpecimen", FossilSpecimen", 
LivingSpecimen", "HumanObservation", "MachineObservation".  Since all of 
these /basisOfRecord/ objects are the bases for documenting Occurrences, 
that insinuates that their parent /dcterms:type/ terms should fall under 
/recordClass/ Occurrence. 

*Problems with calling /basisOfRecord/ a subclass of /dcterms:type/.  
*PreservedSpecimen, FossilSpecimen, and LivingSpecimen can clearly fall 
under PhysicalObject, but what do we do with the rest?  /basisOfRecord 
/terms StillImage and MovingImage could be subtypes of /dcterms:type/ 
Image, but what about a Sound?  Is /basisOfRecord /Sound a subtype of 
/dcterms:type/ Sound?  What about a 35mm slide picturing an organism?  
It is an Image, but is also a PhysicalObject.  Gregor suggested that 
/basisOfRecord /HumanObservation and /basisOfRecord /MachineObservation 
could be subtypes of /dcterms:type /Event but I disagree.  Just like 
observations, StillImages and PreservedSpecimens have Event information 
associated with them (the time and location of their creation) but we 
don't classify them as Events.  An Event is a conceptually different 
thing from the resource that is created at that Event. 

*How does this change facilitate machine processing? * In John's request 
for comments, he quoted "3.3. Semantic changes in Darwin Core terms" 
which mentioned "if ... such changes of meaning are likely to have 
substantial impact on either machine processing of Darwin Core terms... 
"  It is not at all clear to me how the proposed reorganization of these 
three terms (/recordClass/, /dcterms:type,/ and /basisOfRecord/) will 
facilitate machine processing, in particular because of the problems in 
associating particular /basisOfRecord/ terms with /dcterms:type/ terms 
as I discussed in the previous paragraph.  From a machine-processing 
standpoint, it makes a lot more sense to subclass /recordClass/ 
Occurrence as follows:
*PhysicalObject *(including /BasisOfRecord/ "PreservedSpecimen", 
FossilSpecimen", LivingSpecimen", and any other relevant material 
objects such as Seeds, FilmImage, etc.)
*DigitalObject *(including /BasisOfRecord/ "StillImage", "MovingImage", 
"Sound", and any other relevant file-representable objects such as 
DnaSequences)
*NoObject *(including /BasisOfRecord/  "HumanObservation", 
"MachineObservation")
The consuming application that is receiving the metadata can then know 
that if the record involves an Occurrence, if the record's subclass is:
- PhysicalObject then there is an object somewhere that is not 
deliverable through the Internet but which could be visited in an 
herbarium or museum.
- DigitalObject then there is a representational file that should be 
retrieved and presented to the user of the application in an appropriate 
way.
- NoObject then there will only be metadata including measurements but 
nothing for the user to see, hear, etc.
Alternatively, rather than creating the three subclasses, simply create 
a property for resources of the class Occurrence called "objectType" and 
allow it to have values of PhysicalObject, DigitalObject, or NoObject.  
This would serve the same purpose of informing the consuming application 
of the nature of the resource without having to create another 
hierarchical layer.  From a machine-processing standpoint, I don't see a 
great benefit to presenting a biodiversity-related consuming application 
with both /dcterms:type,/ and /basisOfRecord./

*An alternative.  *To me, trying to merge the /dcterms:type/ and its 
DCMI Type values together with the new DwC /recordClass/ and 
/BasisOfRecord/ is like trying to fit a square peg in a round hole.  
DCMI wasn't created with biodiversity records in mind and its vocabulary 
really doesn't mesh very well with DwC.  I fully support the idea of 
creating the new DwC term /recordClass/ to contain what DwC formerly put 
in /dcterms:type/.  However, I think it would just be better to leave 
the actual /dcterms:type/ and its DCMI Type values as an independent 
thing without trying to make DwC /basisOfRecord/ its subtype.  
Biodiversity media providers who need for their databases to mesh with 
non-biodiversity records should assign /dcterms:type/ values to their 
records, non-media providers can if they want.  Biodiversity data 
providers (including those who provide media) should assign 
/recordClass/ and /BasisOfRecord/ values to their records. 

Steve Baskauf

John R. WIECZOREK wrote:
> Gregor,
>
> That sounds like a good solution to all of the problems. I would
> propose that the basisOfRecord IS the the same thing as your proposed
> dwc:subtype, so we should keep basisOfRecord.
>
> Net solution:
>
> 1) keep dcterms:type
> 2) use DCType vocabulary to control dcterms:type (so, StillImage,
> PhysicalObject, Event, etc.)
> 3) keep basisOfRecord
> 4) use our DwC-specific subtypes (PreservedSpecimen, FossilSpecimen,
> HumanObservation, etc.) as the controlled vocabulary for basisOfRecord
> without a formal type vocabulary (very close to how it is now, just
> some of the terms would go to dcterms:type).
> 5) add a recordClass term
> 6) use the DwCType vocabulary to control the recordClass term instead
> of the dcterms:type term.
>
> This solutions fixes the Dublin Core - Darwin Core controlled
> vocabulary problem, retains all existing terms, isolates the
> controlled vocabulary that is specific to our domain, making it very
> easy to expand without changes to the standard.
>
> Any objections?
>
> John
>
> On Fri, Oct 23, 2009 at 12:33 PM, Gregor Hagedorn
> <g.m.hagedorn at gmail.com> wrote:
>   
>>> How about we retain basisOfRecord, but have it refine dcterms:type,
>>> drop dcterms:type and add a "recordClass" term in its place that is
>>> governed exactly as dcterms:type is currently being used in the
>>> recently ratified version of the Core?
>>>       
>> recordClass for Taxon/Occurrence/Event sounds good.
>>
>> I am less sure about keeping the "perspective-dependent"
>> basisOfRecord-term in a place where dcterms:type. The dcterms:type
>> vocabulary is, in principle, extensible, and meant to be extended.
>> Except, of course, some specific xml-implementation of dublin core
>> prevent this... To avoid problems with this one might desire to have
>> only the strict resource type vocabulary in dcterms:type. Then this
>> could by PhysicalObject/Event and a dwc:subtype added to express
>> PreservedSpecimen/MachineObservation etc. Essentially, MRTG intends to
>> use such a mrtg:subtype as well to differentiate different StillImage
>> or Text subtypes:
>>  http://www.keytonature.eu/wiki/MRTG_Schema_v0.8#Subtype
>>
>> This would then mean, DarwinCore might support:
>>  dwc:recordClass
>>  dcterms:type
>>  dwc:subtype
>>
>> Gregor
>>
>>     
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20091024/2a1a742c/attachment.html 


More information about the tdwg-content mailing list