[tdwg-content] What is dwc:basisOfRecord for?

Wed Oct 27 21:15:18 CEST 2010

OK, I think I understand your question better.

The person/organization who creates data should characterize their records as narrowly in the type scheme as they can.  An easy example might be an herbarium sheet, which I would type as a preservedSpecimen.  A more difficult example would be  an image of a tiger taken by a camera-trap.  It could be typed as a still image and/or a machineObservation.  It obviously has properties of both, so the type should be determined by context.  If I am managing that data, I should know that it is both an image and represents a natural occurrence of an organism in space and time.  If someone asks me for images of tigers, I want to include those in my response.  If someone is modeling the distributions of tigers based on all kinds of natural occurrences, I would want to include data from the camera traps, scat samples, etc.

In general, I think it’s appropriate to separate local management of data from the provision of data.  In local data management, you meet your own needs, and TDWG doesn’t have much to say about that.  In data provision, the provider can’t presume to know the consumer’s purpose, and TDWG exists to help us understand each other’s needs and design solutions.   Our challenge is to develop systems that support discovery, assessment, and reuse of data for purposes that weren’t anticipated.  I think that entails having the ability to recast or re-type our data.

And by the way, I do now see inconsistency in the typing scheme I laid out and the precise definitions that Hilmar was asking us to create and use.  In my interpretation of the value enumeration, I had imposed a hierarchy (with namespace prefixes added):

dc:Collection
dc: Dataset
dc: Event
dc: Image
dc: InteractiveResource
dc: MovingImage
dc: PhysicalObject
dc: Service
dc: Software
dc: Sound
dc: StillImage
dc:Text
dwc:Location
dwc:Taxon
dwc:Occurrence
    dwc:PreservedSpecimen
    dwc:FossilSpecimen
    dwc:LivingSpecimen
    dwc:HumanObservation
    dwc:MachineObservation
    dwc:NomenclaturalChecklist

If a preservedSpecimen is a kind of Occurrence, and an individual can become a preservedSpecimen, then it would seem reasonable that an individual is a kind of Occurrence.  Oops, we have a problem.  Obviously I have to be more careful with typing schemes and logical assertions.  In the information modeling / database design world, super/sub-typing is about data structure.  I know nothing about machine reasoning, so I should be more careful as I venture into the RDF world.

-Stan

On 10/27/10 10:34 AM, "Steve Baskauf" <steve.baskauf at vanderbilt.edu> wrote:

 occurrenceID
  recordedBy
  other Occurrence terms
  preparations
  other specimen-related terms
  basisOfRecord

 http://herbarium.org/12345
  Joe Curator
  ...
  pressed and dried
  ...
  ???

OK, above is an example of a database record that I would consider typical.  The manager has flattened the general model we have been discussing to merge the Occurrence resource with the token resource since in his/her database no occurrence has any token other than a single specimen.  So what is the value for basisOfRecord: Occurrence or PreservedSpecimen?  What I'm getting at here is that there seems to be ambiguity as to whether we intend for basisOfRecord to represent the type of the record (which in this case I would say is Occurrence) or the type of the token on which the record is based (which in this case is PreservedSpecimen).  In that lengthy discussion that happened last October, there was discussion of having the proposed recordClass represent the type of the overall record (Occurrence, Event, Location, etc.) and basisOfRecord representing the type of the token or evidence on which an Occurrence record is based (PreservedSpecimen).  I'm not sure what is intended now.

I see this lack of clarity as being a consequence of the general blurring of the distinction between the Occurrence and the "token".  I feel like there is a general consensus that we need to tighten up our definitions regarding Occurrences and their evidence even if some people will prefer to continue using the "flattened" approach to Occurrences and their tokens illustrated above (which they are entitled to do).  I don't have a problem with the various types that you've listed below.  The problem is that I don't think the definition of basisOfRecord makes clear how it (basisOfRecord) should apply - the definition just says "the specific nature of the data record" and that the controlled vocabulary of the Darwin Core Type Vocabulary should be used.  Since the Type Vocabulary includes all of the classes (Event, Location, Occurrence, etc.) in addition to the types of "tokens", I believe that some users might think that making a statement like
basisOfRecord="Event"
is correct.  If we intend for basisOfRecord to ONLY apply to Occurrences, and for basisOfRecord to ONLY have as valid values the types that apply to tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we should say so explicitly in the definition. If we do not say this, then we end up with the kind of ambiguity that I illustrated above.  basisOfRecord ends up getting "overloaded" to represent general classes of records and also to represent the types of tokens.

In addition, I'm not entirely convinced that basisOfRecord actually has any use at all.  It seems to be intended for situations such as I've illustrated above where a database table has rows that could contain occurrences documented with different types of tokens.  Ostensibly, basisOfRecord is needed to tell us the type of the token.  But realistically, any such table that includes multiple token types isn't going to work very well.  For example, if the table includes Occurrences that are documented by specimens, images, and no token/memories (i.e. HumanObservations), then it's going to have a bunch of columns that are empty for any particular record.  The specimens rows won't have anything in the image term columns, the images won't have anything in the specimen term columns, and the human observations won't have anything in either the specimen or image term columns.  I think most database managers would just include the terms that apply to all Occurrences in the Occurrence table and then use identifiers to link separate tables for the metadata terms that are specific to the various types of tokens.  basisOfRecord also has no use for Occurrences that have several tokens.  Which of the several tokens are we saying is the "basis" of the record?

So again, my basic question is not so much about the various types and subtypes to which we can refer, but specifically how the basisOfRecord term is to be used.

Steve

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101027/29fa9c7a/attachment-0001.html