[tdwg-content] What is dwc:basisOfRecord for?

John Wieczorek tuco at berkeley.edu
Tue Nov 2 14:59:06 CET 2010


I'm not saying the subClassing is "right", I'm saying that is how it "is" at
the moment. Encoded there is the view at the time the standard was ratified
that specimens and observations "are" Occurrences. That was sufficient, at
the time, to "do" what the community of reviewers wanted Darwin Core to
"do." Up until now we have not needed to distinguish an Occurrence from a
"token." Nor did we have applications requiring the semantic consistency and
specificity being discussed now.

Though some reviewers wanted Darwin Core to be a full-blown ontology, most
of the community could not wait for that. Based on the last several months
of tdwg-content discussions, I believe it was the right choice to not wait,
as Darwin Core definitely already satisfies a variety of needs. Yet, these
discussions are making good progress toward an ontology - not for the sake
of having one, but for the purpose of "doing" something with it. So, I think
we have a very good starting point with the Darwin Core as

"...a glossary of terms (in other contexts these might be called properties,
elements, fields, columns, attributes, or concepts) intended to facilitate
the sharing of information about biological diversity by providing reference
definitions, examples, and commentaries. The Darwin Core is primarily based
on taxa, their occurrence in nature as documented by observations,
specimens, and samples, and related information."

I think we have a lot of work to go the next step, to get the semantics
right, for which choices have to be considered more carefully than has been
done in the past. Thankfully, it seems we have reached a critical mass to
try to meet the challenge. The effort is much appreciated.

Comments inline...

On Mon, Nov 1, 2010 at 8:52 PM, Steve Baskauf
<steve.baskauf at vanderbilt.edu>wrote:

>  I'm having problems with this in several ways, but I think the main one at
> the moment is saying that PreservedSpecimen is a rdfs:subClassOf
> Occurrence.  If we do that, then we say that all instances of
> PreservedSpecimen are also instances of Occurrence.
>

Yes, that's what Darwin Core says.


> Based on the discussion that took place in the thread about what is an
> Occurrence, I thought the consensus was that the Occurrence was a separate
> resource from the token that documents it.  If that is true, then it does
> not make sense to say that a PreservedSpecimen is an Occurrence.
>

True, but Darwin Core doesn't know about tokens, only the people following
the discussions here do.


> Rather, in RDF I would say:
>
> <rdf:Description rdf:about="http://something.org/12345#occurrence"<http://something.org/12345#occurrence>
> >
>     <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"<http://rs.tdwg.org/dwc/dwctype/Occurrence>
> />
>     ... other properties of the Occurrence such as dwc:recordedBy ...
> </rdf:Description>
> <rdf:Description rdf:about="http://something.org/12345#specimen"<http://something.org/12345#specimen>
> >
>     <rdf:type rdf:resource=
> "http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"<http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen>
> />
>     ... other properties of the specimen such as dwc:preparations ...
> </rdf:Description>
>
> If it is NOT true that an Occurrence is a separate resource from the token,
> then do we want to say:
>
> <rdf:Description rdf:about="http://something.org/12345"<http://something.org/12345>
> >
>     <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"<http://rs.tdwg.org/dwc/dwctype/Occurrence>
> />
>     <rdf:type rdf:resource=
> "http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"<http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen>
> />
>     ... other properties of the Occurrence such as dwc:recordedBy ...
>     ... other properties of the specimen such as dwc:preparations ...
> </rdf:Description>
>
> ?
>

No. We don't want to say that, even the way Darwin Core is right now,
because specimens don't have properties separate from Occurrences - they ARE
Occurrences.


> If we go by this approach, (as you seem to suggest by
>
> Why not? Nothing in Darwin Core says that you can't create a schema in
> which an Occurrence is supported by multiple "tokens", each with its own
> basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe
> that's where these perceptions of inadequacy come from.
> ) then that suggests to me that we could say that an Occurrence which is
> documented by both a specimen and an image would be described as:
>
> <rdf:Description rdf:about="http://something.org/12345"<http://something.org/12345>
> >
>     <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"<http://rs.tdwg.org/dwc/dwctype/Occurrence>
> />
>     <rdf:type rdf:resource=
> "http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"<http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen>
> />
>     <rdf:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"<http://purl.org/dc/dcmitype/StillImage>
> />
>     ... other properties of the Occurrence such as dwc:recordedBy ...
>     ... other properties of the specimen such as dwc:preparations ...
>     ... other properties of the image such as dcterms:rights, mrtg:caption,
> etc. ...
> </rdf:Description>
>
> and you get odd statements like saying that occurrences have copyrights,
> images have preparations, etc.
>

Encoded in the way your example shows, no, this doesn't make sense. You
don't have the predicates you need in Darwin Core to support the assertions
you would like to make. http://something.org/12345 isn't all of those things
at once, it is related to them. In XML schema, where there is only
structural, not semantic scrutiny, one has the freedom to express that the
Occurrence identified by http://something.org/12345 is supported by the
evidence embodied by a PreservedSpecimen record and a StillImage record, but
it won't support any reasoning.


> Ugh.  It gets even worse.  I just looked at
> http://darwincore.googlecode.com/svn/trunk/rdf/dwctype.rdf which I guess
> is the official RDF description of the dwctypes.
>

It is.


> According to it, not only is PreservedSpecimen a subclass of Occurrence,
> but Occurrence is a subclass of http://purl.org/dc/dcmitype/Event .  That
> means that not only is an instance of PreservedSpecimen an instance of
> Occurrence, but it is also by inference an instance of Event, (even if we
> don't explicitly state that using rdf:type).
>

Yes. What this says is that Darwin Core, as it currently stands, doesn't
care to distinguish between these classes. I agree that this is a
mistake. It makes me wonder if it is EVER appropriate to subClass.

Thus the resource I've represented in RDF above is all at once an event, an
> occurrence, a specimen, and an image.  That really mixes up entities that we
> seemed to be considering separate things in that thread.  I guess if this
> kind of subclassing is in the RDF, that's the way it is.  But I don't like
> it.
>

I like that you don't like it.


> This was all just starting to make sense to me when I had decided that it
> was best to go against what I said in the Biodiversity Informatics paper and
> separate Occurrences from their tokens.  Maybe this separation only matters
> if one is trying to clearly define what is a property of what as in RDF and
> doesn't matter in databases where you don't really have to be clear about
> the subject of properties.
>

Agreed.


> But it seems to me like it's overloading the dwctypes to say that they
> should both serve as the way to define they type of a thing (i.e. the class
> to which the thing belongs) and to indicate the relationship that some thing
> has to something else (i.e. hasASpecimen, hasAnImage, hasAnObservation,
> etc.) as one is apparently doing with basisOfRecord.
>

In the absence of the semantics, the basisOfRecord is all we have other than
the content. In rdf, it may indeed be true that the basisOfRecord is
unnecessary.


> I guess if one objects to this kind of overloading, one could just use the
> URIs of the DwC classes themselves as values for rdf:type rather than using
> the dwctype URIs.  The RDF there doesn't seem to have any kind of
> subclassing (but there is the problem of typing PhysicalSpecimen which is
> not a generic DwC class).
>

And a new class would have to be created every time a new token was
conceived, and new predicates for every relationship of every new token to
every other Class they might relate to. Doesn't seem scalable to try to
define the whole biodiversity universe in this way, so what is the "sweet
spot" solution. Create only what you need to create to solve the specific
problem at hand? Don't try to standardize at this level?


> By the way, we don't have one DwCType for every DwC (or borrowed Dublin
> Core) Class as you said.  There are no types for Identification, Event, and
> GeologicalContext
>

There is a type for Event (
http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm#Event), but not for
the other two. When I said "We have one DwCType for every DwC (or borrowed
Dublin Core) Class" I should have added "for which it makes sense to have a
record."

Thanks for the explanation - I'll keep trying to understand.
>

No worries. The critical assessment is necessary. I wish it had been so
carefully considered during review, though of course we still wouldn't have
a standard to work with in that case. ;-)


> Steve
>
> John Wieczorek wrote: [some pieces cut out to paste above]
>
> Sorry for the delay. Just back from Tanzania.
>
>  The basisOfRecord is meant to classify the content of the resource
> (record) as specifically as possible, in the sense of how the contributor of
> the resource intended for it to be used. The value of the basisOfRecord
> should be based on the most specific Class (a DwCtype) of information the
> resource (record) represents. Thus, if an Occurrence record was supported by
> a PreservedSpecimen as evidence, then the record containing the specimen
> information should have PreservedSpecimen as the basisOfRecord. It's still
> an Occurrence, because PreservedSpecimen is a formal refinement of an
> Occurrence in the RDF sense of <rdfs:subClassOf  rdf:resource="
> http://rs.tdwg.org/dwc/dwctype/Occurrence"/>. So, your example record
> should have PreservedSpecimen as its basisOfRecord, and that means it is an
> Occurrence record, because all PreservedSpecimen records are.
>
>  Why have the Occurrence type? For consistency. We have one DwCType for
> every DwC (or borrowed Dublin Core) Class. Yet, we are much more interested
> in the peculiarities of Occurrences than we are in different types of, say,
> Events, because of the implications for their use. So we have gone to the
> effort to create several subclasses of Occurrence based on demonstrated
> requirements.
>
>  Why have a basisOfRecord at all? To assist the data consumer to know the
> ways in which the resource (record) might be used. Without the basisOfRecord
> term, the detailed content of the records would have to be assessed to
> assert what the record was about.
>
>  Steve said,
> "But realistically, any such table that includes multiple token types
> isn't going to work very well."
>
>  and,
>
>  "basisOfRecord also has no use for Occurrences that have several tokens.
> Which of the several tokens are we saying is the "basis" of the record?"
>
>  Why not? Nothing in Darwin Core says that you can't create a schema in
> which an Occurrence is supported by multiple "tokens", each with its own
> basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe
> that's where these perceptions of inadequacy come from.
>
>
> On Wed, Oct 27, 2010 at 10:34 AM, Steve Baskauf <
> steve.baskauf at vanderbilt.edu> wrote:
>
>>   occurrenceID
>>  recordedBy
>>  other Occurrence terms
>>  preparations
>>  other specimen-related terms
>>  basisOfRecord
>>   http://herbarium.org/12345
>>  Joe Curator
>>  ...
>>  pressed and dried
>>  ...
>>  ???
>>
>> OK, above is an example of a database record that I would consider
>> typical.  The manager has flattened the general model we have been
>> discussing to merge the Occurrence resource with the token resource since in
>> his/her database no occurrence has any token other than a single specimen.
>> So what is the value for basisOfRecord: Occurrence or PreservedSpecimen?
>> What I'm getting at here is that there seems to be ambiguity as to whether
>> we intend for basisOfRecord to represent the type of the record (which in
>> this case I would say is Occurrence) or the type of the token on which the
>> record is based (which in this case is PreservedSpecimen).  In that lengthy
>> discussion that happened last October, there was discussion of having the
>> proposed recordClass represent the type of the overall record (Occurrence,
>> Event, Location, etc.) and basisOfRecord representing the type of the token
>> or evidence on which an Occurrence record is based (PreservedSpecimen).  I'm
>> not sure what is intended now.
>>
>> I see this lack of clarity as being a consequence of the general blurring
>> of the distinction between the Occurrence and the "token".  I feel like
>> there is a general consensus that we need to tighten up our definitions
>> regarding Occurrences and their evidence even if some people will prefer to
>> continue using the "flattened" approach to Occurrences and their tokens
>> illustrated above (which they are entitled to do).  I don't have a problem
>> with the various types that you've listed below.  The problem is that I
>> don't think the definition of basisOfRecord makes clear how it
>> (basisOfRecord) should apply - the definition just says "the specific nature
>> of the data record" and that the controlled vocabulary of the Darwin Core
>> Type Vocabulary should be used.  Since the Type Vocabulary includes all of
>> the classes (Event, Location, Occurrence, etc.) in addition to the types of
>> "tokens", I believe that some users might think that making a statement like
>> basisOfRecord="Event"
>> is correct.  If we intend for basisOfRecord to ONLY apply to Occurrences,
>> and for basisOfRecord to ONLY have as valid values the types that apply to
>> tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we
>> should say so explicitly in the definition.  If we do not say this, then we
>> end up with the kind of ambiguity that I illustrated above.  basisOfRecord
>> ends up getting "overloaded" to represent general classes of records and
>> also to represent the types of tokens.
>>
>> In addition, I'm not entirely convinced that basisOfRecord actually has
>> any use at all.  It seems to be intended for situations such as I've
>> illustrated above where a database table has rows that could contain
>> occurrences documented with different types of tokens.  Ostensibly,
>> basisOfRecord is needed to tell us the type of the token.  But
>> realistically, any such table that includes multiple token types isn't going
>> to work very well.  For example, if the table includes Occurrences that are
>> documented by specimens, images, and no token/memories (i.e.
>> HumanObservations), then it's going to have a bunch of columns that are
>> empty for any particular record.  The specimens rows won't have anything in
>> the image term columns, the images won't have anything in the specimen term
>> columns, and the human observations won't have anything in either the
>> specimen or image term columns.  I think most database managers would just
>> include the terms that apply to all Occurrences in the Occurrence table and
>> then use identifiers to link separate tables for the metadata terms that are
>> specific to the various types of tokens.  basisOfRecord also has no use for
>> Occurrences that have several tokens.  Which of the several tokens are we
>> saying is the "basis" of the record?
>>
>> So again, my basic question is not so much about the various types and
>> subtypes to which we can refer, but specifically how the basisOfRecord term
>> is to be used.
>>
>> Steve
>>
>> Blum, Stan wrote:
>>
>>  Steve,
>>
>> I'm wasn't involved in those final discussions of dwc:basisOfRecord and the
>> type vocabulary, but I don't see a difficulty.  The Dublin Core type
>> vocabulary includes the following:
>>
>>     Collection
>>     Dataset
>>     Event
>>     Image
>>     InteractiveResource
>>     MovingImage
>>     PhysicalObject
>>     Service
>>     Software
>>     Sound
>>     StillImage
>>     Text
>>
>> The Darwin Core types extend that with the three additional types, and with
>> Occurrence being further subtyped with those different kinds of Occurrences.
>>
>>     Location
>>     Taxon
>>     Occurrence
>>         PreservedSpecimen
>>         FossilSpecimen
>>         LivingSpecimen
>>         HumanObservation
>>         MachineObservation
>>         NomenclaturalChecklist
>>
>> Data publishers/providers should categorize their Occurrence data as one of
>> those subtypes (or perhaps another subtype that wasn't included in that
>> list).  It's up to the consumer to decide which kind of Occurrence subtypes
>> are appropriate for a particular use.
>>
>> Could you give examples of the inconsistent use of values in basisOfRecord ?
>>
>> Also note, John W. is traveling at the moment.  Markus might be able to
>> provide additional thoughts.
>>
>> -Stan
>>
>>
>>
>>
>> On 10/25/10 10:34 PM, "Steve Baskauf" <steve.baskauf at vanderbilt.edu> <steve.baskauf at vanderbilt.edu> wrote:
>>
>>
>>
>>  OK, I know that this sounds like a stupid question, but I really want
>> somebody who was involved in the development and maintenance of the
>> current DwC standard to tell me how the term dwc:basisOfRecord is
>> supposed to be used (not what it IS - I've seen the definition athttp://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)?  I would like for
>> the answer of this question to be separated from the issue of what the
>> Darwin Core type vocabulary
>> (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for.
>>
>> I re-read the lengthy thread starting withhttp://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html
>> which talked a lot about basisOfRecord and its relationship to other
>> ways of typing things.  I don't want to re-plough that ground again, but
>> I couldn't find the post that stated what the final decision was. I
>> remember that there was a decision to NOT create the recordClass term
>> which was the subject of much discussion.
>>
>> I guess my confusion at this point is with the inclusion of both
>> "Occurrence" and "PreservedSpecimen" in the same list.  Let's say that I
>> have a flat database where I include metadata about the Occurrence (such
>> as dwc:recordedBy) and the specimen (such as dwc:preparations) in the
>> same line.  What is the basisOfRecord for that line?  I would guess that
>> the "basis of the record" was the specimen.  But the line in the record
>> also represents an Occurrence.  It seems like there is a lack of clarity
>> as to whether basisOfRecord is supposed to indicate the type of the
>> record (which would be an Occurrence record) or whether it's supposed to
>> indicate the kind of evidence on which the record is based (which would
>> be PreservedSpecimen).  There have been various times where I've seen a
>> database record that includes basisOfRecord and it seems to be
>> inconsistently applied.
>>
>> I can see how the Darwin Core type vocabulary could be useful - it
>> pretty much lays out useful values that one could give for rdfs:type.
>> But basisOfRecord as a term is confusing me.
>>
>> Steve
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing listtdwg-content at lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>> .
>>
>>
>>
>>
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101102/cb838cf0/attachment-0001.html 


More information about the tdwg-content mailing list