What is dwc:basisOfRecord for?

newer
What is an Occurrence? [followup...

older
Treatise on Occurrence, tokens,...

Steve Baskauf

26 Oct 2010 26 Oct '10

05:34

OK, I know that this sounds like a stupid question, but I really want somebody who was involved in the development and maintenance of the current DwC standard to tell me how the term dwc:basisOfRecord is supposed to be used (not what it IS - I've seen the definition at http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)? I would like for the answer of this question to be separated from the issue of what the Darwin Core type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for. I re-read the lengthy thread starting with http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html which talked a lot about basisOfRecord and its relationship to other ways of typing things. I don't want to re-plough that ground again, but I couldn't find the post that stated what the final decision was. I remember that there was a decision to NOT create the recordClass term which was the subject of much discussion. I guess my confusion at this point is with the inclusion of both "Occurrence" and "PreservedSpecimen" in the same list. Let's say that I have a flat database where I include metadata about the Occurrence (such as dwc:recordedBy) and the specimen (such as dwc:preparations) in the same line. What is the basisOfRecord for that line? I would guess that the "basis of the record" was the specimen. But the line in the record also represents an Occurrence. It seems like there is a lack of clarity as to whether basisOfRecord is supposed to indicate the type of the record (which would be an Occurrence record) or whether it's supposed to indicate the kind of evidence on which the record is based (which would be PreservedSpecimen). There have been various times where I've seen a database record that includes basisOfRecord and it seems to be inconsistently applied. I can see how the Darwin Core type vocabulary could be useful - it pretty much lays out useful values that one could give for rdfs:type. But basisOfRecord as a term is confusing me. Steve -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu

Show replies by date

Blum, Stan

27 Oct 27 Oct

08:02

New subject: [tdwg-content] What is dwc:basisOfRecord for?

Steve, I'm wasn't involved in those final discussions of dwc:basisOfRecord and the type vocabulary, but I don't see a difficulty. The Dublin Core type vocabulary includes the following: Collection Dataset Event Image InteractiveResource MovingImage PhysicalObject Service Software Sound StillImage Text The Darwin Core types extend that with the three additional types, and with Occurrence being further subtyped with those different kinds of Occurrences. Location Taxon Occurrence PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist Data publishers/providers should categorize their Occurrence data as one of those subtypes (or perhaps another subtype that wasn't included in that list). It's up to the consumer to decide which kind of Occurrence subtypes are appropriate for a particular use. Could you give examples of the inconsistent use of values in basisOfRecord ? Also note, John W. is traveling at the moment. Markus might be able to provide additional thoughts. -Stan On 10/25/10 10:34 PM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu> wrote:

...

OK, I know that this sounds like a stupid question, but I really want somebody who was involved in the development and maintenance of the current DwC standard to tell me how the term dwc:basisOfRecord is supposed to be used (not what it IS - I've seen the definition at http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)? I would like for the answer of this question to be separated from the issue of what the Darwin Core type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for.

I re-read the lengthy thread starting with http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html which talked a lot about basisOfRecord and its relationship to other ways of typing things. I don't want to re-plough that ground again, but I couldn't find the post that stated what the final decision was. I remember that there was a decision to NOT create the recordClass term which was the subject of much discussion.

I guess my confusion at this point is with the inclusion of both "Occurrence" and "PreservedSpecimen" in the same list. Let's say that I have a flat database where I include metadata about the Occurrence (such as dwc:recordedBy) and the specimen (such as dwc:preparations) in the same line. What is the basisOfRecord for that line? I would guess that the "basis of the record" was the specimen. But the line in the record also represents an Occurrence. It seems like there is a lack of clarity as to whether basisOfRecord is supposed to indicate the type of the record (which would be an Occurrence record) or whether it's supposed to indicate the kind of evidence on which the record is based (which would be PreservedSpecimen). There have been various times where I've seen a database record that includes basisOfRecord and it seems to be inconsistently applied.

I can see how the Darwin Core type vocabulary could be useful - it pretty much lays out useful values that one could give for rdfs:type. But basisOfRecord as a term is confusing me.

Steve

Steve Baskauf

17:34

New subject: [tdwg-content] What is dwc:basisOfRecord for?

occurrenceID recordedBy other Occurrence terms preparations other specimen-related terms basisOfRecord http://herbarium.org/12345 Joe Curator ... pressed and dried ... ??? OK, above is an example of a database record that I would consider typical. The manager has flattened the general model we have been discussing to merge the Occurrence resource with the token resource since in his/her database no occurrence has any token other than a single specimen. So what is the value for basisOfRecord: Occurrence or PreservedSpecimen? What I'm getting at here is that there seems to be ambiguity as to whether we intend for basisOfRecord to represent the type of the record (which in this case I would say is Occurrence) or the type of the token on which the record is based (which in this case is PreservedSpecimen). In that lengthy discussion that happened last October, there was discussion of having the proposed recordClass represent the type of the overall record (Occurrence, Event, Location, etc.) and basisOfRecord representing the type of the token or evidence on which an Occurrence record is based (PreservedSpecimen). I'm not sure what is intended now. I see this lack of clarity as being a consequence of the general blurring of the distinction between the Occurrence and the "token". I feel like there is a general consensus that we need to tighten up our definitions regarding Occurrences and their evidence even if some people will prefer to continue using the "flattened" approach to Occurrences and their tokens illustrated above (which they are entitled to do). I don't have a problem with the various types that you've listed below. The problem is that I don't think the definition of basisOfRecord makes clear how it (basisOfRecord) should apply - the definition just says "the specific nature of the data record" and that the controlled vocabulary of the Darwin Core Type Vocabulary should be used. Since the Type Vocabulary includes all of the classes (Event, Location, Occurrence, etc.) in addition to the types of "tokens", I believe that some users might think that making a statement like basisOfRecord="Event" is correct. If we intend for basisOfRecord to ONLY apply to Occurrences, and for basisOfRecord to ONLY have as valid values the types that apply to tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we should say so explicitly in the definition. If we do not say this, then we end up with the kind of ambiguity that I illustrated above. basisOfRecord ends up getting "overloaded" to represent general classes of records and also to represent the types of tokens. In addition, I'm not entirely convinced that basisOfRecord actually has any use at all. It seems to be intended for situations such as I've illustrated above where a database table has rows that could contain occurrences documented with different types of tokens. Ostensibly, basisOfRecord is needed to tell us the type of the token. But realistically, any such table that includes multiple token types isn't going to work very well. For example, if the table includes Occurrences that are documented by specimens, images, and no token/memories (i.e. HumanObservations), then it's going to have a bunch of columns that are empty for any particular record. The specimens rows won't have anything in the image term columns, the images won't have anything in the specimen term columns, and the human observations won't have anything in either the specimen or image term columns. I think most database managers would just include the terms that apply to all Occurrences in the Occurrence table and then use identifiers to link separate tables for the metadata terms that are specific to the various types of tokens. basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record? So again, my basic question is not so much about the various types and subtypes to which we can refer, but specifically how the basisOfRecord term is to be used. Steve Blum, Stan wrote:

...

Steve,

I'm wasn't involved in those final discussions of dwc:basisOfRecord and the type vocabulary, but I don't see a difficulty. The Dublin Core type vocabulary includes the following:

Collection Dataset Event Image InteractiveResource MovingImage PhysicalObject Service Software Sound StillImage Text

The Darwin Core types extend that with the three additional types, and with Occurrence being further subtyped with those different kinds of Occurrences.

Location Taxon Occurrence PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist

Data publishers/providers should categorize their Occurrence data as one of those subtypes (or perhaps another subtype that wasn't included in that list). It's up to the consumer to decide which kind of Occurrence subtypes are appropriate for a particular use.

Could you give examples of the inconsistent use of values in basisOfRecord ?

Also note, John W. is traveling at the moment. Markus might be able to provide additional thoughts.

-Stan

On 10/25/10 10:34 PM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu> wrote:

...
OK, I know that this sounds like a stupid question, but I really want somebody who was involved in the development and maintenance of the current DwC standard to tell me how the term dwc:basisOfRecord is supposed to be used (not what it IS - I've seen the definition at http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)? I would like for the answer of this question to be separated from the issue of what the Darwin Core type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for.

I re-read the lengthy thread starting with http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html which talked a lot about basisOfRecord and its relationship to other ways of typing things. I don't want to re-plough that ground again, but I couldn't find the post that stated what the final decision was. I remember that there was a decision to NOT create the recordClass term which was the subject of much discussion.

I guess my confusion at this point is with the inclusion of both "Occurrence" and "PreservedSpecimen" in the same list. Let's say that I have a flat database where I include metadata about the Occurrence (such as dwc:recordedBy) and the specimen (such as dwc:preparations) in the same line. What is the basisOfRecord for that line? I would guess that the "basis of the record" was the specimen. But the line in the record also represents an Occurrence. It seems like there is a lack of clarity as to whether basisOfRecord is supposed to indicate the type of the record (which would be an Occurrence record) or whether it's supposed to indicate the kind of evidence on which the record is based (which would be PreservedSpecimen). There have been various times where I've seen a database record that includes basisOfRecord and it seems to be inconsistently applied.

I can see how the Darwin Core type vocabulary could be useful - it pretty much lays out useful values that one could give for rdfs:type. But basisOfRecord as a term is confusing me.

Steve

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu

Blum, Stan

19:15

New subject: [tdwg-content] What is dwc:basisOfRecord for?

OK, I think I understand your question better. The person/organization who creates data should characterize their records as narrowly in the type scheme as they can. An easy example might be an herbarium sheet, which I would type as a preservedSpecimen. A more difficult example would be an image of a tiger taken by a camera-trap. It could be typed as a still image and/or a machineObservation. It obviously has properties of both, so the type should be determined by context. If I am managing that data, I should know that it is both an image and represents a natural occurrence of an organism in space and time. If someone asks me for images of tigers, I want to include those in my response. If someone is modeling the distributions of tigers based on all kinds of natural occurrences, I would want to include data from the camera traps, scat samples, etc. In general, I think it’s appropriate to separate local management of data from the provision of data. In local data management, you meet your own needs, and TDWG doesn’t have much to say about that. In data provision, the provider can’t presume to know the consumer’s purpose, and TDWG exists to help us understand each other’s needs and design solutions. Our challenge is to develop systems that support discovery, assessment, and reuse of data for purposes that weren’t anticipated. I think that entails having the ability to recast or re-type our data. And by the way, I do now see inconsistency in the typing scheme I laid out and the precise definitions that Hilmar was asking us to create and use. In my interpretation of the value enumeration, I had imposed a hierarchy (with namespace prefixes added): dc:Collection dc: Dataset dc: Event dc: Image dc: InteractiveResource dc: MovingImage dc: PhysicalObject dc: Service dc: Software dc: Sound dc: StillImage dc:Text dwc:Location dwc:Taxon dwc:Occurrence dwc:PreservedSpecimen dwc:FossilSpecimen dwc:LivingSpecimen dwc:HumanObservation dwc:MachineObservation dwc:NomenclaturalChecklist If a preservedSpecimen is a kind of Occurrence, and an individual can become a preservedSpecimen, then it would seem reasonable that an individual is a kind of Occurrence. Oops, we have a problem. Obviously I have to be more careful with typing schemes and logical assertions. In the information modeling / database design world, super/sub-typing is about data structure. I know nothing about machine reasoning, so I should be more careful as I venture into the RDF world. -Stan On 10/27/10 10:34 AM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu> wrote: occurrenceID recordedBy other Occurrence terms preparations other specimen-related terms basisOfRecord http://herbarium.org/12345 Joe Curator ... pressed and dried ... ??? OK, above is an example of a database record that I would consider typical. The manager has flattened the general model we have been discussing to merge the Occurrence resource with the token resource since in his/her database no occurrence has any token other than a single specimen. So what is the value for basisOfRecord: Occurrence or PreservedSpecimen? What I'm getting at here is that there seems to be ambiguity as to whether we intend for basisOfRecord to represent the type of the record (which in this case I would say is Occurrence) or the type of the token on which the record is based (which in this case is PreservedSpecimen). In that lengthy discussion that happened last October, there was discussion of having the proposed recordClass represent the type of the overall record (Occurrence, Event, Location, etc.) and basisOfRecord representing the type of the token or evidence on which an Occurrence record is based (PreservedSpecimen). I'm not sure what is intended now. I see this lack of clarity as being a consequence of the general blurring of the distinction between the Occurrence and the "token". I feel like there is a general consensus that we need to tighten up our definitions regarding Occurrences and their evidence even if some people will prefer to continue using the "flattened" approach to Occurrences and their tokens illustrated above (which they are entitled to do). I don't have a problem with the various types that you've listed below. The problem is that I don't think the definition of basisOfRecord makes clear how it (basisOfRecord) should apply - the definition just says "the specific nature of the data record" and that the controlled vocabulary of the Darwin Core Type Vocabulary should be used. Since the Type Vocabulary includes all of the classes (Event, Location, Occurrence, etc.) in addition to the types of "tokens", I believe that some users might think that making a statement like basisOfRecord="Event" is correct. If we intend for basisOfRecord to ONLY apply to Occurrences, and for basisOfRecord to ONLY have as valid values the types that apply to tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we should say so explicitly in the definition. If we do not say this, then we end up with the kind of ambiguity that I illustrated above. basisOfRecord ends up getting "overloaded" to represent general classes of records and also to represent the types of tokens. In addition, I'm not entirely convinced that basisOfRecord actually has any use at all. It seems to be intended for situations such as I've illustrated above where a database table has rows that could contain occurrences documented with different types of tokens. Ostensibly, basisOfRecord is needed to tell us the type of the token. But realistically, any such table that includes multiple token types isn't going to work very well. For example, if the table includes Occurrences that are documented by specimens, images, and no token/memories (i.e. HumanObservations), then it's going to have a bunch of columns that are empty for any particular record. The specimens rows won't have anything in the image term columns, the images won't have anything in the specimen term columns, and the human observations won't have anything in either the specimen or image term columns. I think most database managers would just include the terms that apply to all Occurrences in the Occurrence table and then use identifiers to link separate tables for the metadata terms that are specific to the various types of tokens. basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record? So again, my basic question is not so much about the various types and subtypes to which we can refer, but specifically how the basisOfRecord term is to be used. Steve

John Wieczorek

2 Nov 2 Nov

00:12

New subject: [tdwg-content] What is dwc:basisOfRecord for?

Sorry for the delay. Just back from Tanzania. The basisOfRecord is meant to classify the content of the resource (record) as specifically as possible, in the sense of how the contributor of the resource intended for it to be used. The value of the basisOfRecord should be based on the most specific Class (a DwCtype) of information the resource (record) represents. Thus, if an Occurrence record was supported by a PreservedSpecimen as evidence, then the record containing the specimen information should have PreservedSpecimen as the basisOfRecord. It's still an Occurrence, because PreservedSpecimen is a formal refinement of an Occurrence in the RDF sense of <rdfs:subClassOf rdf:resource=" http://rs.tdwg.org/dwc/dwctype/Occurrence"/>. So, your example record should have PreservedSpecimen as its basisOfRecord, and that means it is an Occurrence record, because all PreservedSpecimen records are. Why have the Occurrence type? For consistency. We have one DwCType for every DwC (or borrowed Dublin Core) Class. Yet, we are much more interested in the peculiarities of Occurrences than we are in different types of, say, Events, because of the implications for their use. So we have gone to the effort to create several subclasses of Occurrence based on demonstrated requirements. Why have a basisOfRecord at all? To assist the data consumer to know the ways in which the resource (record) might be used. Without the basisOfRecord term, the detailed content of the records would have to be assessed to assert what the record was about. Steve said, "But realistically, any such table that includes multiple token types isn't going to work very well." and, "basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record?" Why not? Nothing in Darwin Core says that you can't create a schema in which an Occurrence is supported by multiple "tokens", each with its own basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe that's where these perceptions of inadequacy come from. On Wed, Oct 27, 2010 at 10:34 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:

...

occurrenceID recordedBy other Occurrence terms preparations other specimen-related terms basisOfRecord http://herbarium.org/12345 Joe Curator ... pressed and dried ... ???

OK, above is an example of a database record that I would consider typical. The manager has flattened the general model we have been discussing to merge the Occurrence resource with the token resource since in his/her database no occurrence has any token other than a single specimen. So what is the value for basisOfRecord: Occurrence or PreservedSpecimen? What I'm getting at here is that there seems to be ambiguity as to whether we intend for basisOfRecord to represent the type of the record (which in this case I would say is Occurrence) or the type of the token on which the record is based (which in this case is PreservedSpecimen). In that lengthy discussion that happened last October, there was discussion of having the proposed recordClass represent the type of the overall record (Occurrence, Event, Location, etc.) and basisOfRecord representing the type of the token or evidence on which an Occurrence record is based (PreservedSpecimen). I'm not sure what is intended now.

I see this lack of clarity as being a consequence of the general blurring of the distinction between the Occurrence and the "token". I feel like there is a general consensus that we need to tighten up our definitions regarding Occurrences and their evidence even if some people will prefer to continue using the "flattened" approach to Occurrences and their tokens illustrated above (which they are entitled to do). I don't have a problem with the various types that you've listed below. The problem is that I don't think the definition of basisOfRecord makes clear how it (basisOfRecord) should apply - the definition just says "the specific nature of the data record" and that the controlled vocabulary of the Darwin Core Type Vocabulary should be used. Since the Type Vocabulary includes all of the classes (Event, Location, Occurrence, etc.) in addition to the types of "tokens", I believe that some users might think that making a statement like basisOfRecord="Event" is correct. If we intend for basisOfRecord to ONLY apply to Occurrences, and for basisOfRecord to ONLY have as valid values the types that apply to tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we should say so explicitly in the definition. If we do not say this, then we end up with the kind of ambiguity that I illustrated above. basisOfRecord ends up getting "overloaded" to represent general classes of records and also to represent the types of tokens.

In addition, I'm not entirely convinced that basisOfRecord actually has any use at all. It seems to be intended for situations such as I've illustrated above where a database table has rows that could contain occurrences documented with different types of tokens. Ostensibly, basisOfRecord is needed to tell us the type of the token. But realistically, any such table that includes multiple token types isn't going to work very well. For example, if the table includes Occurrences that are documented by specimens, images, and no token/memories (i.e. HumanObservations), then it's going to have a bunch of columns that are empty for any particular record. The specimens rows won't have anything in the image term columns, the images won't have anything in the specimen term columns, and the human observations won't have anything in either the specimen or image term columns. I think most database managers would just include the terms that apply to all Occurrences in the Occurrence table and then use identifiers to link separate tables for the metadata terms that are specific to the various types of tokens. basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record?

So again, my basic question is not so much about the various types and subtypes to which we can refer, but specifically how the basisOfRecord term is to be used.

Steve

Blum, Stan wrote:

Steve,

I'm wasn't involved in those final discussions of dwc:basisOfRecord and the type vocabulary, but I don't see a difficulty. The Dublin Core type vocabulary includes the following:

Collection Dataset Event Image InteractiveResource MovingImage PhysicalObject Service Software Sound StillImage Text

The Darwin Core types extend that with the three additional types, and with Occurrence being further subtyped with those different kinds of Occurrences.

Location Taxon Occurrence PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist

Data publishers/providers should categorize their Occurrence data as one of those subtypes (or perhaps another subtype that wasn't included in that list). It's up to the consumer to decide which kind of Occurrence subtypes are appropriate for a particular use.

Could you give examples of the inconsistent use of values in basisOfRecord ?

Also note, John W. is traveling at the moment. Markus might be able to provide additional thoughts.

-Stan

On 10/25/10 10:34 PM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu> <steve.baskauf@vanderbilt.edu> wrote:

OK, I know that this sounds like a stupid question, but I really want somebody who was involved in the development and maintenance of the current DwC standard to tell me how the term dwc:basisOfRecord is supposed to be used (not what it IS - I've seen the definition athttp://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)? I would like for the answer of this question to be separated from the issue of what the Darwin Core type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for.

I re-read the lengthy thread starting withhttp://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html which talked a lot about basisOfRecord and its relationship to other ways of typing things. I don't want to re-plough that ground again, but I couldn't find the post that stated what the final decision was. I remember that there was a decision to NOT create the recordClass term which was the subject of much discussion.

I guess my confusion at this point is with the inclusion of both "Occurrence" and "PreservedSpecimen" in the same list. Let's say that I have a flat database where I include metadata about the Occurrence (such as dwc:recordedBy) and the specimen (such as dwc:preparations) in the same line. What is the basisOfRecord for that line? I would guess that the "basis of the record" was the specimen. But the line in the record also represents an Occurrence. It seems like there is a lack of clarity as to whether basisOfRecord is supposed to indicate the type of the record (which would be an Occurrence record) or whether it's supposed to indicate the kind of evidence on which the record is based (which would be PreservedSpecimen). There have been various times where I've seen a database record that includes basisOfRecord and it seems to be inconsistently applied.

I can see how the Darwin Core type vocabulary could be useful - it pretty much lays out useful values that one could give for rdfs:type. But basisOfRecord as a term is confusing me.

Steve

_______________________________________________ tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content

.

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Steve Baskauf

03:52

New subject: [tdwg-content] What is dwc:basisOfRecord for?

I'm having problems with this in several ways, but I think the main one at the moment is saying that PreservedSpecimen is a rdfs:subClassOf Occurrence. If we do that, then we say that all instances of PreservedSpecimen are also instances of Occurrence. Based on the discussion that took place in the thread about what is an Occurrence, I thought the consensus was that the Occurrence was a separate resource from the token that documents it. If that is true, then it does not make sense to say that a PreservedSpecimen is an Occurrence. Rather, in RDF I would say: <rdf:Description rdf:about="http://something.org/12345#occurrence"> <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"/> ... other properties of the Occurrence such as dwc:recordedBy ... </rdf:Description> <rdf:Description rdf:about="http://something.org/12345#specimen"> <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"/> ... other properties of the specimen such as dwc:preparations ... </rdf:Description> If it is NOT true that an Occurrence is a separate resource from the token, then do we want to say: <rdf:Description rdf:about="http://something.org/12345"> <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"/> <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"/> ... other properties of the Occurrence such as dwc:recordedBy ... ... other properties of the specimen such as dwc:preparations ... </rdf:Description> ? If we go by this approach, (as you seem to suggest by Why not? Nothing in Darwin Core says that you can't create a schema in which an Occurrence is supported by multiple "tokens", each with its own basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe that's where these perceptions of inadequacy come from. ) then that suggests to me that we could say that an Occurrence which is documented by both a specimen and an image would be described as: <rdf:Description rdf:about="http://something.org/12345"> <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"/> <rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"/> <rdf:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> ... other properties of the Occurrence such as dwc:recordedBy ... ... other properties of the specimen such as dwc:preparations ... ... other properties of the image such as dcterms:rights, mrtg:caption, etc. ... </rdf:Description> and you get odd statements like saying that occurrences have copyrights, images have preparations, etc. Ugh. It gets even worse. I just looked at http://darwincore.googlecode.com/svn/trunk/rdf/dwctype.rdf which I guess is the official RDF description of the dwctypes. According to it, not only is PreservedSpecimen a subclass of Occurrence, but Occurrence is a subclass of http://purl.org/dc/dcmitype/Event . That means that not only is an instance of PreservedSpecimen an instance of Occurrence, but it is also by inference an instance of Event, (even if we don't explicitly state that using rdf:type). Thus the resource I've represented in RDF above is all at once an event, an occurrence, a specimen, and an image. That really mixes up entities that we seemed to be considering separate things in that thread. I guess if this kind of subclassing is in the RDF, that's the way it is. But I don't like it. This was all just starting to make sense to me when I had decided that it was best to go against what I said in the Biodiversity Informatics paper and separate Occurrences from their tokens. Maybe this separation only matters if one is trying to clearly define what is a property of what as in RDF and doesn't matter in databases where you don't really have to be clear about the subject of properties. But it seems to me like it's overloading the dwctypes to say that they should both serve as the way to define they type of a thing (i.e. the class to which the thing belongs) and to indicate the relationship that some thing has to something else (i.e. hasASpecimen, hasAnImage, hasAnObservation, etc.) as one is apparently doing with basisOfRecord. I guess if one objects to this kind of overloading, one could just use the URIs of the DwC classes themselves as values for rdf:type rather than using the dwctype URIs. The RDF there doesn't seem to have any kind of subclassing (but there is the problem of typing PhysicalSpecimen which is not a generic DwC class). By the way, we don't have one DwCType for every DwC (or borrowed Dublin Core) Class as you said. There are no types for Identification, Event, and GeologicalContext Thanks for the explanation - I'll keep trying to understand. Steve John Wieczorek wrote: [some pieces cut out to paste above]

...

Sorry for the delay. Just back from Tanzania.

The basisOfRecord is meant to classify the content of the resource (record) as specifically as possible, in the sense of how the contributor of the resource intended for it to be used. The value of the basisOfRecord should be based on the most specific Class (a DwCtype) of information the resource (record) represents. Thus, if an Occurrence record was supported by a PreservedSpecimen as evidence, then the record containing the specimen information should have PreservedSpecimen as the basisOfRecord. It's still an Occurrence, because PreservedSpecimen is a formal refinement of an Occurrence in the RDF sense of <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"/>. So, your example record should have PreservedSpecimen as its basisOfRecord, and that means it is an Occurrence record, because all PreservedSpecimen records are.

Why have the Occurrence type? For consistency. We have one DwCType for every DwC (or borrowed Dublin Core) Class. Yet, we are much more interested in the peculiarities of Occurrences than we are in different types of, say, Events, because of the implications for their use. So we have gone to the effort to create several subclasses of Occurrence based on demonstrated requirements.

Why have a basisOfRecord at all? To assist the data consumer to know the ways in which the resource (record) might be used. Without the basisOfRecord term, the detailed content of the records would have to be assessed to assert what the record was about.

Steve said, "But realistically, any such table that includes multiple token types isn't going to work very well."

and,

"basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record?"

Why not? Nothing in Darwin Core says that you can't create a schema in which an Occurrence is supported by multiple "tokens", each with its own basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe that's where these perceptions of inadequacy come from.

On Wed, Oct 27, 2010 at 10:34 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu <mailto:steve.baskauf@vanderbilt.edu>> wrote:

occurrenceID recordedBy other Occurrence terms preparations other specimen-related terms basisOfRecord http://herbarium.org/12345 Joe Curator ... pressed and dried ... ???

OK, above is an example of a database record that I would consider typical. The manager has flattened the general model we have been discussing to merge the Occurrence resource with the token resource since in his/her database no occurrence has any token other than a single specimen. So what is the value for basisOfRecord: Occurrence or PreservedSpecimen? What I'm getting at here is that there seems to be ambiguity as to whether we intend for basisOfRecord to represent the type of the record (which in this case I would say is Occurrence) or the type of the token on which the record is based (which in this case is PreservedSpecimen). In that lengthy discussion that happened last October, there was discussion of having the proposed recordClass represent the type of the overall record (Occurrence, Event, Location, etc.) and basisOfRecord representing the type of the token or evidence on which an Occurrence record is based (PreservedSpecimen). I'm not sure what is intended now.

I see this lack of clarity as being a consequence of the general blurring of the distinction between the Occurrence and the "token". I feel like there is a general consensus that we need to tighten up our definitions regarding Occurrences and their evidence even if some people will prefer to continue using the "flattened" approach to Occurrences and their tokens illustrated above (which they are entitled to do). I don't have a problem with the various types that you've listed below. The problem is that I don't think the definition of basisOfRecord makes clear how it (basisOfRecord) should apply - the definition just says "the specific nature of the data record" and that the controlled vocabulary of the Darwin Core Type Vocabulary should be used. Since the Type Vocabulary includes all of the classes (Event, Location, Occurrence, etc.) in addition to the types of "tokens", I believe that some users might think that making a statement like basisOfRecord="Event" is correct. If we intend for basisOfRecord to ONLY apply to Occurrences, and for basisOfRecord to ONLY have as valid values the types that apply to tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we should say so explicitly in the definition. If we do not say this, then we end up with the kind of ambiguity that I illustrated above. basisOfRecord ends up getting "overloaded" to represent general classes of records and also to represent the types of tokens.

In addition, I'm not entirely convinced that basisOfRecord actually has any use at all. It seems to be intended for situations such as I've illustrated above where a database table has rows that could contain occurrences documented with different types of tokens. Ostensibly, basisOfRecord is needed to tell us the type of the token. But realistically, any such table that includes multiple token types isn't going to work very well. For example, if the table includes Occurrences that are documented by specimens, images, and no token/memories (i.e. HumanObservations), then it's going to have a bunch of columns that are empty for any particular record. The specimens rows won't have anything in the image term columns, the images won't have anything in the specimen term columns, and the human observations won't have anything in either the specimen or image term columns. I think most database managers would just include the terms that apply to all Occurrences in the Occurrence table and then use identifiers to link separate tables for the metadata terms that are specific to the various types of tokens. basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record?

So again, my basic question is not so much about the various types and subtypes to which we can refer, but specifically how the basisOfRecord term is to be used.

Steve

Blum, Stan wrote:

...
Steve,

I'm wasn't involved in those final discussions of dwc:basisOfRecord and the type vocabulary, but I don't see a difficulty. The Dublin Core type vocabulary includes the following:

Collection Dataset Event Image InteractiveResource MovingImage PhysicalObject Service Software Sound StillImage Text

The Darwin Core types extend that with the three additional types, and with Occurrence being further subtyped with those different kinds of Occurrences.

Location Taxon Occurrence PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist

Data publishers/providers should categorize their Occurrence data as one of those subtypes (or perhaps another subtype that wasn't included in that list). It's up to the consumer to decide which kind of Occurrence subtypes are appropriate for a particular use.

Could you give examples of the inconsistent use of values in basisOfRecord ?

Also note, John W. is traveling at the moment. Markus might be able to provide additional thoughts.

-Stan

On 10/25/10 10:34 PM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu> <mailto:steve.baskauf@vanderbilt.edu> wrote:

...
OK, I know that this sounds like a stupid question, but I really want somebody who was involved in the development and maintenance of the current DwC standard to tell me how the term dwc:basisOfRecord is supposed to be used (not what it IS - I've seen the definition at http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)? I would like for the answer of this question to be separated from the issue of what the Darwin Core type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for.

I re-read the lengthy thread starting with http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html which talked a lot about basisOfRecord and its relationship to other ways of typing things. I don't want to re-plough that ground again, but I couldn't find the post that stated what the final decision was. I remember that there was a decision to NOT create the recordClass term which was the subject of much discussion.

I guess my confusion at this point is with the inclusion of both "Occurrence" and "PreservedSpecimen" in the same list. Let's say that I have a flat database where I include metadata about the Occurrence (such as dwc:recordedBy) and the specimen (such as dwc:preparations) in the same line. What is the basisOfRecord for that line? I would guess that the "basis of the record" was the specimen. But the line in the record also represents an Occurrence. It seems like there is a lack of clarity as to whether basisOfRecord is supposed to indicate the type of the record (which would be an Occurrence record) or whether it's supposed to indicate the kind of evidence on which the record is based (which would be PreservedSpecimen). There have been various times where I've seen a database record that includes basisOfRecord and it seems to be inconsistently applied.

I can see how the Darwin Core type vocabulary could be useful - it pretty much lays out useful values that one could give for rdfs:type. But basisOfRecord as a term is confusing me.

Steve

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content

.

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content

Hilmar Lapp

04:05

New subject: [tdwg-content] What is dwc:basisOfRecord for?

On Nov 1, 2010, at 11:52 PM, Steve Baskauf wrote:

...

not only is PreservedSpecimen a subclass of Occurrence, but Occurrence is a subclass of http://purl.org/dc/dcmitype/Event .

Both of these must be mistakes? I guess this all works well for data exchange. But for anyone trying any kind of reasoning this will throw everything off. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================

John Wieczorek

13:59

New subject: [tdwg-content] What is dwc:basisOfRecord for?

I'm not saying the subClassing is "right", I'm saying that is how it "is" at the moment. Encoded there is the view at the time the standard was ratified that specimens and observations "are" Occurrences. That was sufficient, at the time, to "do" what the community of reviewers wanted Darwin Core to "do." Up until now we have not needed to distinguish an Occurrence from a "token." Nor did we have applications requiring the semantic consistency and specificity being discussed now. Though some reviewers wanted Darwin Core to be a full-blown ontology, most of the community could not wait for that. Based on the last several months of tdwg-content discussions, I believe it was the right choice to not wait, as Darwin Core definitely already satisfies a variety of needs. Yet, these discussions are making good progress toward an ontology - not for the sake of having one, but for the purpose of "doing" something with it. So, I think we have a very good starting point with the Darwin Core as "...a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information." I think we have a lot of work to go the next step, to get the semantics right, for which choices have to be considered more carefully than has been done in the past. Thankfully, it seems we have reached a critical mass to try to meet the challenge. The effort is much appreciated. Comments inline... On Mon, Nov 1, 2010 at 8:52 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu>wrote:

...

I'm having problems with this in several ways, but I think the main one at the moment is saying that PreservedSpecimen is a rdfs:subClassOf Occurrence. If we do that, then we say that all instances of PreservedSpecimen are also instances of Occurrence.

Yes, that's what Darwin Core says.

...

Based on the discussion that took place in the thread about what is an Occurrence, I thought the consensus was that the Occurrence was a separate resource from the token that documents it. If that is true, then it does not make sense to say that a PreservedSpecimen is an Occurrence.

True, but Darwin Core doesn't know about tokens, only the people following the discussions here do.

...

Rather, in RDF I would say:

<rdf:Description rdf:about="http://something.org/12345#occurrence"<http://something.org/12345#occurrence>

...
<rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"<http://rs.tdwg.org/dwc/dwctype/Occurrence> /> ... other properties of the Occurrence such as dwc:recordedBy ... </rdf:Description> <rdf:Description rdf:about="http://something.org/12345#specimen"<http://something.org/12345#specimen>

...
<rdf:type rdf:resource= "http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"<http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen> /> ... other properties of the specimen such as dwc:preparations ... </rdf:Description>

If it is NOT true that an Occurrence is a separate resource from the token, then do we want to say:

<rdf:Description rdf:about="http://something.org/12345"<http://something.org/12345>

...
<rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"<http://rs.tdwg.org/dwc/dwctype/Occurrence> /> <rdf:type rdf:resource= "http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"<http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen> /> ... other properties of the Occurrence such as dwc:recordedBy ... ... other properties of the specimen such as dwc:preparations ... </rdf:Description>

?

No. We don't want to say that, even the way Darwin Core is right now, because specimens don't have properties separate from Occurrences - they ARE Occurrences.

...

If we go by this approach, (as you seem to suggest by

Why not? Nothing in Darwin Core says that you can't create a schema in which an Occurrence is supported by multiple "tokens", each with its own basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe that's where these perceptions of inadequacy come from. ) then that suggests to me that we could say that an Occurrence which is documented by both a specimen and an image would be described as:

<rdf:Description rdf:about="http://something.org/12345"<http://something.org/12345>

...
<rdf:type rdf:resource="http://rs.tdwg.org/dwc/dwctype/Occurrence"<http://rs.tdwg.org/dwc/dwctype/Occurrence> /> <rdf:type rdf:resource= "http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"<http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen> /> <rdf:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"<http://purl.org/dc/dcmitype/StillImage> /> ... other properties of the Occurrence such as dwc:recordedBy ... ... other properties of the specimen such as dwc:preparations ... ... other properties of the image such as dcterms:rights, mrtg:caption, etc. ... </rdf:Description>

and you get odd statements like saying that occurrences have copyrights, images have preparations, etc.

Encoded in the way your example shows, no, this doesn't make sense. You don't have the predicates you need in Darwin Core to support the assertions you would like to make. http://something.org/12345 isn't all of those things at once, it is related to them. In XML schema, where there is only structural, not semantic scrutiny, one has the freedom to express that the Occurrence identified by http://something.org/12345 is supported by the evidence embodied by a PreservedSpecimen record and a StillImage record, but it won't support any reasoning.

...

Ugh. It gets even worse. I just looked at http://darwincore.googlecode.com/svn/trunk/rdf/dwctype.rdf which I guess is the official RDF description of the dwctypes.

It is.

...

According to it, not only is PreservedSpecimen a subclass of Occurrence, but Occurrence is a subclass of http://purl.org/dc/dcmitype/Event . That means that not only is an instance of PreservedSpecimen an instance of Occurrence, but it is also by inference an instance of Event, (even if we don't explicitly state that using rdf:type).

Yes. What this says is that Darwin Core, as it currently stands, doesn't care to distinguish between these classes. I agree that this is a mistake. It makes me wonder if it is EVER appropriate to subClass. Thus the resource I've represented in RDF above is all at once an event, an

...

occurrence, a specimen, and an image. That really mixes up entities that we seemed to be considering separate things in that thread. I guess if this kind of subclassing is in the RDF, that's the way it is. But I don't like it.

I like that you don't like it.

...

This was all just starting to make sense to me when I had decided that it was best to go against what I said in the Biodiversity Informatics paper and separate Occurrences from their tokens. Maybe this separation only matters if one is trying to clearly define what is a property of what as in RDF and doesn't matter in databases where you don't really have to be clear about the subject of properties.

Agreed.

...

But it seems to me like it's overloading the dwctypes to say that they should both serve as the way to define they type of a thing (i.e. the class to which the thing belongs) and to indicate the relationship that some thing has to something else (i.e. hasASpecimen, hasAnImage, hasAnObservation, etc.) as one is apparently doing with basisOfRecord.

In the absence of the semantics, the basisOfRecord is all we have other than the content. In rdf, it may indeed be true that the basisOfRecord is unnecessary.

...

I guess if one objects to this kind of overloading, one could just use the URIs of the DwC classes themselves as values for rdf:type rather than using the dwctype URIs. The RDF there doesn't seem to have any kind of subclassing (but there is the problem of typing PhysicalSpecimen which is not a generic DwC class).

And a new class would have to be created every time a new token was conceived, and new predicates for every relationship of every new token to every other Class they might relate to. Doesn't seem scalable to try to define the whole biodiversity universe in this way, so what is the "sweet spot" solution. Create only what you need to create to solve the specific problem at hand? Don't try to standardize at this level?

...

By the way, we don't have one DwCType for every DwC (or borrowed Dublin Core) Class as you said. There are no types for Identification, Event, and GeologicalContext

There is a type for Event ( http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm#Event), but not for the other two. When I said "We have one DwCType for every DwC (or borrowed Dublin Core) Class" I should have added "for which it makes sense to have a record." Thanks for the explanation - I'll keep trying to understand.

...

No worries. The critical assessment is necessary. I wish it had been so carefully considered during review, though of course we still wouldn't have a standard to work with in that case. ;-)

...

Steve

John Wieczorek wrote: [some pieces cut out to paste above]

Sorry for the delay. Just back from Tanzania.

The basisOfRecord is meant to classify the content of the resource (record) as specifically as possible, in the sense of how the contributor of the resource intended for it to be used. The value of the basisOfRecord should be based on the most specific Class (a DwCtype) of information the resource (record) represents. Thus, if an Occurrence record was supported by a PreservedSpecimen as evidence, then the record containing the specimen information should have PreservedSpecimen as the basisOfRecord. It's still an Occurrence, because PreservedSpecimen is a formal refinement of an Occurrence in the RDF sense of <rdfs:subClassOf rdf:resource=" http://rs.tdwg.org/dwc/dwctype/Occurrence"/>. So, your example record should have PreservedSpecimen as its basisOfRecord, and that means it is an Occurrence record, because all PreservedSpecimen records are.

Why have the Occurrence type? For consistency. We have one DwCType for every DwC (or borrowed Dublin Core) Class. Yet, we are much more interested in the peculiarities of Occurrences than we are in different types of, say, Events, because of the implications for their use. So we have gone to the effort to create several subclasses of Occurrence based on demonstrated requirements.

Why have a basisOfRecord at all? To assist the data consumer to know the ways in which the resource (record) might be used. Without the basisOfRecord term, the detailed content of the records would have to be assessed to assert what the record was about.

Steve said, "But realistically, any such table that includes multiple token types isn't going to work very well."

and,

"basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record?"

Why not? Nothing in Darwin Core says that you can't create a schema in which an Occurrence is supported by multiple "tokens", each with its own basisOfRecord. Of course, Simple Darwin Core doesn't support that, and maybe that's where these perceptions of inadequacy come from.

On Wed, Oct 27, 2010 at 10:34 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:

...
occurrenceID recordedBy other Occurrence terms preparations other specimen-related terms basisOfRecord http://herbarium.org/12345 Joe Curator ... pressed and dried ... ???

OK, above is an example of a database record that I would consider typical. The manager has flattened the general model we have been discussing to merge the Occurrence resource with the token resource since in his/her database no occurrence has any token other than a single specimen. So what is the value for basisOfRecord: Occurrence or PreservedSpecimen? What I'm getting at here is that there seems to be ambiguity as to whether we intend for basisOfRecord to represent the type of the record (which in this case I would say is Occurrence) or the type of the token on which the record is based (which in this case is PreservedSpecimen). In that lengthy discussion that happened last October, there was discussion of having the proposed recordClass represent the type of the overall record (Occurrence, Event, Location, etc.) and basisOfRecord representing the type of the token or evidence on which an Occurrence record is based (PreservedSpecimen). I'm not sure what is intended now.

I see this lack of clarity as being a consequence of the general blurring of the distinction between the Occurrence and the "token". I feel like there is a general consensus that we need to tighten up our definitions regarding Occurrences and their evidence even if some people will prefer to continue using the "flattened" approach to Occurrences and their tokens illustrated above (which they are entitled to do). I don't have a problem with the various types that you've listed below. The problem is that I don't think the definition of basisOfRecord makes clear how it (basisOfRecord) should apply - the definition just says "the specific nature of the data record" and that the controlled vocabulary of the Darwin Core Type Vocabulary should be used. Since the Type Vocabulary includes all of the classes (Event, Location, Occurrence, etc.) in addition to the types of "tokens", I believe that some users might think that making a statement like basisOfRecord="Event" is correct. If we intend for basisOfRecord to ONLY apply to Occurrences, and for basisOfRecord to ONLY have as valid values the types that apply to tokens (i.e. PreservedSpecimen, LivingSpecimen, HumanObservation), then we should say so explicitly in the definition. If we do not say this, then we end up with the kind of ambiguity that I illustrated above. basisOfRecord ends up getting "overloaded" to represent general classes of records and also to represent the types of tokens.

In addition, I'm not entirely convinced that basisOfRecord actually has any use at all. It seems to be intended for situations such as I've illustrated above where a database table has rows that could contain occurrences documented with different types of tokens. Ostensibly, basisOfRecord is needed to tell us the type of the token. But realistically, any such table that includes multiple token types isn't going to work very well. For example, if the table includes Occurrences that are documented by specimens, images, and no token/memories (i.e. HumanObservations), then it's going to have a bunch of columns that are empty for any particular record. The specimens rows won't have anything in the image term columns, the images won't have anything in the specimen term columns, and the human observations won't have anything in either the specimen or image term columns. I think most database managers would just include the terms that apply to all Occurrences in the Occurrence table and then use identifiers to link separate tables for the metadata terms that are specific to the various types of tokens. basisOfRecord also has no use for Occurrences that have several tokens. Which of the several tokens are we saying is the "basis" of the record?

So again, my basic question is not so much about the various types and subtypes to which we can refer, but specifically how the basisOfRecord term is to be used.

Steve

Blum, Stan wrote:

Steve,

I'm wasn't involved in those final discussions of dwc:basisOfRecord and the type vocabulary, but I don't see a difficulty. The Dublin Core type vocabulary includes the following:

Collection Dataset Event Image InteractiveResource MovingImage PhysicalObject Service Software Sound StillImage Text

The Darwin Core types extend that with the three additional types, and with Occurrence being further subtyped with those different kinds of Occurrences.

Location Taxon Occurrence PreservedSpecimen FossilSpecimen LivingSpecimen HumanObservation MachineObservation NomenclaturalChecklist

Data publishers/providers should categorize their Occurrence data as one of those subtypes (or perhaps another subtype that wasn't included in that list). It's up to the consumer to decide which kind of Occurrence subtypes are appropriate for a particular use.

Could you give examples of the inconsistent use of values in basisOfRecord ?

Also note, John W. is traveling at the moment. Markus might be able to provide additional thoughts.

-Stan

On 10/25/10 10:34 PM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu> <steve.baskauf@vanderbilt.edu> wrote:

OK, I know that this sounds like a stupid question, but I really want somebody who was involved in the development and maintenance of the current DwC standard to tell me how the term dwc:basisOfRecord is supposed to be used (not what it IS - I've seen the definition athttp://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)? I would like for the answer of this question to be separated from the issue of what the Darwin Core type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) is for.

I re-read the lengthy thread starting withhttp://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html which talked a lot about basisOfRecord and its relationship to other ways of typing things. I don't want to re-plough that ground again, but I couldn't find the post that stated what the final decision was. I remember that there was a decision to NOT create the recordClass term which was the subject of much discussion.

I guess my confusion at this point is with the inclusion of both "Occurrence" and "PreservedSpecimen" in the same list. Let's say that I have a flat database where I include metadata about the Occurrence (such as dwc:recordedBy) and the specimen (such as dwc:preparations) in the same line. What is the basisOfRecord for that line? I would guess that the "basis of the record" was the specimen. But the line in the record also represents an Occurrence. It seems like there is a lack of clarity as to whether basisOfRecord is supposed to indicate the type of the record (which would be an Occurrence record) or whether it's supposed to indicate the kind of evidence on which the record is based (which would be PreservedSpecimen). There have been various times where I've seen a database record that includes basisOfRecord and it seems to be inconsistently applied.

I can see how the Darwin Core type vocabulary could be useful - it pretty much lays out useful values that one could give for rdfs:type. But basisOfRecord as a term is confusing me.

Steve

_______________________________________________ tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content

.

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences

postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.

delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235

office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu

Steve Baskauf

3 Nov 3 Nov

17:02

New subject: [tdwg-content] Overloading of basisOfRecord and the DwC type vocabulary (includes DigitalStillImage term addition proposal)

...

...

I think we have a lot of work to go the next step, to get the semantics right, for which choices have to be considered more carefully than has been done in the past. Thankfully, it seems we have reached a critical mass to try to meet the challenge. The effort is much appreciated. Yes that is encouraging!

...

No. We don't want to say that, even the way Darwin Core is right now, because specimens don't have properties separate from Occurrences - they ARE Occurrences. Well, I'm glad to hear you say that because I was starting to think that I was crazy when I treated PreservedSpecimens, images, sounds, etc. AS Occurrences in the Biodiversity Informatics paper. I think I got that notion from reading your posts on the list last (northern hemisphere) fall. ...

Yes. What this says is that Darwin Core, as it currently stands, doesn't care to distinguish between these classes. I agree that this is a mistake. It makes me wonder if it is EVER appropriate to subClass. OK, how hard would it be to get rid of the subClassOf properties in the RDF term definitions? Is that something that the TAG could do by fiat or would it have to go through the official DwC change process?

...

I guess if one objects to this kind of overloading, one could just use the URIs of the DwC classes themselves as values for rdf:type rather than using the dwctype URIs. The RDF there doesn't seem to have any kind of subclassing (but there is the problem of typing PhysicalSpecimen which is not a generic DwC class).

And a new class would have to be created every time a new token was conceived, and new predicates for every relationship of every new token to every other Class they might relate to. Doesn't seem scalable to try to define the whole biodiversity universe in this way, so what is the "sweet spot" solution. Create only what you need to create to solve the specific problem at hand? Don't try to standardize at this level? Well, upon further reflection, I'm thinking that it isn't necessary to create a new class for every kind of token. I have been mentally equating Darwin Core classes with rdfs:Class'es. They could be the same

John reminded me that I hadn't responded to his last post in the thread about my call for action on my proposal (http://code.google.com/p/darwincore/issues/detail?id=68) for adding DigitalStillImage to the DwC type vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm) to serve as a controlled value for basisOfRecord (http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord). After he sent the reply I actually laid in bed awake pondering what it was that disturbed me so much about basisOfRecord. After mulling it over for a few days I have concluded that the problem is that basisOfRecord and the DwC type vocabulary are overloaded too much. I will try to briefly state why I think this and then say what I think the implications are for my DigitalStillImage proposal. Apologies in advance to Bob for sloppy use of technical terms. THE PURPOSES It seems like there are basically three separate purposes that basisOfRecord and the dwctype vocabulary can serve/do serve/are intended to serve: 1. To define the types of resources, e.g. to serve as an object for the predicate rdf:type . Semantic reasoning based on assigning an rdf:type property to a resource would indicate that the resource was an instance of an rdfs:Class (see http://www.w3.org/TR/rdf-schema/#ch_type). 2. To serve as a property of an instance of dwc:Occurrence to allow a user to evaluate its fitness of use, e.g. an Occurrence having basisOfRecord=PreservedSpecimen is an Occurrence having supporting evidence (or a "token") that is a preserved specimen. 3. To formally define the ontological relationship among DwC classes through the assignment of an rdfs:subClassOf property to a member of the DwC type vocabularly (see http://www.w3.org/TR/rdf-schema/#ch_subclassof). As noted below, the current term definitions state that dwctype:PreservedSpecimen is a subclass of dwctype:Occurrence which is a subclass of dcmitype:Event I think that at the time of the adoption of Darwin Core as a standard, this overloading may have made sense, but now that people are considering using Darwin Core terms in RDF, this overloading is a hindrance, not a help. If I state that a resource has rdf:type=dwctype:PreservedSpecimen in an attempt to describe what the resource is, I end up causing unintended inferences to be drawn by semantic reasoners. If I state that a resource has dwc:basisOfRecord=dwctype:PreservedSpecimen, it is not clear whether I'm talking about the Occurrence or the physical specimen itself. SUGGESTIONS I would like to suggest that the way forward here is to separate these three functions, at least for the present. It seems clear from the recent discussion on the email list that the kind of subclassing that currently exists (#3 above) does not represent the way that many people in the Darwin Core constituency are looking at PreservedSpecimens, Occurrences, and Events. I would recommend removing the rdfs:subClassOf properties from all of the terms in the Darwin Core vocabulary until more substantial work is completed (with community consensus) on the TDWG ontology. To facilitate purpose #1, I would also recommend that all current Darwin Core classes be represented in the dwctype vocabulary. As I noted, the dwc:Identification class and some others are missing. Based on what John said below, I guess this was intentional, but if dwctype was intended for use #1, then all of the classes should be there. I actually believe that it would be beneficial to have a PhysicalSpecimen class and to move some Occurrence terms to it (similar issues with Taxon class), but that is a hornet's nest that I don't want to kick at the moment. But anyway it would be relatively straightforward to just include all of the existing classes as dwctypes. Purpose #2 is the one that seems the most problematic to me. It seems to me that if we want to refine Occurrence (as seems to be the purpose of making PreservedSpecimen, HumanObservation, etc. subclasses of Occurrence), then it would be better to create separate terms for those refined Occurrences to be used as the object of basisOfRecord rather than using the dwctypes. Thus we would have something like basisOfRecord="OccurrenceWithEvidencePreservedSpecimen", etc. rather than basisOfRecord="PreservedSpecimen". It would then be clear that the subject is the Occurrence, not the evidence. With this approach, there would be no problem with making the statement [OccurrenceWithEvidencePreservedSpecimen] rdfs:subClassOf [dwctype:Occurrence] in contrast with saying [PreservedSpecimen] rdfs:subClassOf [dwctype:Occurrence]. I'm not saying that this is a good thing to do - in fact I don't like it at all based on the philosophy that Roger laid out in http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot. But it is the approach that would get us out of making statements that don't make sense due to our lack of separation between the desire to refine Occurrence and to define rdf:type's. ABOUT THE DigitalStillImage PROPOSAL Having said that, it seems like DigitalStillImage is not needed to serve purpose #1. For that we already have http://purl.org/dc/dcmitype/StillImage . However, if people want to do #2, then DigitalStillImage should be there as a value for basisOfRecord. As I said in the paragraph above, "OccurrenceWithEvidenceDigitalStillImage" would probably be better than "DigitalStillImage" for this purpose. But I don't really care anymore because I have no intention of doing #2 since I would prefer not to subclass Occurrence. Instead I would create a new term (desperately needed, I think) that is a property of Occurrence and call it "hasEvidence", "hasToken", "evidenceID", or "tokenID", and that connects an Occurrence to an evidence URI. I would then type the evidence using rdf:type. But from what John says, it seems that there may be a lot of people who want to subclass Occurrence and for their benefit perhaps DigitalStillImage should be there. From my point of view, I would be happy with either letting the proposal go up for a vote as is and let the TAG accept it or shoot it down, or to "freeze" the proposal (something John at some point hinted might be possible) until some future time when the ontology and RDF development have reached a point where there is a consensus view (expressed in the DwC standard itself, not just in the email list) about what an Occurrence is and what its properties (including basisOfRecord) should be and mean. I just think that I'm ready to be done with worrying about it. Steve P.S. I'm going to stick in a few inline comments below as appropriate. Especially take note of the comment regarding the creation of new classes and typing. John Wieczorek wrote: thing, but wouldn't have to be. What is important here is that we have a way to meet the (somewhat vague) requirement in the GUID Applicability statement (http://www.tdwg.org/stdtrack/article/download/150/51) that resources in the biodiversity informatics world should be typed using a well-known vocabulary. If there are already well-known classes or types for the thing we need to describe (like StillImage in Dublin Core or Person in FOAF) then we can use them. The issue comes to a head when other vocabularies don't have terms for the things we need to type, e.g. Individuals and PreservedSpecimens. That's where DwC needs to create either classes or dwctype terms (unencumbered with subClass properties!). The problem here is that there needs to be a consensus in the community about appropriate values for rdf:type in RDF provided for GUIDs. Can a "semantic reasoner" program know that http://rs.tdwg.org/dwc/dwctype/Occurrence is the same kind of thing as http://rs.tdwg.org/dwc/terms/Occurrence and as the TDWG ontology URI for Occurrence (which I can't remember at the moment)? I think not unless we assert some kind of sameAs relationship, which seems dangerous to me. The GUID applicability statement specifically mentions the TDWG ontology, but I thing the GUID implementation is happening too fast to wait for all of the required "cat herding" that would be needed before a TDWG Ontology is done enough to serve this purpose. I think dwctype is the best answer for things that aren't already typed in another vocabulary. But I'm not going to use it while the subclass problems are still there.

...

By the way, we don't have one DwCType for every DwC (or borrowed Dublin Core) Class as you said. There are no types for Identification, Event, and GeologicalContext

There is a type for Event (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm#Event), but not for the other two. When I said "We have one DwCType for every DwC (or borrowed Dublin Core) Class" I should have added "for which it makes sense to have a record."

As I implied above, I think it makes sense to have a dwctype for an instance of any DwC class for which one might reasonably create a separate GUID (which is probably all of the classes).

...

5611

Age (days ago)

5619

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Blum, Stan
Hilmar Lapp
John Wieczorek
Steve Baskauf

What is dwc:basisOfRecord for?

tags

participants (4)