clarification of the use of dwc:recordedBy
John et al.,
I have been pondering the difference of the use of the terms dwc:recordedBy and dcterms:creator (http://purl.org/dc/terms/creator). dcterms:creator is defined as: "An entity primarily responsible for making the resource." dwc:recordedBy is defined as: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Except for the one word "original" in dwc:recordedBy, I would believe that these two terms would mean the same thing in the case of Occurrence resources. In some cases, such as the collection of a physical specimen or photographing a live organism, I think that they are the same thing. The entity that creates the resource (specimen or image) is the same entity that has recorded the Occurrence. However, in the situation where a specimen is imaged, the resulting image resource would have a dcterms:creator that was the person or institution that did the specimen imaging, while according to the way that I read the definition, dwc:recordedBy for the specimen image would have a value that specified the collector of the specimen (not the photographer).
If I am correct in this interpretation, this distinction would be useful in the case of images because it would allow for a simple mechanism to distinguish between images that directly record the appearance of individual organisms and images that are simply digital representations of some other thing that records the appearance of an individual organism, i.e. if dcterms:creator= =dwc:recordedBy then the resource was collected directly from an organsim and if dcterms:creator !=dwc:recordedBy then the resource might represent some other resource that was collected directly from an organism.
I am not sure how this would apply in situations other than images. For example, if a spider were collected and assigned a persistent identifier, then later for a character documentation project the body parts were separated and considered separate specimens with their own identifiers, would the metadata for a leg specimen resource have dcterms:creator as the person who made the leg prep and dwc:recordedBy be the person who collected the spider?
Basically, I would like to know the intention of the use of the word "original" in the definition of dwc:recordedBy.
Steve Baskauf
Steve,
It sounds like you have a lack of IDs.
You always need at least two IDs to express anything. One is the ID of thing (the Occurrence) the other is the ID of the metadata for the occurrence (the occurrence record?). You can then say whether the dcterms:creator is describing the metadata (record) or the thing itself (event).
If you have a photo of the thing then you need at least one more ID to show that the dcterms:creator is the person who took the photo not the person who created the specimen or the person who created the metadata.
In a flat record you can't express all this without getting really in a twist with which object which field applies to. I guess ( and this is a guess) that is why dwc:recordedBy exists. It is really the creator of the record (in the sense of the biologist with a pencil) and not the creator of the digital record (as in the system that is publishing this thing on the web).
As the semantics get more complex one has to give up and go to regular RDF or risk adding more and more fields.
Just my two pence worth. John may have a more practical explanation.
All the best,
Roger
On 5 Dec 2009, at 13:08, Steve Baskauf wrote:
John et al.,
I have been pondering the difference of the use of the terms dwc:recordedBy and dcterms:creator (http://purl.org/dc/terms/creator). dcterms:creator is defined as: "An entity primarily responsible for making the resource." dwc:recordedBy is defined as: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Except for the one word "original" in dwc:recordedBy, I would believe that these two terms would mean the same thing in the case of Occurrence resources. In some cases, such as the collection of a physical specimen or photographing a live organism, I think that they are the same thing. The entity that creates the resource (specimen or image) is the same entity that has recorded the Occurrence. However, in the situation where a specimen is imaged, the resulting image resource would have a dcterms:creator that was the person or institution that did the specimen imaging, while according to the way that I read the definition, dwc:recordedBy for the specimen image would have a value that specified the collector of the specimen (not the photographer).
If I am correct in this interpretation, this distinction would be useful in the case of images because it would allow for a simple mechanism to distinguish between images that directly record the appearance of individual organisms and images that are simply digital representations of some other thing that records the appearance of an individual organism, i.e. if dcterms:creator= =dwc:recordedBy then the resource was collected directly from an organsim and if dcterms:creator !=dwc:recordedBy then the resource might represent some other resource that was collected directly from an organism.
I am not sure how this would apply in situations other than images. For example, if a spider were collected and assigned a persistent identifier, then later for a character documentation project the body parts were separated and considered separate specimens with their own identifiers, would the metadata for a leg specimen resource have dcterms:creator as the person who made the leg prep and dwc:recordedBy be the person who collected the spider?
Basically, I would like to know the intention of the use of the word "original" in the definition of dwc:recordedBy.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roger, Thanks for the comments. At this point, I'm thinking about this from the standpoint of GUID (HTTP URI/LSID) resolution rather than flat databases. I am, of course, hampered by my lack of understanding of exactly how the LSID resolution process works. According to the draft LSID applicability statement, under the circumstance where LSIDs are used, there would be a single identifier for GUID for something like a digital image of a specimen. The metadata associated with the image would be the metadata in the RDF file returned in response to a getmetadata() call. The actual digital image would be the data that would be returned in response to a getdata() call. If I'm understanding this correctly, the image and its metadata are not things that would be assigned separate persistent identifiers. It's not so clear to me how this would work in the case of a generic HTTP URI: identifier that was not a proxy for an LSID. The GUID applicability statement makes the distinction between data and metadata in section 4 ("Machine and human clients that retrieve the metadata associated with a GUID will use the associated typing information to decide how to process the metadata and any associated data. ") and specifies that the default format for metadata should be RDF. However, as far as I can tell, the document is silent on the mechanism by which access information for the actual data (an actual information resource itself as opposed to its metadata) would specified to a semantic-web enabled client. The working examples I've seen so far where HTTP URIs are used as persistent identifiers have all been for non-information resources (i.e. no data, only metadata).
I suppose the actual answer from an RDF perspective would be to do something similar to http://biocol.org/collection/rdf/id/35115 and http://lod.geospecies.org/ses/4XSQO.rdf where metadata for the identified resource and the metadata for the metadata are contained by two separate XML wrappers with separate rdf:about attributes. Then there could be separate dcterms:creator elements within each of the two containers. The object of the rdf:about attribute for the resource would be the LSID or persistent HTTP:URI of the object itself and the object of the rdf:about attribute for the metadata could be the URI of the rdf file. The URI for the rdf file would effectively be a separate (nonGUID) identifier for the metadata, as you suggested must exist.
The question of the meaning of dwc:recordedBy is still open, I guess.
How about this? Contents of file http://myherbarium.org/image/rdf/12345.rdf containing the metadata for an image identified as urn:lsid:myherbarium.org:image:12345 of a specimen identified as urn:lsid:myherbarium.org:vascular:12345
rdf:RDF <dwc:occurrenceID rdf:about="urn:lsid:myherbarium.org:image:12345"> <dcterms:creator rdf:resource="http://myherbarium.org/people/fred_photographer/foaf.rdf%22/%3E dcterms:created2008-09-08T10:22:15-0600</dcterms:created> <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/StillImage%22/%3E <dwc:recordedBy rdf:resource="http://myherbarium.org/people/joe_taxonomist/foaf.rdf%22/%3E dcterms:descriptionImage of herbarium specimen urn:lsid:myherbarium.org:vascular:12345</dcterms:description> ... more metadata about the image... </dwc:occurrenceID> <rdf:Description rdf:about="http://myherbarium.org/image/rdf/12345.rdf%22%3E dcterms:creatorJoe Shmoe Memorial Herbarium</dcterms:creator> dcterms:created2008-09-08T12:01:33-0600</dcterms:created> dcterms:modified2009-12-08T13:32:33-0600</dcterms:modified> dcterms:descriptionRDF formatted description of the image urn:lsid:myherbarium.org:image:12345</dcterms:description> dcterms:languageen</dcterms:language> ... more metadata about the metadata ... </rdf:Description> </rdf:RDF>
Contents of file http://myherbarium.org/vascular/rdf/12345.rdf containing the metadata for a specimen identified as urn:lsid:myherbarium.org:vascular:12345
rdf:RDF <dwc:occurrenceID rdf:about="urn:lsid:myherbarium.org:vascular:12345"> <dcterms:creator rdf:resource="http://myherbarium.org/people/joe_taxonomist/foaf.rdf%22/%3E dcterms:created2008-07-02</dcterms:created> <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject%22/%3E <dwc:recordedBy rdf:resource="http://myherbarium.org/people/joe_taxonomist/foaf.rdf%22/%3E dcterms:descriptionHerbarium specimen of Toxicodendron radicans</dcterms:description> <dwc:associatedMedia rdf:resource="http://resolver.org/urn:lsid:myherbarium.org:image:12345%22/%3E ... more metadata about the specimen... </dwc:occurrenceID> <rdf:Description rdf:about="http://myherbarium.org/vascular/rdf/12345.rdf%22%3E dcterms:creatorJoe Shmoe Memorial Herbarium</dcterms:creator> dcterms:created2008-09-08T12:01:30-0600</dcterms:created> dcterms:modified2009-10-07T09:14:08-0600</dcterms:modified> dcterms:descriptionRDF formatted description of the specimen urn:lsid:myherbarium.org:vascular:12345</dcterms:description> dcterms:languageen</dcterms:language> ... more metadata about the metadata ... </rdf:Description> </rdf:RDF>
Steve
Roger Hyam wrote:
Steve,
It sounds like you have a lack of IDs.
You always need at least two IDs to express anything. One is the ID of thing (the Occurrence) the other is the ID of the metadata for the occurrence (the occurrence record?). You can then say whether the dcterms:creator is describing the metadata (record) or the thing itself (event).
If you have a photo of the thing then you need at least one more ID to show that the dcterms:creator is the person who took the photo not the person who created the specimen or the person who created the metadata.
In a flat record you can't express all this without getting really in a twist with which object which field applies to. I guess ( and this is a guess) that is why dwc:recordedBy exists. It is really the creator of the record (in the sense of the biologist with a pencil) and not the creator of the digital record (as in the system that is publishing this thing on the web).
As the semantics get more complex one has to give up and go to regular RDF or risk adding more and more fields.
Just my two pence worth. John may have a more practical explanation.
All the best,
Roger
On 5 Dec 2009, at 13:08, Steve Baskauf wrote:
John et al.,
I have been pondering the difference of the use of the terms dwc:recordedBy and dcterms:creator (http://purl.org/dc/terms/creator). dcterms:creator is defined as: "An entity primarily responsible for making the resource." dwc:recordedBy is defined as: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Except for the one word "original" in dwc:recordedBy, I would believe that these two terms would mean the same thing in the case of Occurrence resources. In some cases, such as the collection of a physical specimen or photographing a live organism, I think that they are the same thing. The entity that creates the resource (specimen or image) is the same entity that has recorded the Occurrence. However, in the situation where a specimen is imaged, the resulting image resource would have a dcterms:creator that was the person or institution that did the specimen imaging, while according to the way that I read the definition, dwc:recordedBy for the specimen image would have a value that specified the collector of the specimen (not the photographer).
If I am correct in this interpretation, this distinction would be useful in the case of images because it would allow for a simple mechanism to distinguish between images that directly record the appearance of individual organisms and images that are simply digital representations of some other thing that records the appearance of an individual organism, i.e. if dcterms:creator= =dwc:recordedBy then the resource was collected directly from an organsim and if dcterms:creator !=dwc:recordedBy then the resource might represent some other resource that was collected directly from an organism.
I am not sure how this would apply in situations other than images. For example, if a spider were collected and assigned a persistent identifier, then later for a character documentation project the body parts were separated and considered separate specimens with their own identifiers, would the metadata for a leg specimen resource have dcterms:creator as the person who made the leg prep and dwc:recordedBy be the person who collected the spider?
Basically, I would like to know the intention of the use of the word "original" in the definition of dwc:recordedBy.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
Hi Steve,
Sorry about the delay. Not answering messages before the ink dries is one of the consequences of field work out of electronic contact. Feels good for the first week or two...
To answer your question by way of an example, we've been actively capturing tucos, taking notes, and photos, and tissue samples (in a behavior study such as this, taking specimens would have deleterious consequences for the overall goal). Some of these tucos are young of the year, some are animals we caught as young in years past. We're keen to publish all of the data we have as well as we can, not presuming to know in advance how they might be useful to someone else in the future.
On Sat, Dec 5, 2009 at 5:08 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
John et al.,
I have been pondering the difference of the use of the terms dwc:recordedBy and dcterms:creator (http://purl.org/dc/terms/creator). dcterms:creator is defined as: "An entity primarily responsible for making the resource." dwc:recordedBy is defined as: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Except for the one word "original" in dwc:recordedBy, I would believe that these two terms would mean the same thing in the case of Occurrence resources. In some cases, such as the collection of a physical specimen or photographing a live organism, I think that they are the same thing. The entity that creates the resource (specimen or image) is the same entity that has recorded the Occurrence. However, in the situation where a specimen is imaged, the resulting image resource would have a dcterms:creator that was the person or institution that did the specimen imaging, while according to the way that I read the definition, dwc:recordedBy for the specimen image would have a value that specified the collector of the specimen (not the photographer). If I am correct in this interpretation, this distinction would be useful in the case of images because it would allow for a simple mechanism to distinguish between images that directly record the appearance of individual organisms and images that are simply digital representations of some other thing that records the appearance of an individual organism, i.e. if dcterms:creator= =dwc:recordedBy then the resource was collected directly from an organsim and if dcterms:creator !=dwc:recordedBy then the resource might represent some other resource that was collected directly from an organism. I am not sure how this would apply in situations other than images. For example, if a spider were collected and assigned a persistent identifier, then later for a character documentation project the body parts were separated and considered separate specimens with their own identifiers, would the metadata for a leg specimen resource have dcterms:creator as the person who made the leg prep and dwc:recordedBy be the person who collected the spider?
Basically, I would like to know the intention of the use of the word "original" in the definition of dwc:recordedBy.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Wow, that was a complete failure to communicate. A note to the wise, don't try to send a draft from gmail finished while offline at the time the offline feature is synchronizing, even if your internet connection in the high Andes makes you desperate to send your message while packets actually seem to be moving - you'll lose the finished message in favor of the one last saved while online. Dang, and I was so careful writing the response that didn't make it through.
Not enough time to reconstruct it now, but I'll try to do it justice for the next trip to town on the 4th or so. Now that I know that connectivity doesn't become usable until after 8pm...
On Tue, Dec 29, 2009 at 5:35 AM, John R. WIECZOREK tuco@berkeley.edu wrote:
Hi Steve,
Sorry about the delay. Not answering messages before the ink dries is one of the consequences of field work out of electronic contact. Feels good for the first week or two...
To answer your question by way of an example, we've been actively capturing tucos, taking notes, and photos, and tissue samples (in a behavior study such as this, taking specimens would have deleterious consequences for the overall goal). Some of these tucos are young of the year, some are animals we caught as young in years past. We're keen to publish all of the data we have as well as we can, not presuming to know in advance how they might be useful to someone else in the future.
On Sat, Dec 5, 2009 at 5:08 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
John et al.,
I have been pondering the difference of the use of the terms dwc:recordedBy and dcterms:creator (http://purl.org/dc/terms/creator). dcterms:creator is defined as: "An entity primarily responsible for making the resource." dwc:recordedBy is defined as: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Except for the one word "original" in dwc:recordedBy, I would believe that these two terms would mean the same thing in the case of Occurrence resources. In some cases, such as the collection of a physical specimen or photographing a live organism, I think that they are the same thing. The entity that creates the resource (specimen or image) is the same entity that has recorded the Occurrence. However, in the situation where a specimen is imaged, the resulting image resource would have a dcterms:creator that was the person or institution that did the specimen imaging, while according to the way that I read the definition, dwc:recordedBy for the specimen image would have a value that specified the collector of the specimen (not the photographer). If I am correct in this interpretation, this distinction would be useful in the case of images because it would allow for a simple mechanism to distinguish between images that directly record the appearance of individual organisms and images that are simply digital representations of some other thing that records the appearance of an individual organism, i.e. if dcterms:creator= =dwc:recordedBy then the resource was collected directly from an organsim and if dcterms:creator !=dwc:recordedBy then the resource might represent some other resource that was collected directly from an organism. I am not sure how this would apply in situations other than images. For example, if a spider were collected and assigned a persistent identifier, then later for a character documentation project the body parts were separated and considered separate specimens with their own identifiers, would the metadata for a leg specimen resource have dcterms:creator as the person who made the leg prep and dwc:recordedBy be the person who collected the spider?
Basically, I would like to know the intention of the use of the word "original" in the definition of dwc:recordedBy.
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
OK. Connectivity seems to be sufficient this week to send the intended response to Steve's questions...
On Sat, Dec 5, 2009 at 5:08 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
John et al.,
I have been pondering the difference of the use of the terms dwc:recordedBy and dcterms:creator (http://purl.org/dc/terms/creator). dcterms:creator is defined as: "An entity primarily responsible for making the resource." dwc:recordedBy is defined as: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Except for the one word "original" in dwc:recordedBy, I would believe that these two terms would mean the same thing in the case of Occurrence resources. In some cases, such as the collection of a physical specimen or photographing a live organism, I think that they are the same thing. The entity that creates the resource (specimen or image) is the same entity that has recorded the Occurrence. However, in the situation where a specimen is imaged, the resulting image resource would have a dcterms:creator that was the person or institution that did the specimen imaging, while according to the way that I read the definition, dwc:recordedBy for the specimen image would have a value that specified the collector of the specimen (not the photographer). If I am correct in this interpretation, this distinction would be useful in the case of images because it would allow for a simple mechanism to distinguish between images that directly record the appearance of individual organisms and images that are simply digital representations of some other thing that records the appearance of an individual organism, i.e. if dcterms:creator= =dwc:recordedBy then the resource was collected directly from an organsim and if dcterms:creator !=dwc:recordedBy then the resource might represent some other resource that was collected directly from an organism.
Your interpretation is essentially correct, but there are situations in which the dcterms:creator of a specimen might not be the same as the dwc:recordedBy, largely because dcterms:creator is vague enough to be interpreted in many ways. That's partly why recordedBy was maintained from previous incarnations of Darwin Core, where it was "Collector". One example would be a specimen preparator, the person who actually makes the collected individual into a specimen. This isn't necessarily the same as the collector, and Darwin Core has no term to capture that additional specimen information.
I am not sure how this would apply in situations other than images. For example, if a spider were collected and assigned a persistent identifier, then later for a character documentation project the body parts were separated and considered separate specimens with their own identifiers, would the metadata for a leg specimen resource have dcterms:creator as the person who made the leg prep and dwc:recordedBy be the person who collected the spider?
I would say "yes". I would also say that the derivative resources might benefit from having an expanded controlled vocabulary for the basisOfRecord - something like a "PreservedSpecimenPart."
Basically, I would like to know the intention of the use of the word "original" in the definition of dwc:recordedBy.
The intention is to be very specific about who was responsible for getting the information at the source (in the field) rather than anything derivative. For example it could be the person for whom you might expect to find field notes describing the event from which the Occurrence was recorded.
Hope that is a satisfactory explanation.
John
Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
participants (3)
-
John R. WIECZOREK
-
Roger Hyam
-
Steve Baskauf