Hi Ann,
Thanks for the clear example and explanation. I agree with you that my attempt to solve the problem of propagating data sets (each adding record identifiers) by referring to the "original" record has problems when it comes to this "custody case". I think the best identifiers to use in the institutionCode, collectionID, collectionCode, and catalogNumber would be those for the repository institution, the one in which the specimens are actually curated, the one which you describe as having custody. If I take the references to "original" out of the definitions then the choice of what information to put there will be at the discretion of the data publisher. That's probably fine.
Specifically, I propose the following new definitions:
institutionCode: "The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record." collectionCode: "The name (or acronym) identifying the collection or data set from which the record was derived." collectionID: "A unique identifier for the original collection or dataset from which the record was derived. For physical specimens, the recommended best practice is to use the identifier in a collections registry such as the Biodiversity Collections Index (http://www.biodiversitycollectionsindex.org/)." datasetID: "An identifier for the data set. May be a global unique identifier or an identifier specific to a collection or institution." (unchanged) catalogNumber: An identifier (preferably unique) for the record within the data set or collection.
There exist many relationships that institutions can have with a specimen (or data set), and I don't think it is necessary, or even appropriate, to support all of these possibilities (including ownership, which can be very controversial) in the primary data served via Simple Darwin Core. The mechanism exists to relate records between data sets explicitly using the ResourceRelationship class as well as through the otherCatalogNumbers and associatedOccurrences terms. I posit that these are sufficient to cover the cases presented thus far.
John
On Mon, Aug 17, 2009 at 6:40 AM, Ann_Hitchcock@nps.gov wrote:
John et al.:
I am responding regarding the definitions outlined in your note below pertaining to
institutionCode: collectionCode: collectionID: datasetID: catalogNumber:
The definitions that you provided work from the standpoint of the originating institution, but may not work well from the standpoint of a repository when the originating institution remains involved in the management of the specimens and records while the specimens may be on loan to the repository. I can best describe the problem by giving a typical example from the National Park Service.
Example: NPS permits the collection of specimens in a park. The specimens remain federal property and are cataloged in the NPS system. The specimens go on loan to the State University Museum, which has agreed to serve as the repository. The State University Museum (SUM) creates a new catalog record for the specimen using its catalog system, while cross-referencing the NPS catalog number. The State University Museum periodically reports to NPS on research use of the specimen, third party loans (which NPS authorizes the State University Museum to make), annotations, condition, etc. NPS updates its catalog record (the original record) with this information. As a condition of the permit, the permitted researcher submits copies of his/her field notes, maps, photos (a.k.a. associated resource management records) to NPS and conveys the copyright. NPS catalogs the resource management records in its archival system. NPS provides copies of the field notes to the State University Museum for use in managing the specimens and to provide researcher access to the notes.
If this example follows the definitions outlined below, the NPS, as the institution administering the original record (and specimens), would be identified in the InstitutionCode, CollectionCode, and CollectionID, while the repository institution could show its information in AssociatedOccurrence (for example, "same as SUM:Mammal:1234") and OtherCatalogNumber fields. Is the repository going to be satisfied? Afterall, the repository is where researchers go to see the specimens. Is the repository getting adequate recognition for its role?
An alternative, as previously proposed, could provide for parallel data sets for the owning institution and institution of custody. See Issues 35 and 36 at http://code.google.com/p/darwincore/issues/list.
Thanks to all for your consideration of this issue.
Ann
Ann Hitchcock National Park Service 1849 C Street, NW (2301) Washington, DC 20240-0001 202-354-2271 Fax: 202-371-2422
"John R. WIECZOREK" <tuco@berkeley.ed To u> Wouter Addink wouter@eti.uva.nl, Sent by: Neil Thomson n.thomson@nhm.ac.uk, tdwg-content-boun TDWG Content Mailing List ces@lists.tdwg.or tdwg-content@lists.tdwg.org g cc Lee Belbin lee@tdwg.org, Gail Kampmeier gkamp@uiuc.edu, 08/15/2009 08:14 "donald.hobern" AM donald.hobern@csiro.au Subject [tdwg-content] NCD and DwC Please respond to tuco@berkeley.edu
Hi Wouter and Neil (and others),
I hope both of you are well. I know this may be a busy time (or, better, vacation time), but I hope that you have had a chance to consider recent discussions on tdwg-content about the relationships between DwC, NCD and the TDWG Ontology. In addition to those public discussions, I'm adding a few questions and comments I have had during the progression of the Darwin Core Review. I'm cc'ing those having a clear vested interest in resolution on both sides. I would urge you to look at the relevant tdwg-content commentary as well as my concerns from the messages below so that we can hopefully quickly come to a consensus on joint plan. I say quickly because I am eager that DwC review shouldn't undergo further unnecessary delays.
In case it's a bit much to go through all of the "literature" relevant to the proposal I'm making, and in hopes of facilitating quick solutions, I'll summarize.
- Dublin Core recommends the use of the dcterms rather than their
antiquated dc counterparts. Shouldn't NCD follow suit? Specific example: instead of http://purl.org/dc/elements/1.1/source, use http://purl.org/dc/terms/source.
- NCD is using terms from the TDWG ontology, which is to date an
unfinished academic exercise without any review. This dependency seems to me to guarantee that NCD will require revision when the ontology is revised. This wouldn't necessarily be required if NCD took the reigns and defined terms that aren't already in another standard (the Ontology does not fit into this category) within its own domain. Specifically, abandon http://rs.tdwg.org/ontology/voc/ in favor of http://rs.tdwg.org/ncd/terms/.
- Reword some of the NCD term definitions so that NCD can be used
more generally for data sets (data collections), and not just for object collections.
With these commitments, DwC could safely move forward reusing NCD terms. Without the last two, DwC will have to redefine terms such as collectionID.
Following are relevant message excerpts from previous tdwg-content postings:
from John R. WIECZOREK tuco@berkeley.edu reply-to tuco@berkeley.edu to TDWG Content Mailing List tdwg-content@lists.tdwg.org date Thu, Jul 23, 2009 at 6:20 PM subject Darwin Core Collection-related terms
I have taken the content of the Darwin Core Issues 32 and 33 to post here as they both require discussion before an unambiguous recommendation can be made.
From http://code.google.com/p/darwincore/issues/detail?id=32
Reported by ren...@cria.org.br Term Name: collectionID
Recommendation: Reuse the term which is already defined in NCD (on the other hand, the NCD term defined in the corresponding RDF file should probably not be restricted to a specific domain).
Submitter: Renato De Giovanni
Comment 1 by gtuco.btuco This is indeed intended to be the same term. Can you provide the URI to the term in NCD? Status: Accepted Labels: Milestone-Release1.0 Priority-Critical
Comment 2 by ren...@cria.org.br Currently the URI is:
http://rs.tdwg.org/ontology/voc/Collection#collectionId
But I think that relationship terms like this one should probably not be bound to a domain since they can be used by objects from many different classes. I'm not sure if it's possible to change NCD and if the NCD creators would agree with this change. Perhaps a better URI for this term would be:
http://rs.tdwg.org/ontology/voc/collectionId
From http://code.google.com/p/darwincore/issues/detail?id=33
Reported by ren...@cria.org.br Term Name: collectionCode
Recommendation: Reuse existing term from NCD, but I would probably also suggest to change the NCD term from http://rs.tdwg.org/ontology/voc/Collection#acronymOrCoden to http://rs.tdwg.org/ontology/voc/collectionCode (without a domain). It would be nice to know Markus' or Roger's opinion about this, since they participated in the NCD group.
Submitter: Renato De Giovanni
from Tim Robertson trobertson@gbif.org to tuco@berkeley.edu cc TDWG Content Mailing List tdwg-content@lists.tdwg.org date Fri, Jul 24, 2009 at 1:12 AM subject Re: [tdwg-content] Darwin Core Collection-related terms
Hi John, Renato
Thinking aloud, some possible options I see might be:
a) - omit it from the DwC terms altogether b) - reuse the existing URI if the NCD term domain was derestricted c) - keep a duplicate term in the DwC NS d) - ? keep a duplicate term in the DwC NS and add some kind of "is equivalent of" to the NCD acronymOrCoden e) - ? keep a duplicate term in the DwC NS and have NCD acronymOrCoden do some "refinement" of dwc:collectionCode
My preference is for c) (or if possible e) for clear boundaries of dwc and also maintainability reasons.
To me, DwC fits nicely as a set of commonly used terms which are unrestricted to domain classes, and extend the terms offered by the DublinCore Metadata Terms. Using these terms we can assemble models/schemas etc. To say DwC now also includes terms from other namespaces (which are currently restricted to domains), I think might become more difficult to grasp and maintain. I also wonder if going down the route of b) or d) for one term could open the floodgates for a lot of other terms (http://rs.tdwg.org/dwc/terms/index.htm#genus -> http://rs.tdwg.org/ontology/voc/TaxonName#genusPart) and effectively move towards being an "index of data and object properties in the TDWG ontology".
Just some thoughts,
Tim
from renato@cria.org.br to TDWG Content Mailing List tdwg-content@lists.tdwg.org date Fri, Jul 24, 2009 at 6:52 AM subject Re: [tdwg-content] Darwin Core Collection-related terms
Hi Tim,
Nice summary. My preference is for b. Considering that NCD follows the same principles of this new DarwinCore version, I see no reason for duplicating the same term. No matter how much we try to keep boundaries clear between standards, there will always be some kind of semantic overlap between them. Having the same terms defined under different namespaces can be very confusing for users. I think TDWG should try to make things as reusable as possible.
To be more specific, I would suggest the following changes to NCD:
- Remove the domain from collectionId and institutionId and rename them
to "Id" so that the URI becomes:
http://rs.tdwg.org/ontology/voc/Collection#Id http://rs.tdwg.org/ontology/voc/Institution#Id
- Remove the domain from #acronymOrCoden (Collection) and rename it to
"Code" so that the URI becomes:
http://rs.tdwg.org/ontology/voc/Collection#Code
- Add a Code property in Institution (without a domain) making it:
http://rs.tdwg.org/ontology/voc/Institution#Code
Then DarwinCore or any other standard can easily reuse these terms.
Depending on how this gets solved, yes, I think we should open the floodgates...
Best Regards,
Renato
from John R. WIECZOREK tuco@berkeley.edu reply-to tuco@berkeley.edu to Lynn Kutner Lynn_Kutner@natureserve.org cc TDWG Content Mailing List tdwg-content@lists.tdwg.org date Fri, Jul 31, 2009 at 12:57 PM subject Re: [tdwg-content] InstitutionCode Issue - ownership vs. custodianship
The codes are now meant for any data set (a collection of data), not just collections of objects. They were actually always meant to be that way, but the descriptions had their origins in the specimen collections realm. Specifically, the following terms can all be used to identify where data are coming from originally:
institutionCode collectionCode collectionID datasetID
while the Dublin Core terms dc:rights, and dc:rightsHolder can be used to describe the original or other vested interests.
To be more clear about what I meant about collection-related terms, I would propose changing the descriptions as follow:
institutionCode: "The name (or acronym) in use by the institution administering the original record." collectionCode: "The name (or acronym) identifying the original collection or data set from which the record was derived." collectionID: "A unique identifier for the original collection or dataset from which the record was derived. Recommended best practice is to use the identifier in a collections registry such as the Biodiversity Collections Index (http://www.biodiversitycollectionsindex.org/)." datasetID: "An identifier for the data set. May be a global unique identifier or an identifier specific to a collection or institution." (unchanged) catalogNumber: An identifier (preferably unique) for the record within the data set or collection. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content