[tdwg-content] Fwd: DWC occurrence identifier fields - confused

Tue Aug 18 19:54:32 CEST 2009

Forwarded, as agreed. The following exchange went offline temporarily
- just bringing it back to show outcomes. Thanks, Lynn.

---------- Forwarded message ----------
From: Lynn Kutner <Lynn_Kutner at natureserve.org>
Date: Tue, Aug 18, 2009 at 10:38 AM
Subject: RE: DWC occurrence identifier fields - confused
To: "tuco at berkeley.edu" <tuco at berkeley.edu>

John -

Thanks for the further comments / explanation.

I know there is a lot of push-pull of what various people &
organizations. I personally like the elegant simplicity of DWC for
data sharing / discovery and it seems there are other standards (like
ABCD?) that can take care of the "everything and the kitchen sink"
approach. In addition, each organization will likely have their own
internal data models that have been designed to meet their unique data
management needs.

No problem copying this to tdwg-content.

Lynn

-----Original Message-----
From: gtuco.btuco at gmail.com [mailto:gtuco.btuco at gmail.com] On Behalf
Of John R. WIECZOREK
Sent: Tuesday, August 18, 2009 11:33 AM
To: Lynn Kutner
Subject: Re: DWC occurrence identifier fields - confused

Comments inline below. Do you mind if I copy this to tdwg-content?

On Mon, Aug 17, 2009 at 12:49 PM, Lynn
Kutner<Lynn_Kutner at natureserve.org> wrote:
> John -
>
> Thanks for the clarification / additional definitions.
>
> Now I wonder if all those ID fields are really needed in Darwin Core?
>
> Yes for internal data management needs, but I think of the DWC as for data sharing through portals such as GBIF and for data discovery through a portal is the level of detail of the original field collection "recordNumber" really as relevant?

Yes, definitely. This may be the only way that duplicates can be identified.

> Maybe I'm over simplifying, but if I want to get data from institution X then I think primarily of the primary record ID that they are using in their database. That will enalbe them to link to / select the correct records. If I want / need additional levels of detail (such as any previous ID values) I'd be inclined to follow up with them directly for that data.

You might, but the person writing software to try to link things
together automatically would be out of a job.

> We share such a teeny amount of our data attributes through GBIF that we see it as a data discovery mechanism and not for data management. People definitely contact us directly when they need more detailed information.

Agreed. DwC used to be something like a least common denominator. Now
it is becoming more useful than that. Every term in the proposed
standard has be requested by at least two independent groups, or has
been in active use already.

> Thanks -
> Lynn

Thank you! Good to talk these things through. I've added some of the
explanatory notes to the Occurrence commentaries on the Darwin Core
wiki at http://code.google.com/p/darwincore/wiki/Occurrence.

> -----Original Message-----
> From: gtuco.btuco at gmail.com [mailto:gtuco.btuco at gmail.com] On Behalf Of John R. WIECZOREK
> Sent: Thursday, August 13, 2009 8:46 AM
> To: Lynn Kutner
> Subject: Re: DWC occurrence identifier fields - confused
>
> Hi Lynn,
>
> It's definitely not a dumb question. If you have it, so will others.
> I'm wracking my brain on how to have definitions that don't have a domain bias and yet help people to know what the term means within their domain. It's really tough. Happily, we have the extra wiki pages to work with to help explain things further and give more examples.
> Problem is, there is nothing there right now for these terms. Maybe together we can do something to amend the definitions, examples, and wiki explanations to at least clarify things for your domain.
>
> So, let me try to explain a bit further here.
>
> The occurrenceID is supposed to (globally) uniquely identify an occurrence record, whether it is a specimen-based occurrence, a one-time observation of a species at a location, or one of many occurrences of an individual who is being tracked, monitored, or recaptured. Making it globally unique is quite a trick, one foe which we don't really have good solutions in place yet, but one which ontologists insist is essential. TDWG and GBIF seem to be pushing LSIDs as the solution for this, but there remains great unrest among experts about whether this is a wise move. Anyway, LSID or not, the term is there and waiting.
>
> catalogNumber is unfortunate for a name, because it suggests a catalog, which suggests a specimen. The definition tries to ameliorate the potential bias by saying that it is a number to identify an occurrence record within a data set or collection. So, it could be a specimen catalog number or it could be a unique identifier for a record within an observation or animal movement data set.
>
> individualID is meant for any records that need to identify individuals for whom there may be more than one record. Banded birds, marine mammal photos allowing individual identification, individual trees resampled overtime, periodic biopsies on the same individuals, etc. could all use this term to group the records corresponding to individuals.
>
> recordNumber came directly from the idea of a collector's number - the one that the collector puts on a tag on a specimen in the field and writes in their notes. It isn't the same as the catalogNumber because that is usually only applied once the specimen gets accessioned into a collection. There may be corollaries to this for other disciplines, but I have no experience of them.
>
> Which ones you use is suggested by the rules for Simple Darwin Core (http://rs.tdwg.org/dwc/terms/simple/index.htm#rules), specifically rule 4: "Support (provide data in) as many fields as you can."
>
> I'm happy to get all of this right, so let me know if doubts remain, or if you have suggestions about how to clarify any of this within the definitions or supporting documentation.
>
> Sincerely,
>
> John
>
> On Wed, Aug 12, 2009 at 1:58 PM, Lynn Kutner<Lynn_Kutner at natureserve.org> wrote:
>> Hi John -
>>
>> This may be such an astonishingly dumb question that I didn't want to send it to the list ...
>>
>> When I read through the draft DWC and try to figure out how we'd fit our data to DWC, the fields that currently have me stumped are the occurrence identifiers:
>>
>> occurrenceID    A unique identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. For a specimen, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]. Examples: 1) "urn:lsid:nhm.ku.edu:Herps:32", 2) "urn:catalog:FMNH:Mammal:145732".
>>
>> catalogNumber   An identifier (preferably unique) for the Occurrence within the data set or collection. Examples: "2008.1334", "145732a", "145732".
>>
>> individualID    An identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set. Examples: "U.amer. 44", "Smedley", "Orca J 23".
>>
>> recordNumber    An identifier given to the Occurrence at the time it was recorded. Often serves as a link between field notes and an Occurrence record, such as a specimen collector's number.
>>
>>
>> It's really no clear to me how these fields relate to each other, how they differ, and which one(s) we'd use when we put our data on GBIF.
>>
>> Thanks for your help!
>> Lynn
>>
>> Lynn Kutner
>> NatureServe
>> phone: (703) 797-4804
>> email:  lynn_kutner at natureserve.org
>> http://www.natureserve.org/
>>
>>
>