Yep--- that reference point for aggregation can be really powerful:  To provide a working example of how these identifiers would work, and how they can act to aggregate data elements, consider the following:
IndividualID = JohnDeck
MaterialSampleID = JohnDeckTissueSample1
OccurrenceID = JohnDeckOccurrence123
Taxon = "Homo sapiens"

IndividualID = JohnDeck
MaterialSampleID = JohnDeckGutSample1
OccurrenceID = JohnDeckOccurrence124
Taxon = "Bacteria500"

IndividualID = JohnDeck
MaterialSampleID = JohnDeckGutSample1
OccurrenceID = JohnDeckOccurrence125
Taxon = "Bacteria501"

JohnDeckTissueSample1 is representative of the Individual itself, while JohnDeckGutSample1 is still associated with the same Individual but notice the taxon has changed and it is a new Occurrence as well.  This approach allows for some sense to be constructed using a flat file approach if desired.  Providing a Material Sample BoR for OccurrenceID's 124 and 125 provides further context.  Meanwhile, we can consider the implications of, for example, habitat descriptions (... for JohnDeckOccurrence123 maybe i'd put http://purl.obolibrary.org/obo/ENVO_01000193, "temperate grassland biome")  but the distinct occurrence records for the gut samples could be listed as (http://purl.obolibrary.org/obo/ENVO_01000162, "organ").

Another use for the identifier MaterialSampleID -- lets assume we've expressed an equivalent identifier for a genbank sample using MIxS:source_mat_id, a term which references the same OBI:MaterialSample we're referencing, which allows.  If they're URIs we can model this in RDF using the MaterialSampleID's as either subjects or objects... this gets us a step closer for representing contextual information in genbank and DwC without duplicating metadata across systems (genbank for sequencing metada; DwC for environmental context)
There are some issues with this approach of course, for example, if we provide a lat/lng for an occurrence that is a gutsample are we taking the lat/lng where the gutsample was removed from the organism (may be different than where a parent organism was isolated from nature).  In this case, we need to assume that we're referring to where the parent organism was isolated from nature to be consistent with DwC and implementations in use.  However, the notion of habitat should vary with the occurrence of the actual organism (e.g. "organ" vs. "temperate grassland biome").  Thus, we can still aggregate properties around MaterialSample BoR's that are useful but we need to think carefully about what exactly the properties mean that we assign to these things.... but this is no different than issues we've encountered between other BoR's (Fossil, PreservedSpecimen, or Human/MachineObservation).  

John


On Thu, May 30, 2013 at 11:48 PM, Richard Pyle <deepreef@bishopmuseum.org> wrote:

Yes, that’s a fair point!  In a sense, the ID has intrinsic value on its own if for no other reason than to represent a reference point for aggregation.

 

Nevertheless, I still maintain that if it fulfills that purpose, then it implies a “thing” (around which other “things” are aggregated), and I can’t imagine such a “thing” that we would care about for aggregating purposes, about which we would not associate other property values.

 

I say all this quite deliberately in reference to “dwc:individualID”, of course…. J

 

Aloha,

Rich

 

 

From: Markus Döring [mailto:m.doering@mac.com]
Sent: Thursday, May 30, 2013 7:56 PM
To: Jason Holmberg
Cc: Richard Pyle; TDWG Content Mailing List; Robert Whitton; John Deck; Ramona Walls


Subject: Re: [tdwg-content] New Darwin Core terms proposed relating to material samples

 

The id value is actually very useful and the only trustworthy way of grouping records, e.g. all occurrences of the same whale.

Markus 


_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content




--
John Deck
(541) 321-0689