[tdwg-content] tdwg-content Digest, Vol 20, Issue 17
Dean Pentcheff
pentcheff at gmail.com
Fri Nov 5 21:02:04 CET 2010
OK. I think this makes sense to me.
Seeing how my "jar of semi-identified scunge" example plays now:
Setting: A jar full of stuff collected from a coral reef. In it I can
see larval fish, sphaeromatid isopods, one "Diadema antillarum"
urchin, and a green alga.
"Individual"ization:
1. The record for the jarful of scunge is _not_ eligible to be an "Individual".
2. I can create four additional records, each of which is designated
as "partOf" the jar record, each of which can be an "Individual" with
a higher-level or species-level determination attached to it.
Two years from now, I sort and ID the sphaeromatid isopods and
determine that they are indeed all in the Family Sphaeromatidae, and
split them out to three separate jars, each of which is a different
genus. Each of those jars can be referred to as an "Individual" (one
or more objects from a single taxon collected at a single time/place).
Has the older family-level record now lost its "Individual"ness, now
that it's clear it's a composition of three lower-level taxa? Or does
it still record the correct fact that a group of independently
locomoting bugs from one collecting instance all belonged to the same
family, and hence is still an "Individual" record, indicating that
higher-level taxon group of bugs?
[As a side issue, note that I have IDed them to genus, not species, to
sidestep the debate on the specialness of the species taxon.]
-Dean
--
Dean Pentcheff
pentcheff at gmail.com
dpentche at nhm.org
On Fri, Nov 5, 2010 at 12:29 PM, Steve Baskauf
<steve.baskauf at vanderbilt.edu> wrote:
> For those of you who triage emails and don't read long emails, the bottom
> line is that although I agree with some of Rich's points, I think that the
> suggestion that parts of Individuals should be classified as Individuals
> does not fit the definition that is on the table for the proposed class
> dwc:Individual. I argue that allowing pieces of organisms to be called
> Individuals defeats the purpose of having the Individual class. I suggest
> an alternative approach that I think is the most straightforward method of
> separating tokens from the occurrences they document. The acceptance or
> rejection of Individual as a new class does not hinge on my suggested
> approach. Development of a system to handle more complicated resource
> relationships can take place independently of the proposal for the
> Individual class.
>
> Responses inline below:
>
> Richard Pyle wrote:
>
> ...
>
> previousIdentifications
>
>
> Hmm. I suppose yes, but better to just have
> another instance of Identification. Why not?
>
>
> When the data are structured that way at the source, yes. But a number of
> DwC terms exist because many content sources have not parsed/normalized all
> their data to the full extent of the DwC classes. Therefore, I think
> previousIdentifications should be kept, and if so, it should be part of the
> Individual class.
>
>
> Got it. It should go with Individual.
>
>
>
> associatedSequences
>
>
>
> I suppose you won't agree on this, but I don't see sequences
> as any different than other tokens/evidence types that I
> think we should allow to document Occurrences. I would like
> this term to eventually go away, at least for people using
> RDF who will explicitly create resources for tokens and then
> type them.
>
>
> OK, well I guess "Sequences" per se are functionally equivalent to images,
> in that they are not the organism themselves, but rather a representation of
> some aspect of the organisms (in this case, a representation of the
> molecular structure of the DNA molecules contained within the cells of the
> organism, rather than a representation of light waves reflected off the
> exterior of an organism in the case of an image, or of x-ray waves
> transmitted through an organism in the case of a radiograph image). I was
> thinking more in terms of tissue samples -- which I will much more
> stubbornly defend as being in the Individual class -- but I guess more in
> terms of "individualScope".
>
>
> OK, as usual you are warping my brain into thinking about things in a
> different way. I'm going to separate the issue of "dead" from the issue of
> "pieces" (for the moment I'm going to accept that it doesn't matter if a
> whole organism is dead or not). The advantage of letting pieces of the
> organism be considered as a type of Individual is that it allows us to avoid
> creating another class of things called "PreservedSpecimen" (although in a
> sense we already have it because of dwctype:PreservedSpecimen, which when
> used as a rdf:type would imply membership in some rdfs:Class called
> "PreservedSpecimen"). The pieces could share properties that one might want
> to also apply to the whole organism. One could differentiate among the two
> by the value of "individualScope".
>
> But after another long commute to think about this, I'm realizing that
> pieces of organisms really must not be Individuals. First of all, the
> definition that is under consideration is "The category of information
> pertaining to an individual organism or a group of individual organisms that
> can reliably be known to represent a single taxon." [the Google Code entry,
> with substitution of "taxon" for "species (or lower taxonomic rank if it
> exists)" as was discussed]. That definition as it stands applies to an
> organism or group of organisms, but does not include parts of organisms.
> Obviously the definition could be changed, but if you consider the comment,
> which describes the primary function of Individual: "Instances of this class
> can serve the purpose of connecting one or more instances of the Darwin Core
> class Occurrence to one or more instances of the Darwin Core class
> Identification" it becomes clear that making parts of organisms Individuals
> defeats this primary purpose for the term.
>
> The major selling point for having Individuals at all is to get out of the
> business of applying determinations to all of the pieces of evidence such as
> specimens, images, sounds, etc. that get collected from the same biological
> individual through multiple Occurrences. This has the benefit that if one
> applies an Identification to the Individual, all physical and information
> resources that are derived from the individual automatically get associated
> with the Identification and hence the taxonomic informations referenced by
> the Identification. If we call preserved specimens that are pieces of
> organism Individuals having a value of individualScope="part", then do we do
> the same thing to them as we do with Individuals at higher levels, namely
> apply Identifications to them? If so, then we are back in the business of
> assigning Identifications to all of our derivative resources rather than the
> biological individuals from which they came. If we just say that we'll skip
> assigning separate Identifications to the derivative resources, then we have
> something that doesn't fit the functional role for which Individual was
> designed. In that case an "Individual" which is an organism part is such a
> different thing that one might as well call it as something else (i.e. a
> PreservedSpecimen).
>
> The case of a whole organism (live as a LivingSpecimen or dead as a
> PreservedSpecimen) is different because in that case we would have a single
> resource serving as the evidence (the whole organism itself). By
> definition, there can't be many of those (there would just be one) and it
> would already have an Identification assigned to it, because it is the same
> Individual that it is providing evidence for. So there is no superfluous
> assignment of Identifications in that case.
>
> Here's one thing I'm not so certain about, though. An in-situ image of an
> organism is clearly a token of an Occurrence, because it is evidence of the
> organism at the place/time. An image of the preserved specimen in a Museum,
> or an x-ray, etc., is not really a token of an Occurrence, because it's not
> evidence of the organism at the place/time of its capture. Same goes for
> Sequences -- they are a token of the Individual organism, not of the
> occurrence of the organism at a place and time. This is why I have a hard
> time thinking of such things as tokens of an Occurrence, when they are
> really more tokens of the Individual.
>
>
> I think the solution to this is to not call it a "token of the Occurrence".
> Let's say that the token is derived from the Individual and that it MAY
> serve as evidence for an Occurrence.
>
> I think that the solution is something like you suggest: link the chain of
> derivation of tokens to the Individual and not to the Occurrence. Then have
> a reference in the Occurrence record to the particular token that was
> created or collected during the event of the Occurrence. See
> http://bioimages.vanderbilt.edu/pages/tree-branch.gif . I have had the
> tendency of thinking that the tokens supported the Occurrence, but there
> does not need to be just one purpose for the token. They also support the
> existence of the Individual. This should probably make you happy, because
> the pieces of the Individual (preserved specimens, tissue samples) would be
> derived from the Individual. The "provenance" if you want to call it that,
> traces the connection of the tokens to the Individual. The chain of
> derivation can be traced using the property that I've called "derivedFrom".
> The branch specimen is "derivedFrom" the Individual and the specimen image
> is "derivedFrom" the specimen. Your desire to differentiate between things
> that are physically derived from the Individual vs. things that aren't can
> be handled by the "isPartOf" property. The branch specimen "isPartOf" the
> Individual tree, but the image is not a part of the branch. A token could
> have both the isPartOf property and the derivedFrom property (if it's a
> piece of the Individual), or only the derivedFrom property (if it's not).
>
> In this diagram, the term "hasEvidence" is a property of the Occurrence. It
> has the branch specimen as its object, but not the image of the specimen
> because as you note, the event marking the creation of the image is not the
> same as the event documenting the Occurrence of the Individual (i.e. the
> collection of the branch specimen). Either of the "tokens" (the specimen or
> the image) could be used as evidence for the Identification (we could have a
> property of the Identification called "basedOn" that could have the
> specimen, the image, or both as its object - I did something similar to this
> in the Biodiversity Informatics paper).
>
> Please note that for each of the properties I've listed on the diagram,
> there could and probably should be inverse properties (not shown):
> hasDerivative for derivedFrom, hasPart for isPartOf, isEvidenceFor for
> hasEvidence, and usedIn for basedOn. All of the "tokens" and the Occurrence
> could have the property individualID which would relate the resource
> directly to the Individual and its Identifications.
>
> I have created a number of similar charts showing how these relationships
> could apply to various types of tokens:
> http://bioimages.vanderbilt.edu/pages/tree-branch.gif (tree branch
> PreservedSpecimen)
> http://bioimages.vanderbilt.edu/pages/tree-image.gif (image of a live tree)
> http://bioimages.vanderbilt.edu/pages/whale-dna.gif (tissue sample and DNA
> sequence from a whale)
> http://bioimages.vanderbilt.edu/pages/bird-observation.gif (bird
> observation)
> http://bioimages.vanderbilt.edu/pages/wildebeest.gif (wildebeest calf
> captured and put in zoo)
> http://bioimages.vanderbilt.edu/pages/botanical-garden.gif (twig removed
> and turned into a living specimen in a botanical garden)
>
> Note that in every case, the "token" is typed based on the kind of thing
> that it is. We don't try to make it an Occurrence (my previous mistake) or
> an Individual (what I'm saying is Rich's mistake). Physical things that are
> a part of the Individual have the special status of "isPartOf", electronic
> representations never do. Only the token that was created during the event
> associated with the Occurrence record is connected to the Occurrence
> record. The token serving as evidence for the Occurrence can be anything -
> there is no special class called "token". In fact, the "token" can be the
> organism itself if the organism is curated (John's favorite wildebeest calf
> in the zoo or a whole dead fish in a jar). The token can be another
> individual such as a living specimen that originates as a clone (maybe also
> seed) from the Individual being documented in the Occurrence. We (DwC) only
> get into the business of creating types and properties of tokens if they
> don't already exist in other vocabularies. DwC needs to do that for
> specimens, but not for images that are already covered by MRTG. An
> observation may or may not have a token depending on whether there is some
> kind of evidence that can be referred to (see bird example).
>
> In these diagrams a single Occurrence and a single "line" of derived tokens
> is shown. But there can be many tokens per Occurrence and many tokens per
> Individual. There can also be many Occurrences per Individual. I didn't
> try to show this on the diagram because it would be too complicated.
> Obviously many users will want to make this "flatter" and less complicated.
> But I think this model allows for just about any kind of relationship among
> occurrence-documenting resources that people want to handle. It was the
> kind of thing I was trying to do in the Biodiversity Informatics paper (e.g.
> http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif) but
> better because I'm letting the tokens be what they are rather than trying to
> force them all to be Occurrences.
>
>
>
> This also assumes that dwc:catalogNumber and dwc:otherCatalogNumbers be
> re-assigned to Record-level terms. Was there some reason this isn't
> appropriate?
>
> I think it is appropriate because they should be usable with at
> least two classes: Individual (for living specimens) and
> Occurrences (e.g. preserved specimens, images)
>
>
> Hmmm...on that basis, should individualCount and the various tokens also be
> Record-level terms -- on the basis that they can apply either to an
> Occurrence, or to an Individual? Actually, in the case of DNA Barcodes and
> such, isn't it possible to also represent a DNA Sequence as an attribute of
> a Taxon as well? If the purpose of Record-level terms is to aggregate terms
> that apply to more than one class, then perhaps that is the solution for a
> number of these things (including disposition, and maybe even preparation --
> depending on how broadly those things are defined)?
>
>
>
> individualCount would be metadata that results from an Occurrence, so I
> think that's the only place it belongs. Tokens aren't properties of
> anything, they are resources in their own right that are connected by some
> property term (e.g. hasDerivative/derivedFrom) to an Occurrence in which
> they were collected/recorded and to the Individual from which they derived
> (e.g. derivedFrom and hasDerivative). A DNA sequence is another resource
> that isn't an attribute of anything. It could be the object of numerous
> properties that could have a variety of subjects.
>
>
>
> I haven't said this before, but are we allowing Individuals
> to be dead?
>
>
> Errr....fossils? Preserved specimens? Are they not Individuals? I know you
> think of them in terms of tracking a living organism over time. But that's
> only one of the reasons why I support an Individual class (not even the main
> reason). To me, the main reason is that an "Individual" represents the
> actual organism(s), separate from an Occurrence, which represents the
> presence of an organism at a particular place and time.
>
>
> I think I am prepared to accept this as long as they are the whole thing and
> not pieces. There could be some issues with a fossil since in many cases
> the tissues of the organism are replaced by minerals. But there is still a
> one-to-one relationship, so the problem I described in the long paragraph
> above doesn't apply.
>
>
>
> If we put it in a jar of alcohol
> and cut it into many separately-cataloged pieces, are
> all of the pieces still some of the Individual?
>
>
> This is why we need two things for an Individual:
>
> individualScope (which can range anywhere from the aggregates of multiple
> individuals, all the way down to the smallest parts of individuals)
>
>
> See above.
>
> And, a mechanism to track series of "derived from" Individuals. The ASC
> model covered this, I think (right, Stan?)
>
>
>
> I didn't see it in the flow chart, but it could be there somewhere. I had
> something like this in sernec:derivativeOccurrence and sernec:derivedFrom
> (http://bioimages.vanderbilt.edu/rdf/terms.htm) when I was making every
> token an Occurrence. But it's better to do as you suggest here which is
> what I did in the examples above.
>
> I think Pete might have been suggesting modeling
> things that way with "partOf". What if we cut a
> branch from a tree, glue part of it to a page
> and turn part of it into a DNA sample that
> get sequenced. Are those all a part of an
> Individual?
>
>
> It seems to me that each unit could represent a separate instance of
> Individual, but the "parts" need to be clearly aggregated around the
> single-organism parent Individual, which itself may be a part of another
> Individual instance that is an aggregate lot of specimens, which itself
> could be a subsampling of another Individual instance that represents a
> population in nature.
>
>
> Let the supporting evidence (tokens) be whatever type of thing they are
> rather than call them Individuals. Link them together through hierarchical
> relationships to the single-organism parent Individual.
>
> In my mind, we parse all of these things as separate instances of
> Individuals, but join them via a hierarchical (parent/child) relationship.
> If I'm not mistaken, this is how the ASC model managed instances of
> BiologicalObject (again....right, Stan?)
>
>
>
> I don't really want them to be, but maybe I must?
> Somehow we need to be able to handle road-kill,
> which will be dead when we make the
> observation/collection. If we cut a branch from
> a tree (an Individual), root it, and grow it in
> a botanical garden, do we call the resulting tree
> in the garden the same Individual? I would assign
> it a new identifier and call it a new Individual.
> I guess my point is that I would only apply the
> term Individual to dead stuff, pieces of dead stuff,
> and living pieces of things with extreme caution.
>
>
> Why extreme caution? What are the risks that we are cautioning ourselves
> against?
>
>
> The risk that we make the definition of Individual so broad that it can't
> perform any of the functions it was defined to serve. We've already lost
> one of them (the ability to infer duplicates) when I agreed to the broader
> definition, but that's the subject of another post.
>
> These are some principles that I always try to keep in mind when discussing
> these things:
>
> - DwC is a data exchange standard, not so much a physical data model.
> - There is a necessary balance between structuring DwC around how data
> actually exist in content-provider databases, and how data *should* be
> represented in a normalised world
> - When in doubt, DwC should be accomodating, rather than restrictive --
> especially when more restrictive needs can be met via associated data
> filtering
>
>
> There are other principles as well, but these are the ones I keep having to
> remind myself of.
>
>
> I think that what I I have suggested above is very unrestrictive. We let
> evidence be the type of things that they are (PreservedSpecimens,
> Individuals, StillImages, SoundRecordings, DNA sequences, etc.). We don't
> determine their type by what we want to use them for. That was the mistake
> that I made in the Biodiversity Informatics paper. If we follow this
> approach, then a StillImage can fill any role that we want: evidence that an
> Occurrence happened, information to support an Identification, a character
> for a visual key, a logo, etc. We let it fulfill those roles by giving it an
> identifier and connecting it to other resources using appropriate terms
> (hasEvidence, derivedFrom, mrtg:attributionLogoURL, etc.
>
>
>
>
> I think maybe so. Maybe the appropriate course
> of action here as well is to let people try
> different approaches out and if they turn out
> to work and be needed, then we talk about
> applying them to Darwin Core.
>
>
> Ultimately, I think people will use it in accordance to what terms are
> nested within it -- which is why I think it's important to have this
> conversation we're having now.
>
>
> As I indicated at an earlier time, I think that there are very few terms
> that should be properties of Individual since it is primarily a node that
> connects Occurrences to Identifications (and I guess now to derived
> tokens).
>
> Aloha,
> Rich
>
>
>
>
> Looking forward to responses! But I don't think development of these ideas
> should hold up the proposal for the class Individual, which can stand on its
> own with its current (revised) definition.
> Steve
>
> .
>
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN 37235-1634, U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582, fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
More information about the tdwg-content
mailing list