[tdwg-content] practical details of recording a determination What is an Occurrence?

Steve Baskauf steve.baskauf at vanderbilt.edu
Mon Oct 18 21:38:46 CEST 2010

So are we saying that dwc:nameAccordingTo can be a property of an 
dwc:Identification?  What's dwc:identificationReferences for?
I'm sorry if this is a dumb question but I can plead ignorance on this 

Peter DeVries wrote:
> Hi Markus,
> I feel your pain. :-)
> Maybe an example might help clarify this.
> I use the key* listed below to id my mosquitoes.
> So I should mark up my RDF for the identification with something like:
> <dwcterms:nameAccordingTo>Identification And Geographical Distribution 
> Of The Mosquitoes: Of North America, North Of Mexico By Richard F., 
> Jr. Darsie et al. 2004</dwcterms:nameAccordingTo>
> Rather than use some other term like "dwc:identificationReferences"
> Correct?
> - Pete
> *
> Identification And Geographical Distribution Of The Mosquitoes: Of 
> North America, North Of Mexico
> By Richard F., Jr. Darsie, RONALD A. WARD, Chien C. Chang, Taina Litwak
> University Press of Florida, 2004
> ISBN: 0813027845
> Cite: 13463
> =================================================================
> On Mon, Oct 18, 2010 at 12:19 PM, "Markus Döring (GBIF)" 
> <mdoering at gbif.org <mailto:mdoering at gbif.org>> wrote:
>     I am sorry I dont have the time to follow this extensive thread,
>     but I can manage at least the first paragraphs ;)
>     A quick comment on tying identification sources to a scientific
>     name. As for other taxon concepts this is usually done with the
>     sec/sensu reference which should be recorded as dwc:nameAccordingTo:
>     http://rs.tdwg.org/dwc/terms/index.htm#nameAccordingTo
>     I am slightly irritated that we seem to have some term duplicates
>     for this use case.
>     Maybe dwc:identificationReferences is supposed to only list
>     additional references?
>     Markus
>     On Oct 18, 2010, at 18:49, Steve Baskauf wrote:
>     > I've fallen behind on systematically perusing the list
>     responses, but I would like to focus in on a point that seems to
>     be a consensus in the responses that have shown up recently.  The
>     consensus seems to be that documenting determinations (a.k.a.
>     instances of dwc:Identification class) that are applied to
>     Individuals (or Occurrences if you don't believe in Individuals)
>     is the way to go.  So in my usual graphical way of thinking about
>     this, I would draw a "relationship line" from the determination to
>     the Individual (or Occurrence) on one side and from the
>     determination to the species concept on the other.  I will leave
>     up to the taxonomy people the different things would be connected
>     to the species concept and how all of their lines would be
>     connected.  The determination would have any of the properties
>     that are terms listed in the dwc:Identification class
>     (identifiedBy, dateIdentified, identificationReferences,
>     identification Remarks, identificationQualifier, and typeStatus).
>      Some properties like dateIdentified and identificationReferences
>     would be string literals and others (especially identifiedBy)
>     should probably be GUIDs but could be literals if they had to be.
>     >
>     > That all seems pretty clear.  However, when I've started trying
>     to do this in real life, I immediately have questions.  Take a look at
>     > http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf
>     which should show up as a web page in your browser.
>     >
>     > 1. The original label identifies the species as Juncus
>     diffusissimus.  However, there is no indicator as to who
>     originally identified it or when.  My assumption is that it was
>     the collector (Glen N. Montz) but I don't really know that.  Do I
>     assume that, or list the original determiner as "unknown"?
>     > 2. Do we draw a distinction between the initial identification
>     and subsequent annotations?  I think the answer should be "no" and
>     that's why I refer to both generically as "determinations".
>     > 3. There is really no indication given on the annotation labels
>     as to many of the things that we would like to know, such as the
>     concept they had in mind, any source they used (if any), or the
>     reason why they did the annotation.  So how does one connect the
>     name that they applied to the determination when there is no
>     indication of the concept?  Is this just something we can't do for
>     old annotations and just something that we try to do from this
>     point forward?
>     > 4. The last question is one that I really want to some opinions
>     about.  It seems to me that there are a number of reasons why one
>     would apply a determination.  One would be to correct an actual
>     error in identification.  One would be to increase the precision
>     of a previous determination (e.g. an insect identified to family
>     now is identified to species).  One would be to assert a
>     difference in opinion as to the correct way to group this
>     individual with others (i.e. as in a taxonomic revision).
>      Finally, a single determiner might apply several determinations
>     to one individual and indicate in each determination the concept
>     intended (i.e. if you subscribe to Cronquist, you'd call it X; if
>     you like Radford's book, you'd call it Y; if you like Weakley's
>     treatment, you'd call it Z).  Some of these four reasons may be
>     functionally equivalent, but how would you use Darwin Core to
>     indicate the reason why you applied the determination?  Please
>     don't say "identificationRemarks"!  From a machine-processing
>     standpoint, this is something we should know and there should be
>     some kind of controlled vocabulary to express it.  For instance if
>     an identification is "deprecated" because it was in error (perhaps
>     by the determiner him/herself), one would like the incorrect
>     determination to show up in the historical metadata, but I
>     wouldn't want it to be listed in a website index.  The same would
>     hold true if an annotator was able to pin the taxon down to a
>     lower taxonomic level than the original identifier.  If someone
>     goes to the trouble to connect an Individual/Occurrence to several
>     names under alternative concepts, there should be a way the a
>     machine would know this so that a software user could select the
>     concept they wanted to use and the name under that concept would
>     pop up.
>     >
>     > I don't really see any term under the current DwC that could be
>     used to do this last thing.  Am I missing something?  Do we need
>     several terms to explain the reason why we made the determination
>     because the reasons fall into different categories?
>     >
>     > The other comment that I'll throw out (since this is going out
>     to the bioblitz list as well as to tdwg-content) is that those of
>     you who are building apps to collect metadata in the field really
>     need to separate the process of entering (or acquiring) the
>     collection metadata from the determination process.  In at least
>     some apps, the user immediately has to commit to a taxon as they
>     enter the data at the time of collection.  It seems to me that it
>     would be a very common situation (especially in the case of
>     "citizen science") that the collector/observer/photographer would
>     have no idea what the taxonomic identity was at the time of
>     collection.  The process of determination (and the recording of
>     the various dwc:Identification class terms) is really a separate
>     process that should be able to happen at the time of collection OR
>     later.
>     >
>     > Steve
>     >
>     > Peter DeVries wrote:
>     >> Hi Steve,
>     >>
>     >> I would hypothesize that for the vast majority of identified
>     records the process is something like this:
>     >>
>     >> 1) An individual uses some sort of key to determine what
>     species (taxon concept) to assign to a given individual
>     >>    * They may have created some sort of mental key in which
>     once they recognize one individual mosquito they can then pretty
>     quickly sort
>     >>       a number of individuals into collections.
>     >>
>     >> 2) The actual name they assign to the specimen is usually based
>     on what their key says the name is. Often this does not specify
>     the authorship.
>     >>     Most of these human identifiers have not read the original
>     species descriptions and for the species they are identifying.
>     >>     So the specimen is actually tied to a concept that is based
>     more on the "key" than the original description.
>     >>     * An exception, would be where there is a key in the
>     original description and that was what what was used.
>     >>
>     >> 3) So in a sense, the process of modeling this as if the if the
>     identifier actually asserted that the concept was the same as that
>     described by
>     >>     the original description or a subsequent revision is "fudging"
>     >>
>     >> Side effects of this process include:
>     >>
>     >> 1) A new key for North American Mosquitoes comes out that
>     incorporates recent changes in nomenclature. The major change
>     being the elevation of
>     >>     a subgenus to a genus. For most of the species described
>     the "key concept" is unchanged.
>     >>
>     >> Student identifier, Bob, in state X is using the latest key,
>     while student identifier, Joe, is state Z is using a slightly
>     older edition of the same key.
>     >>
>     >> Bob identifies the species as Ochlerotatus triseriatus, while
>     Joe identifies what should be the same species as Aedes triseriatus.
>     >>
>     >> These show up in GBIF on two different maps, they show up in
>     the EOL as two different pages.
>     >>
>     >> Various TDWG'ers continue to argue that the original
>     description and subsequent revisions were really important in
>     determining what these individuals
>     >> actually meant when they assigned a name to a specimen, and
>     that this is how we should model it in excruciating detail.
>     >>
>     >> I would argue this should be modeled as best as possible to
>     what actually happens.
>     >>
>     >> For example, how many of the species observed in the recent
>     BioBlitz were identified by referring to the original species
>     description or subsequent revisions?
>     >>
>     >> In your diagram, I would suggest that you show that a taxon
>     concept may have many names associated with it. Since it is not
>     clear what the identifier intended by his or her choice of a name,
>     it is often difficult to determine what taxon concept they
>     actually meant.
>     >>
>     >> This is why I advocate a move to a more taxon concept based
>     identifier to link these data sets together because this allows
>     the intent of the identifier
>     >> is more accurately modeled.
>     >>
>     >> This would be done in the form of:
>     >>
>     >>  "I assert that this specimen (of what I call Aedes
>     triseriatus) was observed here. I also assert that it is an
>     instance of the this species concept => URI"
>     >>
>     >>   Or I assert that this is an individual of the type
>     "Individual of species concept X" = > URI
>     >>
>     >>   All of these are instances of the class "Individual"
>     >>
>     >> So the resulting DarwinCore record would contain both the name
>     and and an optional, but I think needed, asserted species concept.
>     >>
>     >> The species concept is a subclass of taxon concept, but is
>     fundamentally different than the higher clades.
>     >>
>     >> There are some guidelines as to what an entity needs to be
>     considered a species.
>     >>
>     >> While their are no real guidelines as to what clades should be
>     considered genera and what clades should be considered families etc.
>     >>
>     >> Assigning properties at the level of genera or family is also
>     problematic because it assumes that there will be inferencing and
>     it will require rechecking
>     >> that those properties are still valid if the species within
>     that genera change.
>     >>
>     >> So if there is some property that is common to all the species
>     in the genus, make that a property of each of the individual
>     species - not a property
>     >> of the genus.
>     >>
>     >> Respectfully,
>     >>
>     >> - Pete
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >> On Fri, Oct 15, 2010 at 10:45 AM, Steve Baskauf
>     <steve.baskauf at vanderbilt.edu
>     <mailto:steve.baskauf at vanderbilt.edu>> wrote:
>     >> As a background to this post, I want to reference a post by Bob
>     called "SubclassOrNot".  I discovered this page on an early foray
>     into the TDWG website labyrinth and it has been very influential
>     on my thinking since then.  The idea Bob discusses is central to
>     what I'm writing below so if you haven't read it you might want to
>     do so first.  You can probably skip the "OWL Inference" section
>     and still get the point which is described in the first two
>     sections of his post.  The URL for the page is
>     http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot .
>     >>
>     >> To preface what I'm going to say below, I want to put Darwin
>     Core Occurrences in the context of what Bob wrote.  In my mind,
>     one of the hallmarks of the Darwin Core standard and one thing
>     that makes it a great improvement over previous versions is that
>     the decision was made to use what Bob called the "has a" approach
>     rather than the "is a" approach.  In particular, the Darwin Core
>     standard has a single class called dwc:Occurrence rather than
>     subclasses called "Specimen", "Observation", and other possible
>     things.  The way that we differentiate among different kinds of
>     Occurrences is by using the DwC types which are the controlled
>     values for the term dwc:basisOfRecord.  Thus we say an Occurrence
>     "has a" basisOfRecord=PreservedSpecimen rather than saying it "is
>     a" PreservedSpecimen.  We say an Occurrence "has a"
>     basisOfRecord=HumanObservation rather than saying it "is
>     a"HumanObservation".  This approach has greatly reduced the number
>     of different terms in the standard since we don't have to have
>     separate "ObservedBy" and "CollectedBy" terms, but rather can just
>     have a single "RecordedBy" term that applies to both specimens and
>     observations.  The same thing applies to many other things, like
>     eventDate rather than DateCollected and DateObserved, locality
>     rather than collectionLocality and observationLocality, etc.  With
>     the ratification of Darwin Core, this decision is now a fait
>     acompli and not a subject of discussion or something optional for
>     users of the standard.  It also seems to be clear that as
>     necessary new terms can be added to the DwC types which would then
>     be valid controlled values for basisOfRecord.
>     >>
>     >> Since the adoption of the DwC standard, the approach to
>     Occurrences has been what I would describe as "I know an
>     Occurrence when I see one".  I consider this as a pretty sloppy
>     practice and as I indicated in my post last night, I think there
>     is enough consensus about what an Occurrence is that we can come
>     up with a better definition than "an occurrence is the category of
>     information pertaining to evidence of an occurrence...".  Another
>     part of what I would characterize as sloppiness is the lack of a
>     clear definition of what exactly basisOfRecord means.  When I
>     wrote my attempt at summarizing consensus last night, I dodged the
>     question about what I called the "token".  This "thing" has been
>     called various names.  In the previous discussion on the list, it
>     was sometimes called "the evidence" of the occurrence.  In the
>     past I have called it "a representation" - however, I now think
>     the term "token" is better because "representation" has a
>     different technical meaning in the context of content negotiation.
>      When we type an Occurrence by saying it has a
>     basisOfRecord=PreservedSpecimen, we are saying that this
>     Occurrence has as supporting evidence, or as a "token" if you
>     prefer, all or part of the dead remains of the organism (i.e. what
>     I'm calling "the Individual") that was being documented by the
>     Occurrence.  When we type an Occurrence by saying it has a
>     basisOfRecord=LivingSpecimen, we are saying that this Occurrence
>     has as a "token" the entire organism that was being documented (or
>     some vegetative part of the live organism that was propagated).
>      When we type an Occurrence by saying it has a
>     basisOfRecord=HumanObservation, we are saying that the Occurrence
>     has no supporting evidence other than the reputation of the
>     observer to accurately record the metadata about the Occurrence.
>      In other words, we "tag" a instance of a core class (to use Bob's
>     words), Occurrence, by telling a metadata consumer what kind of
>     token we are using as evidence of the Occurrence.
>     >> A fundamental part of creating a clear definition of what an
>     Occurrence is, is to define exactly what we are including in the
>     concept of Occurrence.  One possibility is to (1) say that the two
>     boxes at the right side of the diagram at
>     http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gif are
>     fused and that both the Occurrence metadata and its associated
>     token are what we consider to be "the Occurrence".  Another
>     approach (2) would be to say that the actual Occurrence as an
>     entity is only the metadata part and that the token is a separate
>     thing.  A third approach is to say (3) that everything with the
>     blue dotted lines is considered a part of the Occurrence (i.e. the
>     metadata, the token, the event, and the locality).  I don't think
>     in an absolute sense, any one of these approaches is "right".  The
>     problem is that these approaches are used inconsistently,
>     sometimes even by the same person, depending on the basisOfRecord.
>      Differences in ways of thinking about this issue is a part of why
>     people aren't understanding the way other people are approaching
>     the structuring of metadata.  I have tried to consistently take
>     the approach (1) that the two boxes on the right are fused, i.e.
>     that the Occurrence metadata and the token should both be
>     considered part of the entity that we call "an Occurrence".  I
>     think this is why Rich was confused in
>     http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001666.html
>     when I said that it was "wrong" to assert that a scientific name
>     is a property of an Occurrence - obviously it is silly to say that
>     the token (photons on a film, sound patterns in a digital file)
>     has a scientific name.  Yet that is exactly what people do
>     routinely when the token is a branch cut off a tree and glued to a
>     piece of paper.  They say that they are "identifying a specimen".
>      What I am asking (actually demanding) is that the TDWG community
>     get its act together and come to some consistency on this.  If we
>     are going to take the approach (2), then we need to take specimens
>     off their pedestal and treat them like we do any other token that
>     we are using as evidence that an Occurrence happened.  If we are
>     going to do what was suggested for the BioBlitz in
>     http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001603.html,
>     i.e. to call Occurrences "observations" and then link the tokens
>     to them by associatedMedia, ResourceRelationship, or some other
>     means (approach 2) then do it consistently for every kind of
>     token, including specimens, and don't single out media tokens for
>     punishment.
>     >> I have in a sense "thrown down the gauntlet" on this issue by
>     proposing that DigitalStillImage be added as a DwC type and as a
>     controlled value for basisOfRecord
>     (http://code.google.com/p/darwincore/issues/detail?id=68).  I know
>     what some people are going to say in response to this proposal.
>      "Why do you need to have 'DigitalStillImage' as a value for
>     basisOfRecord when you can just say that the resource's
>     dcterms:type=StillImage?"  The answer goes back to Bob's point.
>      If we are going to go the "has a" path (which we already have in
>     DwC for Occurrences) rather than subclassing everything, then we
>     need to provide an appropriate value for the "tag" for any type of
>     resource that a reasonable number of users will want to use as a
>     token.  I think it is clear from this and other Bioblitzes, my
>     work in Bioimages, the whale tracking project, and many other
>     examples, that there are plenty of people who are already using
>     DigitalStillImages as tokens and we all need a controlled value to
>     use for basisOfRecord.
>     >> The other thing that we accomplish when we type an Occurrence
>     by its basisOfRecord is to tell a consumer what kind of metadata
>     to expect to get about the token in addition to the generic
>     metadata that is provided for all Occurrences.  Thus for a
>     LivingSpecimen we expect to be told what zoo, botanical garden,
>     bacterial collection, etc. contains the specimen.  For a
>     PreservedSpecimen we expect to be told the preparation type, the
>     location of the repository, etc.  For a DigitalStillImage we
>     expect to be told the file type, accessURL, etc.  Simply providing
>     a value for dcterms:type=StillImage doesn't indicate whether the
>     image is a physical one (i.e. on film) or a digital one.  It is
>     also unreasonable to expect a client to have to be checking two
>     different terms (basisOfRecord and dcterms:type) to find out what
>     they could learn from one (basisOfRecord).  Of course it would be
>     advisable to provide a value for dcterms:type as well for clients
>     outside the biodiversity community who may not "understand" what
>     basisOfRecord means.
>     >> I hate to keep bringing my posts back to the RDF issue, but
>     thinking about how one would write RDF forces clear thinking about
>     how metadata should be structured.  If we intend to separate
>     tokens as entities from their associated Occurrence metadata, i.e.
>     approach (2), then we open up a whole other can of worms.  To
>     associate the occurrence resources (i.e. the metadata) with the
>     "different" resource (i.e. the token), we will have probably have
>     to be able to create URIs for the tokens and separate RDF metadata
>     blocks which will have to be rdfs:type'd.  What are we going to
>     use for that rdfs:type - create another Darwin Core class?  I
>     simply don't think that is a complicated road that we want to
>     travel.  It would be far easier to just say that every Occurrence
>     has a one-to-one relationship with its token (which could be "the
>     empty set" for observations).  This would not work for people who
>     want to hang multiple tokens on a single observation event, but I
>     think that itself is a bad idea because it makes it even harder to
>     have "flat" occurrence datasets.  Just say that every time we
>     collect a different token (or make an observation that has no
>     token), it is a new Occurrence record.  Realistically, a single
>     collector can't actually take a picture of a plant at the same
>     time he or she collects it for a specimen anyway.  Those really
>     should be considered two different events because they happen at
>     different times.
>     >>
>     >> OK, enough said.  Consider this my defense of my proposal
>     "issue 68" to add DigitalStillImage.  I would urge the powers that
>     be to respond to the issues that I've raised here before having
>     any kind of "vote" (or whatever is ultimately going to happen when
>     there is an up or down decision about the proposal).
>     >>
>     >> Steve
>     >>
>     >> Steve Baskauf wrote:
>     >> After the flurry of emails recently, I had an opportunity to
>     carefully
>     >> read all the way through the threads again, followed by
>     enforced "think
>     >> time" during my long commute.  I was actually pretty cheerful
>     after that
>     >> because I think that in essence, most of the conversation about
>     what
>     >> constitutes an Occurrence really boils down to the same thing.
>      So I
>     >> have sat down and tried to summarize what seems to me to be a
>     consensus
>     >> about Occurrences.  To follow my points, please refer to the
>     diagram at:
>     >> http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gif
>     >>
>     >> Consensus on relationships
>     >> 1. The fundamental definition of an Occurrence involves
>     evidence that a
>     >> representative of a taxon occurred at a place and time.
>     >> Note 1.A: For clarity, I have modified John's statement in his last
>     >> email by replacing "taxon" with "representative of a taxon".  I'm
>     >> considering a taxon to be an abstract concept that is applied to
>     >> individuals or groups of organisms.
>     >> Note 1.B. This definition is far more useful than the official
>     >> definition of the class Occurrence "The category of information
>     >> pertaining to evidence of an occurrence..." which is
>     essentially circular.
>     >> Note 1.C: This statement is extremely broad because the
>     evidence could
>     >> be of many sorts, the representative could range from a single
>     >> individual to all organisms on the earth, the taxon could be
>     anyone's
>     >> definition at any taxonomic level, the place could range from a GPS
>     >> point with uncertainty of less than 10 meters to the entire planet
>     >> earth, and the time could range from a shutter click of less
>     than one
>     >> second to 3.4 billion years.
>     >> 2. The diagram is an attempt to summarize in pictorial form
>     statements
>     >> and relationships that have been described in the thread.  The
>     taxon
>     >> representative is recorded as existing at a particular time and
>     place
>     >> (the arrow) and the result is an Occurrence record.  That
>     Occurrence
>     >> record exists as metadata which may be associated with a token
>     that can
>     >> be used to voucher the fact that the taxon representative
>     existed.  That
>     >> token may be the organism itself (or a living part of it as in
>     a twig
>     >> for grafting), all or part of the organism in preserved form, an
>     >> electronic representation such as an image or sound recording,
>     and other
>     >> kinds of things like tissue or DNA samples.  There may also be
>     no token
>     >> at all, in which case we call the Occurrence record an observation.
>     >> Based on direct observation of the taxon representative,
>     examination of
>     >> one or more tokens, or both, some determiner asserts that a taxon
>     >> concept applies to the taxon representative and as a result a
>     scientific
>     >> name can be used to "identify" the taxon representative.
>      (There may be
>     >> a lot of other complicated stuff above the Identification box,
>     but that
>     >> will have to be filled in by the taxonomists.)
>     >> Note 2.A: I have mapped onto this diagram the letters that John
>     used in
>     >> his last email to refer to entities that are involved in an
>     Occurrence
>     >> (T, E, L, O, and G).  I will beg the forgiveness of fossil people
>     >> because I don't really know how the geological context fits in.
>      I'm
>     >> assuming that it is a way of asserting time and location on a much
>     >> broader scale than we do for extant organisms.
>     >> Note 2.B: I have put a dotted line around the part of the
>     diagram that I
>     >> think includes all the things that people might consider part
>     of the
>     >> Occurrence itself.  I have left out "T" and the other parts
>     related to
>     >> identification because it seems to me that you can have an
>     occurrence
>     >> that you document which does not yet (and perhaps never will)
>     have an
>     >> identification.  The Occurrence still asserts that a taxon
>     >> representative existed at a time and place; we just don't yet
>     know what
>     >> the taxon is.
>     >> 3. The red lines indicate the relationships that connect the
>     various
>     >> entities (I'm going to go ahead and call them resources).
>      Consistent
>     >> with popular opinion, the Occurrence record is the center of the
>     >> universe and most things are connected to it.
>     >> Note 3.A: I am sticking to my guns and refuse to connect the
>     >> Identification directly to the Occurrence.  It is the taxon
>     >> representative that is being identified, not the occurrence.
>      One can
>     >> assert another sort of relationship between the identification
>     and the
>     >> occurrence if one wants to say that one consulted the occurrence
>     >> metadata and token in order to decide about the identification,
>     but it
>     >> is not correct to say that the Identification identifies either the
>     >> Occurrence metadata or the token (as Rich pointed out).
>     >>
>     >> OK, so that's step one - defining what is related to what.  If
>     anyone
>     >> disagrees with these relationships, please clarify or create
>     your own
>     >> diagram.
>     >>
>     >> Complicating circumstances/caveats
>     >> 1. It is noted and recognized that some users will not care to
>     include
>     >> all of these relationships in their models.  In the interest of
>     >> simplification or "flattening" the relationships, they may wish to
>     >> collapse some parts of this diagram (e.g. incorporate time and
>     location
>     >> metadata within the Occurrence metadata rather than considering
>     them
>     >> separate resources, applying scientific names directly to the taxon
>     >> representatives without defining a taxon concept or recording the
>     >> determination metadata, connecting identifications directly to the
>     >> occurrence, etc.).  This doesn't mean that the relationships don't
>     >> exist, it just means that some users don't care about them.
>     >> 2. It is recognized that different users will be interested in
>     or able
>     >> to specify the various resources to differing degrees of precision.
>     >> Examples: A photographer might record times to the nearest
>     second, a
>     >> collector may only be interested in noting the date on which a
>     specimen
>     >> was collected.  A location may be specified to the precision of
>     a GPS
>     >> reading or be defined as some geographic or political
>     subdivision.  The
>     >> taxon representative may be an individual organism, a flock or
>     clump, or
>     >> some larger aggregation of taxon representatives.
>     >>
>     >> That's step two.  If I've missed any complications, please
>     point them out.
>     >>
>     >> My opinions about the implications of this diagram
>     >> 1. The circle I've labeled as "taxon representative" is the
>     resource
>     >> type that I'm proposing to be represented by the class
>     Individual.  You
>     >> will note that in both the definition of dwc:individualID ("An
>     >> identifier for an individual or named group of individual
>     organisms...")
>     >> and the proposed class definition ("The category of information
>     >> pertaining to an individual or named group of individual organisms
>     >> represented in an Occurrence"), groups of individual organisms are
>     >> included.  Thus John's example of a fossil having myriad
>     individuals, or
>     >> Richard's examples of thousands of plankton, a large school of
>     fish,
>     >> herd of wildebeest, flock of
>     >> birds, could all be categorized as "Individual" under this
>     definition if
>     >> there is a reasonable expectation that all of the individuals
>     in the
>     >> group are members of the same taxon.  Perhaps there is a better
>     name for
>     >> this resource, but since dwc:individualID was already extant, I
>     chose
>     >> Individual as the class name for consistency with the pattern
>     >> established with other classes and their associated xxxxID terms.
>     >> 2. Although in note 1.C. I have given the ranges of the various
>     >> resources to their logical extreme (as was done previously in the
>     >> thread), I think that as a practical matter we can adopt
>     guidelines to
>     >> set reasonable values for the "normal" ranges of the resources.
>      One
>     >> such guideline might be that we suggest a range that can
>     accommodate
>     >> about 95% of the user needs within the community (this came
>     from Rich's
>     >> comment about satisfying 95% of the user need with an
>     establishmentMeans
>     >> controlled vocuabulary).  For example, it was suggested that
>     the range
>     >> for the location of an Occurrence could span the entire planet
>     Earth.
>     >> True enough, but virtually nobody would find such a span
>     useful.  95% of
>     >> users would probably find a range between a GPS reading with 10
>     meter
>     >> precision and the extent of a county or province useful for
>     recording
>     >> the location of an Occurrence.  I can suggest similar "useful"
>     ranges:
>     >> one second to one day for an event time (excluding fossils), one
>     >> individual organism to the number of organisms that would fit
>     within a
>     >> 50 meter radius for an "individual", and taxon identified to
>     family for
>     >> plants and maybe mammals, genus for birds, and order for
>     insects.  So
>     >> framing the definition of an Occurrence in these terms it would be
>     >> something like: "An occurrence involves evidence (consisting of a
>     >> physical token, electronic record, or personal observation) that a
>     >> representative (ranging from a single individual to the number that
>     >> would fit on a football field) of a taxon (hopefully identified
>     to some
>     >> lower taxonomic level) occurred at a place (determined to a
>     precision
>     >> between that of a GPS reading and the size of a
>     county/province) and
>     >> time (spanning one second to one day)."  A few people might
>     object to
>     >> this level of restrictiveness, but I would guess that it would
>     make 95%
>     >> of us happy.
>     >> 3. With the exception of the "missing" class Individual, every
>     resource
>     >> type on this diagram except for the "token" and Scientific name
>     has a
>     >> Darwin Core class. Every resource type on the diagram except
>     for "token"
>     >> has a dwc:xxxxID term that can be used to refer to a GUID for the
>     >> resource.  The implication of this is that any resource on this
>     diagram
>     >> except for the token and taxon representative (i.e. Individual)
>     is ready
>     >> to be represented in RDF by Darwin Core terms in the sense that the
>     >> relationships (red lines) can be represented by the xxxxID
>     terms and
>     >> that the resources can be rdfs:type'd using Darwin Core classes.
>     >> (Lacking a class for the scientific name doesn't seem like a
>     big deal to
>     >> me since the scientific name can be a string literal - but then
>     I'm not
>     >> a taxonomist.)
>     >> 4. OK, I've avoided it as long as I can, so I'm going to
>     confess now to
>     >> the RDF-phobes.  The red lines and shapes are something pretty
>     close to
>     >> an RDF graph.  What that means is that if the community can
>     agree that
>     >> this diagram correctly represents the relationships among the
>     kinds of
>     >> biodiversity resources that we care about, then the matter of
>     providing
>     >> guidelines on how to represent Darwin Core in RDF suddenly gets
>     a lot
>     >> simpler.  Just convert the "picture" of the RDF graph into XML
>     format
>     >> and we have a template.  Alright, that's an oversimplification,
>     but I
>     >> think it is essentially true because the most difficult part of
>     >> achieving a consensus on RDF representations is to decide how
>     we connect
>     >> the resource types, not on the literals that we hang onto
>     resources as
>     >> properties.
>     >> 5. While I'm beating the RDF drum again, the importance of my
>     opinion
>     >> number 2 can be extended into the GUID adoption process.  In my
>     comments
>     >> to Kevin about the Beginner's Guide to Persistent Identifiers,
>     I think I
>     >> commented on the question of how one decides whether a GUID
>     needs to be
>     >> assigned to something or not.  I believe that the answer to that
>     >> question boils down to this: we need a GUID for any resource
>     that will
>     >> be referenced by more than one other resource.  Do we need to
>     be able to
>     >> assign a GUID to Taxon concepts?  Yes, because it is likely
>     that many
>     >> identifications will want to reference a particular taxon
>     concept.  Do
>     >> we need to be able to assign a GUID to an Event?  Maybe or
>     maybe not.
>     >> If every occurrence has its own separate time recorded, then no
>     GUID is
>     >> needed because the time is just a part of every separate occurrence
>     >> record.  If the event is defined to be a time range that
>     represents a
>     >> collecting trip, then there may be many Occurrences that are
>     associated
>     >> with that trip and all of them could reference the GUID for
>     that event
>     >> rather than repeating the event information for every
>     Occurrence.  The
>     >> point here is that every shape (class of resources) on this
>     diagram at
>     >> least has the POTENTIAL to be a node connecting multiple
>     resources and
>     >> therefore should have the capability of being assigned a GUID,
>     having
>     >> its own RDF record, and being appropriately typed (presumably
>     by a DwC
>     >> class).  So this is a final technical argument for why we need
>     to have
>     >> the DwC class Individual.  Whether or not people ultimately
>     choose to
>     >> assign GUIDs to particular resource types or not is their own
>     choice,
>     >> but they need to at least be ABLE to if they need that resource
>     to serve
>     >> as a node given the structure of their metadata.
>     >>
>     >> We need to clarify how the "token" thing fits in, but I'm
>     stopping there
>     >> for now.  I would very much appreciate responses indicating that:
>     >>
>     >> A. you agree with the diagram and connections (and consider this
>     >> definition and diagram a consensus)
>     >> B. you disagree with the diagram (and articulate why)
>     >> C. you provide an alternative diagram or explanation of the
>     >> relationships among the classes related to Occurrences.
>     >>
>     >> Thanks for you patience with another tome.
>     >> Steve
>     >>
>     >> --
>     >> Steven J. Baskauf, Ph.D., Senior Lecturer
>     >> Vanderbilt University Dept. of Biological Sciences
>     >>
>     >> postal mail address:
>     >> VU Station B 351634
>     >> Nashville, TN  37235-1634,  U.S.A.
>     >>
>     >> delivery address:
>     >> 2125 Stevenson Center
>     >> 1161 21st Ave., S.
>     >> Nashville, TN 37235
>     >>
>     >> office: 2128 Stevenson Center
>     >> phone: (615) 343-4582,  fax: (615) 343-6707
>     >> http://bioimages.vanderbilt.edu
>     >>
>     >> _______________________________________________
>     >> tdwg-content mailing list
>     >> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>     >> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>     >> .
>     >>
>     >>
>     >>
>     >> --
>     >> Steven J. Baskauf, Ph.D., Senior Lecturer
>     >> Vanderbilt University Dept. of Biological Sciences
>     >>
>     >> postal mail address:
>     >> VU Station B 351634
>     >> Nashville, TN  37235-1634,  U.S.A.
>     >>
>     >> delivery address:
>     >> 2125 Stevenson Center
>     >> 1161 21st Ave., S.
>     >> Nashville, TN 37235
>     >>
>     >> office: 2128 Stevenson Center
>     >> phone: (615) 343-4582,  fax: (615) 343-6707
>     >> http://bioimages.vanderbilt.edu
>     >>
>     >>
>     >>
>     >>
>     >> --
>     >> ----------------------------------------------------------------
>     >> Pete DeVries
>     >> Department of Entomology
>     >> University of Wisconsin - Madison
>     >> 445 Russell Laboratories
>     >> 1630 Linden Drive
>     >> Madison, WI 53706
>     >> TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
>     >> About the GeoSpecies Knowledge Base
>     >> ------------------------------------------------------------
>     >
>     > --
>     > Steven J. Baskauf, Ph.D., Senior Lecturer
>     > Vanderbilt University Dept. of Biological Sciences
>     >
>     > postal mail address:
>     > VU Station B 351634
>     > Nashville, TN  37235-1634,  U.S.A.
>     >
>     > delivery address:
>     > 2125 Stevenson Center
>     > 1161 21st Ave., S.
>     > Nashville, TN 37235
>     >
>     > office: 2128 Stevenson Center
>     > phone: (615) 343-4582,  fax: (615) 343-6707
>     >
>     > http://bioimages.vanderbilt.edu
>     > _______________________________________________
>     > tdwg-content mailing list
>     > tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>     > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> -- 
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / 
> GeoSpecies Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101018/9de43dc9/attachment-0001.html 

More information about the tdwg-content mailing list