[tdwg-content] practical details of recording a determination What is an Occurrence?

Peter DeVries pete.devries at gmail.com
Mon Oct 18 19:57:15 CEST 2010

Hi Markus,

I feel your pain. :-)

Maybe an example might help clarify this.

I use the key* listed below to id my mosquitoes.

So I should mark up my RDF for the identification with something like:

<dwcterms:nameAccordingTo>Identification And Geographical Distribution Of
The Mosquitoes: Of North America, North Of Mexico By Richard F., Jr. Darsie
et al. 2004</dwcterms:nameAccordingTo>

Rather than use some other term like "dwc:identificationReferences"


- Pete

Identification And Geographical Distribution Of The Mosquitoes: Of North
America, North Of Mexico
By Richard F., Jr. Darsie, RONALD A. WARD, Chien C. Chang, Taina Litwak
University Press of Florida, 2004
ISBN: 0813027845
Cite: 13463


On Mon, Oct 18, 2010 at 12:19 PM, "Markus Döring (GBIF)"
<mdoering at gbif.org>wrote:

> I am sorry I dont have the time to follow this extensive thread, but I can
> manage at least the first paragraphs ;)
> A quick comment on tying identification sources to a scientific name. As
> for other taxon concepts this is usually done with the sec/sensu reference
> which should be recorded as dwc:nameAccordingTo:
> http://rs.tdwg.org/dwc/terms/index.htm#nameAccordingTo
> I am slightly irritated that we seem to have some term duplicates for this
> use case.
> Maybe dwc:identificationReferences is supposed to only list additional
> references?
> Markus
> On Oct 18, 2010, at 18:49, Steve Baskauf wrote:
> > I've fallen behind on systematically perusing the list responses, but I
> would like to focus in on a point that seems to be a consensus in the
> responses that have shown up recently.  The consensus seems to be that
> documenting determinations (a.k.a. instances of dwc:Identification class)
> that are applied to Individuals (or Occurrences if you don't believe in
> Individuals) is the way to go.  So in my usual graphical way of thinking
> about this, I would draw a "relationship line" from the determination to the
> Individual (or Occurrence) on one side and from the determination to the
> species concept on the other.  I will leave up to the taxonomy people the
> different things would be connected to the species concept and how all of
> their lines would be connected.  The determination would have any of the
> properties that are terms listed in the dwc:Identification class
> (identifiedBy, dateIdentified, identificationReferences, identification
> Remarks, identificationQualifier, and typeStatus).  Some properties like
> dateIdentified and identificationReferences would be string literals and
> others (especially identifiedBy) should probably be GUIDs but could be
> literals if they had to be.
> >
> > That all seems pretty clear.  However, when I've started trying to do
> this in real life, I immediately have questions.  Take a look at
> > http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf which
> should show up as a web page in your browser.
> >
> > 1. The original label identifies the species as Juncus diffusissimus.
>  However, there is no indicator as to who originally identified it or when.
>  My assumption is that it was the collector (Glen N. Montz) but I don't
> really know that.  Do I assume that, or list the original determiner as
> "unknown"?
> > 2. Do we draw a distinction between the initial identification and
> subsequent annotations?  I think the answer should be "no" and that's why I
> refer to both generically as "determinations".
> > 3. There is really no indication given on the annotation labels as to
> many of the things that we would like to know, such as the concept they had
> in mind, any source they used (if any), or the reason why they did the
> annotation.  So how does one connect the name that they applied to the
> determination when there is no indication of the concept?  Is this just
> something we can't do for old annotations and just something that we try to
> do from this point forward?
> > 4. The last question is one that I really want to some opinions about.
>  It seems to me that there are a number of reasons why one would apply a
> determination.  One would be to correct an actual error in identification.
>  One would be to increase the precision of a previous determination (e.g. an
> insect identified to family now is identified to species).  One would be to
> assert a difference in opinion as to the correct way to group this
> individual with others (i.e. as in a taxonomic revision).  Finally, a single
> determiner might apply several determinations to one individual and indicate
> in each determination the concept intended (i.e. if you subscribe to
> Cronquist, you'd call it X; if you like Radford's book, you'd call it Y; if
> you like Weakley's treatment, you'd call it Z).  Some of these four reasons
> may be functionally equivalent, but how would you use Darwin Core to
> indicate the reason why you applied the determination?  Please don't say
> "identificationRemarks"!  From a machine-processing standpoint, this is
> something we should know and there should be some kind of controlled
> vocabulary to express it.  For instance if an identification is "deprecated"
> because it was in error (perhaps by the determiner him/herself), one would
> like the incorrect determination to show up in the historical metadata, but
> I wouldn't want it to be listed in a website index.  The same would hold
> true if an annotator was able to pin the taxon down to a lower taxonomic
> level than the original identifier.  If someone goes to the trouble to
> connect an Individual/Occurrence to several names under alternative
> concepts, there should be a way the a machine would know this so that a
> software user could select the concept they wanted to use and the name under
> that concept would pop up.
> >
> > I don't really see any term under the current DwC that could be used to
> do this last thing.  Am I missing something?  Do we need several terms to
> explain the reason why we made the determination because the reasons fall
> into different categories?
> >
> > The other comment that I'll throw out (since this is going out to the
> bioblitz list as well as to tdwg-content) is that those of you who are
> building apps to collect metadata in the field really need to separate the
> process of entering (or acquiring) the collection metadata from the
> determination process.  In at least some apps, the user immediately has to
> commit to a taxon as they enter the data at the time of collection.  It
> seems to me that it would be a very common situation (especially in the case
> of "citizen science") that the collector/observer/photographer would have no
> idea what the taxonomic identity was at the time of collection.  The process
> of determination (and the recording of the various dwc:Identification class
> terms) is really a separate process that should be able to happen at the
> time of collection OR later.
> >
> > Steve
> >
> > Peter DeVries wrote:
> >> Hi Steve,
> >>
> >> I would hypothesize that for the vast majority of identified records the
> process is something like this:
> >>
> >> 1) An individual uses some sort of key to determine what species (taxon
> concept) to assign to a given individual
> >>    * They may have created some sort of mental key in which once they
> recognize one individual mosquito they can then pretty quickly sort
> >>       a number of individuals into collections.
> >>
> >> 2) The actual name they assign to the specimen is usually based on what
> their key says the name is. Often this does not specify the authorship.
> >>     Most of these human identifiers have not read the original species
> descriptions and for the species they are identifying.
> >>     So the specimen is actually tied to a concept that is based more on
> the "key" than the original description.
> >>     * An exception, would be where there is a key in the original
> description and that was what what was used.
> >>
> >> 3) So in a sense, the process of modeling this as if the if the
> identifier actually asserted that the concept was the same as that described
> by
> >>     the original description or a subsequent revision is "fudging"
> >>
> >> Side effects of this process include:
> >>
> >> 1) A new key for North American Mosquitoes comes out that incorporates
> recent changes in nomenclature. The major change being the elevation of
> >>     a subgenus to a genus. For most of the species described the "key
> concept" is unchanged.
> >>
> >> Student identifier, Bob, in state X is using the latest key, while
> student identifier, Joe, is state Z is using a slightly older edition of the
> same key.
> >>
> >> Bob identifies the species as Ochlerotatus triseriatus, while Joe
> identifies what should be the same species as Aedes triseriatus.
> >>
> >> These show up in GBIF on two different maps, they show up in the EOL as
> two different pages.
> >>
> >> Various TDWG'ers continue to argue that the original description and
> subsequent revisions were really important in determining what these
> individuals
> >> actually meant when they assigned a name to a specimen, and that this is
> how we should model it in excruciating detail.
> >>
> >> I would argue this should be modeled as best as possible to what
> actually happens.
> >>
> >> For example, how many of the species observed in the recent BioBlitz
> were identified by referring to the original species description or
> subsequent revisions?
> >>
> >> In your diagram, I would suggest that you show that a taxon concept may
> have many names associated with it. Since it is not clear what the
> identifier intended by his or her choice of a name, it is often difficult to
> determine what taxon concept they actually meant.
> >>
> >> This is why I advocate a move to a more taxon concept based identifier
> to link these data sets together because this allows the intent of the
> identifier
> >> is more accurately modeled.
> >>
> >> This would be done in the form of:
> >>
> >>  "I assert that this specimen (of what I call Aedes triseriatus) was
> observed here. I also assert that it is an instance of the this species
> concept => URI"
> >>
> >>   Or I assert that this is an individual of the type "Individual of
> species concept X" = > URI
> >>
> >>   All of these are instances of the class "Individual"
> >>
> >> So the resulting DarwinCore record would contain both the name and and
> an optional, but I think needed, asserted species concept.
> >>
> >> The species concept is a subclass of taxon concept, but is fundamentally
> different than the higher clades.
> >>
> >> There are some guidelines as to what an entity needs to be considered a
> species.
> >>
> >> While their are no real guidelines as to what clades should be
> considered genera and what clades should be considered families etc.
> >>
> >> Assigning properties at the level of genera or family is also
> problematic because it assumes that there will be inferencing and it will
> require rechecking
> >> that those properties are still valid if the species within that genera
> change.
> >>
> >> So if there is some property that is common to all the species in the
> genus, make that a property of each of the individual species - not a
> property
> >> of the genus.
> >>
> >> Respectfully,
> >>
> >> - Pete
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Oct 15, 2010 at 10:45 AM, Steve Baskauf <
> steve.baskauf at vanderbilt.edu> wrote:
> >> As a background to this post, I want to reference a post by Bob called
> "SubclassOrNot".  I discovered this page on an early foray into the TDWG
> website labyrinth and it has been very influential on my thinking since
> then.  The idea Bob discusses is central to what I'm writing below so if you
> haven't read it you might want to do so first.  You can probably skip the
> "OWL Inference" section and still get the point which is described in the
> first two sections of his post.  The URL for the page is
> http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot .
> >>
> >> To preface what I'm going to say below, I want to put Darwin Core
> Occurrences in the context of what Bob wrote.  In my mind, one of the
> hallmarks of the Darwin Core standard and one thing that makes it a great
> improvement over previous versions is that the decision was made to use what
> Bob called the "has a" approach rather than the "is a" approach.  In
> particular, the Darwin Core standard has a single class called
> dwc:Occurrence rather than subclasses called "Specimen", "Observation", and
> other possible things.  The way that we differentiate among different kinds
> of Occurrences is by using the DwC types which are the controlled values for
> the term dwc:basisOfRecord.  Thus we say an Occurrence "has a"
> basisOfRecord=PreservedSpecimen rather than saying it "is a"
> PreservedSpecimen.  We say an Occurrence "has a"
> basisOfRecord=HumanObservation rather than saying it "is
> a"HumanObservation".  This approach has greatly reduced the number of
> different terms in the standard since we don't have to have separate
> "ObservedBy" and "CollectedBy" terms, but rather can just have a single
> "RecordedBy" term that applies to both specimens and observations.  The same
> thing applies to many other things, like eventDate rather than DateCollected
> and DateObserved, locality rather than collectionLocality and
> observationLocality, etc.  With the ratification of Darwin Core, this
> decision is now a fait acompli and not a subject of discussion or something
> optional for users of the standard.  It also seems to be clear that as
> necessary new terms can be added to the DwC types which would then be valid
> controlled values for basisOfRecord.
> >>
> >> Since the adoption of the DwC standard, the approach to Occurrences has
> been what I would describe as "I know an Occurrence when I see one".  I
> consider this as a pretty sloppy practice and as I indicated in my post last
> night, I think there is enough consensus about what an Occurrence is that we
> can come up with a better definition than "an occurrence is the category of
> information pertaining to evidence of an occurrence...".  Another part of
> what I would characterize as sloppiness is the lack of a clear definition of
> what exactly basisOfRecord means.  When I wrote my attempt at summarizing
> consensus last night, I dodged the question about what I called the "token".
>  This "thing" has been called various names.  In the previous discussion on
> the list, it was sometimes called "the evidence" of the occurrence.  In the
> past I have called it "a representation" - however, I now think the term
> "token" is better because "representation" has a different technical meaning
> in the context of content negotiation.  When we type an Occurrence by saying
> it has a basisOfRecord=PreservedSpecimen, we are saying that this Occurrence
> has as supporting evidence, or as a "token" if you prefer, all or part of
> the dead remains of the organism (i.e. what I'm calling "the Individual")
> that was being documented by the Occurrence.  When we type an Occurrence by
> saying it has a basisOfRecord=LivingSpecimen, we are saying that this
> Occurrence has as a "token" the entire organism that was being documented
> (or some vegetative part of the live organism that was propagated).  When we
> type an Occurrence by saying it has a basisOfRecord=HumanObservation, we are
> saying that the Occurrence has no supporting evidence other than the
> reputation of the observer to accurately record the metadata about the
> Occurrence.  In other words, we "tag" a instance of a core class (to use
> Bob's words), Occurrence, by telling a metadata consumer what kind of token
> we are using as evidence of the Occurrence.
> >> A fundamental part of creating a clear definition of what an Occurrence
> is, is to define exactly what we are including in the concept of Occurrence.
>  One possibility is to (1) say that the two boxes at the right side of the
> diagram at http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gifare fused and that both the Occurrence metadata and its associated token are
> what we consider to be "the Occurrence".  Another approach (2) would be to
> say that the actual Occurrence as an entity is only the metadata part and
> that the token is a separate thing.  A third approach is to say (3) that
> everything with the blue dotted lines is considered a part of the Occurrence
> (i.e. the metadata, the token, the event, and the locality).  I don't think
> in an absolute sense, any one of these approaches is "right".  The problem
> is that these approaches are used inconsistently, sometimes even by the same
> person, depending on the basisOfRecord.  Differences in ways of thinking
> about this issue is a part of why people aren't understanding the way other
> people are approaching the structuring of metadata.  I have tried to
> consistently take the approach (1) that the two boxes on the right are
> fused, i.e. that the Occurrence metadata and the token should both be
> considered part of the entity that we call "an Occurrence".  I think this is
> why Rich was confused in
> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001666.html when
> I said that it was "wrong" to assert that a scientific name is a property of
> an Occurrence - obviously it is silly to say that the token (photons on a
> film, sound patterns in a digital file) has a scientific name.  Yet that is
> exactly what people do routinely when the token is a branch cut off a tree
> and glued to a piece of paper.  They say that they are "identifying a
> specimen".  What I am asking (actually demanding) is that the TDWG community
> get its act together and come to some consistency on this.  If we are going
> to take the approach (2), then we need to take specimens off their pedestal
> and treat them like we do any other token that we are using as evidence that
> an Occurrence happened.  If we are going to do what was suggested for the
> BioBlitz in
> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001603.html,
> i.e. to call Occurrences "observations" and then link the tokens to them by
> associatedMedia, ResourceRelationship, or some other means (approach 2) then
> do it consistently for every kind of token, including specimens, and don't
> single out media tokens for punishment.
> >> I have in a sense "thrown down the gauntlet" on this issue by proposing
> that DigitalStillImage be added as a DwC type and as a controlled value for
> basisOfRecord (http://code.google.com/p/darwincore/issues/detail?id=68).
>  I know what some people are going to say in response to this proposal.
>  "Why do you need to have 'DigitalStillImage' as a value for basisOfRecord
> when you can just say that the resource's dcterms:type=StillImage?"  The
> answer goes back to Bob's point.  If we are going to go the "has a" path
> (which we already have in DwC for Occurrences) rather than subclassing
> everything, then we need to provide an appropriate value for the "tag" for
> any type of resource that a reasonable number of users will want to use as a
> token.  I think it is clear from this and other Bioblitzes, my work in
> Bioimages, the whale tracking project, and many other examples, that there
> are plenty of people who are already using DigitalStillImages as tokens and
> we all need a controlled value to use for basisOfRecord.
> >> The other thing that we accomplish when we type an Occurrence by its
> basisOfRecord is to tell a consumer what kind of metadata to expect to get
> about the token in addition to the generic metadata that is provided for all
> Occurrences.  Thus for a LivingSpecimen we expect to be told what zoo,
> botanical garden, bacterial collection, etc. contains the specimen.  For a
> PreservedSpecimen we expect to be told the preparation type, the location of
> the repository, etc.  For a DigitalStillImage we expect to be told the file
> type, accessURL, etc.  Simply providing a value for dcterms:type=StillImage
> doesn't indicate whether the image is a physical one (i.e. on film) or a
> digital one.  It is also unreasonable to expect a client to have to be
> checking two different terms (basisOfRecord and dcterms:type) to find out
> what they could learn from one (basisOfRecord).  Of course it would be
> advisable to provide a value for dcterms:type as well for clients outside
> the biodiversity community who may not "understand" what basisOfRecord
> means.
> >> I hate to keep bringing my posts back to the RDF issue, but thinking
> about how one would write RDF forces clear thinking about how metadata
> should be structured.  If we intend to separate tokens as entities from
> their associated Occurrence metadata, i.e. approach (2), then we open up a
> whole other can of worms.  To associate the occurrence resources (i.e. the
> metadata) with the "different" resource (i.e. the token), we will have
> probably have to be able to create URIs for the tokens and separate RDF
> metadata blocks which will have to be rdfs:type'd.  What are we going to use
> for that rdfs:type - create another Darwin Core class?  I simply don't think
> that is a complicated road that we want to travel.  It would be far easier
> to just say that every Occurrence has a one-to-one relationship with its
> token (which could be "the empty set" for observations).  This would not
> work for people who want to hang multiple tokens on a single observation
> event, but I think that itself is a bad idea because it makes it even harder
> to have "flat" occurrence datasets.  Just say that every time we collect a
> different token (or make an observation that has no token), it is a new
> Occurrence record.  Realistically, a single collector can't actually take a
> picture of a plant at the same time he or she collects it for a specimen
> anyway.  Those really should be considered two different events because they
> happen at different times.
> >>
> >> OK, enough said.  Consider this my defense of my proposal "issue 68" to
> add DigitalStillImage.  I would urge the powers that be to respond to the
> issues that I've raised here before having any kind of "vote" (or whatever
> is ultimately going to happen when there is an up or down decision about the
> proposal).
> >>
> >> Steve
> >>
> >> Steve Baskauf wrote:
> >> After the flurry of emails recently, I had an opportunity to carefully
> >> read all the way through the threads again, followed by enforced "think
> >> time" during my long commute.  I was actually pretty cheerful after that
> >> because I think that in essence, most of the conversation about what
> >> constitutes an Occurrence really boils down to the same thing.  So I
> >> have sat down and tried to summarize what seems to me to be a consensus
> >> about Occurrences.  To follow my points, please refer to the diagram at:
> >> http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gif
> >>
> >> Consensus on relationships
> >> 1. The fundamental definition of an Occurrence involves evidence that a
> >> representative of a taxon occurred at a place and time.
> >> Note 1.A: For clarity, I have modified John's statement in his last
> >> email by replacing "taxon" with "representative of a taxon".  I'm
> >> considering a taxon to be an abstract concept that is applied to
> >> individuals or groups of organisms.
> >> Note 1.B. This definition is far more useful than the official
> >> definition of the class Occurrence "The category of information
> >> pertaining to evidence of an occurrence..." which is essentially
> circular.
> >> Note 1.C: This statement is extremely broad because the evidence could
> >> be of many sorts, the representative could range from a single
> >> individual to all organisms on the earth, the taxon could be anyone's
> >> definition at any taxonomic level, the place could range from a GPS
> >> point with uncertainty of less than 10 meters to the entire planet
> >> earth, and the time could range from a shutter click of less than one
> >> second to 3.4 billion years.
> >> 2. The diagram is an attempt to summarize in pictorial form statements
> >> and relationships that have been described in the thread.  The taxon
> >> representative is recorded as existing at a particular time and place
> >> (the arrow) and the result is an Occurrence record.  That Occurrence
> >> record exists as metadata which may be associated with a token that can
> >> be used to voucher the fact that the taxon representative existed.  That
> >> token may be the organism itself (or a living part of it as in a twig
> >> for grafting), all or part of the organism in preserved form, an
> >> electronic representation such as an image or sound recording, and other
> >> kinds of things like tissue or DNA samples.  There may also be no token
> >> at all, in which case we call the Occurrence record an observation.
> >> Based on direct observation of the taxon representative, examination of
> >> one or more tokens, or both, some determiner asserts that a taxon
> >> concept applies to the taxon representative and as a result a scientific
> >> name can be used to "identify" the taxon representative.  (There may be
> >> a lot of other complicated stuff above the Identification box, but that
> >> will have to be filled in by the taxonomists.)
> >> Note 2.A: I have mapped onto this diagram the letters that John used in
> >> his last email to refer to entities that are involved in an Occurrence
> >> (T, E, L, O, and G).  I will beg the forgiveness of fossil people
> >> because I don't really know how the geological context fits in.  I'm
> >> assuming that it is a way of asserting time and location on a much
> >> broader scale than we do for extant organisms.
> >> Note 2.B: I have put a dotted line around the part of the diagram that I
> >> think includes all the things that people might consider part of the
> >> Occurrence itself.  I have left out "T" and the other parts related to
> >> identification because it seems to me that you can have an occurrence
> >> that you document which does not yet (and perhaps never will) have an
> >> identification.  The Occurrence still asserts that a taxon
> >> representative existed at a time and place; we just don't yet know what
> >> the taxon is.
> >> 3. The red lines indicate the relationships that connect the various
> >> entities (I'm going to go ahead and call them resources).  Consistent
> >> with popular opinion, the Occurrence record is the center of the
> >> universe and most things are connected to it.
> >> Note 3.A: I am sticking to my guns and refuse to connect the
> >> Identification directly to the Occurrence.  It is the taxon
> >> representative that is being identified, not the occurrence.  One can
> >> assert another sort of relationship between the identification and the
> >> occurrence if one wants to say that one consulted the occurrence
> >> metadata and token in order to decide about the identification, but it
> >> is not correct to say that the Identification identifies either the
> >> Occurrence metadata or the token (as Rich pointed out).
> >>
> >> OK, so that's step one - defining what is related to what.  If anyone
> >> disagrees with these relationships, please clarify or create your own
> >> diagram.
> >>
> >> Complicating circumstances/caveats
> >> 1. It is noted and recognized that some users will not care to include
> >> all of these relationships in their models.  In the interest of
> >> simplification or "flattening" the relationships, they may wish to
> >> collapse some parts of this diagram (e.g. incorporate time and location
> >> metadata within the Occurrence metadata rather than considering them
> >> separate resources, applying scientific names directly to the taxon
> >> representatives without defining a taxon concept or recording the
> >> determination metadata, connecting identifications directly to the
> >> occurrence, etc.).  This doesn't mean that the relationships don't
> >> exist, it just means that some users don't care about them.
> >> 2. It is recognized that different users will be interested in or able
> >> to specify the various resources to differing degrees of precision.
> >> Examples: A photographer might record times to the nearest second, a
> >> collector may only be interested in noting the date on which a specimen
> >> was collected.  A location may be specified to the precision of a GPS
> >> reading or be defined as some geographic or political subdivision.  The
> >> taxon representative may be an individual organism, a flock or clump, or
> >> some larger aggregation of taxon representatives.
> >>
> >> That's step two.  If I've missed any complications, please point them
> out.
> >>
> >> My opinions about the implications of this diagram
> >> 1. The circle I've labeled as "taxon representative" is the resource
> >> type that I'm proposing to be represented by the class Individual.  You
> >> will note that in both the definition of dwc:individualID ("An
> >> identifier for an individual or named group of individual organisms...")
> >> and the proposed class definition ("The category of information
> >> pertaining to an individual or named group of individual organisms
> >> represented in an Occurrence"), groups of individual organisms are
> >> included.  Thus John's example of a fossil having myriad individuals, or
> >> Richard's examples of thousands of plankton, a large school of fish,
> >> herd of wildebeest, flock of
> >> birds, could all be categorized as "Individual" under this definition if
> >> there is a reasonable expectation that all of the individuals in the
> >> group are members of the same taxon.  Perhaps there is a better name for
> >> this resource, but since dwc:individualID was already extant, I chose
> >> Individual as the class name for consistency with the pattern
> >> established with other classes and their associated xxxxID terms.
> >> 2. Although in note 1.C. I have given the ranges of the various
> >> resources to their logical extreme (as was done previously in the
> >> thread), I think that as a practical matter we can adopt guidelines to
> >> set reasonable values for the "normal" ranges of the resources.  One
> >> such guideline might be that we suggest a range that can accommodate
> >> about 95% of the user needs within the community (this came from Rich's
> >> comment about satisfying 95% of the user need with an establishmentMeans
> >> controlled vocuabulary).  For example, it was suggested that the range
> >> for the location of an Occurrence could span the entire planet Earth.
> >> True enough, but virtually nobody would find such a span useful.  95% of
> >> users would probably find a range between a GPS reading with 10 meter
> >> precision and the extent of a county or province useful for recording
> >> the location of an Occurrence.  I can suggest similar "useful" ranges:
> >> one second to one day for an event time (excluding fossils), one
> >> individual organism to the number of organisms that would fit within a
> >> 50 meter radius for an "individual", and taxon identified to family for
> >> plants and maybe mammals, genus for birds, and order for insects.  So
> >> framing the definition of an Occurrence in these terms it would be
> >> something like: "An occurrence involves evidence (consisting of a
> >> physical token, electronic record, or personal observation) that a
> >> representative (ranging from a single individual to the number that
> >> would fit on a football field) of a taxon (hopefully identified to some
> >> lower taxonomic level) occurred at a place (determined to a precision
> >> between that of a GPS reading and the size of a county/province) and
> >> time (spanning one second to one day)."  A few people might object to
> >> this level of restrictiveness, but I would guess that it would make 95%
> >> of us happy.
> >> 3. With the exception of the "missing" class Individual, every resource
> >> type on this diagram except for the "token" and Scientific name has a
> >> Darwin Core class. Every resource type on the diagram except for "token"
> >> has a dwc:xxxxID term that can be used to refer to a GUID for the
> >> resource.  The implication of this is that any resource on this diagram
> >> except for the token and taxon representative (i.e. Individual) is ready
> >> to be represented in RDF by Darwin Core terms in the sense that the
> >> relationships (red lines) can be represented by the xxxxID terms and
> >> that the resources can be rdfs:type'd using Darwin Core classes.
> >> (Lacking a class for the scientific name doesn't seem like a big deal to
> >> me since the scientific name can be a string literal - but then I'm not
> >> a taxonomist.)
> >> 4. OK, I've avoided it as long as I can, so I'm going to confess now to
> >> the RDF-phobes.  The red lines and shapes are something pretty close to
> >> an RDF graph.  What that means is that if the community can agree that
> >> this diagram correctly represents the relationships among the kinds of
> >> biodiversity resources that we care about, then the matter of providing
> >> guidelines on how to represent Darwin Core in RDF suddenly gets a lot
> >> simpler.  Just convert the "picture" of the RDF graph into XML format
> >> and we have a template.  Alright, that's an oversimplification, but I
> >> think it is essentially true because the most difficult part of
> >> achieving a consensus on RDF representations is to decide how we connect
> >> the resource types, not on the literals that we hang onto resources as
> >> properties.
> >> 5. While I'm beating the RDF drum again, the importance of my opinion
> >> number 2 can be extended into the GUID adoption process.  In my comments
> >> to Kevin about the Beginner's Guide to Persistent Identifiers, I think I
> >> commented on the question of how one decides whether a GUID needs to be
> >> assigned to something or not.  I believe that the answer to that
> >> question boils down to this: we need a GUID for any resource that will
> >> be referenced by more than one other resource.  Do we need to be able to
> >> assign a GUID to Taxon concepts?  Yes, because it is likely that many
> >> identifications will want to reference a particular taxon concept.  Do
> >> we need to be able to assign a GUID to an Event?  Maybe or maybe not.
> >> If every occurrence has its own separate time recorded, then no GUID is
> >> needed because the time is just a part of every separate occurrence
> >> record.  If the event is defined to be a time range that represents a
> >> collecting trip, then there may be many Occurrences that are associated
> >> with that trip and all of them could reference the GUID for that event
> >> rather than repeating the event information for every Occurrence.  The
> >> point here is that every shape (class of resources) on this diagram at
> >> least has the POTENTIAL to be a node connecting multiple resources and
> >> therefore should have the capability of being assigned a GUID, having
> >> its own RDF record, and being appropriately typed (presumably by a DwC
> >> class).  So this is a final technical argument for why we need to have
> >> the DwC class Individual.  Whether or not people ultimately choose to
> >> assign GUIDs to particular resource types or not is their own choice,
> >> but they need to at least be ABLE to if they need that resource to serve
> >> as a node given the structure of their metadata.
> >>
> >> We need to clarify how the "token" thing fits in, but I'm stopping there
> >> for now.  I would very much appreciate responses indicating that:
> >>
> >> A. you agree with the diagram and connections (and consider this
> >> definition and diagram a consensus)
> >> B. you disagree with the diagram (and articulate why)
> >> C. you provide an alternative diagram or explanation of the
> >> relationships among the classes related to Occurrences.
> >>
> >> Thanks for you patience with another tome.
> >> Steve
> >>
> >> --
> >> Steven J. Baskauf, Ph.D., Senior Lecturer
> >> Vanderbilt University Dept. of Biological Sciences
> >>
> >> postal mail address:
> >> VU Station B 351634
> >> Nashville, TN  37235-1634,  U.S.A.
> >>
> >> delivery address:
> >> 2125 Stevenson Center
> >> 1161 21st Ave., S.
> >> Nashville, TN 37235
> >>
> >> office: 2128 Stevenson Center
> >> phone: (615) 343-4582,  fax: (615) 343-6707
> >> http://bioimages.vanderbilt.edu
> >>
> >> _______________________________________________
> >> tdwg-content mailing list
> >> tdwg-content at lists.tdwg.org
> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >> .
> >>
> >>
> >>
> >> --
> >> Steven J. Baskauf, Ph.D., Senior Lecturer
> >> Vanderbilt University Dept. of Biological Sciences
> >>
> >> postal mail address:
> >> VU Station B 351634
> >> Nashville, TN  37235-1634,  U.S.A.
> >>
> >> delivery address:
> >> 2125 Stevenson Center
> >> 1161 21st Ave., S.
> >> Nashville, TN 37235
> >>
> >> office: 2128 Stevenson Center
> >> phone: (615) 343-4582,  fax: (615) 343-6707
> >> http://bioimages.vanderbilt.edu
> >>
> >>
> >>
> >>
> >> --
> >> ----------------------------------------------------------------
> >> Pete DeVries
> >> Department of Entomology
> >> University of Wisconsin - Madison
> >> 445 Russell Laboratories
> >> 1630 Linden Drive
> >> Madison, WI 53706
> >> TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
> >> About the GeoSpecies Knowledge Base
> >> ------------------------------------------------------------
> >
> > --
> > Steven J. Baskauf, Ph.D., Senior Lecturer
> > Vanderbilt University Dept. of Biological Sciences
> >
> > postal mail address:
> > VU Station B 351634
> > Nashville, TN  37235-1634,  U.S.A.
> >
> > delivery address:
> > 2125 Stevenson Center
> > 1161 21st Ave., S.
> > Nashville, TN 37235
> >
> > office: 2128 Stevenson Center
> > phone: (615) 343-4582,  fax: (615) 343-6707
> >
> > http://bioimages.vanderbilt.edu
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101018/79e48118/attachment-0001.html 

More information about the tdwg-content mailing list