[tdwg-content] practical details of recording a determination What is an Occurrence?

Peter DeVries pete.devries at gmail.com
Mon Oct 18 20:27:17 CEST 2010


Hi Steve,

You need to fix this in two ways (independent of the vocab, which I did not
check)

1) It should show up correctly in URIburner.

http://linkeddata.uriburner.com/about/html/http/bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf

<http://linkeddata.uriburner.com/about/html/http/bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf>2)
In the description of the RDF itself (in your example it is at the bottom),
you need to make a foaf:topic link between that element and each of the
    entities that start with "rdf:about". This will allow you to find the
actual rdf page that describes these. To get the link back from the entity
to the
    page add a "foaf:page" that points back to the RDF.

Remember that in the cloud or in your triple store entities like <
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind> are not tied
to the RDF that contains statements about them, without some link to and
from the page
<http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf>

* You could get the same result by using the "dcterms:references" and its
inverse "dcterms:ReferencedBy", but let me run that past someone to see if
   it is equally accepted.

Here is an abbreviated version of what this might look like:

<rdf:Description rdf:about="
http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf">
  <dcterms:description>RDF formatted description of the preserved specimen
http://www.cyberfloralouisiana.com/specimens/lsu000/0428
</dcterms:description>
  <dcterms:modified>2010-09-25T06:35:58</dcterms:modified>
  <xmp:MetadataDate>2010-09-25T06:35:58</xmp:MetadataDate>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39265b"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39231b"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39231a"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39265a"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#img"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#bq"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#bq"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#tn"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#lq"/>
  <foaf:topic rdf:resource="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#gq"/>
</rdf:Description>

<rdf:Description rdf:about="
http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind">
 <foaf:page rdf:resource="
http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf"/>
</rdf:Description>

Following this pattern, your RDF will be browsable as in this example:

http://linkeddata.uriburner.com/about/html/http/lod.taxonconcept.org/rdf/area_example.rdf

Note how you can click back and forth between the location and the RDF that
describes it.

- Pete

On Mon, Oct 18, 2010 at 11:49 AM, Steve Baskauf <
steve.baskauf at vanderbilt.edu> wrote:

>  I've fallen behind on systematically perusing the list responses, but I
> would like to focus in on a point that seems to be a consensus in the
> responses that have shown up recently.  The consensus seems to be that
> documenting determinations (a.k.a. instances of dwc:Identification class)
> that are applied to Individuals (or Occurrences if you don't believe in
> Individuals) is the way to go.  So in my usual graphical way of thinking
> about this, I would draw a "relationship line" from the determination to the
> Individual (or Occurrence) on one side and from the determination to the
> species concept on the other.  I will leave up to the taxonomy people the
> different things would be connected to the species concept and how all of
> their lines would be connected.  The determination would have any of the
> properties that are terms listed in the dwc:Identification class
> (identifiedBy, dateIdentified, identificationReferences, identification
> Remarks, identificationQualifier, and typeStatus).  Some properties like
> dateIdentified and identificationReferences would be string literals and
> others (especially identifiedBy) should probably be GUIDs but could be
> literals if they had to be.
>
> That all seems pretty clear.  However, when I've started trying to do this
> in real life, I immediately have questions.  Take a look at
> http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf which should
> show up as a web page in your browser.
>
> 1. The original label identifies the species as Juncus diffusissimus.
> However, there is no indicator as to who originally identified it or when.
> My assumption is that it was the collector (Glen N. Montz) but I don't
> really know that.  Do I assume that, or list the original determiner as
> "unknown"?
> 2. Do we draw a distinction between the initial identification and
> subsequent annotations?  I think the answer should be "no" and that's why I
> refer to both generically as "determinations".
> 3. There is really no indication given on the annotation labels as to many
> of the things that we would like to know, such as the concept they had in
> mind, any source they used (if any), or the reason why they did the
> annotation.  So how does one connect the name that they applied to the
> determination when there is no indication of the concept?  Is this just
> something we can't do for old annotations and just something that we try to
> do from this point forward?
> 4. The last question is one that I really want to some opinions about.  It
> seems to me that there are a number of reasons why one would apply a
> determination.  One would be to correct an actual error in identification.
> One would be to increase the precision of a previous determination (e.g. an
> insect identified to family now is identified to species).  One would be to
> assert a difference in opinion as to the correct way to group this
> individual with others (i.e. as in a taxonomic revision).  Finally, a single
> determiner might apply several determinations to one individual and indicate
> in each determination the concept intended (i.e. if you subscribe to
> Cronquist, you'd call it X; if you like Radford's book, you'd call it Y; if
> you like Weakley's treatment, you'd call it Z).  Some of these four reasons
> may be functionally equivalent, but how would you use Darwin Core to
> indicate the reason why you applied the determination?  Please don't say
> "identificationRemarks"!  From a machine-processing standpoint, this is
> something we should know and there should be some kind of controlled
> vocabulary to express it.  For instance if an identification is "deprecated"
> because it was in error (perhaps by the determiner him/herself), one would
> like the incorrect determination to show up in the historical metadata, but
> I wouldn't want it to be listed in a website index.  The same would hold
> true if an annotator was able to pin the taxon down to a lower taxonomic
> level than the original identifier.  If someone goes to the trouble to
> connect an Individual/Occurrence to several names under alternative
> concepts, there should be a way the a machine would know this so that a
> software user could select the concept they wanted to use and the name under
> that concept would pop up.
>
> I don't really see any term under the current DwC that could be used to do
> this last thing.  Am I missing something?  Do we need several terms to
> explain the reason why we made the determination because the reasons fall
> into different categories?
>
> The other comment that I'll throw out (since this is going out to the
> bioblitz list as well as to tdwg-content) is that those of you who are
> building apps to collect metadata in the field really need to separate the
> process of entering (or acquiring) the collection metadata from the
> determination process.  In at least some apps, the user immediately has to
> commit to a taxon as they enter the data at the time of collection.  It
> seems to me that it would be a very common situation (especially in the case
> of "citizen science") that the collector/observer/photographer would have no
> idea what the taxonomic identity was at the time of collection.  The process
> of determination (and the recording of the various dwc:Identification class
> terms) is really a separate process that should be able to happen at the
> time of collection OR later.
>
> Steve
>
> Peter DeVries wrote:
>
> Hi Steve,
>
>  I would hypothesize that for the vast majority of identified records the
> process is something like this:
>
>  1) An individual uses some sort of key to determine what species (taxon
> concept) to assign to a given individual
>    * They may have created some sort of mental key in which once they
> recognize one individual mosquito they can then pretty quickly sort
>       a number of individuals into collections.
>
>  2) The actual name they assign to the specimen is usually based on what
> their key says the name is. Often this does not specify the authorship.
>     Most of these human identifiers have not read the original species
> descriptions and for the species they are identifying.
>     So the specimen is actually tied to a concept that is based more on the
> "key" than the original description.
>     * An exception, would be where there is a key in the original
> description and that was what what was used.
>
>  3) So in a sense, the process of modeling this as if the if the
> identifier actually asserted that the concept was the same as that described
> by
>     the original description or a subsequent revision is "fudging"
>
>  Side effects of this process include:
>
>  1) A new key for North American Mosquitoes comes out that incorporates
> recent changes in nomenclature. The major change being the elevation of
>     a subgenus to a genus. For most of the species described the "key
> concept" is unchanged.
>
>  Student identifier, Bob, in state X is using the latest key, while
> student identifier, Joe, is state Z is using a slightly older edition of the
> same key.
>
>  Bob identifies the species as *Ochlerotatus triseriatus*, while Joe
> identifies what should be the same species as *Aedes triseriatus*.
>
>  These show up in GBIF on two different maps, they show up in the EOL as
> two different pages.
>
>  Various TDWG'ers continue to argue that the original description and
> subsequent revisions were really important in determining what these
> individuals
> actually meant when they assigned a name to a specimen, and that this is
> how we should model it in excruciating detail.
>
>  I would argue this should be modeled as best as possible to what actually
> happens.
>
>  For example, how many of the species observed in the recent BioBlitz were
> identified by referring to the original species description or subsequent
> revisions?
>
>  In your diagram, I would suggest that you show that a taxon concept may
> have many names associated with it. Since it is not clear what the
> identifier intended by his or her choice of a name, it is often difficult to
> determine what taxon concept they actually meant.
>
>  This is why I advocate a move to a more taxon concept based identifier to
> link these data sets together because this allows the intent of the
> identifier
> is more accurately modeled.
>
>  This would be done in the form of:
>
>   "I assert that this specimen (of what I call *Aedes triseriatus*) was
> observed here. I also assert that it is an instance of the this species
> concept => URI"
>
>    Or I assert that this is an individual of the type "Individual of
> species concept X" = > URI
>
>    All of these are instances of the class "Individual"
>
>  So the resulting DarwinCore record would contain both the name and and an
> optional, but I think needed, asserted species concept.
>
>  The species concept is a subclass of taxon concept, but is fundamentally
> different than the higher clades.
>
>  There are some guidelines as to what an entity needs to be considered a
> species.
>
>  While their are no real guidelines as to what clades should be considered
> genera and what clades should be considered families etc.
>
>  Assigning properties at the level of genera or family is also problematic
> because it assumes that there will be inferencing and it will require
> rechecking
> that those properties are still valid if the species within that genera
> change.
>
>  So if there is some property that is common to all the species in the
> genus, make that a property of each of the individual species - not a
> property
> of the genus.
>
>  Respectfully,
>
>  - Pete
>
>
>
>
>
>
>
> On Fri, Oct 15, 2010 at 10:45 AM, Steve Baskauf <
> steve.baskauf at vanderbilt.edu> wrote:
>
>> As a background to this post, I want to reference a post by Bob called
>> "SubclassOrNot".  I discovered this page on an early foray into the TDWG
>> website labyrinth and it has been very influential on my thinking since
>> then.  The idea Bob discusses is central to what I'm writing below so if you
>> haven't read it you might want to do so first.  You can probably skip the
>> "OWL Inference" section and still get the point which is described in the
>> first two sections of his post.  The URL for the page is
>> http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot .
>>
>> To preface what I'm going to say below, I want to put Darwin Core
>> Occurrences in the context of what Bob wrote.  In my mind, one of the
>> hallmarks of the Darwin Core standard and one thing that makes it a great
>> improvement over previous versions is that the decision was made to use what
>> Bob called the "has a" approach rather than the "is a" approach.  In
>> particular, the Darwin Core standard has a single class called
>> dwc:Occurrence rather than subclasses called "Specimen", "Observation", and
>> other possible things.  The way that we differentiate among different kinds
>> of Occurrences is by using the DwC types which are the controlled values for
>> the term dwc:basisOfRecord.  Thus we say an Occurrence "has a"
>> basisOfRecord=PreservedSpecimen rather than saying it "is a"
>> PreservedSpecimen.  We say an Occurrence "has a"
>> basisOfRecord=HumanObservation rather than saying it "is
>> a"HumanObservation".  This approach has greatly reduced the number of
>> different terms in the standard since we don't have to have separate
>> "ObservedBy" and "CollectedBy" terms, but rather can just have a single
>> "RecordedBy" term that applies to both specimens and observations.  The same
>> thing applies to many other things, like eventDate rather than DateCollected
>> and DateObserved, locality rather than collectionLocality and
>> observationLocality, etc.  With the ratification of Darwin Core, this
>> decision is now a fait acompli and not a subject of discussion or something
>> optional for users of the standard.  It also seems to be clear that as
>> necessary new terms can be added to the DwC types which would then be valid
>> controlled values for basisOfRecord.
>>
>> Since the adoption of the DwC standard, the approach to Occurrences has
>> been what I would describe as "I know an Occurrence when I see one".  I
>> consider this as a pretty sloppy practice and as I indicated in my post last
>> night, I think there is enough consensus about what an Occurrence is that we
>> can come up with a better definition than "an occurrence is the category of
>> information pertaining to evidence of an occurrence...".  Another part of
>> what I would characterize as sloppiness is the lack of a clear definition of
>> what exactly basisOfRecord means.  When I wrote my attempt at summarizing
>> consensus last night, I dodged the question about what I called the "token".
>>  This "thing" has been called various names.  In the previous discussion on
>> the list, it was sometimes called "the evidence" of the occurrence.  In the
>> past I have called it "a representation" - however, I now think the term
>> "token" is better because "representation" has a different technical meaning
>> in the context of content negotiation.  When we type an Occurrence by saying
>> it has a basisOfRecord=PreservedSpecimen, we are saying that this Occurrence
>> has as supporting evidence, or as a "token" if you prefer, all or part of
>> the dead remains of the organism (i.e. what I'm calling "the Individual")
>> that was being documented by the Occurrence.  When we type an Occurrence by
>> saying it has a basisOfRecord=LivingSpecimen, we are saying that this
>> Occurrence has as a "token" the entire organism that was being documented
>> (or some vegetative part of the live organism that was propagated).  When we
>> type an Occurrence by saying it has a basisOfRecord=HumanObservation, we are
>> saying that the Occurrence has no supporting evidence other than the
>> reputation of the observer to accurately record the metadata about the
>> Occurrence.  In other words, we "tag" a instance of a core class (to use
>> Bob's words), Occurrence, by telling a metadata consumer what kind of token
>> we are using as evidence of the Occurrence.
>> A fundamental part of creating a clear definition of what an Occurrence
>> is, is to define exactly what we are including in the concept of Occurrence.
>>  One possibility is to (1) say that the two boxes at the right side of the
>> diagram at http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gifare fused and that both the Occurrence metadata and its associated token are
>> what we consider to be "the Occurrence".  Another approach (2) would be to
>> say that the actual Occurrence as an entity is only the metadata part and
>> that the token is a separate thing.  A third approach is to say (3) that
>> everything with the blue dotted lines is considered a part of the Occurrence
>> (i.e. the metadata, the token, the event, and the locality).  I don't think
>> in an absolute sense, any one of these approaches is "right".  The problem
>> is that these approaches are used inconsistently, sometimes even by the same
>> person, depending on the basisOfRecord.  Differences in ways of thinking
>> about this issue is a part of why people aren't understanding the way other
>> people are approaching the structuring of metadata.  I have tried to
>> consistently take the approach (1) that the two boxes on the right are
>> fused, i.e. that the Occurrence metadata and the token should both be
>> considered part of the entity that we call "an Occurrence".  I think this is
>> why Rich was confused in
>> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001666.htmlwhen I said that it was "wrong" to assert that a scientific name is a
>> property of an Occurrence - obviously it is silly to say that the token
>> (photons on a film, sound patterns in a digital file) has a scientific name.
>>  Yet that is exactly what people do routinely when the token is a branch cut
>> off a tree and glued to a piece of paper.  They say that they are
>> "identifying a specimen".  What I am asking (actually demanding) is that the
>> TDWG community get its act together and come to some consistency on this.
>>  If we are going to take the approach (2), then we need to take specimens
>> off their pedestal and treat them like we do any other token that we are
>> using as evidence that an Occurrence happened.  If we are going to do what
>> was suggested for the BioBlitz in
>> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001603.html,
>> i.e. to call Occurrences "observations" and then link the tokens to them by
>> associatedMedia, ResourceRelationship, or some other means (approach 2) then
>> do it consistently for every kind of token, including specimens, and don't
>> single out media tokens for punishment.
>> I have in a sense "thrown down the gauntlet" on this issue by proposing
>> that DigitalStillImage be added as a DwC type and as a controlled value for
>> basisOfRecord (http://code.google.com/p/darwincore/issues/detail?id=68).
>>  I know what some people are going to say in response to this proposal.
>>  "Why do you need to have 'DigitalStillImage' as a value for basisOfRecord
>> when you can just say that the resource's dcterms:type=StillImage?"  The
>> answer goes back to Bob's point.  If we are going to go the "has a" path
>> (which we already have in DwC for Occurrences) rather than subclassing
>> everything, then we need to provide an appropriate value for the "tag" for
>> any type of resource that a reasonable number of users will want to use as a
>> token.  I think it is clear from this and other Bioblitzes, my work in
>> Bioimages, the whale tracking project, and many other examples, that there
>> are plenty of people who are already using DigitalStillImages as tokens and
>> we all need a controlled value to use for basisOfRecord.
>> The other thing that we accomplish when we type an Occurrence by its
>> basisOfRecord is to tell a consumer what kind of metadata to expect to get
>> about the token in addition to the generic metadata that is provided for all
>> Occurrences.  Thus for a LivingSpecimen we expect to be told what zoo,
>> botanical garden, bacterial collection, etc. contains the specimen.  For a
>> PreservedSpecimen we expect to be told the preparation type, the location of
>> the repository, etc.  For a DigitalStillImage we expect to be told the file
>> type, accessURL, etc.  Simply providing a value for dcterms:type=StillImage
>> doesn't indicate whether the image is a physical one (i.e. on film) or a
>> digital one.  It is also unreasonable to expect a client to have to be
>> checking two different terms (basisOfRecord and dcterms:type) to find out
>> what they could learn from one (basisOfRecord).  Of course it would be
>> advisable to provide a value for dcterms:type as well for clients outside
>> the biodiversity community who may not "understand" what basisOfRecord
>> means.
>> I hate to keep bringing my posts back to the RDF issue, but thinking about
>> how one would write RDF forces clear thinking about how metadata should be
>> structured.  If we intend to separate tokens as entities from their
>> associated Occurrence metadata, i.e. approach (2), then we open up a whole
>> other can of worms.  To associate the occurrence resources (i.e. the
>> metadata) with the "different" resource (i.e. the token), we will have
>> probably have to be able to create URIs for the tokens and separate RDF
>> metadata blocks which will have to be rdfs:type'd.  What are we going to use
>> for that rdfs:type - create another Darwin Core class?  I simply don't think
>> that is a complicated road that we want to travel.  It would be far easier
>> to just say that every Occurrence has a one-to-one relationship with its
>> token (which could be "the empty set" for observations).  This would not
>> work for people who want to hang multiple tokens on a single observation
>> event, but I think that itself is a bad idea because it makes it even harder
>> to have "flat" occurrence datasets.  Just say that every time we collect a
>> different token (or make an observation that has no token), it is a new
>> Occurrence record.  Realistically, a single collector can't actually take a
>> picture of a plant at the same time he or she collects it for a specimen
>> anyway.  Those really should be considered two different events because they
>> happen at different times.
>>
>> OK, enough said.  Consider this my defense of my proposal "issue 68" to
>> add DigitalStillImage.  I would urge the powers that be to respond to the
>> issues that I've raised here before having any kind of "vote" (or whatever
>> is ultimately going to happen when there is an up or down decision about the
>> proposal).
>>
>> Steve
>>
>> Steve Baskauf wrote:
>>
>>> After the flurry of emails recently, I had an opportunity to carefully
>>> read all the way through the threads again, followed by enforced "think
>>> time" during my long commute.  I was actually pretty cheerful after that
>>> because I think that in essence, most of the conversation about what
>>> constitutes an Occurrence really boils down to the same thing.  So I
>>> have sat down and tried to summarize what seems to me to be a consensus
>>> about Occurrences.  To follow my points, please refer to the diagram at:
>>> http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gif
>>>
>>> Consensus on relationships
>>> 1. The fundamental definition of an Occurrence involves evidence that a
>>> representative of a taxon occurred at a place and time.
>>> Note 1.A: For clarity, I have modified John's statement in his last
>>> email by replacing "taxon" with "representative of a taxon".  I'm
>>> considering a taxon to be an abstract concept that is applied to
>>> individuals or groups of organisms.
>>> Note 1.B. This definition is far more useful than the official
>>> definition of the class Occurrence "The category of information
>>> pertaining to evidence of an occurrence..." which is essentially
>>> circular.
>>> Note 1.C: This statement is extremely broad because the evidence could
>>> be of many sorts, the representative could range from a single
>>> individual to all organisms on the earth, the taxon could be anyone's
>>> definition at any taxonomic level, the place could range from a GPS
>>> point with uncertainty of less than 10 meters to the entire planet
>>> earth, and the time could range from a shutter click of less than one
>>> second to 3.4 billion years.
>>> 2. The diagram is an attempt to summarize in pictorial form statements
>>> and relationships that have been described in the thread.  The taxon
>>> representative is recorded as existing at a particular time and place
>>> (the arrow) and the result is an Occurrence record.  That Occurrence
>>> record exists as metadata which may be associated with a token that can
>>> be used to voucher the fact that the taxon representative existed.  That
>>> token may be the organism itself (or a living part of it as in a twig
>>> for grafting), all or part of the organism in preserved form, an
>>> electronic representation such as an image or sound recording, and other
>>> kinds of things like tissue or DNA samples.  There may also be no token
>>> at all, in which case we call the Occurrence record an observation.
>>> Based on direct observation of the taxon representative, examination of
>>> one or more tokens, or both, some determiner asserts that a taxon
>>> concept applies to the taxon representative and as a result a scientific
>>> name can be used to "identify" the taxon representative.  (There may be
>>> a lot of other complicated stuff above the Identification box, but that
>>> will have to be filled in by the taxonomists.)
>>> Note 2.A: I have mapped onto this diagram the letters that John used in
>>> his last email to refer to entities that are involved in an Occurrence
>>> (T, E, L, O, and G).  I will beg the forgiveness of fossil people
>>> because I don't really know how the geological context fits in.  I'm
>>> assuming that it is a way of asserting time and location on a much
>>> broader scale than we do for extant organisms.
>>> Note 2.B: I have put a dotted line around the part of the diagram that I
>>> think includes all the things that people might consider part of the
>>> Occurrence itself.  I have left out "T" and the other parts related to
>>> identification because it seems to me that you can have an occurrence
>>> that you document which does not yet (and perhaps never will) have an
>>> identification.  The Occurrence still asserts that a taxon
>>> representative existed at a time and place; we just don't yet know what
>>> the taxon is.
>>> 3. The red lines indicate the relationships that connect the various
>>> entities (I'm going to go ahead and call them resources).  Consistent
>>> with popular opinion, the Occurrence record is the center of the
>>> universe and most things are connected to it.
>>> Note 3.A: I am sticking to my guns and refuse to connect the
>>> Identification directly to the Occurrence.  It is the taxon
>>> representative that is being identified, not the occurrence.  One can
>>> assert another sort of relationship between the identification and the
>>> occurrence if one wants to say that one consulted the occurrence
>>> metadata and token in order to decide about the identification, but it
>>> is not correct to say that the Identification identifies either the
>>> Occurrence metadata or the token (as Rich pointed out).
>>>
>>> OK, so that's step one - defining what is related to what.  If anyone
>>> disagrees with these relationships, please clarify or create your own
>>> diagram.
>>>
>>> Complicating circumstances/caveats
>>> 1. It is noted and recognized that some users will not care to include
>>> all of these relationships in their models.  In the interest of
>>> simplification or "flattening" the relationships, they may wish to
>>> collapse some parts of this diagram (e.g. incorporate time and location
>>> metadata within the Occurrence metadata rather than considering them
>>> separate resources, applying scientific names directly to the taxon
>>> representatives without defining a taxon concept or recording the
>>> determination metadata, connecting identifications directly to the
>>> occurrence, etc.).  This doesn't mean that the relationships don't
>>> exist, it just means that some users don't care about them.
>>> 2. It is recognized that different users will be interested in or able
>>> to specify the various resources to differing degrees of precision.
>>> Examples: A photographer might record times to the nearest second, a
>>> collector may only be interested in noting the date on which a specimen
>>> was collected.  A location may be specified to the precision of a GPS
>>> reading or be defined as some geographic or political subdivision.  The
>>> taxon representative may be an individual organism, a flock or clump, or
>>> some larger aggregation of taxon representatives.
>>>
>>> That's step two.  If I've missed any complications, please point them
>>> out.
>>>
>>> My opinions about the implications of this diagram
>>> 1. The circle I've labeled as "taxon representative" is the resource
>>> type that I'm proposing to be represented by the class Individual.  You
>>> will note that in both the definition of dwc:individualID ("An
>>> identifier for an individual or named group of individual organisms...")
>>> and the proposed class definition ("The category of information
>>> pertaining to an individual or named group of individual organisms
>>> represented in an Occurrence"), groups of individual organisms are
>>> included.  Thus John's example of a fossil having myriad individuals, or
>>> Richard's examples of thousands of plankton, a large school of fish,
>>> herd of wildebeest, flock of
>>> birds, could all be categorized as "Individual" under this definition if
>>> there is a reasonable expectation that all of the individuals in the
>>> group are members of the same taxon.  Perhaps there is a better name for
>>> this resource, but since dwc:individualID was already extant, I chose
>>> Individual as the class name for consistency with the pattern
>>> established with other classes and their associated xxxxID terms.
>>> 2. Although in note 1.C. I have given the ranges of the various
>>> resources to their logical extreme (as was done previously in the
>>> thread), I think that as a practical matter we can adopt guidelines to
>>> set reasonable values for the "normal" ranges of the resources.  One
>>> such guideline might be that we suggest a range that can accommodate
>>> about 95% of the user needs within the community (this came from Rich's
>>> comment about satisfying 95% of the user need with an establishmentMeans
>>> controlled vocuabulary).  For example, it was suggested that the range
>>> for the location of an Occurrence could span the entire planet Earth.
>>> True enough, but virtually nobody would find such a span useful.  95% of
>>> users would probably find a range between a GPS reading with 10 meter
>>> precision and the extent of a county or province useful for recording
>>> the location of an Occurrence.  I can suggest similar "useful" ranges:
>>> one second to one day for an event time (excluding fossils), one
>>> individual organism to the number of organisms that would fit within a
>>> 50 meter radius for an "individual", and taxon identified to family for
>>> plants and maybe mammals, genus for birds, and order for insects.  So
>>> framing the definition of an Occurrence in these terms it would be
>>> something like: "An occurrence involves evidence (consisting of a
>>> physical token, electronic record, or personal observation) that a
>>> representative (ranging from a single individual to the number that
>>> would fit on a football field) of a taxon (hopefully identified to some
>>> lower taxonomic level) occurred at a place (determined to a precision
>>> between that of a GPS reading and the size of a county/province) and
>>> time (spanning one second to one day)."  A few people might object to
>>> this level of restrictiveness, but I would guess that it would make 95%
>>> of us happy.
>>> 3. With the exception of the "missing" class Individual, every resource
>>> type on this diagram except for the "token" and Scientific name has a
>>> Darwin Core class. Every resource type on the diagram except for "token"
>>> has a dwc:xxxxID term that can be used to refer to a GUID for the
>>> resource.  The implication of this is that any resource on this diagram
>>> except for the token and taxon representative (i.e. Individual) is ready
>>> to be represented in RDF by Darwin Core terms in the sense that the
>>> relationships (red lines) can be represented by the xxxxID terms and
>>> that the resources can be rdfs:type'd using Darwin Core classes.
>>> (Lacking a class for the scientific name doesn't seem like a big deal to
>>> me since the scientific name can be a string literal - but then I'm not
>>> a taxonomist.)
>>> 4. OK, I've avoided it as long as I can, so I'm going to confess now to
>>> the RDF-phobes.  The red lines and shapes are something pretty close to
>>> an RDF graph.  What that means is that if the community can agree that
>>> this diagram correctly represents the relationships among the kinds of
>>> biodiversity resources that we care about, then the matter of providing
>>> guidelines on how to represent Darwin Core in RDF suddenly gets a lot
>>> simpler.  Just convert the "picture" of the RDF graph into XML format
>>> and we have a template.  Alright, that's an oversimplification, but I
>>> think it is essentially true because the most difficult part of
>>> achieving a consensus on RDF representations is to decide how we connect
>>> the resource types, not on the literals that we hang onto resources as
>>> properties.
>>> 5. While I'm beating the RDF drum again, the importance of my opinion
>>> number 2 can be extended into the GUID adoption process.  In my comments
>>> to Kevin about the Beginner's Guide to Persistent Identifiers, I think I
>>> commented on the question of how one decides whether a GUID needs to be
>>> assigned to something or not.  I believe that the answer to that
>>> question boils down to this: we need a GUID for any resource that will
>>> be referenced by more than one other resource.  Do we need to be able to
>>> assign a GUID to Taxon concepts?  Yes, because it is likely that many
>>> identifications will want to reference a particular taxon concept.  Do
>>> we need to be able to assign a GUID to an Event?  Maybe or maybe not.
>>> If every occurrence has its own separate time recorded, then no GUID is
>>> needed because the time is just a part of every separate occurrence
>>> record.  If the event is defined to be a time range that represents a
>>> collecting trip, then there may be many Occurrences that are associated
>>> with that trip and all of them could reference the GUID for that event
>>> rather than repeating the event information for every Occurrence.  The
>>> point here is that every shape (class of resources) on this diagram at
>>> least has the POTENTIAL to be a node connecting multiple resources and
>>> therefore should have the capability of being assigned a GUID, having
>>> its own RDF record, and being appropriately typed (presumably by a DwC
>>> class).  So this is a final technical argument for why we need to have
>>> the DwC class Individual.  Whether or not people ultimately choose to
>>> assign GUIDs to particular resource types or not is their own choice,
>>> but they need to at least be ABLE to if they need that resource to serve
>>> as a node given the structure of their metadata.
>>>
>>> We need to clarify how the "token" thing fits in, but I'm stopping there
>>> for now.  I would very much appreciate responses indicating that:
>>>
>>> A. you agree with the diagram and connections (and consider this
>>> definition and diagram a consensus)
>>> B. you disagree with the diagram (and articulate why)
>>> C. you provide an alternative diagram or explanation of the
>>> relationships among the classes related to Occurrences.
>>>
>>> Thanks for you patience with another tome.
>>> Steve
>>>
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>> .
>>>
>>>
>>>
>>
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>>
>
>
> --
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
> Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
>
>


-- 
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101018/898f3eea/attachment-0001.html 


More information about the tdwg-content mailing list