[tdwg-content] practical details of recording a determination What is an Occurrence?
Steve Baskauf
steve.baskauf at vanderbilt.edu
Mon Oct 18 21:38:46 CEST 2010
So are we saying that dwc:nameAccordingTo can be a property of an
dwc:Identification? What's dwc:identificationReferences for?
I'm sorry if this is a dumb question but I can plead ignorance on this
topic.
Steve
Peter DeVries wrote:
> Hi Markus,
>
> I feel your pain. :-)
>
> Maybe an example might help clarify this.
>
> I use the key* listed below to id my mosquitoes.
>
> So I should mark up my RDF for the identification with something like:
>
> <dwcterms:nameAccordingTo>Identification And Geographical Distribution
> Of The Mosquitoes: Of North America, North Of Mexico By Richard F.,
> Jr. Darsie et al. 2004</dwcterms:nameAccordingTo>
>
> Rather than use some other term like "dwc:identificationReferences"
>
> Correct?
>
> - Pete
>
> *
> Identification And Geographical Distribution Of The Mosquitoes: Of
> North America, North Of Mexico
> By Richard F., Jr. Darsie, RONALD A. WARD, Chien C. Chang, Taina Litwak
> University Press of Florida, 2004
> ISBN: 0813027845
> Cite: 13463
>
> =================================================================
>
> On Mon, Oct 18, 2010 at 12:19 PM, "Markus Döring (GBIF)"
> <mdoering at gbif.org <mailto:mdoering at gbif.org>> wrote:
>
> I am sorry I dont have the time to follow this extensive thread,
> but I can manage at least the first paragraphs ;)
> A quick comment on tying identification sources to a scientific
> name. As for other taxon concepts this is usually done with the
> sec/sensu reference which should be recorded as dwc:nameAccordingTo:
>
> http://rs.tdwg.org/dwc/terms/index.htm#nameAccordingTo
>
> I am slightly irritated that we seem to have some term duplicates
> for this use case.
> Maybe dwc:identificationReferences is supposed to only list
> additional references?
>
> Markus
>
>
> On Oct 18, 2010, at 18:49, Steve Baskauf wrote:
>
> > I've fallen behind on systematically perusing the list
> responses, but I would like to focus in on a point that seems to
> be a consensus in the responses that have shown up recently. The
> consensus seems to be that documenting determinations (a.k.a.
> instances of dwc:Identification class) that are applied to
> Individuals (or Occurrences if you don't believe in Individuals)
> is the way to go. So in my usual graphical way of thinking about
> this, I would draw a "relationship line" from the determination to
> the Individual (or Occurrence) on one side and from the
> determination to the species concept on the other. I will leave
> up to the taxonomy people the different things would be connected
> to the species concept and how all of their lines would be
> connected. The determination would have any of the properties
> that are terms listed in the dwc:Identification class
> (identifiedBy, dateIdentified, identificationReferences,
> identification Remarks, identificationQualifier, and typeStatus).
> Some properties like dateIdentified and identificationReferences
> would be string literals and others (especially identifiedBy)
> should probably be GUIDs but could be literals if they had to be.
> >
> > That all seems pretty clear. However, when I've started trying
> to do this in real life, I immediately have questions. Take a look at
> > http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf
> which should show up as a web page in your browser.
> >
> > 1. The original label identifies the species as Juncus
> diffusissimus. However, there is no indicator as to who
> originally identified it or when. My assumption is that it was
> the collector (Glen N. Montz) but I don't really know that. Do I
> assume that, or list the original determiner as "unknown"?
> > 2. Do we draw a distinction between the initial identification
> and subsequent annotations? I think the answer should be "no" and
> that's why I refer to both generically as "determinations".
> > 3. There is really no indication given on the annotation labels
> as to many of the things that we would like to know, such as the
> concept they had in mind, any source they used (if any), or the
> reason why they did the annotation. So how does one connect the
> name that they applied to the determination when there is no
> indication of the concept? Is this just something we can't do for
> old annotations and just something that we try to do from this
> point forward?
> > 4. The last question is one that I really want to some opinions
> about. It seems to me that there are a number of reasons why one
> would apply a determination. One would be to correct an actual
> error in identification. One would be to increase the precision
> of a previous determination (e.g. an insect identified to family
> now is identified to species). One would be to assert a
> difference in opinion as to the correct way to group this
> individual with others (i.e. as in a taxonomic revision).
> Finally, a single determiner might apply several determinations
> to one individual and indicate in each determination the concept
> intended (i.e. if you subscribe to Cronquist, you'd call it X; if
> you like Radford's book, you'd call it Y; if you like Weakley's
> treatment, you'd call it Z). Some of these four reasons may be
> functionally equivalent, but how would you use Darwin Core to
> indicate the reason why you applied the determination? Please
> don't say "identificationRemarks"! From a machine-processing
> standpoint, this is something we should know and there should be
> some kind of controlled vocabulary to express it. For instance if
> an identification is "deprecated" because it was in error (perhaps
> by the determiner him/herself), one would like the incorrect
> determination to show up in the historical metadata, but I
> wouldn't want it to be listed in a website index. The same would
> hold true if an annotator was able to pin the taxon down to a
> lower taxonomic level than the original identifier. If someone
> goes to the trouble to connect an Individual/Occurrence to several
> names under alternative concepts, there should be a way the a
> machine would know this so that a software user could select the
> concept they wanted to use and the name under that concept would
> pop up.
> >
> > I don't really see any term under the current DwC that could be
> used to do this last thing. Am I missing something? Do we need
> several terms to explain the reason why we made the determination
> because the reasons fall into different categories?
> >
> > The other comment that I'll throw out (since this is going out
> to the bioblitz list as well as to tdwg-content) is that those of
> you who are building apps to collect metadata in the field really
> need to separate the process of entering (or acquiring) the
> collection metadata from the determination process. In at least
> some apps, the user immediately has to commit to a taxon as they
> enter the data at the time of collection. It seems to me that it
> would be a very common situation (especially in the case of
> "citizen science") that the collector/observer/photographer would
> have no idea what the taxonomic identity was at the time of
> collection. The process of determination (and the recording of
> the various dwc:Identification class terms) is really a separate
> process that should be able to happen at the time of collection OR
> later.
> >
> > Steve
> >
> > Peter DeVries wrote:
> >> Hi Steve,
> >>
> >> I would hypothesize that for the vast majority of identified
> records the process is something like this:
> >>
> >> 1) An individual uses some sort of key to determine what
> species (taxon concept) to assign to a given individual
> >> * They may have created some sort of mental key in which
> once they recognize one individual mosquito they can then pretty
> quickly sort
> >> a number of individuals into collections.
> >>
> >> 2) The actual name they assign to the specimen is usually based
> on what their key says the name is. Often this does not specify
> the authorship.
> >> Most of these human identifiers have not read the original
> species descriptions and for the species they are identifying.
> >> So the specimen is actually tied to a concept that is based
> more on the "key" than the original description.
> >> * An exception, would be where there is a key in the
> original description and that was what what was used.
> >>
> >> 3) So in a sense, the process of modeling this as if the if the
> identifier actually asserted that the concept was the same as that
> described by
> >> the original description or a subsequent revision is "fudging"
> >>
> >> Side effects of this process include:
> >>
> >> 1) A new key for North American Mosquitoes comes out that
> incorporates recent changes in nomenclature. The major change
> being the elevation of
> >> a subgenus to a genus. For most of the species described
> the "key concept" is unchanged.
> >>
> >> Student identifier, Bob, in state X is using the latest key,
> while student identifier, Joe, is state Z is using a slightly
> older edition of the same key.
> >>
> >> Bob identifies the species as Ochlerotatus triseriatus, while
> Joe identifies what should be the same species as Aedes triseriatus.
> >>
> >> These show up in GBIF on two different maps, they show up in
> the EOL as two different pages.
> >>
> >> Various TDWG'ers continue to argue that the original
> description and subsequent revisions were really important in
> determining what these individuals
> >> actually meant when they assigned a name to a specimen, and
> that this is how we should model it in excruciating detail.
> >>
> >> I would argue this should be modeled as best as possible to
> what actually happens.
> >>
> >> For example, how many of the species observed in the recent
> BioBlitz were identified by referring to the original species
> description or subsequent revisions?
> >>
> >> In your diagram, I would suggest that you show that a taxon
> concept may have many names associated with it. Since it is not
> clear what the identifier intended by his or her choice of a name,
> it is often difficult to determine what taxon concept they
> actually meant.
> >>
> >> This is why I advocate a move to a more taxon concept based
> identifier to link these data sets together because this allows
> the intent of the identifier
> >> is more accurately modeled.
> >>
> >> This would be done in the form of:
> >>
> >> "I assert that this specimen (of what I call Aedes
> triseriatus) was observed here. I also assert that it is an
> instance of the this species concept => URI"
> >>
> >> Or I assert that this is an individual of the type
> "Individual of species concept X" = > URI
> >>
> >> All of these are instances of the class "Individual"
> >>
> >> So the resulting DarwinCore record would contain both the name
> and and an optional, but I think needed, asserted species concept.
> >>
> >> The species concept is a subclass of taxon concept, but is
> fundamentally different than the higher clades.
> >>
> >> There are some guidelines as to what an entity needs to be
> considered a species.
> >>
> >> While their are no real guidelines as to what clades should be
> considered genera and what clades should be considered families etc.
> >>
> >> Assigning properties at the level of genera or family is also
> problematic because it assumes that there will be inferencing and
> it will require rechecking
> >> that those properties are still valid if the species within
> that genera change.
> >>
> >> So if there is some property that is common to all the species
> in the genus, make that a property of each of the individual
> species - not a property
> >> of the genus.
> >>
> >> Respectfully,
> >>
> >> - Pete
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Oct 15, 2010 at 10:45 AM, Steve Baskauf
> <steve.baskauf at vanderbilt.edu
> <mailto:steve.baskauf at vanderbilt.edu>> wrote:
> >> As a background to this post, I want to reference a post by Bob
> called "SubclassOrNot". I discovered this page on an early foray
> into the TDWG website labyrinth and it has been very influential
> on my thinking since then. The idea Bob discusses is central to
> what I'm writing below so if you haven't read it you might want to
> do so first. You can probably skip the "OWL Inference" section
> and still get the point which is described in the first two
> sections of his post. The URL for the page is
> http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot .
> >>
> >> To preface what I'm going to say below, I want to put Darwin
> Core Occurrences in the context of what Bob wrote. In my mind,
> one of the hallmarks of the Darwin Core standard and one thing
> that makes it a great improvement over previous versions is that
> the decision was made to use what Bob called the "has a" approach
> rather than the "is a" approach. In particular, the Darwin Core
> standard has a single class called dwc:Occurrence rather than
> subclasses called "Specimen", "Observation", and other possible
> things. The way that we differentiate among different kinds of
> Occurrences is by using the DwC types which are the controlled
> values for the term dwc:basisOfRecord. Thus we say an Occurrence
> "has a" basisOfRecord=PreservedSpecimen rather than saying it "is
> a" PreservedSpecimen. We say an Occurrence "has a"
> basisOfRecord=HumanObservation rather than saying it "is
> a"HumanObservation". This approach has greatly reduced the number
> of different terms in the standard since we don't have to have
> separate "ObservedBy" and "CollectedBy" terms, but rather can just
> have a single "RecordedBy" term that applies to both specimens and
> observations. The same thing applies to many other things, like
> eventDate rather than DateCollected and DateObserved, locality
> rather than collectionLocality and observationLocality, etc. With
> the ratification of Darwin Core, this decision is now a fait
> acompli and not a subject of discussion or something optional for
> users of the standard. It also seems to be clear that as
> necessary new terms can be added to the DwC types which would then
> be valid controlled values for basisOfRecord.
> >>
> >> Since the adoption of the DwC standard, the approach to
> Occurrences has been what I would describe as "I know an
> Occurrence when I see one". I consider this as a pretty sloppy
> practice and as I indicated in my post last night, I think there
> is enough consensus about what an Occurrence is that we can come
> up with a better definition than "an occurrence is the category of
> information pertaining to evidence of an occurrence...". Another
> part of what I would characterize as sloppiness is the lack of a
> clear definition of what exactly basisOfRecord means. When I
> wrote my attempt at summarizing consensus last night, I dodged the
> question about what I called the "token". This "thing" has been
> called various names. In the previous discussion on the list, it
> was sometimes called "the evidence" of the occurrence. In the
> past I have called it "a representation" - however, I now think
> the term "token" is better because "representation" has a
> different technical meaning in the context of content negotiation.
> When we type an Occurrence by saying it has a
> basisOfRecord=PreservedSpecimen, we are saying that this
> Occurrence has as supporting evidence, or as a "token" if you
> prefer, all or part of the dead remains of the organism (i.e. what
> I'm calling "the Individual") that was being documented by the
> Occurrence. When we type an Occurrence by saying it has a
> basisOfRecord=LivingSpecimen, we are saying that this Occurrence
> has as a "token" the entire organism that was being documented (or
> some vegetative part of the live organism that was propagated).
> When we type an Occurrence by saying it has a
> basisOfRecord=HumanObservation, we are saying that the Occurrence
> has no supporting evidence other than the reputation of the
> observer to accurately record the metadata about the Occurrence.
> In other words, we "tag" a instance of a core class (to use Bob's
> words), Occurrence, by telling a metadata consumer what kind of
> token we are using as evidence of the Occurrence.
> >> A fundamental part of creating a clear definition of what an
> Occurrence is, is to define exactly what we are including in the
> concept of Occurrence. One possibility is to (1) say that the two
> boxes at the right side of the diagram at
> http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gif are
> fused and that both the Occurrence metadata and its associated
> token are what we consider to be "the Occurrence". Another
> approach (2) would be to say that the actual Occurrence as an
> entity is only the metadata part and that the token is a separate
> thing. A third approach is to say (3) that everything with the
> blue dotted lines is considered a part of the Occurrence (i.e. the
> metadata, the token, the event, and the locality). I don't think
> in an absolute sense, any one of these approaches is "right". The
> problem is that these approaches are used inconsistently,
> sometimes even by the same person, depending on the basisOfRecord.
> Differences in ways of thinking about this issue is a part of why
> people aren't understanding the way other people are approaching
> the structuring of metadata. I have tried to consistently take
> the approach (1) that the two boxes on the right are fused, i.e.
> that the Occurrence metadata and the token should both be
> considered part of the entity that we call "an Occurrence". I
> think this is why Rich was confused in
> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001666.html
> when I said that it was "wrong" to assert that a scientific name
> is a property of an Occurrence - obviously it is silly to say that
> the token (photons on a film, sound patterns in a digital file)
> has a scientific name. Yet that is exactly what people do
> routinely when the token is a branch cut off a tree and glued to a
> piece of paper. They say that they are "identifying a specimen".
> What I am asking (actually demanding) is that the TDWG community
> get its act together and come to some consistency on this. If we
> are going to take the approach (2), then we need to take specimens
> off their pedestal and treat them like we do any other token that
> we are using as evidence that an Occurrence happened. If we are
> going to do what was suggested for the BioBlitz in
> http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001603.html,
> i.e. to call Occurrences "observations" and then link the tokens
> to them by associatedMedia, ResourceRelationship, or some other
> means (approach 2) then do it consistently for every kind of
> token, including specimens, and don't single out media tokens for
> punishment.
> >> I have in a sense "thrown down the gauntlet" on this issue by
> proposing that DigitalStillImage be added as a DwC type and as a
> controlled value for basisOfRecord
> (http://code.google.com/p/darwincore/issues/detail?id=68). I know
> what some people are going to say in response to this proposal.
> "Why do you need to have 'DigitalStillImage' as a value for
> basisOfRecord when you can just say that the resource's
> dcterms:type=StillImage?" The answer goes back to Bob's point.
> If we are going to go the "has a" path (which we already have in
> DwC for Occurrences) rather than subclassing everything, then we
> need to provide an appropriate value for the "tag" for any type of
> resource that a reasonable number of users will want to use as a
> token. I think it is clear from this and other Bioblitzes, my
> work in Bioimages, the whale tracking project, and many other
> examples, that there are plenty of people who are already using
> DigitalStillImages as tokens and we all need a controlled value to
> use for basisOfRecord.
> >> The other thing that we accomplish when we type an Occurrence
> by its basisOfRecord is to tell a consumer what kind of metadata
> to expect to get about the token in addition to the generic
> metadata that is provided for all Occurrences. Thus for a
> LivingSpecimen we expect to be told what zoo, botanical garden,
> bacterial collection, etc. contains the specimen. For a
> PreservedSpecimen we expect to be told the preparation type, the
> location of the repository, etc. For a DigitalStillImage we
> expect to be told the file type, accessURL, etc. Simply providing
> a value for dcterms:type=StillImage doesn't indicate whether the
> image is a physical one (i.e. on film) or a digital one. It is
> also unreasonable to expect a client to have to be checking two
> different terms (basisOfRecord and dcterms:type) to find out what
> they could learn from one (basisOfRecord). Of course it would be
> advisable to provide a value for dcterms:type as well for clients
> outside the biodiversity community who may not "understand" what
> basisOfRecord means.
> >> I hate to keep bringing my posts back to the RDF issue, but
> thinking about how one would write RDF forces clear thinking about
> how metadata should be structured. If we intend to separate
> tokens as entities from their associated Occurrence metadata, i.e.
> approach (2), then we open up a whole other can of worms. To
> associate the occurrence resources (i.e. the metadata) with the
> "different" resource (i.e. the token), we will have probably have
> to be able to create URIs for the tokens and separate RDF metadata
> blocks which will have to be rdfs:type'd. What are we going to
> use for that rdfs:type - create another Darwin Core class? I
> simply don't think that is a complicated road that we want to
> travel. It would be far easier to just say that every Occurrence
> has a one-to-one relationship with its token (which could be "the
> empty set" for observations). This would not work for people who
> want to hang multiple tokens on a single observation event, but I
> think that itself is a bad idea because it makes it even harder to
> have "flat" occurrence datasets. Just say that every time we
> collect a different token (or make an observation that has no
> token), it is a new Occurrence record. Realistically, a single
> collector can't actually take a picture of a plant at the same
> time he or she collects it for a specimen anyway. Those really
> should be considered two different events because they happen at
> different times.
> >>
> >> OK, enough said. Consider this my defense of my proposal
> "issue 68" to add DigitalStillImage. I would urge the powers that
> be to respond to the issues that I've raised here before having
> any kind of "vote" (or whatever is ultimately going to happen when
> there is an up or down decision about the proposal).
> >>
> >> Steve
> >>
> >> Steve Baskauf wrote:
> >> After the flurry of emails recently, I had an opportunity to
> carefully
> >> read all the way through the threads again, followed by
> enforced "think
> >> time" during my long commute. I was actually pretty cheerful
> after that
> >> because I think that in essence, most of the conversation about
> what
> >> constitutes an Occurrence really boils down to the same thing.
> So I
> >> have sat down and tried to summarize what seems to me to be a
> consensus
> >> about Occurrences. To follow my points, please refer to the
> diagram at:
> >> http://bioimages.vanderbilt.edu/pages/occurrence-diagram.gif
> >>
> >> Consensus on relationships
> >> 1. The fundamental definition of an Occurrence involves
> evidence that a
> >> representative of a taxon occurred at a place and time.
> >> Note 1.A: For clarity, I have modified John's statement in his last
> >> email by replacing "taxon" with "representative of a taxon". I'm
> >> considering a taxon to be an abstract concept that is applied to
> >> individuals or groups of organisms.
> >> Note 1.B. This definition is far more useful than the official
> >> definition of the class Occurrence "The category of information
> >> pertaining to evidence of an occurrence..." which is
> essentially circular.
> >> Note 1.C: This statement is extremely broad because the
> evidence could
> >> be of many sorts, the representative could range from a single
> >> individual to all organisms on the earth, the taxon could be
> anyone's
> >> definition at any taxonomic level, the place could range from a GPS
> >> point with uncertainty of less than 10 meters to the entire planet
> >> earth, and the time could range from a shutter click of less
> than one
> >> second to 3.4 billion years.
> >> 2. The diagram is an attempt to summarize in pictorial form
> statements
> >> and relationships that have been described in the thread. The
> taxon
> >> representative is recorded as existing at a particular time and
> place
> >> (the arrow) and the result is an Occurrence record. That
> Occurrence
> >> record exists as metadata which may be associated with a token
> that can
> >> be used to voucher the fact that the taxon representative
> existed. That
> >> token may be the organism itself (or a living part of it as in
> a twig
> >> for grafting), all or part of the organism in preserved form, an
> >> electronic representation such as an image or sound recording,
> and other
> >> kinds of things like tissue or DNA samples. There may also be
> no token
> >> at all, in which case we call the Occurrence record an observation.
> >> Based on direct observation of the taxon representative,
> examination of
> >> one or more tokens, or both, some determiner asserts that a taxon
> >> concept applies to the taxon representative and as a result a
> scientific
> >> name can be used to "identify" the taxon representative.
> (There may be
> >> a lot of other complicated stuff above the Identification box,
> but that
> >> will have to be filled in by the taxonomists.)
> >> Note 2.A: I have mapped onto this diagram the letters that John
> used in
> >> his last email to refer to entities that are involved in an
> Occurrence
> >> (T, E, L, O, and G). I will beg the forgiveness of fossil people
> >> because I don't really know how the geological context fits in.
> I'm
> >> assuming that it is a way of asserting time and location on a much
> >> broader scale than we do for extant organisms.
> >> Note 2.B: I have put a dotted line around the part of the
> diagram that I
> >> think includes all the things that people might consider part
> of the
> >> Occurrence itself. I have left out "T" and the other parts
> related to
> >> identification because it seems to me that you can have an
> occurrence
> >> that you document which does not yet (and perhaps never will)
> have an
> >> identification. The Occurrence still asserts that a taxon
> >> representative existed at a time and place; we just don't yet
> know what
> >> the taxon is.
> >> 3. The red lines indicate the relationships that connect the
> various
> >> entities (I'm going to go ahead and call them resources).
> Consistent
> >> with popular opinion, the Occurrence record is the center of the
> >> universe and most things are connected to it.
> >> Note 3.A: I am sticking to my guns and refuse to connect the
> >> Identification directly to the Occurrence. It is the taxon
> >> representative that is being identified, not the occurrence.
> One can
> >> assert another sort of relationship between the identification
> and the
> >> occurrence if one wants to say that one consulted the occurrence
> >> metadata and token in order to decide about the identification,
> but it
> >> is not correct to say that the Identification identifies either the
> >> Occurrence metadata or the token (as Rich pointed out).
> >>
> >> OK, so that's step one - defining what is related to what. If
> anyone
> >> disagrees with these relationships, please clarify or create
> your own
> >> diagram.
> >>
> >> Complicating circumstances/caveats
> >> 1. It is noted and recognized that some users will not care to
> include
> >> all of these relationships in their models. In the interest of
> >> simplification or "flattening" the relationships, they may wish to
> >> collapse some parts of this diagram (e.g. incorporate time and
> location
> >> metadata within the Occurrence metadata rather than considering
> them
> >> separate resources, applying scientific names directly to the taxon
> >> representatives without defining a taxon concept or recording the
> >> determination metadata, connecting identifications directly to the
> >> occurrence, etc.). This doesn't mean that the relationships don't
> >> exist, it just means that some users don't care about them.
> >> 2. It is recognized that different users will be interested in
> or able
> >> to specify the various resources to differing degrees of precision.
> >> Examples: A photographer might record times to the nearest
> second, a
> >> collector may only be interested in noting the date on which a
> specimen
> >> was collected. A location may be specified to the precision of
> a GPS
> >> reading or be defined as some geographic or political
> subdivision. The
> >> taxon representative may be an individual organism, a flock or
> clump, or
> >> some larger aggregation of taxon representatives.
> >>
> >> That's step two. If I've missed any complications, please
> point them out.
> >>
> >> My opinions about the implications of this diagram
> >> 1. The circle I've labeled as "taxon representative" is the
> resource
> >> type that I'm proposing to be represented by the class
> Individual. You
> >> will note that in both the definition of dwc:individualID ("An
> >> identifier for an individual or named group of individual
> organisms...")
> >> and the proposed class definition ("The category of information
> >> pertaining to an individual or named group of individual organisms
> >> represented in an Occurrence"), groups of individual organisms are
> >> included. Thus John's example of a fossil having myriad
> individuals, or
> >> Richard's examples of thousands of plankton, a large school of
> fish,
> >> herd of wildebeest, flock of
> >> birds, could all be categorized as "Individual" under this
> definition if
> >> there is a reasonable expectation that all of the individuals
> in the
> >> group are members of the same taxon. Perhaps there is a better
> name for
> >> this resource, but since dwc:individualID was already extant, I
> chose
> >> Individual as the class name for consistency with the pattern
> >> established with other classes and their associated xxxxID terms.
> >> 2. Although in note 1.C. I have given the ranges of the various
> >> resources to their logical extreme (as was done previously in the
> >> thread), I think that as a practical matter we can adopt
> guidelines to
> >> set reasonable values for the "normal" ranges of the resources.
> One
> >> such guideline might be that we suggest a range that can
> accommodate
> >> about 95% of the user needs within the community (this came
> from Rich's
> >> comment about satisfying 95% of the user need with an
> establishmentMeans
> >> controlled vocuabulary). For example, it was suggested that
> the range
> >> for the location of an Occurrence could span the entire planet
> Earth.
> >> True enough, but virtually nobody would find such a span
> useful. 95% of
> >> users would probably find a range between a GPS reading with 10
> meter
> >> precision and the extent of a county or province useful for
> recording
> >> the location of an Occurrence. I can suggest similar "useful"
> ranges:
> >> one second to one day for an event time (excluding fossils), one
> >> individual organism to the number of organisms that would fit
> within a
> >> 50 meter radius for an "individual", and taxon identified to
> family for
> >> plants and maybe mammals, genus for birds, and order for
> insects. So
> >> framing the definition of an Occurrence in these terms it would be
> >> something like: "An occurrence involves evidence (consisting of a
> >> physical token, electronic record, or personal observation) that a
> >> representative (ranging from a single individual to the number that
> >> would fit on a football field) of a taxon (hopefully identified
> to some
> >> lower taxonomic level) occurred at a place (determined to a
> precision
> >> between that of a GPS reading and the size of a
> county/province) and
> >> time (spanning one second to one day)." A few people might
> object to
> >> this level of restrictiveness, but I would guess that it would
> make 95%
> >> of us happy.
> >> 3. With the exception of the "missing" class Individual, every
> resource
> >> type on this diagram except for the "token" and Scientific name
> has a
> >> Darwin Core class. Every resource type on the diagram except
> for "token"
> >> has a dwc:xxxxID term that can be used to refer to a GUID for the
> >> resource. The implication of this is that any resource on this
> diagram
> >> except for the token and taxon representative (i.e. Individual)
> is ready
> >> to be represented in RDF by Darwin Core terms in the sense that the
> >> relationships (red lines) can be represented by the xxxxID
> terms and
> >> that the resources can be rdfs:type'd using Darwin Core classes.
> >> (Lacking a class for the scientific name doesn't seem like a
> big deal to
> >> me since the scientific name can be a string literal - but then
> I'm not
> >> a taxonomist.)
> >> 4. OK, I've avoided it as long as I can, so I'm going to
> confess now to
> >> the RDF-phobes. The red lines and shapes are something pretty
> close to
> >> an RDF graph. What that means is that if the community can
> agree that
> >> this diagram correctly represents the relationships among the
> kinds of
> >> biodiversity resources that we care about, then the matter of
> providing
> >> guidelines on how to represent Darwin Core in RDF suddenly gets
> a lot
> >> simpler. Just convert the "picture" of the RDF graph into XML
> format
> >> and we have a template. Alright, that's an oversimplification,
> but I
> >> think it is essentially true because the most difficult part of
> >> achieving a consensus on RDF representations is to decide how
> we connect
> >> the resource types, not on the literals that we hang onto
> resources as
> >> properties.
> >> 5. While I'm beating the RDF drum again, the importance of my
> opinion
> >> number 2 can be extended into the GUID adoption process. In my
> comments
> >> to Kevin about the Beginner's Guide to Persistent Identifiers,
> I think I
> >> commented on the question of how one decides whether a GUID
> needs to be
> >> assigned to something or not. I believe that the answer to that
> >> question boils down to this: we need a GUID for any resource
> that will
> >> be referenced by more than one other resource. Do we need to
> be able to
> >> assign a GUID to Taxon concepts? Yes, because it is likely
> that many
> >> identifications will want to reference a particular taxon
> concept. Do
> >> we need to be able to assign a GUID to an Event? Maybe or
> maybe not.
> >> If every occurrence has its own separate time recorded, then no
> GUID is
> >> needed because the time is just a part of every separate occurrence
> >> record. If the event is defined to be a time range that
> represents a
> >> collecting trip, then there may be many Occurrences that are
> associated
> >> with that trip and all of them could reference the GUID for
> that event
> >> rather than repeating the event information for every
> Occurrence. The
> >> point here is that every shape (class of resources) on this
> diagram at
> >> least has the POTENTIAL to be a node connecting multiple
> resources and
> >> therefore should have the capability of being assigned a GUID,
> having
> >> its own RDF record, and being appropriately typed (presumably
> by a DwC
> >> class). So this is a final technical argument for why we need
> to have
> >> the DwC class Individual. Whether or not people ultimately
> choose to
> >> assign GUIDs to particular resource types or not is their own
> choice,
> >> but they need to at least be ABLE to if they need that resource
> to serve
> >> as a node given the structure of their metadata.
> >>
> >> We need to clarify how the "token" thing fits in, but I'm
> stopping there
> >> for now. I would very much appreciate responses indicating that:
> >>
> >> A. you agree with the diagram and connections (and consider this
> >> definition and diagram a consensus)
> >> B. you disagree with the diagram (and articulate why)
> >> C. you provide an alternative diagram or explanation of the
> >> relationships among the classes related to Occurrences.
> >>
> >> Thanks for you patience with another tome.
> >> Steve
> >>
> >> --
> >> Steven J. Baskauf, Ph.D., Senior Lecturer
> >> Vanderbilt University Dept. of Biological Sciences
> >>
> >> postal mail address:
> >> VU Station B 351634
> >> Nashville, TN 37235-1634, U.S.A.
> >>
> >> delivery address:
> >> 2125 Stevenson Center
> >> 1161 21st Ave., S.
> >> Nashville, TN 37235
> >>
> >> office: 2128 Stevenson Center
> >> phone: (615) 343-4582, fax: (615) 343-6707
> >> http://bioimages.vanderbilt.edu
> >>
> >> _______________________________________________
> >> tdwg-content mailing list
> >> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >> .
> >>
> >>
> >>
> >> --
> >> Steven J. Baskauf, Ph.D., Senior Lecturer
> >> Vanderbilt University Dept. of Biological Sciences
> >>
> >> postal mail address:
> >> VU Station B 351634
> >> Nashville, TN 37235-1634, U.S.A.
> >>
> >> delivery address:
> >> 2125 Stevenson Center
> >> 1161 21st Ave., S.
> >> Nashville, TN 37235
> >>
> >> office: 2128 Stevenson Center
> >> phone: (615) 343-4582, fax: (615) 343-6707
> >> http://bioimages.vanderbilt.edu
> >>
> >>
> >>
> >>
> >> --
> >> ----------------------------------------------------------------
> >> Pete DeVries
> >> Department of Entomology
> >> University of Wisconsin - Madison
> >> 445 Russell Laboratories
> >> 1630 Linden Drive
> >> Madison, WI 53706
> >> TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
> >> About the GeoSpecies Knowledge Base
> >> ------------------------------------------------------------
> >
> > --
> > Steven J. Baskauf, Ph.D., Senior Lecturer
> > Vanderbilt University Dept. of Biological Sciences
> >
> > postal mail address:
> > VU Station B 351634
> > Nashville, TN 37235-1634, U.S.A.
> >
> > delivery address:
> > 2125 Stevenson Center
> > 1161 21st Ave., S.
> > Nashville, TN 37235
> >
> > office: 2128 Stevenson Center
> > phone: (615) 343-4582, fax: (615) 343-6707
> >
> > http://bioimages.vanderbilt.edu
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> --
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> /
> GeoSpecies Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101018/9de43dc9/attachment-0001.html
More information about the tdwg-content
mailing list