[tdwg-content] "Wrong" RDF, was Re: What I learned at the TechnoBioBlitz

Thu Oct 14 11:13:07 CEST 2010

I think the problem is much simpler and less sinsiter than this.  I think
the problem is that we have many people who inderstand biodiversity data
very well, but go glassy-eyed on technical discussions about RDF (e.g., me).
And there are also people who understand RDF (and other related protocols
and technologies), who go glassy-eyed on discussions about subtle
distinctions between taxon names and taxon concepts (and other such
details).

There are only a very few people who seem to have a foot firmly in both
camps; and half the time those people will go over the heads of both other
groups simultaneously (and hence not be understood by either).

And then you have extreme examples of people who understand taxon names and
concepts very well (like me), but are put in the awkward position of trying
to develop core web services (like ZooBank).

Like I said in an earlier post, a little knowledge is a dangerous thing.
Sometimes a VERY dangerous thing.

Aloha,
Rich

  _____  

From: tdwg-content-bounces at lists.tdwg.org
[mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Peter DeVries
Sent: Wednesday, October 13, 2010 10:54 PM
To: Steve Baskauf
Cc: tdwg-content at lists.tdwg.org; Roger Hyam; tdwg-bioblitz at googlegroups.com;
Blum, Stan; Jerry Cooper
Subject: Re: [tdwg-content] "Wrong" RDF, was Re: What I learned at the
TechnoBioBlitz

Hi Steve, 

It is not if there are not examples of things that seem to work in this
space, it is that those alternatives that could be incorporated into the
DarwinCore are largely ignored unless they they come from the "right
people". 

How these "right people" are defined is not quite clear to me but what seems
strange is that many of the issues that are being rehashed over and over
again I have already live working examples of. 

If there are any flaws. or features lacking. is because I have spent far too
much time, as you have, trying to get people on the list to some of the
problems with the system they have proposed.

I believe that you are right and that we need to represent individuals in
the RDF version, but I have implemented them in a slightly different way.

<http://lod.taxonconcept.org/ses/iuCXz#Species>
txn:speciesConceptHasSpeciesIndividualTag
<http://lod.taxonconcept.org/ses/iuCXz#Individual>

That this creates is a usable type for an "individual of that species
concept." It is of type txn:SpeciesIndividualTag

Now you can easily query for all the "individuals" or all the "individuals
that are of a particular species concept".

We could be using the species concept URI's that I have setup but instead we
see a perfect example of the folly of LSID's.

Those listed above do not resolve to anything that tells me what they mean. 

(I tried, not because I wanted show the folly of LSID's but to see if the
LSID was for something that I had a URI for.

A URI that could be used in the interim.)

What I got was an LSID that despite being resolved through a proxy, returned
nothing.

I can always add the zoobank LSID to the metadata to the description for
that concept which would allow some tracking and use of LSID's.

Frankly, I don't know what is really going on, but there is something very
strange about how this entire process is operating.

Some of these issue are better handled by people who already understand RDF,
but instead are being rehashed here.

Why do I keep seeing this "pull" to create a specific subsection of the
semantic web or even informatics rather that benefit from all the related
work happening a few email lists away?

One thing that you might be missing, is that in RDF something can have many
types so you can have something that is both a depiction of a speciesconcept
and a depiction of an individual.

Just like you had can have a depiction of a "Firehouse" that is also a
depiction of the "West Washington Firehouse" 

I have thought that there may a need to be some additional "tag" like
identifiers like Image or Media, which has Image as a subclass.

Also since the dwc is not really "live" and and working like it should, we
can't really test it in the ways it need to be tested.

In summary, I feel your pain.

Respectfully,

- Pete

On Wed, Oct 13, 2010 at 9:07 PM, Steve Baskauf
<steve.baskauf at vanderbilt.edu> wrote:

I was just ready to leave work when I wrote this and since then I'm feeling
like I should clarify just what I mean by "wrong" ways of using RDF.  I
recognize that TDWG encourages flexibility in the ways that standards such
as DwC are used.  As such, it doesn't usually define "right" and "wrong"
ways of using the standards.  What I mean by calling some uses "wrong" is
not intended to discourage the creative use of DwC terms in RDF.  What I
mean is that one must be careful to make sure that RDF statements mean what
is intended.  Here is an example.  The Dublin Core term dcterms:language
means "the language of the resource".  On multiple occasions, I've seen this
term used in RDF as a property of a resource whose metadata is written in a
certain language.  This is "wrong" because the subject of the statement is
the resource itself, not the resource's metadata.  The need for this kind of
clarity is apparent in the case of media.  For example, if we are providing
metadata in English that describes a nature film which has audio in German,
the correct statement is that [film] dcterms:language "de", NOT [film]
dcterms:language "en".  This problem is handled appropriately in the MRTG
schema by creating the (required) term mrtg:metadataLanguage.   The correct
statement would be [film] mrtg:metadataLanguage "en" .  (I'm using "[film]"
in lieu of a URI identifier for the film.)  If, however, we were writing RDF
to describe the metadata itself rather than the film, then it would be
appropriate to say [film's metadata] dcterms:language "en" .  In straight
XML, we might get away with semantic sloppiness if the senders and receivers
of the XML "understand" what the intended subject is of the term
dcterms:language.  But in RDF, we have to assume that the receiver of the
RDF is a "stupid" computer which only infers exactly what is said and not
what we MEANT to say.  

I believe that this is a very important point that all parties need to keep
in mind before we happily march off creating RDF templates for the general
public to use.  In particular, I have some serious problems with the way
that people are associating properties with instances of the dwc:Occurrence
class.  I believe that these "wrong" ways originate with the historical
roots of Darwin Core as a means to describe specimens.  I will illustrate
what I mean.  In many cases, a specimen is created by killing an organism
and gluing it to a piece of paper (if it's a plant) or putting it in a jar
(if it's an animal).  It is natural to ask the question "what kind of
species is the specimen?".  We can look at the specimen and make a statement
like [specimen] dwc:scientificName "Drosophila melanogaster" and it pretty
much makes sense.  However, in the new Darwin Core standard, we have a
broader category of "things" (a.k.a. resources) that we call Occurrences
which include specimens but which also includes observations and probably
all kinds of things like images, DNA samples, and a whole lot of other
things.  If we try to apply the same kind of statement to other kinds of
Occurrences besides specimens we immediately run into problems.  If we say
that [digital image] dwc:scientificName "Drosophila melanogaster" we are
making a nonsensical statement.  The digital image can have properties like
its photographer, its format, its pixel dimensions, etc. but the image
itself does not have a scientific name.  The scientific name is a property
of the thing that was photographed.  It makes even less sense if we are
talking about observations.  An observation is a situation where somebody
observes an organism.  The observation can have properties like the
observer, the location, etc.  However, if we say [observation]
dwc:scientificName "Drosophila melanogaster" we are saying that that act of
observing has a scientific name.  That is an incorrect statement.  So the
general statement [Occurrence] dwc:scientificName "Drosophila melanogaster"
does not make sense when applied to all possible types of Occurrences.
Rather, the organism that we are observing is the thing that has a
scientific name.  

In all of the examples above, the correct statement is [individual organism]
dwc:scientificName "Drosophila melanogaster".  The specimen is an occurrence
of the individual organism.  The image is an occurrence of the individual
organism.  The observation is an occurrence of the individual organism.
These statements may seem odd because we are used to thinking of an
Occurrence being an occurrence of the "species" but it's not really.  The
image is not an image of the Drosophila species concept nor is it an image
of the string "Drosophila melanogaster".  The image is an image of an
individual fruit fly.  The individual fruit fly is a representative of the
taxon, the image and the observation are not.  

This point becomes more clear if we look at a situation where several types
of occurrence records are collected from the same individual.  Let's say
that we capture a bird, photograph it, collect a feather from it, collect a
DNA sample and band it and let it go.  Later somebody sees the band and
reports that as an observation.  How do we connect all of these things?  Do
we create an identifier for the specimen (the feather) and then say that the
image and the DNA sample came from it?  That would be wrong.  We could take
an image of the feather, but that would be a different thing from an image
of the bird.  We didn't get the DNA sample from the feather, we got it via a
blood sample from the bird.  The band observation is not an observation of
the feather, or the image or the DNA sample.  It's an observation of the
bird which was never any kind of specimen living or dead.  The bird is an
individual organism and that's what we need to call it.  Right now we don't
have anything in Darwin Core that can be used to rdfs:type the bird, which
is why I proposed Individual as a Darwin Core class.  

I could say these things more clearly in RDF, but since because many members
of the audience of this message aren't familiar with RDF/XML they would
probably zone out and the point would be lost.  The point is that we need to
have identifiable classes of "resources" (the technical name for "things"
like physical artifacts, concepts, and electronic representations) for all
of the things that that we need to describe and inter-relate in the Darwin
Core world.  Right now, we are missing one of the important pieces that we
need, which is a class for the Individual.  If we are satisfied with
creating an RDF model that only works for specimens and one-time
observations, then we probably don't need Individual as a Darwin Core class.
On the other hand, if TDWG and GBIF are really serious about creating a
system (Darwin Core and RDF based on it) that can handle other types of
Occurrences like multiple images of live organisms, observations of the same
organism over time, and multiple types of Occurrences collected from the
same organism, then this capability should be built into the system from the
start.  When I got back from the TDWG meeting, I was all excited about
trying to use Darwin Core Archives with my live plant image collection.
However, it quickly became evident that it could not work because
Occurrences were at the center of the diagram rather than Individuals.  So
unless something changes, we are already embarking on the process of locking
out these other Occurrence types.

I hate to sound like a broken record (do we have those any more?), but read
my paper on this subject.  It explains the rationale better than this email,
has nice diagrams, and gives RDF examples to illustrate everything
(https://journals.ku.edu/index.php/jbi/article/view/3664).  If somebody has
a better idea of how to develop an internally consistent system that can
handle the problems I've raised here that DOESN'T involve Individuals (i.e.
other "right"[=semantically accurate] ways to express properties and
relationships among Identifications, Taxa, diverse types of Occurrences,
etc.) I'd like to hear what it is.  Or perhaps as Stan has suggested, there
needs to be a task group that can hash out alternative views.  But let's
have the discussion before we post models and suggest people use them.

Steve

Steve Baskauf wrote: 

Stan,
Thanks for the clarification.  My concern here is that standard or not, if
examples are posted on the Google Darwin Code site, they will have an
implied "stamp of approval" and will be used by others as a template
(despite that site being labeled as "for discussion and development" not
everyone can post to it and that implies some authority).  In the case of
straight XML, that isn't really that big of an issue.  XML can mean whatever
one wants as long as there is an agreement between the sender and the
receiver (perhaps in the form of a formal XML schema) as to what the
elements represent.  I believe that RDF is a different beast.  When one
exposes RDF, the receiver is unknown.  Therefore, the RDF has to actually
"mean" something to the receiver without a pre-arranged agreement.  In a
generic XML document, the elements can simply be a list of string values of
terms with no implied "meaning" except what might be inferred by grouping
them in a container element.  In RDF, the elements represent properties of
particular resources.  I believe strongly that although there may be several
"right" ways to express properties of members of DwC classes, there are many
more "wrong" ways that should not be used.  By "right" I mean that they make
sense semantically in that the properties logically are ones that should
actually belong to the described resource.  I do not believe that the
discussion of these issues has progressed to the point where there is a
consensus on the "right" use of DwC terms for some types of resources and
therefore I am opposed to the posting of RDF examples on any official Darwin
Core sites without a lot more discussion UNLESS the examples are clearly
labeled as examples intended for discussion and not for use as templates.
If you want such examples, I can provide
http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf as an example for an
Individual
http://bioimages.vanderbilt.edu/baskauf/79695.rdf as am example of an
Occurrence that is a live plant image and
http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0138.rdf as an example
of an Occurrence that is an herbarium specimen
I would be happy to discuss the reasons why I structured the RDF as I did
(although mostly those examples are already rationalized in
https://journals.ku.edu/index.php/jbi/article/view/3664), but I would not go
so far as to say that they are "right" without some discussion.

What I intended when I suggested that I might write some kind of guide for
Darwin Core represented in RDF/XML was really a document that explained to
beginners what the point was of RDF, the basics of how one can structure
properties in RDF using examples that are Darwin Core terms, and options for
creating URIs that refer to resources that are described in separate files
or within the same file.  I wasn't really suggesting that it be a full-blown
recommendation with specific guidelines for the use of particular terms or
structuring of files for particular classes of resources, although that
would be a good thing ultimately.  I guess I was seeing some kind of a
beginner's guide as a way to involve more people (who aren't up on RDF) in
the discussion.  I don't think that it should be necessary to complete full
"standards" process before such a document were made available.  It would
probably be better to have some kind of road map where that document would
be the first segment but would later be followed by guidelines for specific
classes of resources with examples.  I think that such a modular approach
would be the most beneficial because pieces of it could actually get done in
a timely fashion rather than requiring the whole thing to be complete before
any of it would be accepted.  

I do think a task group for Darwin Core RDF would be a good idea.  If nobody
is in a huge hurry, I don't mind trying to charter such a group, although
I'd be just as happy if somebody else wanted to do it and I would just try
be an active participant.  I will look at the links you suggested, thanks.

Steve

Blum, Stan wrote: 

Steve,

The TDWG process for creating standards is here:
http://www.tdwg.org/about-tdwg/process/   This is worth reading if you
haven’t done so  already.   

Another document worth reading is the standards format specifiation
http://www.tdwg.org/standards/147/    I never pushed this “standard” through
public review, but it still functions a guideline for formatting and our
view of what is isn’t within scope of a “standard”.  In other words, we are
doing our best to follow the basic ideas laid out there about the kinds of
specifications:

Type 1 -- normative specification, versioned; 
Type 2 -- versioned, supplementary documentation;
Type 3 — uncontrolled supplementary documentation.    

The page of examples John and others have put up on the DarwinCore site is
non-normative, uncontrolled documentation.   

The thing you were proposing sounded like an applicability statement —
offering guidance about how another standard, RDF, should be used in
biodiversity informatics.  These can also be treated as standards, and get
TDWG ratification as standard, but don’t create a de-novo standard.

Interest groups and task groups are explained in the Process.  If you want
to create an applicability statement for RDF and DarwinCore, you could
prepare a task group charter and submit it to the executive for approval.
Approval would make it a formal Task Group.  See other task group charters
for examples.

-Stan

On 10/13/10 6:33 AM, "Steve Baskauf" <steve.baskauf at vanderbilt.edu> wrote:

OK, because of a momentary heavy work load I'm still in the process of
getting caught up on this thread, but this is moving so fast I feel like I'm
being left in the dust.  Last week I offered to help facilitate creating
some guidelines and examples for RDF/XML in Darwin Core.  I was told that we
should follow the community process of forming an interest group, getting
participants, etc. and have been waiting for some guidelines on how that
process is supposed to work. Now we are surging ahead with examples and help
pages again.  Are we following a process or not and if so, what is it?
Steve

Tim Robertson (GBIF) wrote: 

I will also help with examples.  If we are doing XML / RDF formats, lets get
an example record conforming to the Text guidelines in there as well for
completeness (most useful when dealing with checklists). 

On Oct 12, 2010, at 10:31 PM, John Wieczorek wrote:

I am interested in helping with an examples page. The page could have XML
and RDF examples illustrating particular use cases, as you have recommended.
Create an "Examples" page on the Table of Contents and then have all of the
examples on one page with an index of links to specific examples at the top?
I made a straw man page to show what I am thinking at
http://code.google.com/p/darwincore/wiki/Examples.

On Tue, Oct 12, 2010 at 11:41 AM, "Markus Döring (GBIF)" <mdoering at gbif.org>
wrote:

Would we have the energy to compile example dwc records on how to use darwin
core for certain use cases?
The lack of guidance on how to use darwin core was mentioned earlier. An
additional example webpage for the dwc website would surely be really
helpful for not only newbies. A dwc record for bird watching, vegetation
plot surveys, insect specimen collection, herbarium sheets, zoological
garden visits, tissue sample, dna sequence, marine fishing net catches, etc

Id volunteer to do the html page if Im given example records with a short
use case description...

Markus

On Oct 12, 2010, at 13:14, Roger Hyam wrote:

> Wow - what a thread to come back to.
>
> I saw my name mentioned so I ought to chip in. I also think we are
conflating two distinct things under the name "occurrence".
>
> This point is largely just expanding on what Kevin just said. Going down
the road he was wise enough not to go down!
>
> The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa
in regions but the general thrust of my talk was intended to pose the
questions: Why should we score taxa to regions at all? Shouldn't this always
be the results of a query on occurrence records? The answer will always
depend on the question asked.
>
> Take two examples.
>
> A tiger roaming "free" in London living off a diet of squirrels and
tourists. Occurrence records for this organism are just occurrence records.
Why the tiger is in London (climate change, introduction, invasion, escape)
is not a quality of it being there. They are value judgements added later.
>
> A tiger sitting in a cage a London Zoo is "managed" in that it is being
maintained there by a human effort. We are recording the fact that someone
has placed it there and held it in that position for our edification.
>
> As Kevin says, when I observe an individual (or flock of individuals) I do
not observe their "introducedness" or their "nativeness" this is something
that is derived from combining multiple observations of occurrence of
individuals.
>
> I would therefore advocate that we just have a flag on an occurrence
record that says "intended for distribution" i.e. this is not maintained
here in a garden/zoo/farm etc. To say any more on a occurrence record is
misleading and there are occasions when even this flag will be ignored in
analysis. I think we already have this field.
>
> There are of course grey areas (biology always has grey areas). A Scots
Pine growing in the highlands may be part of a 150 year old naturalistic
plantation. It is therefore native to the region, possibly of local genetic
stock but has been planted in that position. For some applications this
could be considered managed and for others not.
>
> The status of taxa in regions is a completely different thing. As soon as
we talk about aggregating multiple observations (or lack of them) then we
are talking about the results of analysis instead of primary observations.
Only at this point should we be talking about the status of the "occurrence"
in terms of native/invasive/naturalised etc. This may not even be based on
extant records. For example, a taxon can be invasive in an area without
actually occurring there. i.e. it used to be there but is presumed to be
irradiated.
>
> Does the problem occur because we are using the same term "occurrence" to
mean both a primary unit of data gathering and the result of an analysis
(possibly even just a hypothesis if it is the result of niche modelling)?
How could we differentiate between these two? The discussion probably comes
back to 'basisOfRecord' again and our fundamental classes of object.
>
> Sorry to be long winded.
>
> Roger
>
>
> On 12 Oct 2010, at 09:36, Kevin Richards wrote:
>
>> I also have always felt that "nativeness" should apply more to an
occurrence than a taxon, but have swayed from one opinion to the other on a
regular basis.  So my conclusion is that "nativeness" is a propety of both,
and require both, in a way - and that these different perspectives are
actually the same thing.
>>
>> Eg, if we describe (in a basic way) :
>> Ocurrence = Taxon at Location
>>
>> then if we say that Nativeness is a property of a Taxon that is
restricted by Location  (jerry's view)
>> then this is equivalent to saying that Nativeness is a property of an
Ocurrence ! (Rich's view)
>>
>> As Rich points out, it doesnt make a whole lot of sense to apply
Nativeness to a single occurrence, but I'm not sure this is what is meant by
stating that "this specimen of Poa anceps that I collected from Christchurch
is 'Native'" - but more that "I have found a specimen of Poa anceps in
Christchurch and from knowledge of other previously recorded ocurrences, I
know that this occurence/taxon is Native in this area"
>>
>> Also I tend to feel that a lot of biodiversity properties are properties
of ocurrences  - EVEN taxon names are a property of an occurrence and not of
this 'concept' of a species - but I wont go down that road right now   :-)
>>
>> Also, we discussed this topic a while ago on the tdwg content list,
having worked out that "nativeness" or what we call "biostatus" is a fairly
complicated topic, involving taxon names, locations, time, and aspects like
'origin' and 'presence', ...
>>
>> Kevin
>>
>> ________________________________________
>> From: tdwg-content-bounces at lists.tdwg.org
[tdwg-content-bounces at lists.tdwg.org] On Behalf Of Richard Pyle
[deepreef at bishopmuseum.org]
>> Sent: Tuesday, 12 October 2010 5:41 p.m.
>> To: Jerry Cooper; tdwg-content at lists.tdwg.org;
tdwg-bioblitz at googlegroups.com
>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
>>
>> Hi Jerry,
>>
>> Before we agree to disagree, let me try to elaborate a bit more:
>>
>> I think we both agree that "Nativeness" (to borrow Dave's term) is a
>> property of a taxon at a geographic locality (it could also be a property
of
>> a taxon in a class of habitat, but few people actually frame it this
way).
>>
>> The reason I think that "Nativeness" is best represented as a property of
an
>> Occurrence, rather than of a taxon, is that a taxon is a circumscribed
set
>> of organisms, usually based on evolutionary relatedness or morphological
or
>> genetic similarity.  By contrast, an Occurrence is about the presence of
a
>> member or multiple members of a taxon concept in space and time (i.e., at
a
>> particular place and time).
>>
>> We often think of Occurrence records in terms of individual organisms
(e.g.,
>> specimens, or specific observed or photographed organisms), and I agree,
>> it's weird to think of "Nativeness" as it applies to an individual
organism.
>> However, my understanding is that Occurrence instances can also apply to
>> populations -- which is what terms such as establishmentMeans and
>> occurrenceStatus fit into this class.
>>
>> More generally, if we agree that "Nativeness" is a property of a taxon at
a
>> particular locality, the way that this intersection is usually manifest
in
>> DwC is via Occurrence and Event instances.
>>
>> How else would you represent "Nativeness" within DwC?
>>
>> Aloha,
>> Rich
>>
>>> -----Original Message-----
>>> From: tdwg-content-bounces at lists.tdwg.org
>>> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Jerry Cooper
>>> Sent: Monday, October 11, 2010 6:02 PM
>>> To: tdwg-content at lists.tdwg.org; tdwg-bioblitz at googlegroups.com
>>> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
>>>
>>> We will have to agree to disagree.
>>>
>>> For me at least 'Native',  'Invasive' etc are clearly not
>>> properties associated with a collection event. They are
>>> collective statements, not necessarily about properties of
>>> the taxon as a whole, but about the properties of a taxon in
>>> some restricted sense - usually geographically restricted.
>>>
>>> GISIN, like our model here in  NZ, pulls together such items
>>> under a triplet of taxon/occurrence statement/geographical
>>> extent linked to a publication.
>>>
>>>
>>> Jerry
>>>
>>>
>>> -----Original Message-----
>>> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>> Sent: Tuesday, 12 October 2010 4:23 p.m.
>>> To: Jerry Cooper
>>> Cc: tdwg-content at lists.tdwg.org; tdwg-bioblitz at googlegroups.com
>>> Subject: RE: [tdwg-content] What I learned at the TechnoBioBlitz
>>>
>>> Hi Jerry,
>>>
>>> Yes, this is a road I've been down before.  Intuitively,
>>> these terms seem like they should apply to taxon concepts,
>>> but it turns out that's not the right way to do it.  Things
>>> like "native" and "invasive" are not properties of taxon
>>> concepts; they're the property of an occurrence (which, I
>>> suspect, is why establishmentMeans is included in the
>>> Occurrence class in DwC; e.g., see the examples at
>>> http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans
>>>
>>> Rich
>>>
>>> ________________________________
>>>
>>>        From: tdwg-content-bounces at lists.tdwg.org
>>> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Jerry Cooper
>>>        Sent: Monday, October 11, 2010 4:38 PM
>>>        Cc: tdwg-content at lists.tdwg.org;
>>> tdwg-bioblitz at googlegroups.com
>>>        Subject: Re: [tdwg-content] What I learned at the
>>> TechnoBioBlitz
>>>
>>>
>>>
>>>        Rich,
>>>
>>>
>>>
>>>        Let's not confuse those terms which are best applied
>>> to a taxon concept rather than a  specific
>>> collection/observation of a taxon at a location.
>>>
>>>
>>>
>>>         There are existing vocabularies for taxon-related
>>> provenance, like those in GISIN, or the vocabulary Roger
>>> mentioned in his PESI talk at TDWG.
>>>
>>>
>>>
>>>        However, against a specific collection you can only
>>> record what the recorder actually knows at that location for
>>> that specific collected taxon, and not to infer a status like
>>> 'introduced' etc.
>>>
>>>
>>>
>>>        So, to me, the vocabulary reduces even further - and
>>> the obvious ones are 'in cultivation', 'in captivity',
>>> 'border intercept' . Our botanical collection management
>>> system would hold more data on provenance of a specific
>>> collection and linkages between events - from the wild at t=1,
>>> x=1 to cultivation in botanic garden Y at t=2, X=2 etc. But
>>> then we often have that data because we are generating it.
>>>
>>>
>>>
>>>        Jerry
>>>
>>>
>>>
>>>
>>>
>>>        From: tdwg-content-bounces at lists.tdwg.org
>>> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Richard Pyle
>>>        Sent: Tuesday, 12 October 2010 3:27 p.m.
>>>        To: Donald.Hobern at csiro.au; tuco at berkeley.edu
>>>        Cc: tdwg-content at lists.tdwg.org;
>>> tdwg-bioblitz at googlegroups.com
>>>        Subject: Re: [tdwg-content] What I learned at the
>>> TechnoBioBlitz
>>>
>>>
>>>
>>>        I certainly agree it's important!  I was just saying
>>> that a simple flag probably wouldn't be enough.  I like the
>>> idea of a controlled vocabulary (as you and John both allude
>>> to), and I can imagine about a half-dozen terms that our
>>> community will no-doubt adopt with almost no debate.....  :-)
>>>
>>>
>>>
>>>        In my mind, the broadest categories (and likely most
>>> useful) would be something like:
>>>
>>>
>>>
>>>        Native (was there without any assistance from humans)
>>>
>>>        Introduced (got there with the assistance of humans,
>>> but is inhabiting the natural environment)
>>>
>>>        Captive (brought by humans and still maintained in captivity)
>>>
>>>
>>>
>>>        You might also throw in "Cryptogenic", which is an
>>> assertion that we do not know which of these categories a
>>> particular organism falls (not the same as null, which means
>>> we don't know whether or not we know)
>>>
>>>
>>>
>>>        Of course, each of these can be further subdivded,
>>> but the more we subdivide, the greater the ratio of
>>> fuzzy:clean distinctions. I would say that the terms should
>>> be established in consultation with those most likely to use
>>> them (e.g., as you suggest, distribution analysis, niche modellers,
>>> etc.)  For example, it might be useful to distinguish between
>>> an organism that was itself introduced, compared to the
>>> progeny (or a well-established
>>> population) of an intoduced organism. This information can be
>>> useful for separating things likely to become established in
>>> new localities, vs. things that do not seem to "take" in a
>>> novel environment.
>>>
>>>        Anyway...I didn't want to say a lot on this topic
>>> (too late?); I just wanted to steer more towards controlled
>>> vocabulary, than simple flag field.
>>>
>>>
>>>
>>>        Aloha,
>>>
>>>        Rich
>>>
>>>
>>>
>>>                ________________________________
>>>
>>>                                From: Donald.Hobern at csiro.au
>>> [mailto:Donald.Hobern at csiro.au]
>>>                Sent: Monday, October 11, 2010 3:44 PM
>>>                To: Richard Pyle; tuco at berkeley.edu
>>>                Cc: tdwg-content at lists.tdwg.org;
>>> tdwg-bioblitz at googlegroups.com
>>>                Subject: RE: [tdwg-content] What I learned at
>>> the TechnoBioBlitz
>>>
>>>                Hi Rich.
>>>
>>>
>>>
>>>                I recognise this (and could probably define
>>> many different useful flags).  The bottom line is really
>>> whether or not the location is one which should be used for
>>> distribution analysis, niche modelling and similar
>>> activities.  There will certainly be many grey areas, but it
>>> would be good if software could weed out captive occurrences.
>>>
>>>
>>>
>>>                Donald
>>>
>>>
>>>
>>>
>>>
>>>                untitled
>>>
>>>
>>>
>>>                        Donald Hobern, Director, Atlas of
>>> Living Australia
>>>
>>>                CSIRO Ecosystem Sciences, GPO Box 1700,
>>> Canberra, ACT 2601
>>>
>>>                Phone: (02) 62464352 Mobile: 0437990208
>>>
>>>                Email: Donald.Hobern at csiro.au
>>> <mailto:Donald.Hobern at csiro.au>
>>>
>>>                Web: http://www.ala.org.au/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
>>>                Sent: Tuesday, 12 October 2010 12:33 PM
>>>                To: Hobern, Donald (CES, Black Mountain);
>>> tuco at berkeley.edu
>>>                Cc: tdwg-content at lists.tdwg.org;
>>> tdwg-bioblitz at googlegroups.com
>>>                Subject: RE: [tdwg-content] What I learned at
>>> the TechnoBioBlitz
>>>
>>>
>>>
>>>                I'm not so sure a simple flag will do it.  We
>>> have examples ranging from animals in zoos, to escaped
>>> animals, to intentionally and unintentionally introduced
>>> populations, to naturalized populations -- and just about
>>> everything in-between.  Where on this spectrum would you draw
>>> the line for flagging something as "naturally occurring"?
>>>
>>>
>>>
>>>                Rich
>>>
>>>
>>>
>>>                        ________________________________
>>>
>>>                                                From:
>>> tdwg-content-bounces at lists.tdwg.org
>>> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of
>>> Donald.Hobern at csiro.au
>>>                        Sent: Monday, October 11, 2010 2:59 PM
>>>                        To: tuco at berkeley.edu
>>>                        Cc: tdwg-content at lists.tdwg.org;
>>> tdwg-bioblitz at googlegroups.com
>>>                        Subject: Re: [tdwg-content] What I
>>> learned at the TechnoBioBlitz
>>>
>>>                        Thanks, John.
>>>
>>>
>>>
>>>                        This is useful, but completely
>>> uncontrolled - effectively a verbatimEstablishmentMeans.
>>> Having a more controlled version or a simple flag which could
>>> be machine-processible in those cases where providers can
>>> supply it would be useful.
>>>
>>>
>>>
>>>                        Donald
>>>
>>>
>>>
>>>
>>>
>>>                        untitled
>>>
>>>
>>>
>>>                                Donald Hobern, Director,
>>> Atlas of Living Australia
>>>
>>>                        CSIRO Ecosystem Sciences, GPO Box
>>> 1700, Canberra, ACT 2601
>>>
>>>                        Phone: (02) 62464352 Mobile: 0437990208
>>>
>>>                        Email: Donald.Hobern at csiro.au
>>> <mailto:Donald.Hobern at csiro.au>
>>>
>>>                        Web: http://www.ala.org.au/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                        From: gtuco.btuco at gmail.com
>>> [mailto:gtuco.btuco at gmail.com] On Behalf Of John Wieczorek
>>>                        Sent: Tuesday, 12 October 2010 11:34 AM
>>>                        To: Hobern, Donald (CES, Black Mountain)
>>>                        Cc: jsachs at csee.umbc.edu;
>>> tdwg-bioblitz at googlegroups.com; tdwg-content at lists.tdwg.org
>>>                        Subject: Re: [tdwg-content] What I
>>> learned at the TechnoBioBlitz
>>>
>>>
>>>
>>>                        Natural occurrence is meant to be
>>> captured through the term dwc:establishmentMeans
>>> (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans).
>>>
>>>                        On Mon, Oct 11, 2010 at 5:16 PM,
>>> <Donald.Hobern at csiro.au> wrote:
>>>
>>>                        Thanks, Joel.
>>>
>>>                        Nice summary.  One addition which we
>>> do need to resolve (and which has been suggested in recent
>>> months) is to have a flag to indicate whether a record should
>>> be considered to show a "natural"
>>> occurrence (in distinction from cultivation, botanic gardens,
>>> zoos, etc.).
>>> This is not so much an issue in a BioBlitz, but is certainly
>>> a factor with citizen science recording in general - see the
>>> number of zoo animals in the Flickr EOL group.
>>>
>>>                        Donald
>>>
>>>
>>>
>>>
>>>                        Donald Hobern, Director, Atlas of
>>> Living Australia
>>>                        CSIRO Ecosystem Sciences, GPO Box
>>> 1700, Canberra, ACT 2601
>>>                        Phone: (02) 62464352 Mobile: 0437990208
>>>                        Email: Donald.Hobern at csiro.au
>>>                        Web: http://www.ala.org.au/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                        -----Original Message-----
>>>                        From: tdwg-content-bounces at lists.tdwg.org
>>> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of joel sachs
>>>                        Sent: Monday, 11 October 2010 10:47 PM
>>>                        To: tdwg-bioblitz at googlegroups.com;
>>> tdwg-content at lists.tdwg.org
>>>                        Subject: [tdwg-content] What I
>>> learned at the TechnoBioBlitz
>>>
>>>                        One of the goals of the recent
>>> bioblitz was to think about the suitability and
>>> appropriatness of TDWG standards for citizen science. Robert
>>> Stevenson has volunteered to take the lead on preparing a
>>> technobioblitz lessons learned document, and though the scope
>>> of this document is not yet determined, I think the audience
>>> will include bioblitz organizers, software developers, and
>>> TDWG as a whole. I hope no one is shy about sharing lessons
>>> they think they learned, or suggestions that they have. We
>>> can use the bioblitz google group for this discussion, and
>>> copy in tdwg-content when our discussion is standards-specific.
>>>
>>>                        Here are some of my immediate observations:
>>>
>>>                        1. Darwin Core is almost exactly
>>> right for citizen science. However, there is a desperate need
>>> for examples and templates of its use. To illustrate this
>>> need: one of the developers spoke of the design choice
>>> between "a simple csv file and a Darwin Core record". But a
>>> simple csv file is a legitimate representation of Darwin
>>> Core! To be fair to the developer, such a sentence might not
>>> have struck me as absurd a year ago, before Remsen said
>>> "let's use DwC for the bioblitz".
>>>
>>>                        We provided a couple of example DwC
>>> records (text and rdf) in the bioblitz data profile [1]. I
>>> think the lessons learned document should include an on-line
>>> catalog of cut-and-pasteable examples covering a variety of
>>> use cases, together with a dead simple desciption of DwC,
>>> something like "Darwin Core is a collection of terms,
>>> together with definitions."
>>>
>>>                        Here are areas where we augemented or
>>> diverged from DwC in the bioblitz:
>>>
>>>                        i. We added obs:observedBy [2], since
>>> there is no equivalent property in DwC, and it's important in
>>> Citizen Science (though often not available).
>>>
>>>                        ii. We used geo:lat and geo:long [3]
>>> instead of DwC terms for latitude and longitude. The geo
>>> namespace is a well used and supported standard, and records
>>> with geo coordinates are automatically mapped by several
>>> applications. Since everyone was using GPS  to retrieve their
>>> coordinates, we were able to assume WGS-84 as the datum.
>>>
>>>                        If someone had used another Datum,
>>> say XYZ, we would have added columns to the Fusion table so
>>> that they could have expressed their coordiantes in DwC, as, e.g.:
>>>                        DwC:decimalLatitude=41.5
>>>                        DwC:decimalLongitude=-70.7
>>>                        DwC:geodeticDatum=XYZ
>>>
>>>                        (I would argue that it should be
>>> kosher DwC to express the above as simply XYZ:lat and
>>> XYZ:long. DwC already incorporates terms from other
>>> namespaces, such as Dublin Core, so there is precedent for this.
>>>
>>>                        2. DwC:scientificName might be more
>>> user friendly than taxonomy:binomial and the other taxonomy
>>> machine tags EOL uses for flickr images.  If
>>> DwC:scientificName isn't self-explanatory enough, a user can
>>> look it up, and see that any scientific name is acceptable,
>>> at any taxonomic rank, or not having any rank. And once we
>>> have a scientific name, higher ranks can be inferred.
>>>
>>>                        3. Catalogue of Life was an important
>>> part of the workflow, but we had some problems with it.
>>> Future bioblitzes might consider using something like a CoL
>>> fork, as recently described by Rod Page [4].
>>>
>>>                        4. We didn't include "basisOfRecord"
>>> in the original data profile, and so it wasn't a column in
>>> the Fusion Table [5]. But when a transcriber felt it was
>>> necessary to include in order to capture data in a particular
>>> field sheet, she just added the column to the table. This
>>> flexibility of schema is important, and is in harmony with
>>> the semantic web.
>>>
>>>                        5. There seemed to be enthusiasm for
>>> another field event at next year's TDWG. This could be an
>>> opportunity to gather other types of data (eg.
>>>                        character data) and thereby
>>>                        i) expose meeting particpants to
>>> another set of everyday problems from the world of
>>> biodiversity workflows, and ii) try other TDWG technology on
>>> for size, e.g. the observation exchange format, annotation
>>> framework, etc.
>>>
>>>
>>>                        Happy Thanksgiving to all in Canada -
>>>                        Joel.
>>>                        ----
>>>
>>>
>>>                        1.
>>> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz
>> -profile-v1-1
>>>                        2. Slightly bastardizing our old
>>> observation ontology -
>>> http://spire.umbc.edu/ontologies/Observation.owl
>>>                        3. http://www.w3.org/2003/01/geo/
>>>                        4.
>>> http://iphylo.blogspot.com/2010/10/replicating-and-forking-dat
>> a-in-2010.html
>>>                        5.
>>> http://tables.googlelabs.com/DataSource?dsrcid=248798
>>>
>>>
>>> _______________________________________________
>>>                        tdwg-content mailing list
>>>                        tdwg-content at lists.tdwg.org
>>>
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
>>> _______________________________________________
>>>                        tdwg-content mailing list
>>>                        tdwg-content at lists.tdwg.org
>>>
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>>        Please consider the environment before printing this email
>>>        Warning: This electronic message together with any
>>> attachments is confidential. If you receive it in error: (i)
>>> you must not read, use, disclose, copy or retain it; (ii)
>>> please contact the sender immediately by reply email and then
>>> delete the emails.
>>>        The views expressed in this email may not be those of
>>> Landcare Research New Zealand Limited.
>>> http://www.landcareresearch.co.nz
>>>
>>>
>>>
>>>
>>>
>>> Please consider the environment before printing this email
>>> Warning:  This electronic message together with any
>>> attachments is confidential. If you receive it in error: (i)
>>> you must not read, use, disclose, copy or retain it; (ii)
>>> please contact the sender immediately by reply email and then
>>> delete the emails.
>>> The views expressed in this email may not be those of
>>> Landcare Research New Zealand Limited.
>>> http://www.landcareresearch.co.nz
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>> Please consider the environment before printing this email
>> Warning:  This electronic message together with any attachments is
confidential. If you receive it in error: (i) you must not read, use,
disclose, copy or retain it; (ii) please contact the sender immediately by
reply email and then delete the emails.
>> The views expressed in this email may not be those of Landcare Research
New Zealand Limited. http://www.landcareresearch.co.nz
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

_______________________________________________
tdwg-content mailing list
 tdwg-content at lists.tdwg.org
 http://lists.tdwg.org/mailman/listinfo/tdwg-content

_______________________________________________
tdwg-content mailing list
 tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

http://bioimages.vanderbilt.edu

-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

http://bioimages.vanderbilt.edu

_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- 
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept  <http://www.taxonconcept.org/> Knowledge Base / GeoSpecies
Knowledge Base <http://lod.geospecies.org/> 
About the GeoSpecies  <http://about.geospecies.org/> Knowledge Base
------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101013/9f59468b/attachment-0001.html