<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<p class="MsoNormal">I have been dreading trying to write this post
which I have
promised (or threatened depending on if you have enjoyed or been
annoyed by the
previous lengthy thread) for some time.<span style="">
</span>I have dreaded it because this is a complicated subject and not
one that
is amenable to terse messages.<span style=""> </span>However,
after the previous conversation with Rich et al., I feel for the first
time
that I have the questions (not answers!) clearly in my mind.<span
style=""> </span>So rather than starting off rambling about
LivingSpecimens and establishmentMeans as I had planned, I'm going to
start by
laying down several principles that have come into clarity in my mind
after the
previous conversation and the attempt to map things out in a diagram.<span
style=""> </span>I will apologize in advance for failure to
use the correct database or IT technical terms when I'm in unfamiliar
territory.<span style=""> Until there is a consensus about how we deal
with the "tokens" we use to document Occurrences, I'm not sure that
what I have to say about those other topics will make sense.<br>
</span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">PRINCIPLES (derived from earlier discussion)<br>
</p>
<p class="MsoNormal">1.<span style=""> </span>We have a number
of kinds of "things" (which I will henceforth refer to as "resources")
that are useful for describing and organizing metadata that we collect
in our
attempts to document biodiversity.<span style=""> </span>For
many of these types of resources, we have defined classes to categorize
the
terms that can be used to describe the properties of resources that are
instances of that class.<span style=""> </span>Describing the
class helps us to understand the type of resources that constitute
instances of
that class.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">2. A conscious decision was made to avoid formally
defining
rdfs:domain for Darwin Core terms.<span style=""> </span>This
decision was made to provide flexibility in the way the terms can be
used and
to avoid the situation where semantic clients would draw incorrect or
silly
conclusions about what kind of things resources are.<span style=""> </span>However,
this decision does not excuse us
from thinking carefully about whether a term can be appropriately
applied to a
resource that is a member of some class (e.g. should we say that a
digital
photograph has a scientific name?).<span style="">
</span>Placing a term within a class is a suggestion that the term
would appropriately
be applied as a property of an instance of a class.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">3. When users want to "flatten" and simplify their
databases, they tend to eliminate one-to-many (1:M) relationships in
favor of
one-to-one (1:1) relationships.<span style=""> </span>The
result of that is differences like we saw in </p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif">http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif</a>
(which
allows 1:M relationships between Occurrences and Events and between
Events and
Locations) and</p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/rich-diagram2.gif">http://bioimages.vanderbilt.edu/pages/rich-diagram2.gif</a>
(which "atomizes" every Occurrence by considering it to have its own
separate eventTime and Location information).<span style="">
</span></p>
<p class="MsoNormal">A. There is nothing intrinsically "right" or
"wrong" about either of these approaches, because they each have
their own advantages.<span style=""> </span>The 1:M approach
is more efficient, but results in a more complicated database, while
the 1:1 approach
results in a simpler database but may require repeating some or many
term values
in the records.<span style=""> </span></p>
<p class="MsoNormal">B. The choices that users make in these situations
is the
cause of much of the disagreement about whether a certain class should
exist or
not since the people taking the 1:1 approach "collapse" the
relationship diagram and eliminate classes they don't need while people
who
take the 1:M approach need instances of the class to act as nodes to
connect
their "many" resources to some other thing.<span style=""> </span></p>
<p class="MsoNormal">C. This collapsing of the diagram is also the
reason for
some disagreement about whether a term belongs in a certain class or
not.<span style=""> </span>In the example above, 1:1 people would say
that eventDate is a property of an Occurrence, while 1:M people would
say that
eventDate is a property of an Event.<span style=""> </span><span
style=""> </span></p>
<p class="MsoNormal">D. The choice of users on this issue influences
their
decision about whether or not to create resources that are instances of
classes
and hence to assign them identifiers.<span style=""> </span>If
users take the 1:M approach, they need identifiers for resources that
are
acting as connecting nodes so that they can make reference to that
resource in
the metadata of the many things they are connecting to it.<span style="">
</span>If users take the 1:1 approach, they probably
will skip creating explicit resources (and their corresponding
identifiers) for
resources of the class that they are "collapsing" out of the
diagram).<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">4.<span style=""> </span>I would propose
that the "right" relationship diagram is not necessarily one that
caters to a certain "right" philosophical point of view.<span style="">
</span>Rather, the "right" diagram is the
one that allows users to define the relationships that they need for
the
organization of their metadata in the simplest manner, and which
provides the
most clarity about what resources of various kinds are, and how they
are
connected.<span style=""> </span></p>
<p class="MsoNormal">A.<span style=""> </span>"Right"
as I have defined it above depends on how broadly applicable the
relationship
diagram is intended to apply.<span style=""> </span>An
individual person or organization with limited interests may have a
relationship diagram that is simpler than the diagram shown at
<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif">http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif</a>
or might choose to add classes for other things that are their personal
interest.<span style=""> </span>An organization interested
focused on different issues or with broader interests might opt for
many more or
different classes that would be connected to those shown in the diagram.<span
style=""> </span></p>
<p class="MsoNormal">B. Given what I just said in A, what is "right"
for Darwin Core is going to be defined by the needs of the Darwin Core
constituency.<span style=""> </span>At the TDWG meeting, John
Wieczorek made a statement which I will paraphrase as "in order for a
term
to make it into Darwin Core, at least two people had to want it".<span
style=""> </span>I'm not sure to what extent he was joking
about this, but it makes the point that one must consider community
needs
before saying that a certain part of the "diagram" is necessary.<span
style=""> </span>I think that the reason that Rich and I were
so quickly able to come to a consensus on the organization of the left
side of
the diagram is because he realized that there was a significant part of
the DwC
constituency that needed a way to group occurrences (i.e. needed
Individuals)
and I realized that there was a significant part of the constituency
that
needed to group multiple Events at a Locality and multiple Occurrences
at an
Event.<span style=""> </span>So in evaluating alternative conceptual
systems for organizing resources, the question has to be asked as to
the extent
that an alternative allows broad segments of the DwC constituency to
organize
their metadata in an efficient and conceptually sensible way.<span
style=""> </span>If one alternative is more broadly applicable
and conceptually clear than another, then that alternative is better
regardless
of the philosophical underpinnings of the argument.<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">5. The last point is one that has run as an
undercurrent
through various TDWG threads but which may not have been explicitly
stated in
this particular thread.<span style=""> </span>That is that
there should be a separation between what a resource IS and what we
want to use
a resource FOR.<span style=""> </span>To use technical terms, we
need to separate the "type" of a resource from its fitness of
use.<span style=""> </span>A digital image IS a digital
image.<span style=""> </span>It might be used FOR documenting
that an organism was at a particular location at a particular time, but
it
could be used to illustrate a character, as a part of a visual key, as
media
for an educational presentation, as art, and probably many other things
that
aren't popping into my mind at the moment.<span style="">
</span>I believe that much of the confusion about "what is an
Occurrence"
comes from a failure to make this distinction.<span style="">
</span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">THE ISSUE OF THE TOKEN</p>
<p class="MsoNormal">Earlier in the thread of "What is an Occurrence",
there was a general consensus that an Occurrence often had a "thing"
that was associated with it that served as evidence that a taxon
representative
(i.e. Individual) occurred at a particular Location at a particular
time.<span style=""> </span>In my Biodiversity Informatics paper, I
called this thing a "representation", but I now believe that
"token" is a better term and will use it hereafter.<span style=""> </span>There
also seemed to be a consensus that an
observation was simply an Occurrence that did not have an associated
token.<span style=""> </span>(This is with the understanding
that observation is being narrowly defined as a type of Occurrence,
with a
definable time and location, as opposed to what I called the
"checklist" definition which indicated that some undefined taxon
representative was present in some defined geographical area at an
indefinite
time.)<span style=""> </span>In one of my earlier posts, I
pleaded for somebody to tell me whether there was an assumption that
the token
was considered a part of the Occurrence or whether it was a separate
thing.<span style=""> </span>I did not get any responses,
which I'm construing to mean that people weren't sure about this.<span
style=""> </span>At the present, I now have a clearer idea of
the general principles I outlined above, and also have the "Rich"
diagram for modeling relationships, so I'm going to again pose this
question,
but in what I hope is a clearer way.<span style=""> </span>I
have re-made the earlier diagram as Rich suggested, using triangles
rather than
arrows.<span style=""> </span>The wide side of the triangle is
the "many" side of the relationship and the point is the
"one" side.<span style=""> </span>As before, I'm
deferring on the right side of the diagram (to the right of
Identification) to
the taxonomists for now, so let's keep that out of the discussion for
the
moment.<span style=""> </span>I have also clarified the
diagram by coloring in the actual DwC classes to distinguish them from
selected
terms that fall within those classes (non-colored boxes) and which can
be used
as properties of resources that are instances of the class.<span
style=""> </span>The two alternatives that I'm discussion are:</p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-assumed.gif">http://bioimages.vanderbilt.edu/pages/token-assumed.gif</a>
which
I will refer to as the "assumed token" model and </p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-explicit.gif">http://bioimages.vanderbilt.edu/pages/token-explicit.gif</a>
which I will refer to as the "explicit token" model.<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I believe that historically the assumed token
model has been
the one which most people have had in mind.<span style="">
</span>Before the new DwC standard, we had specimens and we had
observations.<span style=""> </span>In order to avoid
redundancies in terms for those two types of "things", a combined
"thing" called "Occurrence" was created.<span style=""> </span>An
Occurrence that was an observation didn't
have a token and an Occurrence that was a specimen had a physical or
living
specimen as its token.<span style=""> </span>That's all pretty
simple and sensible and we see evidence of this kind of thinking on the
descriptions given <a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/index.htm">http://rs.tdwg.org/dwc/terms/index.htm</a> .<span
style=""> </span>A record for an Occurrence has a thing called
its dwc:basisOfRecord that presumably describes the kind of token (if
any).<span style=""> </span>So if the token were a preserved
specimen, we would say that [Occurrence] basisOfRecord
[PreservedSpecimen].<span style=""> </span>If there were no
token we would say [Occurrence] basisOfRecord [HumanObservation] or
[Occurrence] basisOfRecord [MachineObservation].<span style=""> </span>Referring
back to the assumed token diagram,
in the case of a specimen there is no explicit reference to the
specimen as a separate
entity.<span style=""> </span>The terms related to the
specimen, such as preparations and disposition are just plopped into
the
Occurrence class which implies that they are properties of the
Occurrence
itself.<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">There seems to be a general consensus that other
kinds of
tokens can be used to document an Occurrence.<span style="">
</span>However, the way that the current Darwin Core terms are designed
and
placed within classes are very inconsistent as to how they handle types
of
tokens other than specimens.<span style=""> </span>According
to the instructions at the top of
<a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/index.htm">http://rs.tdwg.org/dwc/terms/index.htm</a>, a
camera trap bird sighting should have [Occurrence] basisOfRecord
[MachineObservation].<span style=""> </span>It is not clear
how one is supposed to handle the actually metadata for the image that
serves
as the token.<span style=""> </span>Unlike specimens where the
token's metadata terms are placed in the Occurrence class, I guess in
the case
of an image one is supposed to use associatedMedia to link the
so-called
MachineObservation to the image record.<span style=""> </span>If
DNA were extracted, one would link the sequence to the Occurrence using
associatedSequences (although it's not clear to me what the
basisOfRecord for
that would be - "TookATissueSample"?).<span style=""> </span>But what
does one do for other kinds of
tokens, like seeds or tissue samples - create terms like associatedSeed
and
associatedTissueSample?<span style=""> </span>I think that the
ResourceRelationship terms were supposed to handle this problem, but I
have yet
to see an example of exactly how this was supposed to work.<span
style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">As an attempt to resolve this confusion in my
mind, I wrote
the Biodiversity Informatics paper that I've promoted frequently on
this list (<a class="moz-txt-link-freetext" href="https://journals.ku.edu/index.php/jbi/article/view/3664">https://journals.ku.edu/index.php/jbi/article/view/3664</a>).
<span style=""> </span>In that paper, I take the basic assumed
token model and broaden it in an attempt to make the assumed token
model work
for all kinds of tokens.<span style=""> </span>Because I
assumed that each occurrence has a single token, I "collapsed the
diagram" and connected the properties of the token directly to the
Occurrence resource (as was modeled when specimen properties were
placed within
the Occurrence class).<span style=""> </span>If there were
several tokens for a given Individual, I "flattened" the records by
creating a separate Occurrence resource for each token.<span style="">
</span>The model was generalized further by allowing
secondary Occurrence records where the token was not derived directly
from the
organism but rather derived from a primary Occurrence record.<span
style=""> </span>In complicated circumstances such as those
found in a botanical garden where a seed or cutting might be collected
from a
tree with subsequent generation of a LivingSpecimen which might have a
PreservedSpecimen collected from it and a DigitalStillImage taken of
the
preserved specimen.<span style=""> </span>You can see examples
of the complex types of situations I tried to handle at </p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif">http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif</a>
and</p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/conceptual-scheme-botanical.gif">http://bioimages.vanderbilt.edu/pages/conceptual-scheme-botanical.gif</a></p>
<p class="MsoNormal">I created my own terms (like
sernec:derivativeOccurrence and
sernec:derivedFrom) to describe the connections among the individual
and the
various layers of Occurrences.<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Does this system work?<span style="">
</span>Yes, but there are a number of problems associated with it.<span
style=""> </span>The first problem is related to Principle 4
above.<span style=""> </span>In order for this system to work,
there needs to be a consensus in the DwC community about several things.<span
style=""> </span>One is that each Occurrence must have only
one token.<span style=""> </span>If we are going to
"type" Occurrences by their basisOfRecord (and the acceptable values
for basisOfRecord are officially DwC types, see
<a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm">http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm</a>),
then an Occurrence can't have two values for basisOfRecord.<span
style=""> </span>It is clear from the discussion we've had
that people would like to consider a single Occurrence to be able to
have
multiple tokens as documentation.<span style=""> </span>The
second problem is that there needs to be a consensus that a secondary
Occurrence can exist at all (i.e. can you call the image of a specimen
"an
Occurrence"?).<span style=""> </span>It is clear to me from
the discussion that when people are thinking about what an Occurrence
means, they
have in mind the documentation of the time and place of the Individual
in its
environment.<span style=""> </span>In a previous communication,
John Wieczorek clarified that terms describing Occurrences like
recordedBy and
eventDate should only apply to primary occurrences and that it would
not be
appropriate to use them as properties of what I'm calling a secondary
occurrence (such as the image of a specimen).<span style="">
</span>So I dealt with this by creating a distinction between
Occurrences that
document the distribution of a taxon (using the term
sernec:documentsDistribution) and those that don't.<span style=""> </span>This
is something like the old
validDistributionFlag, but I defined documentsDistribution specifically
as
having a value of "true" only for Occurrences that were derived
directly from the Individual (gray arrows in the two diagrams from the
paper).<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">But I think that the worst "crime" of the system I
suggested is violation of Principle 5 above.<span style="">
</span>By asserting an unvarying 1:1 relationship between the
Occurrence and
its token and by collapsing my relationship diagram to not explicitly
include a
resource that is the token itself, I am confusing the USE of an
Occurrence (to
demonstrate that a representative of a taxon was present at a
particular
Location at a particular time) which what the token IS (a dead organism
in a
jar or glued to paper, an electronic representation of photon patterns,
a
series of characters representing a nucleotide sequence).<span style="">
</span>So I'm charging myself with this
"crime", pleading guilty, and accepting my sentence, which is to
admit that the system I suggested in the Biodiversity Informatics paper
is
"wrong" based on the principles I outlined above.<span style=""> </span>What
this amounts to is an acceptance of the "rightness"
of the explicit token model (in the sense that I defined "right" in
Principle 3 above).<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">However, if I'm going to make this admission, I
demand that the
other guilty parties also confess, namely people who want to assert
that
Occurrences have properties that actually are properties of specimens.<span
style=""> </span>If we are going to have a system that
actually works, we can't straddle the fence and say that the assumed
token
model is correct for specimens and that the explicit token model is
correct for
every other kind of token.<span style=""> </span>If we accept
the explicit token model, then specimen will have to come off of it's
throne
and be a token like all of the other ways that we provide evidence that
an Occurrence
happened.<span style=""> </span>If we accept the explicit
token model, then as a biodiversity informatics resource type
"observation"
will have to disappear into a puff of nothingness just like the
"luminescent ether", "centrifugal force", and other kinds
of things that we thought we needed to have to explain things but which
turned
out to be unnecessary when we figured out more basic explanations.<span
style=""> </span>A human observation will simply be an
Occurrence that doesn't have a token (which is what I've heard some
people say
all along).<span style=""> </span>If we allow the Occurrence/token
relationship to be a one-to-many relationship rather than one-to-one,
then HumanObservation
is just the one-to-zero case of the more general one-to-many.<span
style=""> </span>For those of you who like the idea of a
"machine observation", that is just an Occurrence with a token that
is whatever type of resource that the machine produces (electronic data
file,
image of the organism, image of a graph, or whatever).<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">ADVANTAGES OF RECOGNIZING TOKENS EXPLICITLY</p>
<p class="MsoNormal">If we accept the explicit token model over the
assumed token
model, a number of problems get solved.<span style="">
</span>Just as was the case with Events, people who want to flatten
things out
by having only one token per Occurrence can do so.<span style=""> </span>For
example, if I want to atomize things by
defining my Occurrence to have taken place during an Event that lasted
only the
one second within which my camera shutter clicked, I can do that and
have only
a single token associated with that Occurrence.<span style="">
</span>On the other hand, if others want to define their Occurrence as
taking
place over the time over which they photographed, collected a leaf
tissue sample,
and then collected a branch of a tree for an herbarium specimen, then
they can
do that and associate all of those tokens (one or more images, the
tissue
sample, and the preserved specimen) with the single Occurrence.<span
style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Another important benefit will come down the line
when we
actually try to develop RDF templates.<span style="">
</span>Right now it is not exactly clear (at least to me) how
properties should
be divided up among resources that are being described in the RDF.<span
style=""> </span>Based on the assumed token model, I have been
including the metadata for the token within the container element for
the Occurrence.<span style=""> </span>This leads to some of the kind
of odd
assertions that people have been objecting to, such as </p>
<p class="MsoNormal">[Occurrence] dcterms:rights ["(c) 2002 Steven J.
Baskauf"] or </p>
<p class="MsoNormal">[Occurrence] preparations ["skin"].<span style="">
</span></p>
<p class="MsoNormal">In the explicit token model, dividing metadata up
appropriately among separate Occurrence and token resources makes more
sense,
e.g. </p>
<p class="MsoNormal">[Occurrence] recordedBy ["Joe Curator"]</p>
<p class="MsoNormal">[image] dcterms:rights ["(c) 2002 Steven J.
Baskauf"]</p>
<p class="MsoNormal">[specimen] preparations ["skin"]</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If we wanted to be really explicit about this, we
probably
should have a separate class for PhysicalSpecimens and separate the
terms that
describe specimens from those that describe Occurrences in general.<span
style=""> </span>There might be some difficulty in doing this
because there are some terms that might be hard to decide about, like
catalogNumber.<span style=""> </span>I don't really think the
catalogNumber is a property of the Occurrence, because it makes more
sense to
me to say </p>
<p class="MsoNormal">[specimen] catalogNumber ["12345"] than</p>
<p class="MsoNormal">[Occurrence] catalogNumber ["12345"]</p>
<p class="MsoNormal">Realistically, I can't see this kind of separation
ever
happening, given the amount of trouble it's been just to get a few
people to admit
that Individuals exist.<span style=""> </span>It is just too
hard to get motion to happen in the TDWG community.<span style=""> </span>As
a practical matter, people who
"compress" the system (which we admit happens and make concession to in
Principle 3) by having record tables where a single row contains the
metadata
for both the Occurrence and the token (i.e. treat it as a 1:1
relationship)
will simply have a column heading for catalogNumber and not care
whether the
catalogNumber applies to the Occurrence or the token.<span style=""> </span>It's
the people who want to do the more
complicated stuff like simultaneously keep track of multiple tokens per
Occurrence (like several images, a sound recording, and a specimen),
people who
want to write RDF, or people who want to merge databases containing
many types
of tokens who will have to pay attention to this distinction.<span
style=""> </span>Physical specimens would really be the only
kind of class we would have to create because there already is a rich
vocabulary for media items that is separate from DwC (i.e. the MRTG
schema) and
there are probably also vocabularies for stuff like tissue samples and
DNA
sequences (although I'm not familiar with them).<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">TYPING</p>
<p class="MsoNormal">Bob has warned us about the dangers of asserting
that a term
always applies to a certain type of resource by asserting that the term
has an rdfs:domain
.<span style=""> </span>However, we should not avoid
attempting to assert that a resource is itself of a certain type.<span
style=""> </span>Describing the "type" of a resource
is an important part of letting potential users assess the possible
fitness of
use of that resource.<span style=""> </span>For example, you
can collect DNA from a preserved specimen but not from an image.<span
style=""> </span>You can include an image in a print journal article
but not a sound recording.<span style=""> </span>You can
create build a range map from Occurrences, but not from DNA samples.<span
style=""> </span>In RDF, one of the basic properties that
should be described about every resource is its rdfs:type .<span
style=""> </span>In the generic Linked Data world, you can
pretty much use anything that you want as an rdfs:type .<span style="">
</span>If you decide to use something obscure, then
the danger is that nobody else will have any idea what kind of thing
you are
describing.<span style=""> </span>The Draft TDWG GUID
Applicability Statement recommendation 11 says that "Objects in the
biodiversity informatics domain that are identified by a GUID should be
typed
using the TDWG ontology or other well-known vocabularies in accordance
with the
TDWG common architecture."<span style=""> </span>So in
our community, we can't just type resources any way we want.<span
style=""> </span>But exactly how we SHOULD type things isn't
clear.<span style=""> </span>There isn't any functioning TDWG
ontology at the moment.<span style=""> </span>I have found it
useful to use the DwC class as the rdfs:type in my attempts to write
RDF.<span style=""> </span>That works pretty well for things that
have
DwC classes.<span style=""> </span>But if we follow the
explicit token model, we need to have some consensus on what we will
use as the
rdfs:type for the tokens.<span style=""> </span>At this point
it looks to me like it would make sense to have the convention that for
tokens one
uses either a dcterms:type or a Darwin Core type (i.e. one of the types
listed
at <a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm">http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm</a>, although as
I
already noted, there is no need for HumanObservation in the case of
describing
a token because human observations don't have tokens).<span style=""> </span>There
isn't any sort of "collision"
here of the sort that happened right after the adoption of the Darwin
Core
Standard when we tried to merge the Dublin and Darwin Core types (see
<a class="moz-txt-link-freetext" href="http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC">http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC</a>
and </p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html">http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html</a>
with many following responses for the gruesome details) since rdfs:type
doesn't
demand any particular type vocabulary.<span style=""> </span>I'm
not entirely happy with this approach because for digital still images
the
logical type would be dctype:StillImage, which doesn't give any
indication as
to whether the image is film or digital, but I guess at this point in
the 21<sup>st</sup>
century most consuming applications will probably just assume digital
anyway.<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">So (assuming that Individuals become a DwC class)
I guess I
don't really see that there is any problem in using the current Darwin
Core
classes to indicate the rdfs:type of every kind of resource that we
would be
reasonably likely to assign GUIDs to EXCEPT for tokens.<span style="">
</span>Typing of tokens could be done using a
combination of Darwin Core and Dublin Core types.<span style=""> </span>What
I'm left scratching my head about is
basisOfRecord.<span style=""> </span>When I subscribed to the
assumed token model (i.e. when I wrote the Biodiversity Informatics
paper), I
thought I knew what basisOfRecord meant.<span style="">
</span>It meant the kind of token that backed up an Occurrence.<span
style=""> </span>So when I wrote RDF for a specimen (as in
<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf">http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf</a>)
I used the "hand grenade" approach to typing.<span style=""> </span>I
lobbed every kind of "typing"
that I knew of at the Occurrence record for a specimen:</p>
<p class="MsoNormal">[Occurrence] rdfs:type [dwc:Occurrence]</p>
<p class="MsoNormal">[Occurrence] dwc:basisOfRecord
[dwctype:PreservedSpecimen]</p>
<p class="MsoNormal">and</p>
<p class="MsoNormal">[Occurrence] dcterms:type [dctype:PhysicalObject]</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Under the explicit token model, I would just use</p>
<p class="MsoNormal">[Occurrence] rdfs:type [dwc:Occurrence] </p>
<p class="MsoNormal">for the Occurrence and</p>
<p class="MsoNormal">[specimen] rdfs:type [dwctype:PreservedSpecimen]</p>
<p class="MsoNormal">for the specimen itself.<span style="">
</span>If I also took an image at the same time and wanted to say that
it was
part of the same Occurrence as the specimen, I would use</p>
<p class="MsoNormal">[image] rdfs:type [dctype:StillImage]</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Under the explicit token model, I really can't see
any use
for dwc:basisOfRecord .<span style=""> </span>Despite the
resolution of the "train wreck" involving dcterms:type that we
narrowly avoided after the adoption of Darwin Core, the definition
still says
"the specific nature of the data record - a subtype of the
dcterms:type."<span style=""> </span>I think this is
clearly wrong because I think we established that it was NOT a subtype
of
dcterms:type in that discussion that I referenced above.<span style="">
</span>So what is basisOfRecord???<span style=""> </span>What is "the
data record" of which
we are describing the nature?<span style=""> </span>If it's
the Occurrence, then I think the consensus that I'm hearing in the
discussion
is that an Occurrence data record shouldn't have as its type any of the
dwctype
terms except for dwctype:Occurrence.<span style=""> </span>So
what are all of the other terms like PreservedSpecimen for???<span
style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Under the explicit token model, what we really
need is NOT
basisOfRecord.<span style=""> </span>What we need is some term
like "dwc:tokenID" if you like the Darwin Core IDREF style or if you
prefer the style of the Linked Data community "dwc:hasToken".<span
style=""> </span>In both cases, the object of the term would
be an identifier for the token that's associated with a subject
Occurrence.<span style=""> </span>This term could be applied
from zero (for observations) to many times to an Occurrence.<span
style=""> </span>People who want to flatten everything out
will just ignore this term and cram all their metadata for the
Occurrence,
token, Event, and Location onto one line in their metadata table.<span
style=""> </span>People who are going to use any kind of
one-to-many relationships at all will have to figure out how to handle
that
anyway and won't be daunted by having more than one dwc:tokenID per
Occurrence.<span style=""> </span>In the spirit of the complicated
resource
relationship diagrams from my paper, one could link primary tokens
(like
specimens) to secondary tokens (like specimen images) by using
dwc:tokenID as
well.<span style=""> </span>Any kind of token (primary, secondary,
tertiary, ad infinatum) could be linked to the occurrence that it
supports with
dwc:occurrenceID.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">WHAT DOES THIS DEMAND OF US?</p>
<p class="MsoNormal">OK, I've now gone on for eight pages of text
explaining the
rationale behind the question.<span style=""> </span>So I'll
return to the basic question: is the consensus for modeling the
relationship
between an Occurrence and associated token(s) the assumed token model:</p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-assumed.gif">http://bioimages.vanderbilt.edu/pages/token-assumed.gif</a></p>
<p class="MsoNormal">or the explicit token model:</p>
<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-explicit.gif">http://bioimages.vanderbilt.edu/pages/token-explicit.gif</a></p>
<p class="MsoNormal">?<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If we accept the assumed token model with all of
its warts,
then for consistency's sake, we must create dwctype terms for each of
the types
of tokens that people would reasonably want to use as evidence for
Occurrences
(and my proposal for adding DigitalStillImage as a Darwin Core type
stands).<span style=""> </span>We must also resign ourselves
to assigning a separate occurrence to each token that users want to use
to
document the presence of a taxon at a time and place.<span style=""> </span>We
also must accept having goofy-sounding
statements like </p>
<p class="MsoNormal">[Occurrence] dcterms:rights ["(c) 2002 Steven J.
Baskauf"]</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If we accept the explicit token model, then we
need to
either dump basisOfRecord or come up with some rational explanation for
what it
actually means (and my proposal to add DigitalStillImage as a Darwin
Core type
becomes irrelevant).<span style=""> </span>We also need to
create some kind of term like dwc:tokenID that will allow connections
to be
made between Occurrence records and their tokens.<span style=""> </span>For
people who want to flatten out their
Occurrence records and put the tokens together with the Occurrence
(i.e. "compress
the diagram" to get rid of the token resource), and who feel some need
to
indicate the type of the token that they are using, let them use any
appropriate term from the Dublin Core or Darwin Core types as a value
for
rdfs:type.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Until we make one of these choices or the other
and
"fix" Darwin Core to work in a consistent way, we are just going to
continue to misunderstand each other because each person will just
"know
an Occurrence when they see it".<span style=""> </span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In the interest of space, I am going to defer on
explaining
my opinions about LivingSpecimen and establishmentMeans.<span style="">
</span>Those explanations are contingent on the
conclusion that we reach on this issue.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<span style="font-size: 12pt; font-family: "Times New Roman","serif";">Steve</span>
<pre class="moz-signature" cols="72">--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>
</pre>
</body>
</html>