<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000000">

<p class="MsoNormal">I have been dreading trying to write this post

which I have

promised (or threatened depending on if you have enjoyed or been

annoyed by the

previous lengthy thread) for some time.<span style="">&nbsp;

</span>I have dreaded it because this is a complicated subject and not

one that

is amenable to terse messages.<span style="">&nbsp; </span>However,

after the previous conversation with Rich et al., I feel for the first

time

that I have the questions (not answers!) clearly in my mind.<span

 style="">&nbsp; </span>So rather than starting off rambling about

LivingSpecimens and establishmentMeans as I had planned, I'm going to

start by

laying down several principles that have come into clarity in my mind

after the

previous conversation and the attempt to map things out in a diagram.<span

 style="">&nbsp; </span>I will apologize in advance for failure to

use the correct database or IT technical terms when I'm in unfamiliar

territory.<span style=""> Until there is a consensus about how we deal

with the "tokens" we use to document Occurrences, I'm not sure that

what I have to say about those other topics will make sense.<br>

</span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">PRINCIPLES (derived from earlier discussion)<br>

</p>

<p class="MsoNormal">1.<span style="">&nbsp; </span>We have a number

of kinds of "things" (which I will henceforth refer to as "resources")

that are useful for describing and organizing metadata that we collect

in our

attempts to document biodiversity.<span style="">&nbsp; </span>For

many of these types of resources, we have defined classes to categorize

the

terms that can be used to describe the properties of resources that are

instances of that class.<span style="">&nbsp; </span>Describing the

class helps us to understand the type of resources that constitute

instances of

that class.</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">2. A conscious decision was made to avoid formally

defining

rdfs:domain for Darwin Core terms.<span style="">&nbsp; </span>This

decision was made to provide flexibility in the way the terms can be

used and

to avoid the situation where semantic clients would draw incorrect or

silly

conclusions about what kind of things resources are.<span style="">&nbsp; </span>However,

this decision does not excuse us

from thinking carefully about whether a term can be appropriately

applied to a

resource that is a member of some class (e.g. should we say that a

digital

photograph has a scientific name?).<span style="">&nbsp;

</span>Placing a term within a class is a suggestion that the term

would appropriately

be applied as a property of an instance of a class.</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">3. When users want to "flatten" and simplify their

databases, they tend to eliminate one-to-many (1:M) relationships in

favor of

one-to-one (1:1) relationships.<span style="">&nbsp; </span>The

result of that is differences like we saw in </p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif">http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif</a>

(which

allows 1:M relationships between Occurrences and Events and between

Events and

Locations) and</p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/rich-diagram2.gif">http://bioimages.vanderbilt.edu/pages/rich-diagram2.gif</a>

(which "atomizes" every Occurrence by considering it to have its own

separate eventTime and Location information).<span style="">&nbsp;

</span></p>

<p class="MsoNormal">A. There is nothing intrinsically "right" or

"wrong" about either of these approaches, because they each have

their own advantages.<span style="">&nbsp; </span>The 1:M approach

is more efficient, but results in a more complicated database, while

the 1:1 approach

results in a simpler database but may require repeating some or many

term values

in the records.<span style="">&nbsp; </span></p>

<p class="MsoNormal">B. The choices that users make in these situations

is the

cause of much of the disagreement about whether a certain class should

exist or

not since the people taking the 1:1 approach "collapse" the

relationship diagram and eliminate classes they don't need while people

who

take the 1:M approach need instances of the class to act as nodes to

connect

their "many" resources to some other thing.<span style="">&nbsp; </span></p>

<p class="MsoNormal">C. This collapsing of the diagram is also the

reason for

some disagreement about whether a term belongs in a certain class or

not.<span style="">&nbsp; </span>In the example above, 1:1 people would say

that eventDate is a property of an Occurrence, while 1:M people would

say that

eventDate is a property of an Event.<span style="">&nbsp; </span><span

 style="">&nbsp;</span></p>

<p class="MsoNormal">D. The choice of users on this issue influences

their

decision about whether or not to create resources that are instances of

classes

and hence to assign them identifiers.<span style="">&nbsp; </span>If

users take the 1:M approach, they need identifiers for resources that

are

acting as connecting nodes so that they can make reference to that

resource in

the metadata of the many things they are connecting to it.<span style="">&nbsp;

</span>If users take the 1:1 approach, they probably

will skip creating explicit resources (and their corresponding

identifiers) for

resources of the class that they are "collapsing" out of the

diagram).<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">4.<span style="">&nbsp; </span>I would propose

that the "right" relationship diagram is not necessarily one that

caters to a certain "right" philosophical point of view.<span style="">&nbsp;

</span>Rather, the "right" diagram is the

one that allows users to define the relationships that they need for

the

organization of their metadata in the simplest manner, and which

provides the

most clarity about what resources of various kinds are, and how they

are

connected.<span style="">&nbsp; </span></p>

<p class="MsoNormal">A.<span style="">&nbsp; </span>"Right"

as I have defined it above depends on how broadly applicable the

relationship

diagram is intended to apply.<span style="">&nbsp; </span>An

individual person or organization with limited interests may have a

relationship diagram that is simpler than the diagram shown at

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif">http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif</a>

or might choose to add classes for other things that are their personal

interest.<span style="">&nbsp; </span>An organization interested

focused on different issues or with broader interests might opt for

many more or

different classes that would be connected to those shown in the diagram.<span

 style="">&nbsp; </span></p>

<p class="MsoNormal">B. Given what I just said in A, what is "right"

for Darwin Core is going to be defined by the needs of the Darwin Core

constituency.<span style="">&nbsp; </span>At the TDWG meeting, John

Wieczorek made a statement which I will paraphrase as "in order for a

term

to make it into Darwin Core, at least two people had to want it".<span

 style="">&nbsp; </span>I'm not sure to what extent he was joking

about this, but it makes the point that one must consider community

needs

before saying that a certain part of the "diagram" is necessary.<span

 style="">&nbsp; </span>I think that the reason that Rich and I were

so quickly able to come to a consensus on the organization of the left

side of

the diagram is because he realized that there was a significant part of

the DwC

constituency that needed a way to group occurrences (i.e. needed

Individuals)

and I realized that there was a significant part of the constituency

that

needed to group multiple Events at a Locality and multiple Occurrences

at an

Event.<span style="">&nbsp; </span>So in evaluating alternative conceptual

systems for organizing resources, the question has to be asked as to

the extent

that an alternative allows broad segments of the DwC constituency to

organize

their metadata in an efficient and conceptually sensible way.<span

 style="">&nbsp; </span>If one alternative is more broadly applicable

and conceptually clear than another, then that alternative is better

regardless

of the philosophical underpinnings of the argument.<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">5. The last point is one that has run as an

undercurrent

through various TDWG threads but which may not have been explicitly

stated in

this particular thread.<span style="">&nbsp; </span>That is that

there should be a separation between what a resource IS and what we

want to use

a resource FOR.<span style="">&nbsp; </span>To use technical terms, we

need to separate the "type" of a resource from its fitness of

use.<span style="">&nbsp; </span>A digital image IS a digital

image.<span style="">&nbsp; </span>It might be used FOR documenting

that an organism was at a particular location at a particular time, but

it

could be used to illustrate a character, as a part of a visual key, as

media

for an educational presentation, as art, and probably many other things

that

aren't popping into my mind at the moment.<span style="">&nbsp;

</span>I believe that much of the confusion about "what is an

Occurrence"

comes from a failure to make this distinction.<span style="">&nbsp;&nbsp;

</span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">THE ISSUE OF THE TOKEN</p>

<p class="MsoNormal">Earlier in the thread of "What is an Occurrence",

there was a general consensus that an Occurrence often had a "thing"

that was associated with it that served as evidence that a taxon

representative

(i.e. Individual) occurred at a particular Location at a particular

time.<span style="">&nbsp; </span>In my Biodiversity Informatics paper, I

called this thing a "representation", but I now believe that

"token" is a better term and will use it hereafter.<span style="">&nbsp; </span>There

also seemed to be a consensus that an

observation was simply an Occurrence that did not have an associated

token.<span style="">&nbsp; </span>(This is with the understanding

that observation is being narrowly defined as a type of Occurrence,

with a

definable time and location, as opposed to what I called the

"checklist" definition which indicated that some undefined taxon

representative was present in some defined geographical area at an

indefinite

time.)<span style="">&nbsp; </span>In one of my earlier posts, I

pleaded for somebody to tell me whether there was an assumption that

the token

was considered a part of the Occurrence or whether it was a separate

thing.<span style="">&nbsp; </span>I did not get any responses,

which I'm construing to mean that people weren't sure about this.<span

 style="">&nbsp; </span>At the present, I now have a clearer idea of

the general principles I outlined above, and also have the "Rich"

diagram for modeling relationships, so I'm going to again pose this

question,

but in what I hope is a clearer way.<span style="">&nbsp; </span>I

have re-made the earlier diagram as Rich suggested, using triangles

rather than

arrows.<span style="">&nbsp; </span>The wide side of the triangle is

the "many" side of the relationship and the point is the

"one" side.<span style="">&nbsp; </span>As before, I'm

deferring on the right side of the diagram (to the right of

Identification) to

the taxonomists for now, so let's keep that out of the discussion for

the

moment.<span style="">&nbsp; </span>I have also clarified the

diagram by coloring in the actual DwC classes to distinguish them from

selected

terms that fall within those classes (non-colored boxes) and which can

be used

as properties of resources that are instances of the class.<span

 style="">&nbsp; </span>The two alternatives that I'm discussion are:</p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-assumed.gif">http://bioimages.vanderbilt.edu/pages/token-assumed.gif</a>

which

I will refer to as the "assumed token" model and </p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-explicit.gif">http://bioimages.vanderbilt.edu/pages/token-explicit.gif</a>

which I will refer to as the "explicit token" model.<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">I believe that historically the assumed token

model has been

the one which most people have had in mind.<span style="">&nbsp;

</span>Before the new DwC standard, we had specimens and we had

observations.<span style="">&nbsp; </span>In order to avoid

redundancies in terms for those two types of "things", a combined

"thing" called "Occurrence" was created.<span style="">&nbsp; </span>An

Occurrence that was an observation didn't

have a token and an Occurrence that was a specimen had a physical or

living

specimen as its token.<span style="">&nbsp; </span>That's all pretty

simple and sensible and we see evidence of this kind of thinking on the

descriptions given <a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/index.htm">http://rs.tdwg.org/dwc/terms/index.htm</a> .<span

 style="">&nbsp; </span>A record for an Occurrence has a thing called

its dwc:basisOfRecord that presumably describes the kind of token (if

any).<span style="">&nbsp; </span>So if the token were a preserved

specimen, we would say that [Occurrence] basisOfRecord

[PreservedSpecimen].<span style="">&nbsp; </span>If there were no

token we would say [Occurrence] basisOfRecord [HumanObservation] or

[Occurrence] basisOfRecord [MachineObservation].<span style="">&nbsp; </span>Referring

back to the assumed token diagram,

in the case of a specimen there is no explicit reference to the

specimen as a separate

entity.<span style="">&nbsp; </span>The terms related to the

specimen, such as preparations and disposition are just plopped into

the

Occurrence class which implies that they are properties of the

Occurrence

itself.<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">There seems to be a general consensus that other

kinds of

tokens can be used to document an Occurrence.<span style="">&nbsp;

</span>However, the way that the current Darwin Core terms are designed

and

placed within classes are very inconsistent as to how they handle types

of

tokens other than specimens.<span style="">&nbsp; </span>According

to the instructions at the top of

<a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/index.htm">http://rs.tdwg.org/dwc/terms/index.htm</a>, a

camera trap bird sighting should have [Occurrence] basisOfRecord

[MachineObservation].<span style="">&nbsp; </span>It is not clear

how one is supposed to handle the actually metadata for the image that

serves

as the token.<span style="">&nbsp; </span>Unlike specimens where the

token's metadata terms are placed in the Occurrence class, I guess in

the case

of an image one is supposed to use associatedMedia to link the

so-called

MachineObservation to the image record.<span style="">&nbsp; </span>If

DNA were extracted, one would link the sequence to the Occurrence using

associatedSequences (although it's not clear to me what the

basisOfRecord for

that would be - "TookATissueSample"?).<span style="">&nbsp; </span>But what

does one do for other kinds of

tokens, like seeds or tissue samples - create terms like associatedSeed

and

associatedTissueSample?<span style="">&nbsp; </span>I think that the

ResourceRelationship terms were supposed to handle this problem, but I

have yet

to see an example of exactly how this was supposed to work.<span

 style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">As an attempt to resolve this confusion in my

mind, I wrote

the Biodiversity Informatics paper that I've promoted frequently on

this list (<a class="moz-txt-link-freetext" href="https://journals.ku.edu/index.php/jbi/article/view/3664">https://journals.ku.edu/index.php/jbi/article/view/3664</a>).

<span style="">&nbsp;</span>In that paper, I take the basic assumed

token model and broaden it in an attempt to make the assumed token

model work

for all kinds of tokens.<span style="">&nbsp; </span>Because I

assumed that each occurrence has a single token, I "collapsed the

diagram" and connected the properties of the token directly to the

Occurrence resource (as was modeled when specimen properties were

placed within

the Occurrence class).<span style="">&nbsp; </span>If there were

several tokens for a given Individual, I "flattened" the records by

creating a separate Occurrence resource for each token.<span style="">&nbsp;

</span>The model was generalized further by allowing

secondary Occurrence records where the token was not derived directly

from the

organism but rather derived from a primary Occurrence record.<span

 style="">&nbsp; </span>In complicated circumstances such as those

found in a botanical garden where a seed or cutting might be collected

from a

tree with subsequent generation of a LivingSpecimen which might have a

PreservedSpecimen collected from it and a DigitalStillImage taken of

the

preserved specimen.<span style="">&nbsp; </span>You can see examples

of the complex types of situations I tried to handle at </p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif">http://bioimages.vanderbilt.edu/pages/conceptual-scheme-insect.gif</a>

and</p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/conceptual-scheme-botanical.gif">http://bioimages.vanderbilt.edu/pages/conceptual-scheme-botanical.gif</a></p>

<p class="MsoNormal">I created my own terms (like

sernec:derivativeOccurrence and

sernec:derivedFrom) to describe the connections among the individual

and the

various layers of Occurrences.<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Does this system work?<span style="">&nbsp;

</span>Yes, but there are a number of problems associated with it.<span

 style="">&nbsp; </span>The first problem is related to Principle 4

above.<span style="">&nbsp; </span>In order for this system to work,

there needs to be a consensus in the DwC community about several things.<span

 style="">&nbsp; </span>One is that each Occurrence must have only

one token.<span style="">&nbsp; </span>If we are going to

"type" Occurrences by their basisOfRecord (and the acceptable values

for basisOfRecord are officially DwC types, see

<a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm">http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm</a>),

then an Occurrence can't have two values for basisOfRecord.<span

 style="">&nbsp; </span>It is clear from the discussion we've had

that people would like to consider a single Occurrence to be able to

have

multiple tokens as documentation.<span style="">&nbsp; </span>The

second problem is that there needs to be a consensus that a secondary

Occurrence can exist at all (i.e. can you call the image of a specimen

"an

Occurrence"?).<span style="">&nbsp; </span>It is clear to me from

the discussion that when people are thinking about what an Occurrence

means, they

have in mind the documentation of the time and place of the Individual

in its

environment.<span style="">&nbsp; </span>In a previous communication,

John Wieczorek clarified that terms describing Occurrences like

recordedBy and

eventDate should only apply to primary occurrences and that it would

not be

appropriate to use them as properties of what I'm calling a secondary

occurrence (such as the image of a specimen).<span style="">&nbsp;

</span>So I dealt with this by creating a distinction between

Occurrences that

document the distribution of a taxon (using the term

sernec:documentsDistribution) and those that don't.<span style="">&nbsp; </span>This

is something like the old

validDistributionFlag, but I defined documentsDistribution specifically

as

having a value of "true" only for Occurrences that were derived

directly from the Individual (gray arrows in the two diagrams from the

paper).<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">But I think that the worst "crime" of the system I

suggested is violation of Principle 5 above.<span style="">&nbsp;

</span>By asserting an unvarying 1:1 relationship between the

Occurrence and

its token and by collapsing my relationship diagram to not explicitly

include a

resource that is the token itself, I am confusing the USE of an

Occurrence (to

demonstrate that a representative of a taxon was present at a

particular

Location at a particular time) which what the token IS (a dead organism

in a

jar or glued to paper, an electronic representation of photon patterns,

a

series of characters representing a nucleotide sequence).<span style="">&nbsp;

</span>So I'm charging myself with this

"crime", pleading guilty, and accepting my sentence, which is to

admit that the system I suggested in the Biodiversity Informatics paper

is

"wrong" based on the principles I outlined above.<span style="">&nbsp; </span>What

this amounts to is an acceptance of the "rightness"

of the explicit token model (in the sense that I defined "right" in

Principle 3 above).<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">However, if I'm going to make this admission, I

demand that the

other guilty parties also confess, namely people who want to assert

that

Occurrences have properties that actually are properties of specimens.<span

 style="">&nbsp; </span>If we are going to have a system that

actually works, we can't straddle the fence and say that the assumed

token

model is correct for specimens and that the explicit token model is

correct for

every other kind of token.<span style="">&nbsp; </span>If we accept

the explicit token model, then specimen will have to come off of it's

throne

and be a token like all of the other ways that we provide evidence that

an Occurrence

happened.<span style="">&nbsp; </span>If we accept the explicit

token model, then as a biodiversity informatics resource type

"observation"

will have to disappear into a puff of nothingness just like the

"luminescent ether", "centrifugal force", and other kinds

of things that we thought we needed to have to explain things but which

turned

out to be unnecessary when we figured out more basic explanations.<span

 style="">&nbsp; </span>A human observation will simply be an

Occurrence that doesn't have a token (which is what I've heard some

people say

all along).<span style="">&nbsp; </span>If we allow the Occurrence/token

relationship to be a one-to-many relationship rather than one-to-one,

then HumanObservation

is just the one-to-zero case of the more general one-to-many.<span

 style="">&nbsp; </span>For those of you who like the idea of a

"machine observation", that is just an Occurrence with a token that

is whatever type of resource that the machine produces (electronic data

file,

image of the organism, image of a graph, or whatever).<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">ADVANTAGES OF RECOGNIZING TOKENS EXPLICITLY</p>

<p class="MsoNormal">If we accept the explicit token model over the

assumed token

model, a number of problems get solved.<span style="">&nbsp;

</span>Just as was the case with Events, people who want to flatten

things out

by having only one token per Occurrence can do so.<span style="">&nbsp; </span>For

example, if I want to atomize things by

defining my Occurrence to have taken place during an Event that lasted

only the

one second within which my camera shutter clicked, I can do that and

have only

a single token associated with that Occurrence.<span style="">&nbsp;

</span>On the other hand, if others want to define their Occurrence as

taking

place over the time over which they photographed, collected a leaf

tissue sample,

and then collected a branch of a tree for an herbarium specimen, then

they can

do that and associate all of those tokens (one or more images, the

tissue

sample, and the preserved specimen) with the single Occurrence.<span

 style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Another important benefit will come down the line

when we

actually try to develop RDF templates.<span style="">&nbsp;

</span>Right now it is not exactly clear (at least to me) how

properties should

be divided up among resources that are being described in the RDF.<span

 style="">&nbsp; </span>Based on the assumed token model, I have been

including the metadata for the token within the container element for

the Occurrence.<span style="">&nbsp; </span>This leads to some of the kind

of odd

assertions that people have been objecting to, such as </p>

<p class="MsoNormal">[Occurrence] dcterms:rights ["(c) 2002 Steven J.

Baskauf"] or </p>

<p class="MsoNormal">[Occurrence] preparations ["skin"].<span style="">&nbsp;

</span></p>

<p class="MsoNormal">In the explicit token model, dividing metadata up

appropriately among separate Occurrence and token resources makes more

sense,

e.g. </p>

<p class="MsoNormal">[Occurrence] recordedBy ["Joe Curator"]</p>

<p class="MsoNormal">[image] dcterms:rights ["(c) 2002 Steven J.

Baskauf"]</p>

<p class="MsoNormal">[specimen] preparations ["skin"]</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">If we wanted to be really explicit about this, we

probably

should have a separate class for PhysicalSpecimens and separate the

terms that

describe specimens from those that describe Occurrences in general.<span

 style="">&nbsp; </span>There might be some difficulty in doing this

because there are some terms that might be hard to decide about, like

catalogNumber.<span style="">&nbsp; </span>I don't really think the

catalogNumber is a property of the Occurrence, because it makes more

sense to

me to say </p>

<p class="MsoNormal">[specimen] catalogNumber ["12345"] than</p>

<p class="MsoNormal">[Occurrence] catalogNumber ["12345"]</p>

<p class="MsoNormal">Realistically, I can't see this kind of separation

ever

happening, given the amount of trouble it's been just to get a few

people to admit

that Individuals exist.<span style="">&nbsp; </span>It is just too

hard to get motion to happen in the TDWG community.<span style="">&nbsp; </span>As

a practical matter, people who

"compress" the system (which we admit happens and make concession to in

Principle 3) by having record tables where a single row contains the

metadata

for both the Occurrence and the token (i.e. treat it as a 1:1

relationship)

will simply have a column heading for catalogNumber and not care

whether the

catalogNumber applies to the Occurrence or the token.<span style="">&nbsp; </span>It's

the people who want to do the more

complicated stuff like simultaneously keep track of multiple tokens per

Occurrence (like several images, a sound recording, and a specimen),

people who

want to write RDF, or people who want to merge databases containing

many types

of tokens who will have to pay attention to this distinction.<span

 style="">&nbsp; </span>Physical specimens would really be the only

kind of class we would have to create because there already is a rich

vocabulary for media items that is separate from DwC (i.e. the MRTG

schema) and

there are probably also vocabularies for stuff like tissue samples and

DNA

sequences (although I'm not familiar with them).<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">TYPING</p>

<p class="MsoNormal">Bob has warned us about the dangers of asserting

that a term

always applies to a certain type of resource by asserting that the term

has an rdfs:domain

.<span style="">&nbsp; </span>However, we should not avoid

attempting to assert that a resource is itself of a certain type.<span

 style="">&nbsp; </span>Describing the "type" of a resource

is an important part of letting potential users assess the possible

fitness of

use of that resource.<span style="">&nbsp; </span>For example, you

can collect DNA from a preserved specimen but not from an image.<span

 style="">&nbsp; </span>You can include an image in a print journal article

but not a sound recording.<span style="">&nbsp; </span>You can

create build a range map from Occurrences, but not from DNA samples.<span

 style="">&nbsp; </span>In RDF, one of the basic properties that

should be described about every resource is its rdfs:type .<span

 style="">&nbsp; </span>In the generic Linked Data world, you can

pretty much use anything that you want as an rdfs:type .<span style="">&nbsp;

</span>If you decide to use something obscure, then

the danger is that nobody else will have any idea what kind of thing

you are

describing.<span style="">&nbsp; </span>The Draft TDWG GUID

Applicability Statement recommendation 11 says that "Objects in the

biodiversity informatics domain that are identified by a GUID should be

typed

using the TDWG ontology or other well-known vocabularies in accordance

with the

TDWG common architecture."<span style="">&nbsp; </span>So in

our community, we can't just type resources any way we want.<span

 style="">&nbsp; </span>But exactly how we SHOULD type things isn't

clear.<span style="">&nbsp; </span>There isn't any functioning TDWG

ontology at the moment.<span style="">&nbsp; </span>I have found it

useful to use the DwC class as the rdfs:type in my attempts to write

RDF.<span style="">&nbsp; </span>That works pretty well for things that

have

DwC classes.<span style="">&nbsp; </span>But if we follow the

explicit token model, we need to have some consensus on what we will

use as the

rdfs:type for the tokens.<span style="">&nbsp;&nbsp; </span>At this point

it looks to me like it would make sense to have the convention that for

tokens one

uses either a dcterms:type or a Darwin Core type (i.e. one of the types

listed

at <a class="moz-txt-link-freetext" href="http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm">http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm</a>, although as

I

already noted, there is no need for HumanObservation in the case of

describing

a token because human observations don't have tokens).<span style="">&nbsp; </span>There

isn't any sort of "collision"

here of the sort that happened right after the adoption of the Darwin

Core

Standard when we tried to merge the Dublin and Darwin Core types (see

<a class="moz-txt-link-freetext" href="http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC">http://www.keytonature.eu/wiki/MRTGv08_Type_term_inconsistent_with_DwC</a>

and </p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html">http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000301.html</a>

with many following responses for the gruesome details) since rdfs:type

doesn't

demand any particular type vocabulary.<span style="">&nbsp; </span>I'm

not entirely happy with this approach because for digital still images

the

logical type would be dctype:StillImage, which doesn't give any

indication as

to whether the image is film or digital, but I guess at this point in

the 21<sup>st</sup>

century most consuming applications will probably just assume digital

anyway.<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">So (assuming that Individuals become a DwC class)

I guess I

don't really see that there is any problem in using the current Darwin

Core

classes to indicate the rdfs:type of every kind of resource that we

would be

reasonably likely to assign GUIDs to EXCEPT for tokens.<span style="">&nbsp;

</span>Typing of tokens could be done using a

combination of Darwin Core and Dublin Core types.<span style="">&nbsp; </span>What

I'm left scratching my head about is

basisOfRecord.<span style="">&nbsp; </span>When I subscribed to the

assumed token model (i.e. when I wrote the Biodiversity Informatics

paper), I

thought I knew what basisOfRecord meant.<span style="">&nbsp;

</span>It meant the kind of token that backed up an Occurrence.<span

 style="">&nbsp; </span>So when I wrote RDF for a specimen (as in

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf">http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf</a>)

I used the "hand grenade" approach to typing.<span style="">&nbsp; </span>I

lobbed every kind of "typing"

that I knew of at the Occurrence record for a specimen:</p>

<p class="MsoNormal">[Occurrence] rdfs:type [dwc:Occurrence]</p>

<p class="MsoNormal">[Occurrence] dwc:basisOfRecord

[dwctype:PreservedSpecimen]</p>

<p class="MsoNormal">and</p>

<p class="MsoNormal">[Occurrence] dcterms:type [dctype:PhysicalObject]</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Under the explicit token model, I would just use</p>

<p class="MsoNormal">[Occurrence] rdfs:type [dwc:Occurrence] </p>

<p class="MsoNormal">for the Occurrence and</p>

<p class="MsoNormal">[specimen] rdfs:type [dwctype:PreservedSpecimen]</p>

<p class="MsoNormal">for the specimen itself.<span style="">&nbsp;

</span>If I also took an image at the same time and wanted to say that

it was

part of the same Occurrence as the specimen, I would use</p>

<p class="MsoNormal">[image] rdfs:type [dctype:StillImage]</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Under the explicit token model, I really can't see

any use

for dwc:basisOfRecord .<span style="">&nbsp; </span>Despite the

resolution of the "train wreck" involving dcterms:type that we

narrowly avoided after the adoption of Darwin Core, the definition

still says

"the specific nature of the data record - a subtype of the

dcterms:type."<span style="">&nbsp; </span>I think this is

clearly wrong because I think we established that it was NOT a subtype

of

dcterms:type in that discussion that I referenced above.<span style="">&nbsp;

</span>So what is basisOfRecord???<span style="">&nbsp; </span>What is "the

data record" of which

we are describing the nature?<span style="">&nbsp; </span>If it's

the Occurrence, then I think the consensus that I'm hearing in the

discussion

is that an Occurrence data record shouldn't have as its type any of the

dwctype

terms except for dwctype:Occurrence.<span style="">&nbsp; </span>So

what are all of the other terms like PreservedSpecimen for???<span

 style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Under the explicit token model, what we really

need is NOT

basisOfRecord.<span style="">&nbsp; </span>What we need is some term

like "dwc:tokenID" if you like the Darwin Core IDREF style or if you

prefer the style of the Linked Data community "dwc:hasToken".<span

 style="">&nbsp; </span>In both cases, the object of the term would

be an identifier for the token that's associated with a subject

Occurrence.<span style="">&nbsp; </span>This term could be applied

from zero (for observations) to many times to an Occurrence.<span

 style="">&nbsp; </span>People who want to flatten everything out

will just ignore this term and cram all their metadata for the

Occurrence,

token, Event, and Location onto one line in their metadata table.<span

 style="">&nbsp; </span>People who are going to use any kind of

one-to-many relationships at all will have to figure out how to handle

that

anyway and won't be daunted by having more than one dwc:tokenID per

Occurrence.<span style="">&nbsp; </span>In the spirit of the complicated

resource

relationship diagrams from my paper, one could link primary tokens

(like

specimens) to secondary tokens (like specimen images) by using

dwc:tokenID as

well.<span style="">&nbsp; </span>Any kind of token (primary, secondary,

tertiary, ad infinatum) could be linked to the occurrence that it

supports with

dwc:occurrenceID.</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">WHAT DOES THIS DEMAND OF US?</p>

<p class="MsoNormal">OK, I've now gone on for eight pages of text

explaining the

rationale behind the question.<span style="">&nbsp; </span>So I'll

return to the basic question: is the consensus for modeling the

relationship

between an Occurrence and associated token(s) the assumed token model:</p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-assumed.gif">http://bioimages.vanderbilt.edu/pages/token-assumed.gif</a></p>

<p class="MsoNormal">or the explicit token model:</p>

<p class="MsoNormal"><a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu/pages/token-explicit.gif">http://bioimages.vanderbilt.edu/pages/token-explicit.gif</a></p>

<p class="MsoNormal">?<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">If we accept the assumed token model with all of

its warts,

then for consistency's sake, we must create dwctype terms for each of

the types

of tokens that people would reasonably want to use as evidence for

Occurrences

(and my proposal for adding DigitalStillImage as a Darwin Core type

stands).<span style="">&nbsp; </span>We must also resign ourselves

to assigning a separate occurrence to each token that users want to use

to

document the presence of a taxon at a time and place.<span style="">&nbsp; </span>We

also must accept having goofy-sounding

statements like </p>

<p class="MsoNormal">[Occurrence] dcterms:rights ["(c) 2002 Steven J.

Baskauf"]</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">If we accept the explicit token model, then we

need to

either dump basisOfRecord or come up with some rational explanation for

what it

actually means (and my proposal to add DigitalStillImage as a Darwin

Core type

becomes irrelevant).<span style="">&nbsp; </span>We also need to

create some kind of term like dwc:tokenID that will allow connections

to be

made between Occurrence records and their tokens.<span style="">&nbsp; </span>For

people who want to flatten out their

Occurrence records and put the tokens together with the Occurrence

(i.e. "compress

the diagram" to get rid of the token resource), and who feel some need

to

indicate the type of the token that they are using, let them use any

appropriate term from the Dublin Core or Darwin Core types as a value

for

rdfs:type.</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Until we make one of these choices or the other

and

"fix" Darwin Core to work in a consistent way, we are just going to

continue to misunderstand each other because each person will just

"know

an Occurrence when they see it".<span style="">&nbsp; </span></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">In the interest of space, I am going to defer on

explaining

my opinions about LivingSpecimen and establishmentMeans.<span style="">&nbsp;

</span>Those explanations are contingent on the

conclusion that we reach on this issue.</p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<span style="font-size: 12pt; font-family: &quot;Times New Roman&quot;,&quot;serif&quot;;">Steve</span>

<pre class="moz-signature" cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 343-6707

<a class="moz-txt-link-freetext" href="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</a>

</pre>

</body>

</html>