By contrast, the core object in GNUB is a taxon name usage instance --
which
is a purely abstract notion of the usage of a taxon name within some documentation source (like a publication). In this case, the text-string name is merely a property of the GUID-identified object, and would be an extremely BAD choice to use as a unique identifier.
It is possible that I'm not understanding what you are saying here, but if
you
are saying that the only name-related property of your GNUB taxon instances will be one which has a name string literal as its object,
Goodness, no!! The point I was making was that for GNI, the name-string *is* the object. For GNUB, the name-string is merely one (of MANY) properties of the object.
That will require any client using your taxon instance metadata to
re-process
the literal name string to cross reference it with lexical variants, parse
it into
its pieces, etc.
No -- that's definitely NOT the case. GNUB is highly normalized/atomized/parsed.
That should only need to be done once and then referenced via a GUID for the name (i.e. in the sense of tn:TaxonName).
Yes, but the name-string is only one of the properties. Other properties include most of the other elements in dwc:Taxon (and more).
This is why GNUB needs to generate a unique identifier to represent this core data object. The form that identifier takes (UUID, LSID, integer, DOI, whatever) from the perspective of the end user should be completely irrelevant, because the user should rarely (if ever) see it, and should certainly *never* be in a position to type it on a keyboard (we can discuss the appearance of
ZooBank
LSIDs on printed pages separately).
OK, again maybe I'm not understanding what you are saying here, but if you are saying that you don't intend to expose your unique GNUB identifiers to the public, then as far as I'm concerned you are setting up
GNUB to be irrelevant from the start.
Let me clarify: Obviously, GNUB identifiers will be fully exposed to the "public", in the sense that anyone who WANTS to see them (developers, IT specialists, hard-core name nerds, etc.), will be able to see them. In fact, anyone who wants a replicate copy of the ENTIRE dataset, including all Identifiers, raw tables, etc., will be able to do so. The idea is that you can download a snapshot of the entire database (all tables in their native structure; not dumbed down or flattened), and then set up a simple replication service that allows your local copy to automatically stay in synch with the "master" copy/ies. So yes, anyone who wants access to the identifiers has full access to them.
The point I was making was that most end-users won't care what the identifier is, or what kind it is, or how beautiful or ugly it is, or whatever. A good analogy is DNS: All users ever see is "google.com". They never see "74.125.224.176" (which google.com maps to from my machine at this moment). But the "ugly" "74.125.224.176" is what actually identifies the server to which google.com takes you. Analogously, users should only ever see "Danaus plexippus (Linnaeus 1758)"; they should never need to see "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523".
You mention a number of cool taxonomist-geek type things that you hope to accomplish with GNUB. But from my perspective as a
non-taxonomist-geek,
the main purpose I have for GNUB is as a place to anchor
dwc:Identification instances
so that I can indicate whether my identified resource is a representative
of the same
taxon that is being referred to by somebody else (or at least to make it
possible for
somebody to figure that out via computery cleverness, Semantic Web or
otherwise).
Yes, exactly. But remember, GNUB is just an index to information. You will anchor your dwc:Identification instances to GNUB identifers, which will give you a precise indication of the concept that was used for the Identification. For example, the Field guide or taxonomic key that the taxonomist used to make the identification in the first place. No information on the field guide or key that was used when applying the identification to the occurrence? No problem -- generate a new TNU, "authored" by the person/entity making the identification, and voila! You're now plugged into the GNA matrix. What does that give you? Well...a few immediate options include:
- Access to the full literature citation and other nomenclatural details for the name; - Access to all other usages of that name, including variant spellings, combinations, synonymy treatments, etc. - Access to all other resources of relevance that are also plugged into the GNA "matrix".
But more to your interests:
- Cross-linking to other usage instances in a way that allows you to figure that out via computery cleverness whether you and someone else are referring to the same taxon concept. This little piece of magic can happen and two ways:
1) Implicitly. By comparing other usages in the contexts of their collective synonymies. For example, suppose RefA and RefB both treated "Aus bus" and "Aus xus" as two distinct species. RefC treated "Aus xus" as a heterotypic/junior synonym of "Aus bus". If your identification of a specimen as "Aus bus" links into the TNU associated with RefA, then implicitly we can say that its (likely to be) congruent with the concept represented by the TNU for that name of RefB; but may or may not be comparable to the concept represented by RefC. This is an example of addressing the "many concepts for one name" problem. Conversely, suppose your specimen identification is linked to the TNU for RefC. In that case, we can infer that your concept of "Aus bus" could apply to representatives of either "Aus bus" *or* "Aus xus" as cited in RefA and RefB. This is addressing the "many names for one concept" problem. These are just two very simple examples (of many possible examples) of the sort of computery cleverness that can be used to infer implicit concept-mapping among TNUs. Obviously, there are assumptions and caveats and such -- but it's still better (a LOT better) than trying to make inferences based on the text-string name only.
2) Explicitly. In the same way that TNUs can serve as the "molecules" behind nomenclatural services (like ZooBank, Index Fungorum, and possibly IPNI/APNI/Tropicos, if/when they embrace GNA), these TNU molecules can also underpin taxon concept services, such as those represented in TCS RelationshipAsserions. In other words, there can be a structure/service that sits on top of GNUB that allows explicit declarations of the sort: TNU1 represents a concept circumscription that is congruent with TNU2; etc. These third-party assertions about concept-concept mappings could provide a very valuable service for making inferences involving both many-names-for-same-concept issues and many-concepts-with-one-name issues, presumably with greater precision and reliability than the implicit mappings.
How am I going to do that if you don't provide me with a good (i.e. meeting the 8 criteria of my last email) GUID to use as the object of my dwc:Identification properties?
Have we cleared up that misunderstanding?
For over a year, I've heard you lament that the whole problem is that people make identifications and don't indicate the sensu/sec. reference for the names they use.
Yes, exactly! And that's the real problem with our information domain: one of the key pieces of information needed to apply computery cleverness to identifications of Occurrence instances is missing from the vast majority of datasets. That, unfortunately, means we're limited in our ability to make inferences about concept mappings -- not because an informatic structure doesn't exist to accommodate it, but because one of the key pieces of information is lost (i.e., what *concept* of this taxon were you thinking when you assigned this name to this occurrence instance?)
You are now creating a system that would allow people to unambiguously make it clear what taxon they mean but you aren't giving them any way to say what it is? Again, I may just be misunderstanding what you
wrote here.
Indeed, it seems that you are. Please let me know if I have not cleared up the misunderstanding.
Yes. This "record based ID" can be anything you want. I don't really
don't and
shouldn't have to care about that. The "human friendly ID that allows
people to
discuss the same semantic thing" is precisely what the TDWG GUID
Applicability
Statement (a ratified TDWG standard, thanks to Kevin) is talking about.
Hmmm...my turn to worry that I'm misunderstanding something. I'm fairly certain that the TDWG GUID applicability statement applies primarily to what you are referring to as the "record based ID". I think (not sure) that what Kevin meant by the other thing ("human friendly ID that allows people to discuss the same semantic thing") was more of a human-friendly service that accepts the human-friendly form of an "identifier" (e.g., the text-string taxon name), and then converts that into the real GUID (our "record based ID") for actual embedded linking purposes. Sort of like how DNS converts "google.com" (human-friendly representation of a domain name) to "74.125.224.176" (actual "GUID" used to route to a specific server).
As I read that standard, I don't see any requirement that a GUID be "human
friendly",
but I would consider "human friendliness" to be a desirable "best
practice"
(influenced somewhat by http://www.w3.org/Provider/Style/URI and http://www.w3.org/TR/cooluris/) - if we have a choice of creating
externally exposed
GUIDs that are either human-friendly or not human-friendly, and if either
works
equally well, why not choose ones that are human-friendly?
Here is where I completely disagree. I've said it before, and I'll keep saying it: GUIDs are (should be) intended and necessary for computer-computer communication; *NOT*for human-computer or human-human communication. Their beauty or ugliness should be determined by what's beautiful or ugly to a computer, not to a human. A consistent 128 bits is "beautiful" to a computer, but a UUID is ugly to a human; whereas " Danaus plexippus (Linnaeus 1758)" is beautiful to a human, but ugly to a computer (for reasons Dima already outlined).
More fundamentally, one lesson of history that seems to be perpetually repeated is the mistake of encoding human-interpretable information into what is intended to be a stable, permanent identifier. INEVITABLY, a system that uses human-interpretable information as identifiers will include some fraction of instances where the human-interpretable part is somehow "wrong" (e.g., the user entered a Cyrillic "а" was accitdentally entered instead of a latin "a", or a typographic error in a scientific name, or worst of all, the assignement of a text-string name to a homonym due to a mix-up in authorship). The temptation to "fix" those "wrong" values is enormous. And, of course, by "fixing" them, permanence is broken.
Almost by definition, then, a "beautiful" identifier for computer-computer communication should be "ugly" to a pair of human eyeballs.
It is interesting all this discussion of identifiers when in the end it
doesn't
matter too much what the identifier is, just that you have an identifier
at all.
Yes and know. I guess it depends on what the word "is" means in your "what the identifier is" phrase (Channeling Bill Clinton here). If by "is" you include "is permanent", "is unique", or "is actionable", then it does matter what the identifier "is". If you mean "is a DOI" vs. "is an lsid", then it may matter (see Rod's post), or it may not -- depending on what you want the Identifier to be able to do.
The important thing is the semantics, the "are we talking about the same
thing"
question - so this is where I believe RDF/semantic web comes in - I might
see if
I can come up with some RDF/sem web example for TDWG that could
demonstrate this, hmmm...
This is where the real problem in our community is. We are *WAY* too fast and loose with the definitions of what our "things" are. We think that by simply distinguishing "Taxon Names" from "Taxon Concepts" that we've removed ambiguity. Not even close. There are multiple flavors within each of those two "domains", and far too few people in our community (both on the IT side *and* the taxonomy side) have thought through the implications of defining the different flavors, let alone trying to establish a "sameAs" between two different flavors.
Better yet, read the TDWG GUID Applicability Statement http://www.tdwg.org/standards/150/ and
I think I helped write that one, so I'm pretty sure I've got a lot of that covered already (except the parts I vehemently disagree with... :-) )
That one I didn't know about, so thanks for the link. Of course, GUIDs (sensu lato) and "uris" are not necessarily the same thing. But that's another argument for another day.
When I say "GUID" I am not throwing around a colloquial term. I intend for it to have the exact technical meaning that it is given in the TDWG standard.
Fair enough -- I must have missed when you defined your use of "GUID" specifically in the context of the TDWG standard.
At this point in time (i.e. after we finally have a ratified standard on
GUIDs),
Maybe I'm mistaken, but I don't think we do. I don't think that an "Applicability Statement" rises to the level of "ratified standard", in the sense that TCS 1.0 and DwC are "ratified standards". Someone with better knowledge of the TDWG process can clarify this.
nobody in our community has any business designing and exposing GUIDs without having read this document and completely understanding its requirements and recommendations.
I certainly would agree with that statement.
I should not have to be "explaining" any of this to anybody on the list.
*Sigh* I often feel the same way. Too often, in fact. I hope you realize that when I complimented you on your 8 points I was complimenting you on the way you "paraphrase out of [your] head".
It is explained clearly and concisely in the standard.
...err "applicability statement".
There has been a bit of a debate over the importance of embedding "actionability" into identifiers inherently (the Tim Berners-Lee perspective)
Wrong. "GUIDs should be resolvable" (direct quote of recommendation 7 from the GUID applicability statement).
No, *NOT* wrong! I will say it again to be perfectly clear: There has been (and continues to be) a bit of a debate over the importance of embedding "actionability" into identifiers inherently. This is and continues to be a true statement. The only extent to which that statement is "wrong" is that I understated it with the words "bit of a". I should have either eliminated those words, or replaced them with "robust".
Don't add more of them to the list. Recommendation 3: "Providers must assign at most one GUID to any particular object." Recommendation 4: "Only one globally unique identifier should be assigned
to each object".
*Exactly*. That's why I think it's foolish to regard all of these different resolution mechanisms as distinct "identifiers". There is *ONE* GUID. It is: A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523. There are ten different ways to make it actionable. It therefore meets the recommendations of the applicability statement.
I draw your attention to p.7 of the "TDWG GUID Applicability Statement", under the heading "Uniqueness and Resolution", where it states the following: ============================ The global uniqueness of an identifier is often confused with the issue of resolution of the identifier. These two attributes of GUIDs can be distinguished and discussed separately. For example a Universally Unique Identifier (UUID) is a globally unique identifier, but there are no widely known and used protocols for resolving a UUID over the Internet (unlike HTTP URIs). This form of GUID is perfectly acceptable for uniquely identifying data objects within a dataset. Some identifiers therefore provide uniqueness, but not resolvability. ============================
The part that's not written there, but I think should have been written there (and that I argued strongly in favor of writing there when the document was drafted), is that GUIDs that are not self-resolving (i.e., not inherently actionable), can be *made* actionable when represented in the context of resolution metadata.
I would assert that what you "want" and what you have in your mind is at odds with the TDWG standard for GUIDs.
I would assert otherwise.
This may be your opinion, but it is at odds with the ratified standard which says.
Again, I don't agree with you on this assertion.
(recommendation 2) that "HTTP GET resolution must be provided for non-self-resolving GUIDs".
Yes, exactly -- and I trust you realize that this is exactly what ZooBank does. Note that the applicability statement does not say there must be *only one* HTTP proxy for the non-self-resolving GUID.
The problem here is caused by you when you create and expose so many different HTTP URI forms of your UUID. Stop doing that (recommendation 4).
And I disagree. ZooBank follows recommendation 4 *precisely*. There is only *ONE* globally unique *identifier* assigned to each object. In this case, that identifier is: A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523. Full stop. End of story. 'Nuff said.
The problem is not that I create and expose so many different HTTP URI forms of my UUID. The problem is when people conflate the function of *identification* with *resolution*. This is where I part company with (what I've been told is) the TB-L school of thought. And no, I don't think I'm smarter than TB-L. If I were the only one who disagreed on this point of conflating resolution metadata with unique and global identification, then I would assume that I'm an idiot and would stop complaining about this. But the more I think about it, and read about it, and understand about it, the more confident I am in standing my ground on this.
There is no need for this. Make a single HTTP URI version of your UUID and stick with it. Preferably one without the query string and use Mod rewrite (or whatever it's called) to transform the simple, clear, and permanent version of the URI into whatever flavor of temporary URL you are liking at the moment. Every application today understands HTTP GET. No need for a registry.
Of course every application understands HTTP GET. That's not the point (at all).
Go with the TCS standard and the TDWG ontology as it exists currently.
If you think that TCS has "the" answer to the "name" problem, then I don't think you fully appreciate the magnitude of the problem.
While it's nice to see the explicit representation of a "name" as an
object,
rather than a string; unfortunately that doesn't address the elephant in
the
room; that is, that different people have different notions of what "a single scientific biological name" is. I'm not talking subtly different shades of fundamentally the same thing; I'm talking about fundamentally different things with different implied sets of properties. This is one
of
the issues I continued to hammer on during the development of TCS, and
the
one that gave me the biggest qualms about TCS 1.0. My hope was that it would be resolved in TCS 2.0.
There ain't no TCS 2.0 . There is only TCS 1.2 . I'm sorry about it, but that's the ratified standard.
Please understand, I'm trying to illustrate where the existing standards fall short of what this community *needs* in order to move forward. Of course we have the standards, and if we allowed our hands to be tied to those standards, there wouldn't be any progress. TCS 1.2 DOES NOT MEET THE NEED. I want to move in the direction of something that DOES meet the need.
There have been any number of things that I would "like" to be the way I want. However, the point of standards is that they get hammered out in a form that satisfies the community in a general way.
Are you saying that the standards are written in stone, and we should be happy with them, and simply live with their limitations? If so, then you're operating in a world that I don't want any part of. I don't think you are, but frankly, the tone of this particular email exchange (by either of us) has not been especially helpful. OBVIOUSLY we should use the standards, as they exist, as much as possible WHEN THEY MEET THE NEEDS. What I was talking about (perhaps in an overly friendly, informal and loose way) is where we need to go to next. We clearly disagree on a few specific interpretations of the TDWG GUID applicability statement, but that's fine -- that's what we should be spending our time focused on.
But GNUB is not an "old system". It is being build from scratch and I would assert that where it comes to interfacing it with the outside world, it should follow standards such as they exist at the moment.
*Exactly*, and obviously it will, as much as feasible, practical and desirable, accommodate the existing standards -- and even the applicability statements -- within their inherent limitations. But speaking as someone who was a very active participant in the development of both the GUID applicability statements and TCS back when those were new and on the cutting edge, I have absolutely no interest in *limiting* what GNUB can do to what those standards articulate. We've moved along now that it's time to start pushing to the next level -- time to start overcoming the limitations those existing standards imposed. Many of those limitations were recognized at the time those documents were drafted, and the drafters acknowledged that some of the improvements would need to wait for the next version. With the development of GNA/GNUB, it's time to move on to the next version. We obviously want the next level to be backward compatible with existing standards, and obviously every effort will be made to maintain backward compatibility.
At the moment, people are allowed to think about and describe names without reducing them solely to usage instances as you would like.
Yes -- which is why I keep emphasizing why GNI will remain an important component.
I spend about an hour yesterday composing a rant about how
counterproductive
it is for taxonomy and computer geeks to create tools and systems that
won't
ever actually be used by the people who need them. I decided that it
wasn't
helpful to actually post it, but now I'm thinking that maybe I should
have...
Perhaps you should -- but keep in mind that statements like "won't ever actually be used by the people who need them" is an awfully broad and bold assertion. Backing up such an assertion begs for an articulation of the full scope of all possible users, and a deep understanding of the function of the systems you are making such assertions about.
dwc:Taxon doesn't really have much of any useful definition, so I'm with
you there.
tn:TaxonName is actually rather precisely defined, at least if you look at
the RDF
(http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/Tax onName.rdf)
Is this the definition you refer to:
"A scientific biological name. An object that represents a single scientific biological name that either is governed by or appears to be governed by one of the biological codes of nomenclature. These are not taxa. Taxa, whether accepted or not, are represented by TaxonConcept objects."
If so, by that definition, how many TaxonName instances are included in the following list?
Aus bus Aus dus Xus bus
Three? Four? Five? I can defend all three of those answers within the scope of the definition above. Assuming no homonyms or misspellings are involved, GNUB would establish four separate Protonyms, each of which can be thought of as a "name object", each with Code-specific properties. Additionally, if these fell under the botanical Code, there would be at least one, and as many as three additional nomenclaturally-relevant TNUs that would establish combination(s) other than the original as distinct "name objects" under the botanical Code.
In my opinion, TCS (and by extension, the TDWG ontology) puts a rather restrictive collar and leash on taxon names.
Enthusiastically Agreed! :-)
I quote from the user guide page 9: "<TaxonName> elements do not represent taxa. They serve only as abstract nomenclatural data structures that encapsulate the core rules of the different nomenclatural codes. Their purpose is to prevent nomenclatural statements becoming confused with statements about the circumscription of, and relationships between, different taxon concepts. No taxonomic opinion can be expressed using <TaxonName> elements in TCS. As a rule of thumb if you are dealing with anything beyond a type specimen and references to it, you are talking about a TaxonConcept of some form." This does not seem like a broad and imprecise definition to me. One is allowed to describe the pieces of the name and that's about it.
Yes, I know -- I helped write that. Unfortunately, it's still not precise enough (as is documented on some wiki somewhere, as we were defining what was originally called "LinneanCore", which later was subsumed into what is now TCS).
When I look carefully at how the TDWG ontology deals with taxon names and taxon concepts, it seems very simple and "usable" to me.
I'll definitely concede that point to you -- is strikes a good balance between ideal and practical. One of the over-arching goals within GNA development is to nudge a bit further towards the "ideal" without compromising on the "simple" and "usable" Whether or not this is possible remains to be seen.
If one defines a Taxon to be composed of a name component and a sensu/sec. component as several people (including you, I think) on this list have done and as TSC has done (I think), then representing it in RDF becomes tractable.
OK, good -- now I'm getting my head back into this conversation. Yes, *my* intent was to keep TCS open-ended such that any "[Name] sec. [Reference]" (=TNU) could be represented through TCS. That is the intention of GNUB. This is where Jessie Kennedy and I had many long debates. From her perspective, only the subset of "[Name] sec. [Reference]" (=TNU) instances that rise to the level of a "taxon definition" should be represented in TCS. This comes down to the fuzzy distinction between an "Identification" and a "Concept Definition". In the latter, presumably one provides a suite of information to help define the boundaries of a taxon-concept circumscription (specimens, characters, synonymy, etc.). In the former, presumably one simply assigns a name-string to an occurrence (or similar) instance of an organism. The problem is that every imaginable version between these two endpoints exists in biodiversity-land, so there is no clear distinction between which instances rise to the level of a "Taxon" and thus are legitimately represented via TCS, and which do not. In my mind, the approach of GNUB should be to not try to establish a distinction, and just accommodate any "[Name] sec. [Reference]" (=TNU) instance.
One anchors the name part to a tn:TaxonName instance (properly collared and chained and wearing a GUID as a dog tag). How one anchors the sensu/sec. part is still a subject for discussion.
This is the essence of a TNU. Except in GNUB-speak, a "TaxonName" is represented by another TNU -- specifically, the TNU that established the name in the first place. So, for example, Linnaeus (1758) established the name "Aus bus". Smith (1990) defines a taxon concept for "Aus bus L.".
TNU1: Aus bus Linnaeus 1758 sec. Linnaeus 1758 TNU2: Aus bus Linnaeus 1758 sec. Smith 1990
The Protonym is TNU1. TNU2 links to TNU1 as the Protonym, and basically translates to "Smith's taxon concept definition labeled with the name 'Aus bus L.'"; or more simply: "Aus bus L. sec. Smith 1990".
I have been thinking about the following approach. It is based on a Venn diagram that I have in my head which I created from your descriptions of TNUs on this list. The Venn diagram has a big rectangle labeled "nominal taxon".
If I correctly understand what you mean by the "Nominal Taxon", I think this equates in GNUB-speak to a Protonym.
Inside that is a smaller rectangle named "taxon name usage (TNU)". Inside that is an even smaller rectangle named "taxon concept".
Hmmmm...maybe. I need to digest this a bit.
In this view, Taxon concepts are well-described/circumscribed by a publication.
Yes.
TNUs (which include taxon concepts) are associated with a particular person's idea of what the taxon is, but which may or may not be described in a publication.
Yes, I think. I would state it this way: a subset of all TNUs are the TNUs that represent well-defined, published definitions of taxon concepts. That is, all taxon concepts are anchored to (born as?) a TNU, but not all TNUs rise to the level of Taxon Concepts.
Depending on how you distinguish "Publication" from non-publication, this may be somewhat of a distracting parameter. Generally, good taxon concept definitions exist within documentation sources that are what most of us would call "published"; but there's nothing inherent to "publication" that is necessary for "good taxon concept definition". Good taxon concept definitions can certainly exist in what many of us would described as "unpublished" form; just as many published TNU's don't rise to the level of good taxon concept definition.
Nominal taxa are all instances of a scientific name use including those where we have no idea who applied the name or what set of organisms they intended to be included in the taxon.
Yes! In GNUB, this is represented by the fact that all the relevant TNUs are anchored to the same Protonym (e.g., Aus bus L. sec. Linnaeus 1758).
In terms of RDF metadata:
- Go ahead and let the rdf:type of the thing be tc:Taxon
Ok. But how does that map to dwc:Taxon?
- Make the object of tc:hasName be a GUID (i.e. as described
by the TDWG GUID Applicability Statement, not some other kind of GUID)-identified resource, preferably from a well-known source like uBio.
Not sure. I don't see uBio as a source of "name objects" so much as "name-strings". I think a better GUID link would be to a GNUB TNU that is a Protonym. This is what is currently registered in ZooBank: Protonyms (the most common kind of Nomenclatural Act; that is, the TNU that represents the establishment of a new scientific name).
- If the sensu/sec. is described in a publication (in my mind
a true taxon concept), then the object of tc:accordingTo is an HTTP proxied DOI, HTTP URI of a BHL-scanned publication, or if both of those fail, something non-resolvable but globally- unique like an ISBN or URL of a stable web page.
OK, yes, I think so. Translated into GNUB-speak, I would say that if the TNU (treatment of a taxon name within a documentation source, like a publication) includes a robust definition of a Taxon Concept, then the linked ReferenceID (GNUB-generated GUID) would ideally be cross-mapped to a content-rich rendering of the identified reference, such as a DOI (presumably resolving to a PDF), an HTTP URI to a set of BHL page-images, or a PLAZI Handle for an XML-marked-up taxon treatment (or any or all of the above).
- If the sensu/sec. is not described in a publication, but is
associated with a particular person (in my mind a TNU that isn't a true taxon concept), then the object of tc:accordingTo could be the URI of a foaf:Person or foaf:Group.
Well, that's not exactly how GNUB would handle it -- but close. Basically, a "Reference" in GNUB represents some form of documentation of information that has been authored (e.g., foaf:Person), and is static as of some moment in time (e.g., publication date). Again, I don't think "publication" is the right parameter to distinguish "taxon concept" from non-taxon-concept. There are many, many TNUs appearing in published works that do not really rise to the level of taxon concept definition. In any case, whether it's published or not, and whether it represents a good taxon definition or not, are two different things that may be correlated, but not hard-linked. Also, regardless of whether it's published, any kind of documentation has the potential of authorship (attribution) and some point in time....in other words, a gnub:Reference instance. There's no reason to use the class of "thing" to which a TNU is linked (e.g., publication object vs. Agent object, as you seem to be suggesting) as the delimiter of what should be treated as a "Taxon Concept" and what should not.
- If the sensu/sec. is completely unknown, then the taxon
is a nominal taxon that is not a TNU. I don't know whether it is better for the taxon to simply lack a tc:accordingTo property or to have a tc:accordingTo property that somehow says "we don't know anything about the sensu/sec.".
Agreed! GNUB-speak, the ReferenceID would be null or (my preference from an implementation perspective "0" (which translates to "we don't have any information about the specific implied usage, so treat it as a nominal taxon").
I realize that you probably aren't going to like this because it isn't as sophisticated and nuanced as you would like for your GNUB TNUs to be.
No, actually I think it's perfectly fine. The reason I like normalized back-end data structures is that they give you much greater flexibility in offering any range of services, from extremely simple to as complex as the back-end data model allows. Moreover, as you said:
However, there would be nothing that would prohibit you from creating and adding a myriad of clever properties to the tc:Taxon instance RDF to make it do all of the things you want.
Exactly.
The practice I have described would break down the act of defining a taxon into well-known, standardized pieces and it is a practice that could be fairly easily be followed by people without sophisticated IT resources. It would allow for the transfer and comparison of taxa information and make the possibility of reconciling at some central location (like GNUB) the taxa that are described in a distributed network of users. Doing something like this is, I believe, the entire reason for the existence of TCS, the TDWG ontology, old TDWG TAG roadmaps, etc.
We are in full agreement!
Please apply some self-discipline to follow the ratified standards or risk blowing us all back to 2005 where we would have to discuss all of the settled things again.
I guess this is where we differ. Besides the semantic issue of "ratified standard" vs. "applicability statement", and the fact that we seem to have somewhat different interpretations of what the GUID applicability statement is actually recommending, I have a somewhat opposite perspective from you on this. In my view, constraining ourselves to TCS 1.2 is forcing us to STAY back in 2005, which had a somewhat different biodiversity informatics landscape from today, and even more different from what (I *hope*) we see emerge over the next 2-3 years. As I said, we want to maintain backward compatibility with TCS 1.2, and we certainly want to adhere to the recommendations of the GUID applicability statement (which I believe I do, except for the specific known issues that are on the "to do" list), but also push forward to overcome the limitations those technologies as a way to prototype the next generation of these equivalent standards & recommendations.
In some ways what I'm talking about here is really (as I understand it) the principle that underlies REST.
Yes! Ever since I had REST explained to me, I've been anxious to implement those kinds of services. Rob Whitton is already at work on ZooBank 2.0, which will be a complete ground-up re-write, and will be services-based.
Within your big GNUB kingdom and my little Bioimages kingdom, we are free to do whatever clever things we want, structure databases as we wish, do clever programming stuff or whatever. But when you and I talk, we follow commonly established rules, namely we talk using the HTTP protocol
Total agreement!
and identify the things that we want to talk about using HTTP URIs.
Errr..sort of. I say we identify things using GUIDs, and provide services that resolve those GUIDs via actionable HTTP URIs (or, if you prefer, embedding those GUIDs within a resolution metadata "wrapper"). Yes, I know it's all the rage to collapse the functions of actionability and globally unique identification into the same text-string URI (what I've been referring to as the TB-L perspective). But to be perfectly blunt, I see this as a mistake that will, in the long run, sow down our progress.
Since we are talking specifically about biodiversity informatics, we should choose to follow more restrictive rules about the identifiers themselves (following the TDWG GUID applicability statement) and the nature of the RDF (following the GUID applicability statement, well-known vocabularies such as the TDWG ontology, FOAF, DCMI, Darwin Core, geo, etc.). If we fail to do that, then every interaction that I have with another entity requires me to establish in advance the rules of that interaction. The Web works well because people follow a defined set of rules about URLs and HTML. I would assert that we now (at last) have a similar model available to us in the biodiversity informatics community if organizations would just have the self-discipline to use it.
Agreed! I think when we distill this entire exchange, we'll find that we have slightly different interpretations about what the GUID applicability statement actually says & means, and a non-trivial amount of miscommunication, but otherwise (as was the case the last time we had such a voluminous exchange), we're actually more on the same page than not.
So I'm actually pretty optimistic about the whole venture assuming that we can get people and organizations to actually read and try to follow the standards that we have already agreed upon.
I think it's nice to end this email on a point of strong agreement!
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html