[tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]

Peter DeVries pete.devries at gmail.com
Tue May 10 04:37:25 CEST 2011


I have been explaining this since at least 2006.

I have made several proposals including the recent dwc_area.owl, which
incorporates both geo and some estimate of extent/error.

In order to do this correctly you need to use URI's that correctly resolve
following semantic web standards.

It is still not clear to me and probably others where the authoritative DWC
vocabulary is.

I have made several trips to vist other groups and work some issues out and
get these connected to related efforts like the GNI.

It was planned that Rich and I were going to try connecting these to his
TNU's once he has gotten his head above water. (figuratively)

You did not ask about these specific issues until you had already completed
your vocabulary.

In other words, you started this without really understanding what exists or
existing semantic web practices.

As was mentioned earlier it is relatively easy to link via equivalent
property and equivalent class.

Based on the examples from Rich and an earlier bird example from David
Remsen I would say that the David's three birds and the Rich's two fish
should get different concept URI's.

These concept URI's should be supported by photo's and specimens etc that
give users some guidance as to what kinds of things are instances of that
concept and what things are not instances of that concept.

If Rich choose the supporting individuals and references then he would be
the editor of that concept, linked in a machine interpretable way to that

Centropyge fisheri    se:q72fd
http://lod.taxonconcept.org/ses/q72fd#Species  the html version for the
concept could be this http://www.eol.org/pages/221698
Centropyge flavicauda se:Mj6j4
http://lod.taxonconcept.org/ses/Mj6j4#Species  the html version for the
concept could be this http://www.eol.org/pages/224676

What is missing from these examples are a curated set of specimens and
photographs which are linked via their own URI's

Ideally you want try to keep these concepts from overlapping so when someone
tags their specimens or other data it should be clear which is the most
appropriate concept.

Now someone can review the candidates and make the assertion that the fish
in their hand is an instance of this concept using a relatively stable URI
that is simple to include in publications, is trackable and links to the
editor Rich.

Merging these later would be relatively simple, splitting further is
possible but more complicated.

For the fish example you might want to link to a different concept on the
web that indicates (and makes findable) related concepts where the set of
specimens are different.

This was the area of modeling that I was planning on working out with Rich.

What I propose also allows tracking of multiple individuals, parts of
individuals and individuals over time.

I just have not marked up examples of these.

Note that the Plant List seems to do something like this but without URI's,
representative photos or linked specimens. In addition it is not open or
machine interpretable.


- Pete

On Mon, May 9, 2011 at 8:10 PM, Steve Baskauf
<steve.baskauf at vanderbilt.edu>wrote:

>  Pete,
> Comments inline.
> Peter DeVries wrote:
> This may surprise you but on the Linked Open Data Cloud
> TaxonConcept/GeoSpecies is a well known vocabulary.
>  I has been looked at very closely by many in that community and has been
> revised overtime.
>  Other data sets including EUNIS - the European Union Environmental Agency
> is in part based on it.
> I am not knowledgeable enough to dispute this.
>  Among the LOD data sets TaxonConcept is one of the few biological
> datasets that correctly follows the standards.
> To which standards are you referring?  W3C?  Generic rules for using RDF
> and OWL?  LOD best-practices?  TDWG?  If TDWG standards, which ones?
>  What I don't understand is why your new vocabulary is seen as an official
> TDWG effort and mine is not.
> OK, I think that I said this before, but I'll say it again for clarity:
> DSW is NOT an official TDWG effort!  It is a Cam Webb/Steve Baskauf effort.
> It is a functioning demonstration of a possible approach to describing
> biodiversity resources in RDF, just as taxonconcept.org is.  However, it
> IS based on using terms from Darwin Core, which IS a ratified TDWG
> standard.  It also (for the moment) incorporates the sections of the TDWG
> Ontology which create terms in RDF to describe TaxonConcept instances in
> accordance with another ratified TDWG standard: the TSC schema.  So although
> DSW is NOT an official TDWG effort, it is composed of as many pieces of
> official TDWG efforts as we could find to build it from.
>  Also it is not clear to me how you define a taxon. Are *Felis concolor*and
> *Puma concolor* the same thing or different things?
> I have already confessed long ago that Cam and I dodged this question by
> not defining the Taxon class ourselves.  We used the relevant sections of
> the TDWG Ontology because it was based on a TDWG standard (TCS) and was
> consistent with other published references discussed on the page
> http://code.google.com/p/darwin-sw/wiki/ClassTaxon.  Ask your question to
> Roger Hyam (who I think wrote most of the TDWG Ontology) and Jessie Kennedy
> (who along with Roger wrote the TCS standard).
>  It is also not clear to me how it is determined what is "accepted" and
> "consensus" based on the few people that take the time to write to the list.
> Well, with regards to Darwin Core, we tried to pay particular attention to
> the comments of John Wieczorek and Markus Döring whose names appear on the
> DwC documentation, and to Rich Pyle who was apparently consulted about at
> least parts of DwC.  I confess that they probably qualify as Illuminati, but
> they did write it.  I would also note that in the documentation of DSW we
> tried to show how the structure of DSW class interrelationships was related
> to earlier models such as the ASC model (see
> http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels).
> Given that the ASC model has been around since 1992 and formed the basis for
> several later models that were in general use, I think that it has some
> status as "accepted".  In my analysis of the many posts to the list over the
> past couple years (
> http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary), there
> was (in my opinion) general agreement about many of the more straightforward
> classes (such as Event and Identification).  In the cases where there was
> not uniform agreement, we tried to note this in the DSW documentation and
> provide a rationale for why we chose between the alternatives viewpoints
> that were expressed.  I will grant that the people who post to the list are
> probably a minority of the people who are receiving the posts.  But that's
> all we have.
>  What do the lurkers think?
>  I think for the vast majority of data providers the current DarwinCore
> works for data submission.
>  What could happen in the future is that GBIF takes these records, cleans
> them and exposes some portion of their data in a a more semantic markup to
> the LOD cloud.
> I tried to express in my last post why I felt that it was consistent with
> the past stated goals of TDWG to prioritize the needs of data submission
> over the needs of semantic interpretation when those two goals conflicted.
> I took a considerable amount of time to try to understand the
> taxonconcept.org ontology and in a previous message, to articulate the
> questions that I had about the structure of it.  Those questions were framed
> around the basic problem of the tradeoff between constructing an ontology
> that made prioritized semantic querying (the subclassing cats by color
> approach) and constructing an ontology that prioritized simplicity in class
> structure (the single cat type with color properties approach).  I don't
> think that you ever really responded to those specific questions/criticisms
> other than to say that your ontology was great and everybody should use it,
> and to demonstrate SPARQL queries with it.
> Really, I can't speak for Cam, but from my perspective, I would guess that
> 5% or less of the time I spent working on DSW was involved in actually
> writing the RDF (that's excluding the time it took to learn how to use
> Protege and Subversion).  At least 95% of the time or more was spent
> puzzling over emails and papers, writing annoying questions to the list to
> try to get people to talk about how they understood things, and then writing
> up the documentation with references.  If you want people to accept the
> taxonconcept.org ontology as a means to markup metadata and type
> resources, then I would encourage you to write up some detailed
> documentation explaining how your approach is similar or different from
> earlier models (and if different, why).  Also, some detailed explanations of
> what you intend the classes to represent would be helpful.  For example, I
> had initially assumed that what you meant by "Occurrence" and
> "SpeciesIndividual" meant the same thing as we had described for
> "Occurrence" and "IndividualOrganism" in the DSW documentation.  However, in
> your email responses it was clear to me that you meant something different
> (although I'm not sure what that was).
> You suggested (if I'm understanding you correctly) in a previous email that
> you created taxonconcept.org so that its terms and classes could be used
> as a part of the TDWG infrastructure.  If that is your intention, then you
> need to describe in a detailed document how the mapping from an existing
> data markup standard (i.e. DwC) to taxonconcept.org should be
> accomplised.  As I pointed out in an earlier email, the structure of
> taxonconcept.org is rather complex ("reticulated") and potentially
> utilizes up to millions of classes, some of which have (to me) unclear
> connections to the DwC classes.  The correspondence between DSW classes and
> DwC classes is generally not an issue because DSW classes ARE DwC classes
> (mostly).
>  The goal of my species concepts are to create URI's for a species that
> have a RDF (machine interpretable) and HTML representation.   (the HTML
> representation is likely to change in the future to something much better)
>  That URI can then be used as a GUID that is relatively stable despite
> changes in nomenclature and classification hierarchies.
>  This allows searches for occurrence records and other data that are tied
> to the thing most call *Puma concolor* but is also known by many other
> names.
>  The alternative is for someone to search under all the potential name
> variants (assuming that they know them all)
> As I pointed out before, taxonconcept.org is built around taxa-related
> classes and as such it is good at doing the kinds of taxonomy-related
> queries that you describe above.  However, the TDWG community also includes
> people who are less interested in taxonomy and more interested in things
> like tracking individual whales over space, connecting different types of
> evidence that is located in different institutions, collecting data on one
> organism over time, etc.  That is why I questioned constructing a
> taxon-oriented ontology rather than an ontology containing a few general
> classes of things.
> Steve
>  The example queries are to show that the system seems to work in the way
> people expect.
>  To do this correctly and allow the functionality that people seen to want
> it will need to be a little complex.
>  Respectfully,
>  - Pete
> On Mon, May 9, 2011 at 3:53 PM, Steve Baskauf <
> steve.baskauf at vanderbilt.edu> wrote:
>>  Being either fearless or a fool (is there actually a difference?), I
>> shall tread into this subject area at which I am a mere novice.  So be
>> kind...
>> I think that there may be several "solutions" to this problem.  The one
>> that is "correct" probably depends on what one is trying to accomplish.  So
>> I will try to describe in the most succinct way what Cam and I were trying
>> to accomplish with DSW, and how that fits in with this thread.  Cam and I
>> basically wanted to do two things:
>> 1. Make it possible to use GUIDs RIGHT NOW (not five years from now).
>> 2. Create an extremely stripped down ontology that would be
>> non-controversial enough that people might actually use it, but which
>> wouldn't do anything so bad that it would inhibit future development in the
>> Semantic Web context (i.e. it could be extended in the future by clever
>> people to do clever Semantic Web stuff).
>> Amazingly, the GUID Applicability Statement has achieved the status of
>> Standard-hood! (http://www.tdwg.org/standards/150/)  Hooray!  I sort of
>> missed the announcement, but ran across the fact the other day when I was
>> surfing through the TDWG website.  Since the GUID A.S. is now a TDWG
>> Standard, I would say it would now officially be a best-practice to follow
>> it.  In particular, Recommendation 11 states "Objects in the biodiversity
>> domain that are identified by a GUID should be typed using the TDWG ontology
>> or other well-known vocabularies in accordance with the TDWG common
>> architecture."  This is somewhat problematic, given that the TDWG ontology
>> (with the possible exception of the Taxon/TaxonConcept part) is effectively
>> ("socially"?) deprecated.  What is the alternative "other well-known
>> vocabulary"?  There is none, at least none having any kind of official
>> status with TDWG.
>> I recently discovered (or maybe re-discovered) the Technical Architecture
>> Group (TAG) Technical Roadmaps from 2006-2008:
>> http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc
>> http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf
>> http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf
>> I might have seen them before, but if so it was at the point where I was
>> really not knowledgeable enough to comprehend them.  I found it very
>> instructive to read about what the TAG had in mind when it set out to create
>> the TDWG Ontology.  In particular (from the 2007 Roadmap):
>> "From the point of view of exchanging data - such as in the federation of
>> a number of natural history collections - there is no need for a standards
>> architecture.  The federation is a closed system where a single exchange
>> format can be agreed on. ... This model has worked well in the past but it
>> does not meet the primary use case that is emerging.  Biodiversity research
>> is typically carried out by combining data of different kinds from multiple
>> sources. The providers of data do not know who will use their data or how it
>> will be combined with data from other sources. The consumer needs some level
>> of commonality across all the data received so that it can be combined for
>> analysis without the need to write computer software for every new
>> combination." [This brings to mind the very "different kinds" of resources
>> Cam is documenting in Borneo and the "multiple sources" that will be
>> handling the metadata once those resources are sent off to herbaria, labs,
>> and arboreta.]
>> and from the 2008 Roadmap:
>> "If GUIDs are used to uniquely identify 'pieces' of data we need to have a
>> shared understanding of what we mean by a 'piece of data' i.e. what kind of
>> thing is it that a particular id applies to, a specimen, a person, an
>> observation, a complete data set. We also need to have a shared
>> understanding of at least some of the properties we use to describe these
>> things."
>> Having been barely aware of TDWG's existence in 2008, I am blissfully
>> ignorant of whatever disagreements may have occurred regarding LSIDs,
>> reification, or whatever, and really don't want to know about them.  All I
>> can say as an outside observer is that it appears that the failure of the
>> initial efforts to get GUIDs and the TDWG Ontology off the ground was
>> because the system envisioned was too complicated to maintain, too
>> complicated to gain a consensus, and to complicated to actually explain to
>> anybody.  Now that GUIDs seem to have a new lease on life, it seems like the
>> greatest chance of successfully implementing them is to start by keeping
>> things absolutely as simple as possible.  To Cam and me, Darwin Core seemed
>> to be the only candidate for something relatively simple and relatively
>> universally accepted on which one could base an ontology that could be used
>> to type GUIDs and to use to express "a shared understanding of at least some
>> of the properties used to describe" biodiversity resources.  Although I was
>> somewhat skeptical that there was a "community consensus" about what the DwC
>> classes meant and how they were related to each other, the exhaustive
>> discussion on this list in Oct/Nov convinced me that maybe there WAS a
>> consensus, or at least enough of a consensus to move forward.  Although some
>> people may at the present time be interested in figuring out how to do
>> things like "define 'Fish' as an owl class as well as as a Taxon object", I
>> would assert that is outside the core mission of TDWG as stated: to
>> "develop, adopt and promote standards and guidelines for the recording and
>> exchange of data about organisms evidenced by the historical record".  It is
>> fun to talk about, but to me not the primary consideration in designing a
>> community data exchange model.  This outlook explains to some extent why I
>> asked questions about the complexity of taxonconcept.org and its
>> orientation toward facilitating semantic queries.  There is nothing wrong
>> with that, but it doesn't seem to be the direction that TDWG has said it
>> wants to go.  Perhaps when we have "gotten there" (i.e. have a functioning
>> system using GUIDs for clearly typed resources), we might want to embark
>> further down the road to the Semantic Web.
>> Aside from just importing the DwC classes into the DSW ontology and
>> connecting them with object properties, Cam and I did a little nasty thing
>> with them.  It has been said that declaring ranges and domains for terms
>> doesn't prevent people from using the terms to express relationships among
>> the "wrong" types of things.  Rather, it simply asserts that those things
>> are instances of the classes used in the range and domain declarations for
>> the term.  That is sort of true, but by declaring many of the core DwC
>> classes to be disjoint, we actually ARE preventing people from using the
>> wrong object properties with instances of the wrong classes.  If  Joe
>> Curator rdf:type's a determination as a dwc:Identification, but then uses
>> dsw:atEvent (which has the domain dwc:Occurrence) as a property of the
>> determination, then a reasoner will infer that the determination is a type
>> dwc:Occurrence as well as the explicitly declared type dwc:Identification.
>> Because dwc:Identification and dwc:Occurrence are disjoint classes, the
>> reasoner will have a fit.  Cam and I are being Naughty (sensu Bob Morris)
>> because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator
>> is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the
>> statement from the 2007 Roadmap: "The consumer needs some level of
>> commonality across all the data received...".  Joe Curator is not being
>> consistent with the "shared understanding of at least some of the
>> properties" to the extent that DSW reflects the "shared understanding" of
>> the TDWG community.  We are basically trying to enforce a sort of orthodoxy
>> on the use of DwC classes as rdf:types and on the connections between the
>> dwc:classes so that people can have some reasonable expectation that they
>> are talking the same language as their partners whose data are also being
>> aggregated in the same federated database.
>> It seems to me that this "enforcement of orthodoxy" may be very much at
>> odds with the free-wheeling spirit of the Semantic Web community where
>> Anybody Can Say Anything About Anything.  But when I look over those old TAG
>> roadmaps, I see little having to do with clever Semantic inferencing.  I see
>> a lot about providers and consumers understanding what each other are
>> talking about.  To some extent, Darwin Core can provide most of the
>> necessary commonality between providers and consumers.  There were (in our
>> opinion) three areas where it could not.  One was the lack of a class to
>> link repeated sampling events and determinations (dwc:IndividualOrganism or
>> TaxonomicallyHomogeneousEntity if you prefer) and another was a class that
>> allowed for the separation of evidence from the Occurrence documented by it
>> (called by us the dsw:Token class).  The other area was the dwc:Taxon class
>> which did not seem clear enough in its definition nor to possess enough
>> complexity to express the kinds of relationships commonly discussed on this
>> list.  dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e.
>> usable in rdf:type declarations)
>> So I guess having read the various responses to my query and thinking
>> about the history of the TDWG Ontology, I would say that it may not really
>> be important how dwc:Taxon could be tied to tc:Taxon because the two classes
>> probably don't need to be tied together anyway.  As it currently stands,
>> dwc:Taxon (outside of DSW) has no semantic meaning other than what people
>> want to believe that it means because it's not tied to any other classes by
>> object properties of its instances.   The mish-mash of terms describing
>> names and taxa listed under dwc:Taxon add to the confusion - since the DwC
>> vocabulary purposefully does not declare domains for the terms listed under
>> a class they really could be used as properties for an instance of any class
>> anywhere.  In contract, tc:Taxon does have properties that are described
>> clearly in the TDWG Ontology.  The only reason that we declared the two
>> classes to be equivalent was to signal that we felt that some of the DwC
>> terms listed under dwc:Taxon in the DwC vocabulary could be used as data
>> properties for the things in the tc:Taxon class that people like Paul were
>> describing with properties from the TDWG Ontology.  Tying them together
>> doesn't (at the moment) mess up anything that anybody is doing with
>> dwc:Taxon because (outside of DSW) there isn't anything to actually DO with
>> dwc:Taxon in RDF.  However, the point is well taken that if someone in the
>> future did decide to define properties specifically intended for use with
>> dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon
>> properties.
>> It seems to me like the real road forward (if one believes as I do that
>> DwC is the only practical alternative to use for typing GUIDs) would be to:
>> 1. decide that the TDWG Ontology in its dead form adequately describes
>> taxa, names, and their properties (use it as-is).  OR
>> 2. decide that although the TDWG Ontology doesn't do everything that
>> people want it to do at the present time, it could resurrected and modified
>> to do what people want (use it and hope for the future).  OR
>> 3. decide to just create the additional classes, e.g. dwc:Name (or
>> dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object
>> properties for dwc:Taxon and dwc:Name that are needed to get the job done
>> (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
>> In any of these three alternatives, there isn't actually any reason to tie
>> the two classes together that I can see.  Of these three, I think the third
>> option would probably be preferable, although it might put Paul (and any
>> others currently using the TDWG Ontology to describe Taxon instances) in the
>> unpleasant position of having to redo their RDF.
>> Steve
>> Paul Murray wrote:
>>  On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
>>  Paul
>> I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y).
>> And this is built into standard semantic web reasoners - which is a bonus.
>> But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar).
>> Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
>> A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
>> (another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
>> This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
>> But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
>> Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties.  At least ... I think it is. I should write a test case.
>> If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
>> Please consider the environment before printing this email.
>> _______________________________________________
>> tdwg-content mailing listtdwg-content at lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
>> .
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> --
> ------------------------------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> Email: pdevries at wisc.edu
> TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
> Bases
> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
> --------------------------------------------------------------------------------------
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries at wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110509/f886ec2c/attachment.html 

More information about the tdwg-content mailing list