[tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]

Greg Whitbread ghw at anbg.gov.au
Tue May 10 05:10:39 CEST 2011


I'm not too keen on the hack-a-thon idea.  It will in all probability
just consume important contact time with activity that is best
integrated into real projects and reported in a forum such as this or at
the TDWG meeting. Much in the way that Steve and Cam and Pete and Paul
are doing now.  

We (you and I ) have drafted a proposal to put to TDWG executive for a
2-3 day workshop prior to this years meeting to establish context for
these issues within the framework of a TDWG Technical Architecture.  Do
we need to evolve a TDWG level understanding of the requirement for
semantic interoperability within our standards space? Would it be useful
to spend time and effort to formally model the TDWG domain? Is there a
role for TAG? Can we improve the process?

Hopefully, we can find an opportunity to get a small group together
between now and then do a little planning around agenda, background
requirements, preparation workload and specialist inputs.


On Tue, 2011-05-10 at 07:26, Kevin Richards wrote:
> I wonder if it would be a good idea to have a session (hackathon?) at
> this years TDWG meeting to look at / prove / experiment with, the
> various ways of working with semantic web data and ontologies we
> discuss here?
> This would soon show any benefits/disadvantages etc of the various
> approaches.
> Is anyone lined up / keen to promote such a session?
> Kevin
> From: tdwg-content-bounces at lists.tdwg.org
> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Steve
> Baskauf
> Sent: Tuesday, 10 May 2011 8:54 a.m.
> To: Paul Murray
> Cc: tdwg-content at lists.tdwg.org List
> Subject: Re: [tdwg-content] Heretics and illuminati, oh my!
> Being either fearless or a fool (is there actually a difference?), I
> shall tread into this subject area at which I am a mere novice.  So be
> kind...
> I think that there may be several "solutions" to this problem.  The
> one that is "correct" probably depends on what one is trying to
> accomplish.  So I will try to describe in the most succinct way what
> Cam and I were trying to accomplish with DSW, and how that fits in
> with this thread.  Cam and I basically wanted to do two things:
> 1. Make it possible to use GUIDs RIGHT NOW (not five years from now). 
> 2. Create an extremely stripped down ontology that would be
> non-controversial enough that people might actually use it, but which
> wouldn't do anything so bad that it would inhibit future development
> in the Semantic Web context (i.e. it could be extended in the future
> by clever people to do clever Semantic Web stuff).  
> Amazingly, the GUID Applicability Statement has achieved the status of
> Standard-hood! (http://www.tdwg.org/standards/150/)  Hooray!  I sort
> of missed the announcement, but ran across the fact the other day when
> I was surfing through the TDWG website.  Since the GUID A.S. is now a
> TDWG Standard, I would say it would now officially be a best-practice
> to follow it.  In particular, Recommendation 11 states "Objects in the
> biodiversity domain that are identified by a GUID should be typed
> using the TDWG ontology or other well-known vocabularies in accordance
> with the TDWG common architecture."  This is somewhat problematic,
> given that the TDWG ontology (with the possible exception of the
> Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. 
> What is the alternative "other well-known vocabulary"?  There is none,
> at least none having any kind of official status with TDWG.  
> I recently discovered (or maybe re-discovered) the Technical
> Architecture Group (TAG) Technical Roadmaps from 2006-2008:
> http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc
> http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf
> http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf
> I might have seen them before, but if so it was at the point where I
> was really not knowledgeable enough to comprehend them.  I found it
> very instructive to read about what the TAG had in mind when it set
> out to create the TDWG Ontology.  In particular (from the 2007
> Roadmap):
> "From the point of view of exchanging data - such as in the federation
> of a number of natural history collections - there is no need for a
> standards architecture.  The federation is a closed system where a
> single exchange format can be agreed on. ... This model has worked
> well in the past but it does not meet the primary use case that is
> emerging.  Biodiversity research is typically carried out by combining
> data of different kinds from multiple sources. The providers of data
> do not know who will use their data or how it will be combined with
> data from other sources. The consumer needs some level of commonality
> across all the data received so that it can be combined for analysis
> without the need to write computer software for every new
> combination." [This brings to mind the very "different kinds" of
> resources Cam is documenting in Borneo and the "multiple sources" that
> will be handling the metadata once those resources are sent off to
> herbaria, labs, and arboreta.]
> and from the 2008 Roadmap:
> "If GUIDs are used to uniquely identify 'pieces' of data we need to
> have a shared understanding of what we mean by a 'piece of data' i.e.
> what kind of thing is it that a particular id applies to, a specimen,
> a person, an observation, a complete data set. We also need to have a
> shared understanding of at least some of the properties we use to
> describe these things."
> Having been barely aware of TDWG's existence in 2008, I am blissfully
> ignorant of whatever disagreements may have occurred regarding LSIDs,
> reification, or whatever, and really don't want to know about them. 
> All I can say as an outside observer is that it appears that the
> failure of the initial efforts to get GUIDs and the TDWG Ontology off
> the ground was because the system envisioned was too complicated to
> maintain, too complicated to gain a consensus, and to complicated to
> actually explain to anybody.  Now that GUIDs seem to have a new lease
> on life, it seems like the greatest chance of successfully
> implementing them is to start by keeping things absolutely as simple
> as possible.  To Cam and me, Darwin Core seemed to be the only
> candidate for something relatively simple and relatively universally
> accepted on which one could base an ontology that could be used to
> type GUIDs and to use to express "a shared understanding of at least
> some of the properties used to describe" biodiversity resources. 
> Although I was somewhat skeptical that there was a "community
> consensus" about what the DwC classes meant and how they were related
> to each other, the exhaustive discussion on this list in Oct/Nov
> convinced me that maybe there WAS a consensus, or at least enough of a
> consensus to move forward.  Although some people may at the present
> time be interested in figuring out how to do things like "define
> 'Fish' as an owl class as well as as a Taxon object", I would assert
> that is outside the core mission of TDWG as stated: to "develop, adopt
> and promote standards and guidelines for the recording and exchange of
> data about organisms evidenced by the historical record".  It is fun
> to talk about, but to me not the primary consideration in designing a
> community data exchange model.  This outlook explains to some extent
> why I asked questions about the complexity of taxonconcept.org and its
> orientation toward facilitating semantic queries.  There is nothing
> wrong with that, but it doesn't seem to be the direction that TDWG has
> said it wants to go.  Perhaps when we have "gotten there" (i.e. have a
> functioning system using GUIDs for clearly typed resources), we might
> want to embark further down the road to the Semantic Web.
> Aside from just importing the DwC classes into the DSW ontology and
> connecting them with object properties, Cam and I did a little nasty
> thing with them.  It has been said that declaring ranges and domains
> for terms doesn't prevent people from using the terms to express
> relationships among the "wrong" types of things.  Rather, it simply
> asserts that those things are instances of the classes used in the
> range and domain declarations for the term.  That is sort of true, but
> by declaring many of the core DwC classes to be disjoint, we actually
> ARE preventing people from using the wrong object properties with
> instances of the wrong classes.  If  Joe Curator rdf:type's a
> determination as a dwc:Identification, but then uses dsw:atEvent
> (which has the domain dwc:Occurrence) as a property of the
> determination, then a reasoner will infer that the determination is a
> type dwc:Occurrence as well as the explicitly declared type
> dwc:Identification.  Because dwc:Identification and dwc:Occurrence are
> disjoint classes, the reasoner will have a fit.  Cam and I are being
> Naughty (sensu Bob Morris) because we are inhibiting the extensibility
> of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and
> Webb) because Cam and I believe in the statement from the 2007
> Roadmap: "The consumer needs some level of commonality across all the
> data received...".  Joe Curator is not being consistent with the
> "shared understanding of at least some of the properties" to the
> extent that DSW reflects the "shared understanding" of the TDWG
> community.  We are basically trying to enforce a sort of orthodoxy on
> the use of DwC classes as rdf:types and on the connections between the
> dwc:classes so that people can have some reasonable expectation that
> they are talking the same language as their partners whose data are
> also being aggregated in the same federated database.  
> It seems to me that this "enforcement of orthodoxy" may be very much
> at odds with the free-wheeling spirit of the Semantic Web community
> where Anybody Can Say Anything About Anything.  But when I look over
> those old TAG roadmaps, I see little having to do with clever Semantic
> inferencing.  I see a lot about providers and consumers understanding
> what each other are talking about.  To some extent, Darwin Core can
> provide most of the necessary commonality between providers and
> consumers.  There were (in our opinion) three areas where it could
> not.  One was the lack of a class to link repeated sampling events and
> determinations (dwc:IndividualOrganism or
> TaxonomicallyHomogeneousEntity if you prefer) and another was a class
> that allowed for the separation of evidence from the Occurrence
> documented by it (called by us the dsw:Token class).  The other area
> was the dwc:Taxon class which did not seem clear enough in its
> definition nor to possess enough complexity to express the kinds of
> relationships commonly discussed on this list.  dwc:Taxon needs to be
> "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type
> declarations)
> So I guess having read the various responses to my query and thinking
> about the history of the TDWG Ontology, I would say that it may not
> really be important how dwc:Taxon could be tied to tc:Taxon because
> the two classes probably don't need to be tied together anyway.  As it
> currently stands, dwc:Taxon (outside of DSW) has no semantic meaning
> other than what people want to believe that it means because it's not
> tied to any other classes by object properties of its instances.   The
> mish-mash of terms describing names and taxa listed under dwc:Taxon
> add to the confusion - since the DwC vocabulary purposefully does not
> declare domains for the terms listed under a class they really could
> be used as properties for an instance of any class anywhere.  In
> contract, tc:Taxon does have properties that are described clearly in
> the TDWG Ontology.  The only reason that we declared the two classes
> to be equivalent was to signal that we felt that some of the DwC terms
> listed under dwc:Taxon in the DwC vocabulary could be used as data
> properties for the things in the tc:Taxon class that people like Paul
> were describing with properties from the TDWG Ontology.  Tying them
> together doesn't (at the moment) mess up anything that anybody is
> doing with dwc:Taxon because (outside of DSW) there isn't anything to
> actually DO with dwc:Taxon in RDF.  However, the point is well taken
> that if someone in the future did decide to define properties
> specifically intended for use with dwc:Taxon, those properties would
> be hopelessly tangled with tc:Taxon properties.  
> It seems to me like the real road forward (if one believes as I do
> that DwC is the only practical alternative to use for typing GUIDs)
> would be to:
> 1. decide that the TDWG Ontology in its dead form adequately describes
> taxa, names, and their properties (use it as-is).  OR
> 2. decide that although the TDWG Ontology doesn't do everything that
> people want it to do at the present time, it could resurrected and
> modified to do what people want (use it and hope for the future).  OR
> 3. decide to just create the additional classes, e.g. dwc:Name (or
> dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and
> object properties for dwc:Taxon and dwc:Name that are needed to get
> the job done (i.e. just dump the TDWG Ontology as unfixable and make
> up new stuff).  
> In any of these three alternatives, there isn't actually any reason to
> tie the two classes together that I can see.  Of these three, I think
> the third option would probably be preferable, although it might put
> Paul (and any others currently using the TDWG Ontology to describe
> Taxon instances) in the unpleasant position of having to redo their
> RDF.
> Steve
> Paul Murray wrote: 
> On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
>         Paul
>         I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y).
>         And this is built into standard semantic web reasoners - which is a bonus.
>         But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar).
>         Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
> A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
> (another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
> This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
> But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
> Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties.  At least ... I think it is. I should write a test case.
> If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
> Please consider the environment before printing this email.
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> .
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
> ______________________________________________________________________
> Please consider the environment before printing this email
> Warning: This electronic message together with any attachments is
> confidential. If you receive it in error: (i) you must not read, use,
> disclose, copy or retain it; (ii) please contact the sender
> immediately by reply email and then delete the emails.
> The views expressed in this email may not be those of Landcare
> Research New Zealand Limited. http://www.landcareresearch.co.nz
> ______________________________________________________________________
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

australian centre for plant bIodiversity research<------------------+
national            greg whitBread             voice: +61 2 62509 482
botanic Integrated Botanical Information System  fax: +61 2 62509 599
gardens                      S........ I.T. happens.. ghw at anbg.gov.au
+----------------------------------------->GPO Box 1777 Canberra 2601

If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email.

More information about the tdwg-content mailing list