Re: [Tdwg-tag] Primary Objects as XML Structures or OWL Classes

22 Feb 2006


      Yes as to your understanding 1&2. As to 3, I am actually a big fan of both
XML Schema and RDF/S when appropriately used by the appropriate audience for
the appropriate problem. My big complaint in the current discussion is that
I think most biologists are not the appropriate audience for RDF/S in the
present state of its tools. For example, SDD is not about
characters/states/descriptions as its title might suggest. Rather it is
about how those things are \constrained/, and it is practicing biologists
who have to prepare the ontologies of taxonomic groups and controlled
vocabularies for describing them in various contexts. Doing that is not the
role of SDD, so if anything, the SDD team would be asked to serialize SDD in
RDFS for control of serialization by biologists of controlled vocabularies
in RDF (or whatever else RDFS might be to constrain). I don't even think
this is bad in principle, but rather way premature. By contrast, for Taxon
Concepts, Specimens, and a few other things of recent attention, it is
basically information specialists closely tied to TDWG activities who with
advice and consent of biologists (which includes most of the activists or
their close collaborators) who will prepare those ontologies, which are much
more nearly static---except for their evolution---than something which
biologists do more or less routinely, even if more or less informally. I can
see a project in which (after a little more SDD polishing and updating of
tools produced before St. Petersburg) in which SDD is indeed specified in
UML. Indeed Gregor is writing a dissertation on descriptive data in which
\much/ of his explanation and arguments are couched in UML diagrams.


On 2/20/06, Roger Hyam <roger@tdwg.org> wrote:
...
Hi Bob,
I'm rushing off to the GISIN meeting at AGADIR and might not have much
time to respond more before midweek, or maybe even until I get back next
week, but:
0.  I _wish_ this discussion were taking place in a wiki, with RSS or
email notification,  so it is easier to follow if you cannot keep up with
the email
The way I was planning on running the TAG discussions was to have
'discussions' on the mailing list and summarize them to the wiki. The
motivation behind this is to work towards the wiki being a readable document
for the uninitiated. It should not be necessary for some one coming new to a
field to have to read all the discussions that have taken place to reach a
conclusion. These discussions should be available but it is the job of an
editor/facilitator to create a readable narrative from possibly wandering
dialog.
The wiki is here: http://www.tdwg.hyam.net/twiki/bin/view/TAG
The URL will change at some point in the next few months but I will make
sure all URLs forward to the appropriate place on the new server. There is
no RSS feed on it at present I'll see about setting one up either now or
when we move it to the main server.
The mailing list archive is here:
http://lists.tdwg.org/pipermail/tdwg-tag_lists.tdwg.org/ so any thread can
be followed and resurrected at any time.
I take on board what you are saying though and will try and create links
between the wiki and list archive.
1.  I don't think specifications of high level things like "objects"
should be done in a serlization constraint languge such as RDF or XML
Schema. Instead, it should have something more general as the normative
definition and have _representation_ in one or more of such constraint
languages. This is the mechanism of W3C usually. Many (Most?) W3C standards
have a normative BNF definition, and one or more representations to allow
implementers to actually do business.  OMG favors UML for this, etc.Thereis nothing inherently normative about, say RDF or XML Schema, for, say
TaxonConcepts. If you take the serialization language as the normative
language, then in the future you just end up having to support several
serialization languages when you find you want to extend your specification
with something for which the chosen one is insufficiently expressive. This,
in fact, is what is going on now with the cries for RDF over XML Schema. Put
another way, if you choose language L as the normative language, you are not
building a specification, but rather a set of constraints on applications
written in L. Such things do not have as long a life as actual
specifications do and mature standards bodies do not seem to use
serialization languages as the root specification language, as far as I can
tell.  My conclusion is that specifications should not be in anything like
RDF or XML Schema, but in something else---BNF is probably adequate for most
TDWG standards---with working subgroups responsible for publishing a
serialization definition implementing the standard in languages useful for
one or another purpose, e.g. LSID resolution.
Yes I think you are right. We should be specifying our objects in a high
level 'language' like UML (not so sure about BNF but I am not so familiar
with it) . There has been talk about OWL Lite as a subset of UML. This was
actually the next topic I was going to suggest and I'll kick of  a thread on
it soon if no one else does.
Can I take it from your reply that you think:
1. There should be commonality between all TDWG 'objects' and that
   that commonality should be their specification in UML/BNF/Other technology?
   (Yes to my question 1).
   2. Their should be alternative ways to serialize these objects. Some
   of the serialization may support different aspects of the objects (Yes to my
   question 2).
   3. XML Schema or RDF/S are not appropriate ways to define such
   objects
Have I read this correctly?
Roger
Bob
On 2/17/06, Roger Hyam <roger@tdwg.org> wrote:
...
Hi All,
In a previous post I suggested definitions for Resolving, Searching and
Querying from the point of view of the TAG. There has been a muted response
which I take as meaning there aren't any strong objections to these
definitions. We can come back to them later if need be. You can read the
post here if you missed it:
http://lists.tdwg.org/pipermail/tdwg-tag_lists.tdwg.org/2006-February/000009...
I'd like to look at the implications of the first two definitions:
1. *Resolving.* This means to convert a pointer into a data object.
      Examples would be to resolve an LSID and get back either data or
      metadata or resolve a url and get back a web page in html.
2. *Searching.* This means to select a set of objects (or their
      proxies) on the basis of the values of their properties. The
      objects are  predefined (implicitly part of the call) and we are
      simply looking for them. An example would be finding pages on Google.
Both these definitions imply the existence of data 'Objects' or
'Structures' that are understood by the clients when they are received. The
kinds of objects that jump to mind are Specimens, TaxonNames, TaxonConcepts,
NaturalCollections, Collectors, Publications, People,  Expeditions etc etc.
A piece of client software should be able to know what to do with an object
when it gets - how to display it to the user or map it to a db etc.
My two leading questions are:
1. *Should there be commonality to all the objects?* If yes - what
   should it be? XML Schema location or OWL Class or something else? If no -
   then how should clients handle new objects dynamically - or shouldn't they
   be doing that kind of thing.
   2. *Should we have multiple ways of representing the SAME objects?
   * e.g. Should there be only one way to encode a Specimen or should
   it be possible to have several encodings running in parallel. If there is
   only one way how do we handle upgrades (where we have to run two types of
   encoding together during the roll out of the new one) AND how do we reach
   consensus on the 'perfect' way of encoding each and every object in our
   domain?
The answers I have for my leading questions are:
1. Yes - We should have some commonality between objects or it
   will be really difficult to write client code - but what that commonality is
   has to be decided.
    2. Yes - The architecture has to handle multiple versions/ways of
   encoding any particular object type because any one version is not likely to
   be ideal for everyone forever.
Are the two conclusions I come to here reasonable? Is this too high
level and not making any sense?
I'd be grateful for your thoughts on this,
Roger
--
-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
--
-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------

Re: [Tdwg-tag] Primary Objects as XML Structures or OWL Classes

Bob Morris