[Tdwg-tag] Primary Objects as XML Structures or OWL Classes

17 Feb 2006

      Hi All,

In a previous post I suggested definitions for Resolving, Searching and 
Querying from the point of view of the TAG. There has been a muted 
response which I take as meaning there aren't any strong objections to 
these definitions. We can come back to them later if need be. You can 
read the post here if you missed it:

http://lists.tdwg.org/pipermail/tdwg-tag_lists.tdwg.org/2006-February/000009...

I'd like to look at the implications of the first two definitions:

   1. *Resolving.* This means to convert a pointer into a data object. 
      Examples would be to resolve an LSID and get back either data or
      metadata or resolve a url and get back a web page in html.
   2. *Searching.* This means to select a set of objects (or their
      proxies) on the basis of the values of their properties. The
      objects are  predefined (implicitly part of the call) and we are
      simply looking for them. An example would be finding pages on Google.

Both these definitions imply the existence of data 'Objects' or 
'Structures' that are understood by the clients when they are received. 
The kinds of objects that jump to mind are Specimens, TaxonNames, 
TaxonConcepts, NaturalCollections, Collectors, Publications, People,  
Expeditions etc etc. A piece of client software should be able to know 
what to do with an object when it gets - how to display it to the user 
or map it to a db etc.

My two leading questions are:

   1. *Should there be commonality to all the objects?* If yes - what
      should it be? XML Schema location or OWL Class or something else?
      If no - then how should clients handle new objects dynamically -
      or shouldn't they be doing that kind of thing.
   2. *Should we have multiple ways of representing the SAME objects?*
      e.g. Should there be only one way to encode a Specimen or should
      it be possible to have several encodings running in parallel. If
      there is only one way how do we handle upgrades (where we have to
      run two types of encoding together during the roll out of the new
      one) AND how do we reach consensus on the 'perfect' way of
      encoding each and every object in our domain?

The answers I have for my leading questions are:

   1. Yes - We should have some commonality between objects or it will
      be really difficult to write client code - but what that
      commonality is has to be decided.
   2. Yes - The architecture has to handle multiple versions/ways of
      encoding any particular object type because any one version is not
      likely to be ideal for everyone forever.

Are the two conclusions I come to here reasonable? Is this too high 
level and not making any sense?

I'd be grateful for your thoughts on this,

Roger

-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------