<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

Hi Steve, <br>

<br>

Great posting. I agree just about 100%. The only point I disagree on is

whether it is possible to develop a metamodel that is expressive enough

to be useful but simple enough to be mapped into multiple languages.

This is for two reasons:<br>

<br>

1) The metamodel does not have to define application logic it is only

for supplying meaning. <br>

<br>

There is a split between denotation and connotation in formal logic.

The ontology is to enable people to denote that something is, for

example, a specific epithet. It does not have to define (connote) what

a specific epithet IS so that people can test things against the

ontology.<br>

<br>

The example I gave in an earlier posting to the TAG was of cardinality.

Everyone has a mother so the semantics of the Person.hasMother property

should be 1. This is useless from a application logic point of view. No

Person instance could be exchanged without first defining another

instance to act as the person's mother. Every ontology containing

instances of Person would be invalid.&nbsp; (We could create a subclass of

Person called Eve that has a constraint on hasMother - but that is just

getting silly) From an application perspective hasMother should have a

cardinality of 0 or more - which is semantically nonsense because we

all have at least one mother. The Person class doesn't have to have a

mother but then we are talking about the semantics of the class not the

person... For this reason I would argue that cardinality should not be

in the metamodel. It adds nothing to the meaning of the property but is

very useful for application logic.<br>

<br>

The purpose of the network is to allow transfer of data between

heterogeneous applications which by definition have different

application logic and therefore different notions of validity.

Individual applications therefore have to have their own ontologies

that import the general shared ontology. Your application may say

hasMother has cardinality of 0 or 1 but it is not a general truism of

all applications. The social services application has 0 to many because

it handles birth mothers, adoptive mothers and foster mothers.<br>

<br>

This does not negate the need to produce shared application logic.

Herbaria may well need their own ontologies to constrain the data they

share but why should climate prediction models constrain data in

exactly the same way? A field recording application may allow the

specific epithet field to contain punctuation (such as a question mark)

but a taxonomic revision application may prevent it. If we have to

agree on whether punctuation is permitted in a specific epithet field

we will never benefit from the fact that both applications agree there

is such a thing as a specific epithet field.<br>

<br>

2) Not all mappings have to be totally expressive.<br>

<br>

If some one is going to come up with a way of tagging a span element

in&nbsp; HTML as being a specific epithet it would be convenient if they

used something that could be mapped back to the vocabulary used to

describe TaxonNames in LSID metadata. This does not mean that the

tagging system has to support hierarchies.<br>

---<br>

<br>

I don't think the ontology should provide data transformation services.

I think primarily it will make it possible for providers to make data

available in multiple formats (using PyWrapper and Wasabi). Whether

this is worth the bother is up to the clients who use the data - and we

don't have enough of them to make a decision.<br>

<br>

People can use OWL or GML or TAPIR or ... but which do the <b>clients </b>actually

want to use? It would be a lot easier if we only used one. A few client

applications would certainly clarify things.<br>

<br>

If some one has an alternative approach I would certainly like to hear

it!<br>

<br>

All the best,<br>

<br>

Roger<br>

<br>

<br>

Steve Perry wrote:<br>

<blockquote cite="mid45196363.50006@ku.edu" type="cite">Hi Roger,

  <br>

  <br>

Supporting many representation formats would be really cool, but I have

doubts as to whether the benefit of such a system will outweigh the

costs. <br>

The initial goal behind modular schemata was that, if we had them, we

could build a network of data providers and consumers that could carry

any type of data (type independence).&nbsp; In essence we would build a data

network that would allow anyone to talk about anything.&nbsp; This by itself

is not an easy thing to do. <br>

Then the issue of representation language cropped up; first XML or RDF

and now different types of XML, different RDF ontology languages,

microformats, and semantic tags (why not JSON, SQL tables, serialized

Java objects, C structs, and any other representation people might

want).&nbsp; To resolve this issue without restricting representation

language requires a huge increase in the scope of work; not only type

independence, but independence of representation; building a data

network that allows anyone to talk about anything in any language.

  <br>

  <br>

</blockquote>

<blockquote cite="mid45196363.50006@ku.edu" type="cite">On the one hand

you're absolutely right that such a system, if we could build it, might

work as a bridge between different technologies.&nbsp; But I worry that it

will be a massively difficult and expensive undertaking that might not

ever work.&nbsp; I'll list a few of my concerns.

  <br>

  <br>

The first is whether or not it will support automatic translation:

  <br>

  <br>

1.) If the system does not do automatic translation between

representation languages, then it's more like a schema repository.&nbsp; In

my view, schema repositories don't help to integrate tools that use

different representation languages.&nbsp; Instead each representation

language becomes a silo.&nbsp; The schema repository helps to document what

has to be done when people need to write code that will cut across

silos for a one-time task, but it doesn't actually encourage people to

do so.

  <br>

  <br>

2.) If the system does automatic translation between representations

then it adds a layer of complexity and a large processing and transport

cost to each transaction on the network.&nbsp; Imagine that you want to do

some niche modeling.&nbsp; Assume you have some taxonomic group in mind.&nbsp;

First you'd have to find the names for this group, including synonyms.&nbsp;

Next you'd have to get specimens and observations for these names.&nbsp; So,

two large sets of transactions are necessary to acquire the data you

need.&nbsp; Each name and observation provider might be using a different

representation language.&nbsp; When you contact them you have to figure out

what representation they've given you and ship the data off to a

translation service before you can merge the results.&nbsp; This adds a

large (at best linear) cost to acquiring data.&nbsp; Additionally, someone

has to pay for the huge amount of bandwidth used by the translation

service.&nbsp; We can propose to use a local library instead of a remote

service to do the translation, but this adds a burden on the developers

of all software, requires that the library is updated often as new

types and representation languages are adopted, and requires that the

library exists or has bindings to many programming languages; in short

this is a software maintenance nightmare.

  <br>

  <br>

My second set of concerns are about the representations themselves:

  <br>

  <br>

3.) Each representation will require some effort to construct and

maintain.&nbsp; If the system will provide guidelines (rules expressed in

natural language) for how to translate each representation into other

representations, the cost (in effort, time, and money) will increase.&nbsp;

If the system will provide automatic translation, the cost will

increase further.&nbsp; However, not all representations will be used

equally.&nbsp; If there are only two people who want TCS in format X, then

is it worth the expense of providing it to them?&nbsp; Who decides whether

or not a particular representation format has enough demand to justify

the work involved in supporting it?

  <br>

  <br>

4.) If the goal is to provide guidelines or automatic services for

translation between representations of a given data type, then we have

to map X * X-1 * Y possible translations where X is the number of

allowed representations for a given data type and Y is the number of

data types.&nbsp; The TDWG biodiversity informatics ontology may end up with

30 classes.&nbsp; If we support 5 representations (maybe OWL, RDFS, semantic

tags, XML metadata, and GML Feature Types) that's 5 * 4 * 30 = 600

possible translation mappings to create and maintain.&nbsp; Each time we

have a new representation or a new data type we have to update the set

of translation mappings.

  <br>

  <br>

My final set of concerns regards knowledge representation, modeling,

and the expressive power of representation languages:

  <br>

  <br>

5.) Different representation languages have different language features

and expressive powers.&nbsp; For instance, there are things you can do with

OWL that you can't do with semantic tags.&nbsp; This is because OWL has

language features for representing inheritance, property-value

constraints, etc. that simply don't exist in the world of semantic

tagging.&nbsp; If we have to be able to represent the platonic ideal of our

data types (as defined in the TDWG ontology) in any representation

language and also have to be able to translate between representations,

we run into a dilemma.

  <br>

  <br>

If we use all the features of a particular representation language we

benefit from them when using that particular format.&nbsp; The software that

is constructed to natively consume that representation can use all of

the available language features to automate tasks on behalf of the

user.&nbsp; However, translation becomes very difficult.&nbsp; Imaging

translating OWL-style inheritance into microformats or XML-Schema data

type constraints into a system of semantic tags.&nbsp; It's simply not

possible.&nbsp; Translating between languages of differing expressive powers

can be problematic.&nbsp; The alternative approach is to use only those

language features that are common to all representation languages.&nbsp; In

practice this usually means using only those features that exist in the

most weakly-expressive language.&nbsp; If our bag of representation

languages includes both semantic tagging and OWL, then we're not really

using the power of OWL.&nbsp; In fact, if we have to use only the common

features of the two, we might as well implement our OWL ontology so

that there is only one type of class with a single property called

"tagvalue".

  <br>

  <br>

6.) Different representation languages enable different functionality

in the software that consumes them.&nbsp; For instance, client software that

consumes RDFS or OWL instances often expand searches to encompass

instances of superclasses.&nbsp; In other words, software designed to use

semantic web technologies can do some of the work a human user might

otherwise have to do by exploiting the features of semantic web

languages.&nbsp; Software designed to use semantic tags often doesn't do

much more than search and statistical correlation between tag

instances.&nbsp; This is quite powerful in it's own way, but because

semantic tags were designed to indicate the context of a document, not

necessarily its contents, semantic tagging really only helps a user to

locate documents of interest.&nbsp; A document with tags is ultimately read

by a human, not a machine.&nbsp; Every representation language carries with

it assumptions about how "documents" that are instances of that

language will be used. <br>

  <br>

  <br>

To navigate you need a fixed point.&nbsp; To move the world you need a

fulcrum.&nbsp; Because representation languages provide different features

and make different assumptions about how their instances will be used,

it makes sense to use representation language as the fixed point of our

designs and leave data types and service interfaces free to vary.&nbsp; Some

have argued that the TDWG ontology is the fixed point in our

constellation of services, but I disagree.&nbsp; It is the umbrella under

which data integration will occur; there will always be extensions to

the core ontology and it too will change over time as it is expanded.

  <br>

  <br>

Overall I think it's a laudable goal to support as many representation

languages as possible, but there are so many headaches and compromises

involved that we may end up with an expensive solution that, because it

only supports the lowest common denominator of functionality, doesn't

really work right for anybody.&nbsp; A case in point is the current

discussion of namespaces.&nbsp; In order to make namespaces work across the

widest range of representation languages, it's been proposed that they

can no longer be used as packages to logically partition the larger

ontology.&nbsp; This makes it harder to manage extensions to the ontology

and makes it likely that we'll end up using

veryLongClassAndPropertyNamesToTryToAvoidNamespaceClashes.&nbsp; And you

still can't represent namespaces in semantic tags.

  <br>

  <br>

It's hard enough to write software that can cope with any data type and

I'd rather spend energy, time, and money on getting it right with only

one or two feature-rich representations.&nbsp; What I'd really like to see

is a network of heterogeneously typed, highly integrated data objects

and a rich set of services that operate on them.&nbsp; Once this is built,

the real fun can begin, creating software that uses these data to

answer important scientific questions.

  <br>

  <br>

-Steve

  <br>

  <br>

  <br>

  <br>

  <br>

  <br>

Roger Hyam wrote:

  <br>

  <blockquote type="cite"><br>

Thanks for forwarding this Sally.

    <br>

    <br>

What I am proposing at St Louis - though I seem to been having to

propose it long before - is that we have an application for managing

the ontology that will expose the underlying semantics in multiple

'formats' i.e. as RDFS or OWL ontologies as GML application schemas, as

custom XML Schemas as OBO ontologies etc etc. I see no other way of

integrating multiple technologies. (Suggested alternatives welcome).

    <br>

    <br>

One of the things on my list is micro formats along with tagging. It

seems crazy to define a 'specificEpithet' in a TDWG ontology and then

not use exactly the same concept in a micro format or as a tag.

    <br>

    <br>

So this is timely. I just can't act on it very well before St Louis.

I'll add something to the wiki page to flag my/our interest.

    <br>

    <br>

Thanks,

    <br>

    <br>

Roger

    <br>

    <br>

    <br>

Sally Hinchcliffe wrote:

    <br>

    <blockquote type="cite">Hi all

      <br>

      <br>

This is probably on the wrong list (Maybe TAG?) but it strikes me that

what this guy needs is an ontology that he can use in his microformats

...

      <br>

      <br>

Possibly an example of a real world need for ontologies ?

      <br>

      <br>

Sally

      <br>

      <br>

------- Forwarded message follows -------

      <br>

Date sent:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Tue, 26 Sep 2006 09:34:04 -0000

      <br>

To:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a class="moz-txt-link-rfc2396E" href="mailto:sh00kg@rbgkew.org.uk">&lt;sh00kg@rbgkew.org.uk&gt;</a>

      <br>

Subject:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fwd: [TDWG] Announce: Proposal for "microformat"

for marking-up taxonomic names in HTML: comments and contributions

sought

      <br>

From:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a class="moz-txt-link-rfc2396E" href="mailto:M.Jackson@kew.org">&lt;M.Jackson@kew.org&gt;</a>

      <br>

Send reply to:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a class="moz-txt-link-abbreviated" href="mailto:M.Jackson@rbgkew.org.uk">M.Jackson@rbgkew.org.uk</a>

      <br>

      <br>

Sally,

      <br>

      <br>

Do you think you might respond to this? Just curious what you think.

      <br>

      <br>

Mark

      <br>

----

      <br>

Forwarded From: Andy Mabbett <a class="moz-txt-link-rfc2396E" href="mailto:andy@pigsonthewing.org.uk">&lt;andy@pigsonthewing.org.uk&gt;</a>

      <br>

      <br>

&nbsp;

      <br>

      <blockquote type="cite">Hello - my first post to this mailing

list.

        <br>

        <br>

I'm not a taxonomist, but I've been told by one that you might be

        <br>

interested in recent proposals for a formula (a "microformat"

        <br>

<a class="moz-txt-link-rfc2396E" href="http://microformats.org">&lt;http://microformats.org&gt;</a>) for marking-up, in HTML, the names of

species

        <br>

(and other ranks, varieties, hybrids, etc.).

        <br>

        <br>

Microformats are a way of adding additional, simple markup to

        <br>

human-readable data items on web pages, using common and open HTML

        <br>

standards, so that the information can be extracted by software and

        <br>

indexed, searched for, saved, cross-referenced or aggregated.

        <br>

Microformats are also open standards, freely available for anyone to

        <br>

use.

        <br>

        <br>

The proposed format respects all existing biological taxonomies, and is

        <br>

not intended to change or supplant any of them - it merely provides

        <br>

webmasters with a method of either:

        <br>

        <br>

&nbsp;&nbsp; 1)&nbsp;&nbsp; marking-up a taxonomical name (or taxon-common name pair) in

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; such a way that its components can be recognised by computers

        <br>

        <br>

or

        <br>

        <br>

&nbsp;&nbsp; 2)&nbsp;&nbsp; marking up a common name, so as to associative with it a

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; taxonomical name, in such a way that the latter's components

can

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; be recognised by computers

        <br>

        <br>

For instance, if I mark up a list of common names on a page I maintain:

        <br>

        <br>

&nbsp;&nbsp;

<a class="moz-txt-link-rfc2396E" href="http://www.westmidlandbirdclub.com/staffs/tittesworth/latest.htm">&lt;http://www.westmidlandbirdclub.com/staffs/tittesworth/latest.htm&gt;</a>

        <br>

        <br>

using that microformat, a visitor might have browser tool which lists

        <br>

all the species on the page, sorted into alphabetical order within

        <br>

taxonomic class, or in taxonomic order, and then creates links to, say

        <br>

(for Joe Public) their entries in Wikipedia, or the British Trust for

        <br>

Ornithology, or (for scientists) some academic database of the users

        <br>

choosing.

        <br>

        <br>

Early thoughts on the format are on an editable "wiki", here:

        <br>

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a class="moz-txt-link-rfc2396E" href="http://microformats.org/wiki/species">&lt;http://microformats.org/wiki/species&gt;</a>

        <br>

        <br>

Please feel free to participate - the proposal needs both messages of

        <br>

support (particularly from people or organisations who have websites on

        <br>

which they might use them) and, especially, comments and constructive

        <br>

criticisms - does the proposal understand and use taxonomy correctly;

is

        <br>

the terminology right, are there any omissions or overlooked, unusual

        <br>

naming conventions?

        <br>

        <br>

You can use the above wiki, or the microformats mailing list:

        <br>

        <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a class="moz-txt-link-rfc2396E" href="http://microformats.org/wiki/mailing-lists">&lt;http://microformats.org/wiki/mailing-lists&gt;</a>

        <br>

        <br>

and/ or please feel free to pass this e-mail to other interested

        <br>

parties.

        <br>

        <br>

Thank you.

        <br>

        <br>

--&nbsp;<br>

Andy Mabbett

        <br>

Birmingham, England

        <br>

        <br>

_______________________________________________

        <br>

TDWG mailing list

        <br>

<a class="moz-txt-link-abbreviated" href="mailto:TDWG@mailman.nhm.ku.edu">TDWG@mailman.nhm.ku.edu</a>

        <br>

<a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg</a>

        <br>

        <br>

&nbsp;&nbsp;&nbsp; </blockquote>

      <br>

      <br>

      <br>

&nbsp; </blockquote>

    <br>

    <br>

------------------------------------------------------------------------

    <br>

    <br>

_______________________________________________

    <br>

TDWG-GUID mailing list

    <br>

<a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>

    <br>

<a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>

    <br>

&nbsp; </blockquote>

  <br>

  <br>

</blockquote>

<br>

<br>

<pre class="moz-signature" cols="72">-- 

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>

 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>

 +44 1578 722782

-------------------------------------

</pre>

</body>

</html>