[Tdwg-guid] (Fwd) Fwd: [TDWG] Announce: Proposal for "microformat" for mar
S.Hinchcliffe at kew.org
Thu Sep 28 10:05:57 CEST 2006
Has anyone asked/responded to Andy Mabbet (the guy who proposed the
microformat in the first place)?
He might have an opinion
> Hi Steve,
> Great posting. I agree just about 100%. The only point I disagree on is
> whether it is possible to develop a metamodel that is expressive enough
> to be useful but simple enough to be mapped into multiple languages.
> This is for two reasons:
> 1) The metamodel does not have to define application logic it is only
> for supplying meaning.
> There is a split between denotation and connotation in formal logic. The
> ontology is to enable people to denote that something is, for example, a
> specific epithet. It does not have to define (connote) what a specific
> epithet IS so that people can test things against the ontology.
> The example I gave in an earlier posting to the TAG was of cardinality.
> Everyone has a mother so the semantics of the Person.hasMother property
> should be 1. This is useless from a application logic point of view. No
> Person instance could be exchanged without first defining another
> instance to act as the person's mother. Every ontology containing
> instances of Person would be invalid. (We could create a subclass of
> Person called Eve that has a constraint on hasMother - but that is just
> getting silly) From an application perspective hasMother should have a
> cardinality of 0 or more - which is semantically nonsense because we all
> have at least one mother. The Person class doesn't have to have a mother
> but then we are talking about the semantics of the class not the
> person... For this reason I would argue that cardinality should not be
> in the metamodel. It adds nothing to the meaning of the property but is
> very useful for application logic.
> The purpose of the network is to allow transfer of data between
> heterogeneous applications which by definition have different
> application logic and therefore different notions of validity.
> Individual applications therefore have to have their own ontologies that
> import the general shared ontology. Your application may say hasMother
> has cardinality of 0 or 1 but it is not a general truism of all
> applications. The social services application has 0 to many because it
> handles birth mothers, adoptive mothers and foster mothers.
> This does not negate the need to produce shared application logic.
> Herbaria may well need their own ontologies to constrain the data they
> share but why should climate prediction models constrain data in exactly
> the same way? A field recording application may allow the specific
> epithet field to contain punctuation (such as a question mark) but a
> taxonomic revision application may prevent it. If we have to agree on
> whether punctuation is permitted in a specific epithet field we will
> never benefit from the fact that both applications agree there is such a
> thing as a specific epithet field.
> 2) Not all mappings have to be totally expressive.
> If some one is going to come up with a way of tagging a span element in
> HTML as being a specific epithet it would be convenient if they used
> something that could be mapped back to the vocabulary used to describe
> TaxonNames in LSID metadata. This does not mean that the tagging system
> has to support hierarchies.
> I don't think the ontology should provide data transformation services.
> I think primarily it will make it possible for providers to make data
> available in multiple formats (using PyWrapper and Wasabi). Whether this
> is worth the bother is up to the clients who use the data - and we don't
> have enough of them to make a decision.
> People can use OWL or GML or TAPIR or ... but which do the *clients
> *actually want to use? It would be a lot easier if we only used one. A
> few client applications would certainly clarify things.
> If some one has an alternative approach I would certainly like to hear it!
> All the best,
> Steve Perry wrote:
> > Hi Roger,
> > Supporting many representation formats would be really cool, but I
> > have doubts as to whether the benefit of such a system will outweigh
> > the costs.
> > The initial goal behind modular schemata was that, if we had them, we
> > could build a network of data providers and consumers that could carry
> > any type of data (type independence). In essence we would build a
> > data network that would allow anyone to talk about anything. This by
> > itself is not an easy thing to do.
> > Then the issue of representation language cropped up; first XML or RDF
> > and now different types of XML, different RDF ontology languages,
> > microformats, and semantic tags (why not JSON, SQL tables, serialized
> > Java objects, C structs, and any other representation people might
> > want). To resolve this issue without restricting representation
> > language requires a huge increase in the scope of work; not only type
> > independence, but independence of representation; building a data
> > network that allows anyone to talk about anything in any language.
> > On the one hand you're absolutely right that such a system, if we
> > could build it, might work as a bridge between different
> > technologies. But I worry that it will be a massively difficult and
> > expensive undertaking that might not ever work. I'll list a few of my
> > concerns.
> > The first is whether or not it will support automatic translation:
> > 1.) If the system does not do automatic translation between
> > representation languages, then it's more like a schema repository. In
> > my view, schema repositories don't help to integrate tools that use
> > different representation languages. Instead each representation
> > language becomes a silo. The schema repository helps to document what
> > has to be done when people need to write code that will cut across
> > silos for a one-time task, but it doesn't actually encourage people to
> > do so.
> > 2.) If the system does automatic translation between representations
> > then it adds a layer of complexity and a large processing and
> > transport cost to each transaction on the network. Imagine that you
> > want to do some niche modeling. Assume you have some taxonomic group
> > in mind. First you'd have to find the names for this group, including
> > synonyms. Next you'd have to get specimens and observations for these
> > names. So, two large sets of transactions are necessary to acquire
> > the data you need. Each name and observation provider might be using
> > a different representation language. When you contact them you have
> > to figure out what representation they've given you and ship the data
> > off to a translation service before you can merge the results. This
> > adds a large (at best linear) cost to acquiring data. Additionally,
> > someone has to pay for the huge amount of bandwidth used by the
> > translation service. We can propose to use a local library instead of
> > a remote service to do the translation, but this adds a burden on the
> > developers of all software, requires that the library is updated often
> > as new types and representation languages are adopted, and requires
> > that the library exists or has bindings to many programming languages;
> > in short this is a software maintenance nightmare.
> > My second set of concerns are about the representations themselves:
> > 3.) Each representation will require some effort to construct and
> > maintain. If the system will provide guidelines (rules expressed in
> > natural language) for how to translate each representation into other
> > representations, the cost (in effort, time, and money) will increase.
> > If the system will provide automatic translation, the cost will
> > increase further. However, not all representations will be used
> > equally. If there are only two people who want TCS in format X, then
> > is it worth the expense of providing it to them? Who decides whether
> > or not a particular representation format has enough demand to justify
> > the work involved in supporting it?
> > 4.) If the goal is to provide guidelines or automatic services for
> > translation between representations of a given data type, then we have
> > to map X * X-1 * Y possible translations where X is the number of
> > allowed representations for a given data type and Y is the number of
> > data types. The TDWG biodiversity informatics ontology may end up
> > with 30 classes. If we support 5 representations (maybe OWL, RDFS,
> > semantic tags, XML metadata, and GML Feature Types) that's 5 * 4 * 30
> > = 600 possible translation mappings to create and maintain. Each time
> > we have a new representation or a new data type we have to update the
> > set of translation mappings.
> > My final set of concerns regards knowledge representation, modeling,
> > and the expressive power of representation languages:
> > 5.) Different representation languages have different language
> > features and expressive powers. For instance, there are things you
> > can do with OWL that you can't do with semantic tags. This is because
> > OWL has language features for representing inheritance, property-value
> > constraints, etc. that simply don't exist in the world of semantic
> > tagging. If we have to be able to represent the platonic ideal of our
> > data types (as defined in the TDWG ontology) in any representation
> > language and also have to be able to translate between
> > representations, we run into a dilemma.
> > If we use all the features of a particular representation language we
> > benefit from them when using that particular format. The software
> > that is constructed to natively consume that representation can use
> > all of the available language features to automate tasks on behalf of
> > the user. However, translation becomes very difficult. Imaging
> > translating OWL-style inheritance into microformats or XML-Schema data
> > type constraints into a system of semantic tags. It's simply not
> > possible. Translating between languages of differing expressive
> > powers can be problematic. The alternative approach is to use only
> > those language features that are common to all representation
> > languages. In practice this usually means using only those features
> > that exist in the most weakly-expressive language. If our bag of
> > representation languages includes both semantic tagging and OWL, then
> > we're not really using the power of OWL. In fact, if we have to use
> > only the common features of the two, we might as well implement our
> > OWL ontology so that there is only one type of class with a single
> > property called "tagvalue".
> > 6.) Different representation languages enable different functionality
> > in the software that consumes them. For instance, client software
> > that consumes RDFS or OWL instances often expand searches to encompass
> > instances of superclasses. In other words, software designed to use
> > semantic web technologies can do some of the work a human user might
> > otherwise have to do by exploiting the features of semantic web
> > languages. Software designed to use semantic tags often doesn't do
> > much more than search and statistical correlation between tag
> > instances. This is quite powerful in it's own way, but because
> > semantic tags were designed to indicate the context of a document, not
> > necessarily its contents, semantic tagging really only helps a user to
> > locate documents of interest. A document with tags is ultimately read
> > by a human, not a machine. Every representation language carries with
> > it assumptions about how "documents" that are instances of that
> > language will be used.
> > To navigate you need a fixed point. To move the world you need a
> > fulcrum. Because representation languages provide different features
> > and make different assumptions about how their instances will be used,
> > it makes sense to use representation language as the fixed point of
> > our designs and leave data types and service interfaces free to vary.
> > Some have argued that the TDWG ontology is the fixed point in our
> > constellation of services, but I disagree. It is the umbrella under
> > which data integration will occur; there will always be extensions to
> > the core ontology and it too will change over time as it is expanded.
> > Overall I think it's a laudable goal to support as many representation
> > languages as possible, but there are so many headaches and compromises
> > involved that we may end up with an expensive solution that, because
> > it only supports the lowest common denominator of functionality,
> > doesn't really work right for anybody. A case in point is the current
> > discussion of namespaces. In order to make namespaces work across the
> > widest range of representation languages, it's been proposed that they
> > can no longer be used as packages to logically partition the larger
> > ontology. This makes it harder to manage extensions to the ontology
> > and makes it likely that we'll end up using
> > veryLongClassAndPropertyNamesToTryToAvoidNamespaceClashes. And you
> > still can't represent namespaces in semantic tags.
> > It's hard enough to write software that can cope with any data type
> > and I'd rather spend energy, time, and money on getting it right with
> > only one or two feature-rich representations. What I'd really like to
> > see is a network of heterogeneously typed, highly integrated data
> > objects and a rich set of services that operate on them. Once this is
> > built, the real fun can begin, creating software that uses these data
> > to answer important scientific questions.
> > -Steve
> > Roger Hyam wrote:
> >> Thanks for forwarding this Sally.
> >> What I am proposing at St Louis - though I seem to been having to
> >> propose it long before - is that we have an application for managing
> >> the ontology that will expose the underlying semantics in multiple
> >> 'formats' i.e. as RDFS or OWL ontologies as GML application schemas,
> >> as custom XML Schemas as OBO ontologies etc etc. I see no other way
> >> of integrating multiple technologies. (Suggested alternatives welcome).
> >> One of the things on my list is micro formats along with tagging. It
> >> seems crazy to define a 'specificEpithet' in a TDWG ontology and then
> >> not use exactly the same concept in a micro format or as a tag.
> >> So this is timely. I just can't act on it very well before St Louis.
> >> I'll add something to the wiki page to flag my/our interest.
> >> Thanks,
> >> Roger
> >> Sally Hinchcliffe wrote:
> >>> Hi all
> >>> This is probably on the wrong list (Maybe TAG?) but it strikes me
> >>> that what this guy needs is an ontology that he can use in his
> >>> microformats ...
> >>> Possibly an example of a real world need for ontologies ?
> >>> Sally
> >>> ------- Forwarded message follows -------
> >>> Date sent: Tue, 26 Sep 2006 09:34:04 -0000
> >>> To: <sh00kg at rbgkew.org.uk>
> >>> Subject: Fwd: [TDWG] Announce: Proposal for "microformat"
> >>> for marking-up taxonomic names in HTML: comments and contributions
> >>> sought
> >>> From: <M.Jackson at kew.org>
> >>> Send reply to: M.Jackson at rbgkew.org.uk
> >>> Sally,
> >>> Do you think you might respond to this? Just curious what you think.
> >>> Mark
> >>> ----
> >>> Forwarded From: Andy Mabbett <andy at pigsonthewing.org.uk>
> >>>> Hello - my first post to this mailing list.
> >>>> I'm not a taxonomist, but I've been told by one that you might be
> >>>> interested in recent proposals for a formula (a "microformat"
> >>>> <http://microformats.org>) for marking-up, in HTML, the names of
> >>>> species
> >>>> (and other ranks, varieties, hybrids, etc.).
> >>>> Microformats are a way of adding additional, simple markup to
> >>>> human-readable data items on web pages, using common and open HTML
> >>>> standards, so that the information can be extracted by software and
> >>>> indexed, searched for, saved, cross-referenced or aggregated.
> >>>> Microformats are also open standards, freely available for anyone to
> >>>> use.
> >>>> The proposed format respects all existing biological taxonomies,
> >>>> and is
> >>>> not intended to change or supplant any of them - it merely provides
> >>>> webmasters with a method of either:
> >>>> 1) marking-up a taxonomical name (or taxon-common name pair) in
> >>>> such a way that its components can be recognised by computers
> >>>> or
> >>>> 2) marking up a common name, so as to associative with it a
> >>>> taxonomical name, in such a way that the latter's
> >>>> components can
> >>>> be recognised by computers
> >>>> For instance, if I mark up a list of common names on a page I
> >>>> maintain:
> >>>> <http://www.westmidlandbirdclub.com/staffs/tittesworth/latest.htm>
> >>>> using that microformat, a visitor might have browser tool which lists
> >>>> all the species on the page, sorted into alphabetical order within
> >>>> taxonomic class, or in taxonomic order, and then creates links to, say
> >>>> (for Joe Public) their entries in Wikipedia, or the British Trust for
> >>>> Ornithology, or (for scientists) some academic database of the users
> >>>> choosing.
> >>>> Early thoughts on the format are on an editable "wiki", here:
> >>>> <http://microformats.org/wiki/species>
> >>>> Please feel free to participate - the proposal needs both messages of
> >>>> support (particularly from people or organisations who have
> >>>> websites on
> >>>> which they might use them) and, especially, comments and constructive
> >>>> criticisms - does the proposal understand and use taxonomy
> >>>> correctly; is
> >>>> the terminology right, are there any omissions or overlooked, unusual
> >>>> naming conventions?
> >>>> You can use the above wiki, or the microformats mailing list:
> >>>> <http://microformats.org/wiki/mailing-lists>
> >>>> and/ or please feel free to pass this e-mail to other interested
> >>>> parties.
> >>>> Thank you.
> >>>> --
> >>>> Andy Mabbett
> >>>> Birmingham, England
> >>>> _______________________________________________
> >>>> TDWG mailing list
> >>>> TDWG at mailman.nhm.ku.edu
> >>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg
> >> ------------------------------------------------------------------------
> >> _______________________________________________
> >> TDWG-GUID mailing list
> >> TDWG-GUID at mailman.nhm.ku.edu
> >> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> Roger Hyam
> Technical Architect
> Taxonomic Databases Working Group
> roger at tdwg.org
> +44 1578 722782
*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk
More information about the tdwg-tag