[Tdwg-guid] (Fwd) Fwd: [TDWG] Announce: Proposal for "microformat" for mar

Sally Hinchcliffe S.Hinchcliffe at kew.org
Thu Sep 28 10:05:57 CEST 2006


Has anyone asked/responded to Andy Mabbet (the guy who proposed the 
microformat in the first place)? 
He might have an opinion
Sally

> 
> Hi Steve,
> 
> Great posting. I agree just about 100%. The only point I disagree on is 
> whether it is possible to develop a metamodel that is expressive enough 
> to be useful but simple enough to be mapped into multiple languages. 
> This is for two reasons:
> 
> 1) The metamodel does not have to define application logic it is only 
> for supplying meaning.
> 
> There is a split between denotation and connotation in formal logic. The 
> ontology is to enable people to denote that something is, for example, a 
> specific epithet. It does not have to define (connote) what a specific 
> epithet IS so that people can test things against the ontology.
> 
> The example I gave in an earlier posting to the TAG was of cardinality. 
> Everyone has a mother so the semantics of the Person.hasMother property 
> should be 1. This is useless from a application logic point of view. No 
> Person instance could be exchanged without first defining another 
> instance to act as the person's mother. Every ontology containing 
> instances of Person would be invalid.  (We could create a subclass of 
> Person called Eve that has a constraint on hasMother - but that is just 
> getting silly) From an application perspective hasMother should have a 
> cardinality of 0 or more - which is semantically nonsense because we all 
> have at least one mother. The Person class doesn't have to have a mother 
> but then we are talking about the semantics of the class not the 
> person... For this reason I would argue that cardinality should not be 
> in the metamodel. It adds nothing to the meaning of the property but is 
> very useful for application logic.
> 
> The purpose of the network is to allow transfer of data between 
> heterogeneous applications which by definition have different 
> application logic and therefore different notions of validity. 
> Individual applications therefore have to have their own ontologies that 
> import the general shared ontology. Your application may say hasMother 
> has cardinality of 0 or 1 but it is not a general truism of all 
> applications. The social services application has 0 to many because it 
> handles birth mothers, adoptive mothers and foster mothers.
> 
> This does not negate the need to produce shared application logic. 
> Herbaria may well need their own ontologies to constrain the data they 
> share but why should climate prediction models constrain data in exactly 
> the same way? A field recording application may allow the specific 
> epithet field to contain punctuation (such as a question mark) but a 
> taxonomic revision application may prevent it. If we have to agree on 
> whether punctuation is permitted in a specific epithet field we will 
> never benefit from the fact that both applications agree there is such a 
> thing as a specific epithet field.
> 
> 2) Not all mappings have to be totally expressive.
> 
> If some one is going to come up with a way of tagging a span element in  
> HTML as being a specific epithet it would be convenient if they used 
> something that could be mapped back to the vocabulary used to describe 
> TaxonNames in LSID metadata. This does not mean that the tagging system 
> has to support hierarchies.
> ---
> 
> I don't think the ontology should provide data transformation services. 
> I think primarily it will make it possible for providers to make data 
> available in multiple formats (using PyWrapper and Wasabi). Whether this 
> is worth the bother is up to the clients who use the data - and we don't 
> have enough of them to make a decision.
> 
> People can use OWL or GML or TAPIR or ... but which do the *clients 
> *actually want to use? It would be a lot easier if we only used one. A 
> few client applications would certainly clarify things.
> 
> If some one has an alternative approach I would certainly like to hear it!
> 
> All the best,
> 
> Roger
> 
> 
> Steve Perry wrote:
> > Hi Roger,
> >
> > Supporting many representation formats would be really cool, but I 
> > have doubts as to whether the benefit of such a system will outweigh 
> > the costs.
> > The initial goal behind modular schemata was that, if we had them, we 
> > could build a network of data providers and consumers that could carry 
> > any type of data (type independence).  In essence we would build a 
> > data network that would allow anyone to talk about anything.  This by 
> > itself is not an easy thing to do.
> > Then the issue of representation language cropped up; first XML or RDF 
> > and now different types of XML, different RDF ontology languages, 
> > microformats, and semantic tags (why not JSON, SQL tables, serialized 
> > Java objects, C structs, and any other representation people might 
> > want).  To resolve this issue without restricting representation 
> > language requires a huge increase in the scope of work; not only type 
> > independence, but independence of representation; building a data 
> > network that allows anyone to talk about anything in any language.
> >
> > On the one hand you're absolutely right that such a system, if we 
> > could build it, might work as a bridge between different 
> > technologies.  But I worry that it will be a massively difficult and 
> > expensive undertaking that might not ever work.  I'll list a few of my 
> > concerns.
> >
> > The first is whether or not it will support automatic translation:
> >
> > 1.) If the system does not do automatic translation between 
> > representation languages, then it's more like a schema repository.  In 
> > my view, schema repositories don't help to integrate tools that use 
> > different representation languages.  Instead each representation 
> > language becomes a silo.  The schema repository helps to document what 
> > has to be done when people need to write code that will cut across 
> > silos for a one-time task, but it doesn't actually encourage people to 
> > do so.
> >
> > 2.) If the system does automatic translation between representations 
> > then it adds a layer of complexity and a large processing and 
> > transport cost to each transaction on the network.  Imagine that you 
> > want to do some niche modeling.  Assume you have some taxonomic group 
> > in mind.  First you'd have to find the names for this group, including 
> > synonyms.  Next you'd have to get specimens and observations for these 
> > names.  So, two large sets of transactions are necessary to acquire 
> > the data you need.  Each name and observation provider might be using 
> > a different representation language.  When you contact them you have 
> > to figure out what representation they've given you and ship the data 
> > off to a translation service before you can merge the results.  This 
> > adds a large (at best linear) cost to acquiring data.  Additionally, 
> > someone has to pay for the huge amount of bandwidth used by the 
> > translation service.  We can propose to use a local library instead of 
> > a remote service to do the translation, but this adds a burden on the 
> > developers of all software, requires that the library is updated often 
> > as new types and representation languages are adopted, and requires 
> > that the library exists or has bindings to many programming languages; 
> > in short this is a software maintenance nightmare.
> >
> > My second set of concerns are about the representations themselves:
> >
> > 3.) Each representation will require some effort to construct and 
> > maintain.  If the system will provide guidelines (rules expressed in 
> > natural language) for how to translate each representation into other 
> > representations, the cost (in effort, time, and money) will increase.  
> > If the system will provide automatic translation, the cost will 
> > increase further.  However, not all representations will be used 
> > equally.  If there are only two people who want TCS in format X, then 
> > is it worth the expense of providing it to them?  Who decides whether 
> > or not a particular representation format has enough demand to justify 
> > the work involved in supporting it?
> >
> > 4.) If the goal is to provide guidelines or automatic services for 
> > translation between representations of a given data type, then we have 
> > to map X * X-1 * Y possible translations where X is the number of 
> > allowed representations for a given data type and Y is the number of 
> > data types.  The TDWG biodiversity informatics ontology may end up 
> > with 30 classes.  If we support 5 representations (maybe OWL, RDFS, 
> > semantic tags, XML metadata, and GML Feature Types) that's 5 * 4 * 30 
> > = 600 possible translation mappings to create and maintain.  Each time 
> > we have a new representation or a new data type we have to update the 
> > set of translation mappings.
> >
> > My final set of concerns regards knowledge representation, modeling, 
> > and the expressive power of representation languages:
> >
> > 5.) Different representation languages have different language 
> > features and expressive powers.  For instance, there are things you 
> > can do with OWL that you can't do with semantic tags.  This is because 
> > OWL has language features for representing inheritance, property-value 
> > constraints, etc. that simply don't exist in the world of semantic 
> > tagging.  If we have to be able to represent the platonic ideal of our 
> > data types (as defined in the TDWG ontology) in any representation 
> > language and also have to be able to translate between 
> > representations, we run into a dilemma.
> >
> > If we use all the features of a particular representation language we 
> > benefit from them when using that particular format.  The software 
> > that is constructed to natively consume that representation can use 
> > all of the available language features to automate tasks on behalf of 
> > the user.  However, translation becomes very difficult.  Imaging 
> > translating OWL-style inheritance into microformats or XML-Schema data 
> > type constraints into a system of semantic tags.  It's simply not 
> > possible.  Translating between languages of differing expressive 
> > powers can be problematic.  The alternative approach is to use only 
> > those language features that are common to all representation 
> > languages.  In practice this usually means using only those features 
> > that exist in the most weakly-expressive language.  If our bag of 
> > representation languages includes both semantic tagging and OWL, then 
> > we're not really using the power of OWL.  In fact, if we have to use 
> > only the common features of the two, we might as well implement our 
> > OWL ontology so that there is only one type of class with a single 
> > property called "tagvalue".
> >
> > 6.) Different representation languages enable different functionality 
> > in the software that consumes them.  For instance, client software 
> > that consumes RDFS or OWL instances often expand searches to encompass 
> > instances of superclasses.  In other words, software designed to use 
> > semantic web technologies can do some of the work a human user might 
> > otherwise have to do by exploiting the features of semantic web 
> > languages.  Software designed to use semantic tags often doesn't do 
> > much more than search and statistical correlation between tag 
> > instances.  This is quite powerful in it's own way, but because 
> > semantic tags were designed to indicate the context of a document, not 
> > necessarily its contents, semantic tagging really only helps a user to 
> > locate documents of interest.  A document with tags is ultimately read 
> > by a human, not a machine.  Every representation language carries with 
> > it assumptions about how "documents" that are instances of that 
> > language will be used.
> >
> >
> > To navigate you need a fixed point.  To move the world you need a 
> > fulcrum.  Because representation languages provide different features 
> > and make different assumptions about how their instances will be used, 
> > it makes sense to use representation language as the fixed point of 
> > our designs and leave data types and service interfaces free to vary.  
> > Some have argued that the TDWG ontology is the fixed point in our 
> > constellation of services, but I disagree.  It is the umbrella under 
> > which data integration will occur; there will always be extensions to 
> > the core ontology and it too will change over time as it is expanded.
> >
> > Overall I think it's a laudable goal to support as many representation 
> > languages as possible, but there are so many headaches and compromises 
> > involved that we may end up with an expensive solution that, because 
> > it only supports the lowest common denominator of functionality, 
> > doesn't really work right for anybody.  A case in point is the current 
> > discussion of namespaces.  In order to make namespaces work across the 
> > widest range of representation languages, it's been proposed that they 
> > can no longer be used as packages to logically partition the larger 
> > ontology.  This makes it harder to manage extensions to the ontology 
> > and makes it likely that we'll end up using 
> > veryLongClassAndPropertyNamesToTryToAvoidNamespaceClashes.  And you 
> > still can't represent namespaces in semantic tags.
> >
> > It's hard enough to write software that can cope with any data type 
> > and I'd rather spend energy, time, and money on getting it right with 
> > only one or two feature-rich representations.  What I'd really like to 
> > see is a network of heterogeneously typed, highly integrated data 
> > objects and a rich set of services that operate on them.  Once this is 
> > built, the real fun can begin, creating software that uses these data 
> > to answer important scientific questions.
> >
> > -Steve
> >
> >
> >
> >
> >
> > Roger Hyam wrote:
> >>
> >> Thanks for forwarding this Sally.
> >>
> >> What I am proposing at St Louis - though I seem to been having to 
> >> propose it long before - is that we have an application for managing 
> >> the ontology that will expose the underlying semantics in multiple 
> >> 'formats' i.e. as RDFS or OWL ontologies as GML application schemas, 
> >> as custom XML Schemas as OBO ontologies etc etc. I see no other way 
> >> of integrating multiple technologies. (Suggested alternatives welcome).
> >>
> >> One of the things on my list is micro formats along with tagging. It 
> >> seems crazy to define a 'specificEpithet' in a TDWG ontology and then 
> >> not use exactly the same concept in a micro format or as a tag.
> >>
> >> So this is timely. I just can't act on it very well before St Louis. 
> >> I'll add something to the wiki page to flag my/our interest.
> >>
> >> Thanks,
> >>
> >> Roger
> >>
> >>
> >> Sally Hinchcliffe wrote:
> >>> Hi all
> >>>
> >>> This is probably on the wrong list (Maybe TAG?) but it strikes me 
> >>> that what this guy needs is an ontology that he can use in his 
> >>> microformats ...
> >>>
> >>> Possibly an example of a real world need for ontologies ?
> >>>
> >>> Sally
> >>>
> >>> ------- Forwarded message follows -------
> >>> Date sent:          Tue, 26 Sep 2006 09:34:04 -0000
> >>> To:                 <sh00kg at rbgkew.org.uk>
> >>> Subject:            Fwd: [TDWG] Announce: Proposal for "microformat" 
> >>> for marking-up taxonomic names in HTML: comments and contributions 
> >>> sought
> >>> From:               <M.Jackson at kew.org>
> >>> Send reply to:      M.Jackson at rbgkew.org.uk
> >>>
> >>> Sally,
> >>>
> >>> Do you think you might respond to this? Just curious what you think.
> >>>
> >>> Mark
> >>> ----
> >>> Forwarded From: Andy Mabbett <andy at pigsonthewing.org.uk>
> >>>
> >>>  
> >>>> Hello - my first post to this mailing list.
> >>>>
> >>>> I'm not a taxonomist, but I've been told by one that you might be
> >>>> interested in recent proposals for a formula (a "microformat"
> >>>> <http://microformats.org>) for marking-up, in HTML, the names of 
> >>>> species
> >>>> (and other ranks, varieties, hybrids, etc.).
> >>>>
> >>>> Microformats are a way of adding additional, simple markup to
> >>>> human-readable data items on web pages, using common and open HTML
> >>>> standards, so that the information can be extracted by software and
> >>>> indexed, searched for, saved, cross-referenced or aggregated.
> >>>> Microformats are also open standards, freely available for anyone to
> >>>> use.
> >>>>
> >>>> The proposed format respects all existing biological taxonomies, 
> >>>> and is
> >>>> not intended to change or supplant any of them - it merely provides
> >>>> webmasters with a method of either:
> >>>>
> >>>>    1)   marking-up a taxonomical name (or taxon-common name pair) in
> >>>>         such a way that its components can be recognised by computers
> >>>>
> >>>> or
> >>>>
> >>>>    2)   marking up a common name, so as to associative with it a
> >>>>         taxonomical name, in such a way that the latter's 
> >>>> components can
> >>>>         be recognised by computers
> >>>>
> >>>> For instance, if I mark up a list of common names on a page I 
> >>>> maintain:
> >>>>
> >>>>    <http://www.westmidlandbirdclub.com/staffs/tittesworth/latest.htm>
> >>>>
> >>>> using that microformat, a visitor might have browser tool which lists
> >>>> all the species on the page, sorted into alphabetical order within
> >>>> taxonomic class, or in taxonomic order, and then creates links to, say
> >>>> (for Joe Public) their entries in Wikipedia, or the British Trust for
> >>>> Ornithology, or (for scientists) some academic database of the users
> >>>> choosing.
> >>>>
> >>>> Early thoughts on the format are on an editable "wiki", here:
> >>>>
> >>>>         <http://microformats.org/wiki/species>
> >>>>
> >>>> Please feel free to participate - the proposal needs both messages of
> >>>> support (particularly from people or organisations who have 
> >>>> websites on
> >>>> which they might use them) and, especially, comments and constructive
> >>>> criticisms - does the proposal understand and use taxonomy 
> >>>> correctly; is
> >>>> the terminology right, are there any omissions or overlooked, unusual
> >>>> naming conventions?
> >>>>
> >>>> You can use the above wiki, or the microformats mailing list:
> >>>>
> >>>>         <http://microformats.org/wiki/mailing-lists>
> >>>>
> >>>> and/ or please feel free to pass this e-mail to other interested
> >>>> parties.
> >>>>
> >>>> Thank you.
> >>>>
> >>>> -- 
> >>>> Andy Mabbett
> >>>> Birmingham, England
> >>>>
> >>>> _______________________________________________
> >>>> TDWG mailing list
> >>>> TDWG at mailman.nhm.ku.edu
> >>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg
> >>>>
> >>>>     
> >>>
> >>>
> >>>
> >>>   
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> TDWG-GUID mailing list
> >> TDWG-GUID at mailman.nhm.ku.edu
> >> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>   
> >
> >
> 
> 
> -- 
> 
> -------------------------------------
>  Roger Hyam
>  Technical Architect
>  Taxonomic Databases Working Group
> -------------------------------------
>  http://www.tdwg.org
>  roger at tdwg.org
>  +44 1578 722782
> -------------------------------------
> 
> 

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk





More information about the tdwg-tag mailing list