Re: [tdwg-tag] SPM Categories or Subclassing - again.

8 Oct 2007

      Donald, Roger,
The problem we saw with TAPIR is that there are only 2 new concepts defined
by SPM - category and value. If you do not want to map anything more than
this in TAPIR models, you probably can produce SPM RDF. But if you want to
only map the descriptions or distributions you are in trouble.

Imagine a simple database table with 3 columns: taxon, category and value.
You could map them to 3 concepts namely SPM/taxon, SPM/category and
SPM/value. You can then create an output model similar to this one here:

<outputModel
    <structure>
      <schema location="http://rs.tdwg.org/tapir/rs/SPM.xsd"/>
    </structure>
    <indexingElement path="/SpeciesProfileModel/hasInformation/InfoItem"/>
    <mapping>
        <node path="/SpeciesProfileModel/aboutTaxon/@rdf:resource">
            <concept id="SPM/taxon"/>
        </node>
        <node 
path="/SpeciesProfileModel/hasInformation/InfoItem/category/@rdf:resource">
            <concept id="SPM/category"/>
        </node>
        <node path="/SpeciesProfileModel/hasInformation/InfoItem/value">
            <concept id="SPM/value"/>
        </node>
    </mapping>
</outputModel>

With current provider implementations you will get a new SpeciesProfileModel
element for every record in your database, i.e. repeating taxa for every
InfoItem. This is valid SPM and you can actually search on it too with TAPIR
like Donald wanted to. So here you go. You would just need to know (and
understand) what categories this provider is using. You can get those
through an inventory instead of the capabilities, not a real problem.

If someone has a table with a taxon and a colum for description, one for
economic use and one for something else they would have to transform it into
a simple 3 column table.

My understanding of SPM in the beginning was just that I thought an initial
broad vocabulary should be part of SPM. I still think this is very much
needed, otherwise you all "mashup" will be done by the human enduser who is
presented with a nice long list of facts, sorted alphabetically.

Personally I am reluctant which approach to take. The tagging seems more
flexible, so yes, lets go for it!

But I cant see why OWL classes are not suited for defining a reusable,
external vocabulary. Isn't this what they are made for and what GO and all
the other ontologies use them for rather than just defining data exchange
formats? At least the tagging approach would free us from imposing a certain
definition language. I guess some can use SKOS, some OWL and some others
just RDFS. And of course there will be many different semantic definitions
in all those languages.

Markus

"Donald Hobern" wrote on 08.10.2007 13:53 Uhr:
...
Roger,
Your analysis (and the four points) seem perfectly correct to me.  I prefer a
plain tagging approach and am particularly dubious about the long term
benefits of subclasses which represent multiple inheritance because this would
seem to place a greater burden on consumers.
However I am not so sure that the tagging approach does prevent the use of
TAPIR.  Surely the user is expected to be searching for InfoItems and each
InfoItem will include a relative path "./category/@rdf:resource" which could
be searchable.  In other words you could search for InfoItems where the
concept identified by "./category/@rdf:resource" has the value
"some.resource.org/123". Won't that work?  Won't it return any InfoItem which
has any combination of category elements provided at least one is identified
to the given resource?
Of course searching for InfoItems like this may prevent us from using search
elements outside the scope of the InfoItem - would this stop us from using a
ScientificName specified against the SpeciesProfileModel element?  In other
words, would it be better to move the aboutTaxon down to the InfoItem?
Donald
Roger Hyam wrote:
...
Hi All,
I'd like to re-ignite a debate we had over the summer in the light of what
was discussed at Bratislava and things I am thinking about now.
The original way the Species Profile Model was structure was to have
something like the attached UML diagram tagging.png
We didn't define a class "Category" but just left it as being any valid URI
in the RDF style of doing things.
Markus and several others pointed out that this would not work with TAPIR
very well as all the InfoItems will look the same. Something like this:
<SpeciesProfileModel>
    <hasInformation>
        <InfoItem>
            <category rdf:resource="some.resource.org/123" />
            <....>
        </InfoItem>
    </hasInformation>
    <hasInformation>
        <InfoItem>
            <category rdf:resource="some.resource.org/124" />
            <....>
        </InfoItem>
    </hasInformation>
    <hasInformation>
        <InfoItem>
            <category rdf:resource="some.resource.org/125" />
            <....>
        </InfoItem>
    </hasInformation>
</SpeciesProfileModel>
The xpaths to the data represented by <...> would all be the same.
They suggested something like the next diagram:
Where info item is subclassed. This allows RDF to be serialized like this:
<SpeciesProfileModel>
    <hasInformation>
        <Ecology>
            <....>
        </Ecology >
    </hasInformation>
    <hasInformation>
        <Behaviour>
            <....>
        </Behaviour>
    </hasInformation>
    <hasInformation>
        <BehaviouralEcology>
            <....>
        </BehaviouralEcology>
    </hasInformation>
</SpeciesProfileModel>
The xpaths to the <...> are all different and the model becomes TAPIR
friendly.
At the time I wasn't too bothered either way because I saw this as an
artifact of serialization. The same thing could be written.
<SpeciesProfileModel>
    <hasInformation>
        <InfoItem>
            <rdf:type rdf:resource="some.resource.org/Ecology" />
            <....>
        </InfoItem>
    </hasInformation>
    <hasInformation>
        <InfoItem>
            <rdf:type rdf:resource="some.resource.org/Behaviour" />
            <....>
        </InfoItem>
    </hasInformation>
    <hasInformation>
        <InfoItem>
            <rdf:type rdf:resource="some.resource.org/BehaviouralEcology" />
            <....>
        </InfoItem>
    </hasInformation>
</SpeciesProfileModel>
With more or less that same meaning and looking just like a tagging example.
This was a mistake on my part because of course it isn't the same meaning
just a similar serialization - an RDF type really needs to point to a class
and can't point to any old vocabulary that it could if it was treated like a
tag.
I have just added the category property back into InfoItem for discussion
purposes.
I am concerned about going down the subclassing route for a few reasons.
1) Building a class hierarchy is difficult. The discussions we had in
Bratislava highlighted this. The examples put together in the current
vocabulary sparked a debate as to how things should be organized. There are
issues with the diamond problem in multiple inheritance - if contradictory
semantics occur in multiple inheritance routes to a class which has priority?
If we don't have multiple inheritance how do we have InfoItems that are about
both Ecology and Behaviour for example or any other two subjects such as
Dispersal and Asexual reproduction.
2) If a class hierarchy is required for analysis/inference it can be
superimposed on a tag based transfer protocol using OWL necessary and
sufficient properties. In fact this is arguably a better way of approaching
the situation than building a class hierarchy a priori.
3) The motivation for going down the subclassing route is largely so it will
work with the transport protocol - which sets alarm bells ringing for me.
4) It precludes us from using something like SKOS  to build our categories.
"SKOS or Simple Knowledge Organisation System is a family of formal languages
designed for representation of thesauri, classification schemes, taxonomies,
subject-heading systems, or any other type of structured controlled
vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to
enable easy publication of controlled structured vocabularies for the
Semantic Web. SKOS is currently developed within the W3C framework."
(http://en.wikipedia.org/wiki/SKOS)
I would really like to be able to say to "domain experts" that they "just"
need to build a SKOS vocabulary for the terms used in their domain and then
use the URIs of these terms for tagging information that is passed between
providers than specify that they need to construct a more formal ontology of
classes. I appreciate that one could argue that SKOS terms are like classes
but they form part of a structure that is specifically designed for having
the debates that TDWG needs to have around the vocabulary part of the
standards (rather than the exchange parts).
I would be grateful for peoples thoughts and criticisms.
All the best,
Roger
_______________________________________________
tdwg-tag mailing list
tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Re: [tdwg-tag] SPM Categories or Subclassing - again.

Döring, Markus