RE: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 2

10 Oct 2007

      Hi Guys

While I am far from understanding the technicalities, I do know that there
must be a clear path from SKOS (if you start there) to Semantic Web
technologies like OWL (lite or whatever). Maybe there is already.

Lee

Lee Belbin
Manager, TDWG Infrastructure Project
Email: lee@tdwg.org
Phone: +61(0)419 374 133 
-----Original Message-----
From: tdwg-tag-bounces@lists.tdwg.org
[mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Eamonn O Tuama
Sent: Wednesday, 10 October 2007 2:25 AM
To: tdwg-tag@lists.tdwg.org
Subject: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 2

Sorry, I only saw the latest additions to this thread now. 

The example database that Marcus presents represents a common way that
people would be providing data. I think that if can agree on a common flat
vocabulary of broad terms (as begun at the SPM workshop) we can at least
provide his alphabetically sorted list of categorised statements back to a
consumer (still pretty useful, I think). This could be expressed as a SKOS
vocabulary as suggested by Roger but we could eventually take it further as
he also suggests and develop and OWL ontology for reasoning over the
categories.

Gregor asked if TAPIR finds  "it easier to search elements having specific
attribute values than elements having child elements with specific value"
like so -
<hasInformation>
 <InfoItem
category="voc.x.org/DescriptiveConcepts/Size<http://some.resource.org/123">
   <....>
  </InfoItem>

This could be important if we were to adopt a model where an InfoItem can
have multiple categories, like so:

<hasInformation>
 <InfoItem>

<category="voc.x.org/DescriptiveConcepts/Dispersal<http://some.resource.org/
123"/>

<category="voc.x.org/DescriptiveConcepts/Reproduction<http://some.resource.o
rg/123"/>
   <....>
  </InfoItem>

Éamonn 

-----Original Message-----
From: tdwg-tag-bounces@lists.tdwg.org
[mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of
tdwg-tag-request@lists.tdwg.org
Sent: 09 October 2007 12:00
To: tdwg-tag@lists.tdwg.org
Subject: tdwg-tag Digest, Vol 22, Issue 2

Send tdwg-tag mailing list submissions to
	tdwg-tag@lists.tdwg.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.tdwg.org/mailman/listinfo/tdwg-tag
or, via email, send a message with subject or body 'help' to
	tdwg-tag-request@lists.tdwg.org

You can reach the person managing the list at
	tdwg-tag-owner@lists.tdwg.org

When replying, please edit your Subject line so it is more specific than
"Re: Contents of tdwg-tag digest..."

Today's Topics:

   1. Re: SPM Categories or Subclassing - again. (Gregor Hagedorn)
   2. Re: SPM Categories or Subclassing - again. (D ? ring, Markus)

----------------------------------------------------------------------

Message: 1
Date: Mon, 8 Oct 2007 14:41:39 +0200
From: "Gregor Hagedorn" <g.m.hagedorn@gmail.com>
Subject: Re: [tdwg-tag] SPM Categories or Subclassing - again.
To: tdwg-tag@lists.tdwg.org
Message-ID:
	<5ebbead70710080541l3cba340cyf2ebced50b4b9598@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

In SDD we used what is here called the "tagging approach", i.e. for
descriptive concepts and characters we use

<Text ref="some.resource.org/123"><Content>Free form text</Content></Text>

In SDD, all element/attribute names refer to classes/attributes in UML or
Java, whereas the xml values are corresponding values. Expressing concepts
and characters as classes was considered impractical if the aim is generic
software (classes are creatable at runtime, but at a price) because the
number of characters is often very large.

My experience shows that the number of character in identification sets
varies between 50 and over 1000. Some studies in manual data integration (in
LIAS) convinced me that the number of "integratable" characters is often
less than imagined. A similar case study may also be the FRIDA data sets,
which limit the number of common characters (common to all plant families)
to 200, and from thereon use separate characters for each family.

Clearly, when limiting the approach to just "very high level concepts", both
class and value approaches are possible. The benefit of the SPM 0.2 approach
is that it does enable reasoning (as well as may simplify tapir usage).
However, as presented in Bratislava, I am doubtful that this limit holds. In
my experience the limit between free form text and categorical data tends be
diffuse rather than sharp and my intuition is that the SPM mechanism, being
extensible, will be extended.

I would like to note that the current TAG strategy is to provide both RDF
and xml schema. The schema approach to SPM, however, seems to be more and
more undesirable as the number of descriptive concepts grows.

I have a question for TAPIR:

Does TAPIR find it easier to search elements having specific attribute
values than elements having child elements with specific value:

<hasInformation>
 <InfoItem
category="voc.x.org/DescriptiveConcepts/Size<http://some.resource.org/123>">

   <....>
  </InfoItem>

I might be useful to know. Although RDF allows this for literals, it seems
to require the rdf:resource="" step for categorical data, so this may or may
not help.

----------

PS: SDD calls the concepts "characters" based on the tradition and character
is clearly inappropriate. However, Category to me seems to express nothing
other than the type of the value. All contextValue and Value in SPM in
version 0.2 refer to categories. Can anyone suggest a better term? What is
the opposite of "value"?

I can think of "Class" (value = instance), or - but perhaps too much limited
to descriptions - "Feature" (also in GML). Anything else?

Gregor

-----------------------------------------------------
Gregor Hagedorn (G.M.Hagedorn@gmail.com) Institute for Plant Virology,
Microbiology, and Biosafety Federal Research Center for Agriculture and
Forestry (BBA)
Kvnigin-Luise-Str. 19      Tel: +49-30-8304-2220
14195 Berlin, Germany      Fax: +49-30-8304-2203