<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Hi Everyone,<br>

<br>

I am cross posting this to the TCS list and the TAG list because it is

relevant to both but responses should fall neatly into things to do

with nomenclature (for the TCS list) and things to do with technology -

for the TAG list. The bit about avowed serializations of RDF below are

TAG relevant.<br>

<br>

The move towards using LSIDs and the implied use of RDF for metadata

has lead to the question: "Can we do TCS is RDF?". I have put together

a package of files to encode the TaxonName part of TCS as an RDF

vocabulary. It is not 100% complete but could form the basis of a

solution.<br>

<br>

You can download it here:<br>

<a class="moz-txt-link-freetext" href="http://biodiv.hyam.net/schemas/TCS_RDF/tcs_rdf_examples.zip">http://biodiv.hyam.net/schemas/TCS_RDF/tcs_rdf_examples.zip</a><br>

<br>

For the impatient you can see a summary of the vocabulary here:

<a class="moz-txt-link-freetext" href="http://biodiv.hyam.net/schemas/TCS_RDF/TaxonNames.html">http://biodiv.hyam.net/schemas/TCS_RDF/TaxonNames.html</a><br>

<br>

and an example xml document here:

<a class="moz-txt-link-freetext" href="http://biodiv.hyam.net/schemas/TCS_RDF/instance.xml">http://biodiv.hyam.net/schemas/TCS_RDF/instance.xml</a><br>

<br>

It has actually been quite easy (though time consuming) to represent

the semantics in the TCS XML Schema as RDF. Generally elements within

the TaxonName element have become properties of the TaxonName class

with some minor name changes. Several other classes were needed to

represent NomenclaturalNotes and Typification events. The only

difficult part was with Typification. A nomenclatural type is both a

property of a name and, if it is a lectotype, a separate object that

merely references a type and a name. The result is a compromise in an

object that can be embedded as a property. I use instances for

controlled vocabularies that may be controversial or may not.<br>

<br>

What is lost in only using RDFS is control over validation. It is not

possible to specify that certain combinations of properties are

permissible and certain not. There are two approaches to adding more

'validation':<br>

<h3>OWL Ontologies</h3>

An OWL ontology could be built that makes assertions about the items in

the RDF ontology. It would be possible to use necessary and sufficient

properties to assert that instances of TaxonName are valid members of

an OWL class for BotanicalSubspeciesName for example. In fact far more

control could be introduced in this way than is present in the current

XML Schema. What is important to note is that any such OWL ontology

could be separate from the common vocabulary suggested here. Different

users could develop their own ontologies for their own purposes. This

is a good thing as it is probably impossible to come up with a single,

agreed ontology that handles the full complexity of the domain.<br>

<br>

I would argue strongly that we should not build a single central

ontology that summarizes all we know about nomenclature - we couldn't

do it within my lifetime :)<br>

<h3>Avowed Serializations <br>

</h3>

Because RDF can be serialized as XML it is possible for an XML document

to both validate against an XML Schema AND be valid RDF.&nbsp; This may be a

useful generic solution so I'll explain it here in an attempt to make

it accessible to those not familiar with the technology.<br>

<br>

The same RDF data can be serialized in XML in many ways and different

code libraries will do it differently though all code libraries can

read the serializations produced by others. It is possible to pick one

of the ways of serializing a particular set of RDF data and design a

XML Schema to validate the resulting structure. I am stuck for a way to

describe this so I am going to use the term 'avowed serialization'

(Avowed means 'openly declared') as opposed to 'arbitrary

serialization'. This is the approach taken by the <a

 href="http://www.prismstandard.org">prismstandard.org </a>group for

their standard and it gives a number of benefits as a bridging

technology:<br>

<ol>

  <li>Publishing applications that are not RDF aware (even simple

scripts) can produce regular XML Schema validated XML documents that

just happen to also be RDF compliant.</li>

  <li>Consuming applications can assume that all data is just RDF and

not worry about the particular XML Schema used. These are the

applications that are likely to have to merge different kinds of data

from different suppliers so they benefit most from treating it like RDF.</li>

  <li>Because it is regular structured XML it can be transformed using

XSLT into other document formats such as 'legacy' non-RDF compliant

structures - if required.</li>

</ol>

There is one direction that data would not flow without some effort.

The same data published in an arbitrary serialization rather than the

avowed one could be transformed, probably via several XSLT steps, into

the avowed serialization and therefore made available to legacy

applications using 3 above. This may not be worth the bother or may be

useful. Some of the code involved would be generic to all

transformations so may not be too great. It would certainly be possible

for restricted data sets.<br>

<br>

To demonstrate this instance.xml is included in the package along with

avowed.xsd and two supporting files. instance.xml will validate against

avowed.xsd and parse correctly in the w3c RDF parser.<br>

<br>

I have not provided XSLT to convert instance.xml to the TCS standard

format though I believe it could be done quite easily if required.

Converting arbitrary documents from the current TCS to the structure

represented in avowed.xsd would be more tricky but feasible and

certainly possible for restricted uses of the schema that are typical

from individual data suppliers.<br>

<h3>Contents</h3>

This is what the files in this package are:<br>

<br>

README.txt = this file<br>

TaxonNames.rdfs = An RDF vocabulary that represents TCS TaxonNames

object.<br>

TaxonNames.html = Documentation from TaxonNames.rdfs - much more

readable.<br>

instance.xml = an example of an XML document that is RDF compliant use

of the vocabulary and XML Schema compliant.<br>

avowed.xsd = XML Schema that instance.xml validates against.<br>

dc.xsd = XML Schema that is used by avowed.xsd.<br>

taxonnames.xsd = XML Schema that is used by avowed.xsd.<br>

rdf2html.css = the style formatting for TaxonNames.html<br>

rdfs2html.xsl = XSLT style sheet to generate docs from TaxonNames.rdfs<br>

tcs_1.01.xsd = the TCS XML Schema for reference.<br>

<h3>Needs for other Vocabularies</h3>

What is obvious looking at the vocabulary for TaxonNames here is that

we need vocabularies for people, teams of people, literature and

specimens as soon as possible.<br>

<h3>Need for conventions</h3>

In order for all exchanged objects to be discoverable in a reasonable

way we need to have conventions on the use of rdfs:label for Classes

and Properties and dc:title for instances.<br>

<br>

The namespaces used in these examples are fantasy as we have not

finalized them yet. <br>

<h3>Minor changes in TCS</h3>

There are a few points where I have intentionally not followed TCS 1.01

(there are probably others where it is accidental). <br>

<ul>

  <li>basionym is a direct pointer to a TaxonName rather than a

NomenclaturalNote. I couldn't see why it was a nomenclatural note in

the 1.01 version as it is a simple pointer to a name.<br>

  </li>

  <li>changed name of genus element to genusEpithet&nbsp; property. The

contents of the element are not to be used alone and are not a genus

name in themselves (uninomial should be used in this case) so

genusEpithet is more appropriate - even if it is not common English

usage.<br>

  </li>

  <li>Addition of referenceTo property. The vocabulary may be used to

mark up an occurrence of a name that is not a publishing of a new name.

In these cases the thing being marked up is actually a pointer to

another object, either a TaxonName issued by a nomenclator or a

TaxonConcept. In these cases we need to have a reference field. Here is

an example (assuming namespace) &lt;TaxonName

referenceTo="urn:lsid:example.com:myconcepts:1234"&gt;&lt;genusEpithet&gt;Bellis&lt;/genusEpithet&gt;&lt;specificEpithet&gt;perennis&lt;/specificEpithet&gt;&lt;/TaxonName&gt;

This could possibly appear in a XHTML document for example.<br>

  </li>

</ul>

<h3>Comments Please</h3>

All this amounts to a complex suggestion of how things could be done.

i.e. we develop central vocabularies that go no further than RDFS but

permit exchange and validation of data using avowed serializations and

OWL ontologies.<br>

<br>

What do you think?<br>

<br>

Roger<br>

<br>

<br>

<pre class="moz-signature" cols="72">-- 

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>

 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>

 +44 1578 722782

-------------------------------------

</pre>

</body>

</html>