[Tdwg-guid] Jena examples?

newer
Re: [Tdwg-guid] resolving deleted...

peter.hollas＠thomson.com

25 Sep 2006 25 Sep '06

08:02

Hi Everyone, Does anyone have/or know of any good examples of using Jena to output RDF metadata with non-typical namespaces such as TCS/RDF? Currently I have the following code for my metadata method(I know it's wrong!). String TN_NS = "http://tdwg.org/2006/03/12/TaxonNames/"; ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); Model model = ModelFactory.createDefaultModel(); Resource res = model.createResource("http://test"); res.addProperty(model.createProperty(TN_NS,"testProperty"),"test"); model.write(byteStream, "RDF/XML-ABBREV"); return new MetadataResponse(new ByteArrayInputStream(byteStream.toByteArray()),null,MetadataResponse.RDF _FORMAT); ...which gives me. <rdf:RDF xmlns:j.0="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://test"> <j.0:testProperty >test</j.0:testProperty> </rdf:Description> </rdf:RDF> ...whereas I could really do with something along the lines of the IPNI metadata. <rdf:RDF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:tn="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">  <tn:TaxonName rdf:about="urn:lsid:ipni.org:names:30000959-2:1.1.2.1"> <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> <dc:title>Amaryllidaceae J.St.-Hil.</dc:title> <dcterms:created>2004-01-20 00:00:00.0</dcterms:created> <dcterms:modified>2005-06-23 15:45:33.0</dcterms:modified> <tn:rankString>fam.</tn:rankString> <tn:nameComplete>Amaryllidaceae</tn:nameComplete> <tn:uninomial>Amaryllidaceae</tn:uninomial> <tn:authorship>J.St.-Hil.</tn:authorship> <tn:publishedIn>Expos. Fam. Nat. 1: 134. 1805 [Feb-Apr 1805]</tn:publishedIn> <tn:year>1805</tn:year> <tn:typifiedBy> <tn:NomenclaturalType> <dc:title>Amaryllis Linnaeus, nom. cons.</dc:title> </tn:NomenclaturalType> </tn:typifiedBy> </tn:TaxonName> I think the main problem I have is creating resources that aren't defined as vocabularies within Jena already. I can't seem to find a way to create that <tn:TaxonName root resource element. I'm starting to think that just outputting vanilla text could be the best option here, or am I barking up the wrong tree? Many thanks, Peter. Peter Hollas MSc BSc(hons) (Peter.Hollas@thomson.com) Software Engineer /Systems Administrator Thomson Zoological Innovation Centre York Science Park Heslington York YO10 5DG Tel: 01904-435113 Fax: 01904-435114

Show replies by date

Benjamin H Szekely

25 Sep 25 Sep

08:32

Hi Peter, In XML/ABBREV mode, the serializer uses rdf:Types for element names. Since your example doesn't have an rdf:Type, it uses the default Jena element names. - Ben Ben Szekely IBM Software Engineer Advanced Internet Technology, Cambridge, MA bhszekel@us.ibm.com tdwg-guid-bounces@mailman.nhm.ku.edu wrote on 09/25/2006 11:02:30 AM:

...

Hi Everyone,

Does anyone have/or know of any good examples of using Jena to output RDF metadata with non-typical namespaces such as TCS/RDF?

Currently I have the following code for my metadata method(I know it's wrong!).

String TN_NS = "http://tdwg.org/2006/03/12/TaxonNames/"; ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); Model model = ModelFactory.createDefaultModel(); Resource res = model.createResource("http://test"); res.addProperty(model.createProperty(TN_NS,"testProperty"),"test");

model.write(byteStream, "RDF/XML-ABBREV"); return new MetadataResponse(new ByteArrayInputStream(byteStream.toByteArray()),null,MetadataResponse.RDF _FORMAT);

...which gives me.

<rdf:RDF xmlns:j.0="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://test"> <j.0:testProperty >test</j.0:testProperty> </rdf:Description> </rdf:RDF>

...whereas I could really do with something along the lines of the IPNI metadata.

<rdf:RDF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:tn="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

 <tn:TaxonName rdf:about="urn:lsid:ipni.org:names:30000959-2:1.1.2.1"> <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> <dc:title>Amaryllidaceae J.St.-Hil.</dc:title> <dcterms:created>2004-01-20 00:00:00.0</dcterms:created> <dcterms:modified>2005-06-23 15:45:33.0</dcterms:modified> <tn:rankString>fam.</tn:rankString> <tn:nameComplete>Amaryllidaceae</tn:nameComplete> <tn:uninomial>Amaryllidaceae</tn:uninomial> <tn:authorship>J.St.-Hil.</tn:authorship> <tn:publishedIn>Expos. Fam. Nat. 1: 134. 1805 [Feb-Apr 1805]</tn:publishedIn> <tn:year>1805</tn:year> <tn:typifiedBy> <tn:NomenclaturalType> <dc:title>Amaryllis Linnaeus, nom. cons.</dc:title> </tn:NomenclaturalType> </tn:typifiedBy> </tn:TaxonName>

I think the main problem I have is creating resources that aren't defined as vocabularies within Jena already. I can't seem to find a way to create that <tn:TaxonName root resource element.

I'm starting to think that just outputting vanilla text could be the best option here, or am I barking up the wrong tree?

Many thanks, Peter.

Peter Hollas MSc BSc(hons) (Peter.Hollas@thomson.com) Software Engineer /Systems Administrator Thomson Zoological Innovation Centre York Science Park Heslington York YO10 5DG

Tel: 01904-435113 Fax: 01904-435114

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

Steve Perry

12:53

Hi Peter, Ben is right. To get a TaxonName element (instead of an rdf:Description), add the following line of code: res.addProperty(model.createProperty(" http://www.w3.org/1999/02/22-rdf-syntax-ns#type"), TN_NS + "TaxonName"); This should give you output like : <rdf:RDF xmlns:j.0="http://tdwg.org/2006/03/12/TaxonNames" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <j.0:TaxonName rdf:about="http://test"> <j.0:testProperty>test</j.0:testProperty> </j.0:TaxonName> </rdf:RDF> XML namespace prefixes are not preserved in RDF so Jena will supply its own prefixes when serializing to one of the XML formats (hence the j.0 instead of tn). By the way, the IPNI example you cite has an error: <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like: <tn:nomenclaturalCode rdf:resource="tn:botanical" /> However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like: <tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" /> -Steve Benjamin H Szekely wrote:

...

Hi Peter, In XML/ABBREV mode, the serializer uses rdf:Types for element names. Since your example doesn't have an rdf:Type, it uses the default Jena element names.

- Ben

Ben Szekely IBM Software Engineer Advanced Internet Technology, Cambridge, MA bhszekel@us.ibm.com

tdwg-guid-bounces@mailman.nhm.ku.edu wrote on 09/25/2006 11:02:30 AM:

...
Hi Everyone,

Does anyone have/or know of any good examples of using Jena to output RDF metadata with non-typical namespaces such as TCS/RDF?

Currently I have the following code for my metadata method(I know it's wrong!).

String TN_NS = "http://tdwg.org/2006/03/12/TaxonNames/"; ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); Model model = ModelFactory.createDefaultModel(); Resource res = model.createResource("http://test"); res.addProperty(model.createProperty(TN_NS,"testProperty"),"test");

model.write(byteStream, "RDF/XML-ABBREV"); return new MetadataResponse(new ByteArrayInputStream(byteStream.toByteArray()),null,MetadataResponse.RDF _FORMAT);

...which gives me.

<rdf:RDF xmlns:j.0="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://test"> <j.0:testProperty >test</j.0:testProperty> </rdf:Description> </rdf:RDF>

...whereas I could really do with something along the lines of the IPNI metadata.

<rdf:RDF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:tn="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

 <tn:TaxonName rdf:about="urn:lsid:ipni.org:names:30000959-2:1.1.2.1"> <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> <dc:title>Amaryllidaceae J.St.-Hil.</dc:title> <dcterms:created>2004-01-20 00:00:00.0</dcterms:created> <dcterms:modified>2005-06-23 15:45:33.0</dcterms:modified> <tn:rankString>fam.</tn:rankString> <tn:nameComplete>Amaryllidaceae</tn:nameComplete> <tn:uninomial>Amaryllidaceae</tn:uninomial> <tn:authorship>J.St.-Hil.</tn:authorship> <tn:publishedIn>Expos. Fam. Nat. 1: 134. 1805 [Feb-Apr 1805]</tn:publishedIn> <tn:year>1805</tn:year> <tn:typifiedBy> <tn:NomenclaturalType> <dc:title>Amaryllis Linnaeus, nom. cons.</dc:title> </tn:NomenclaturalType> </tn:typifiedBy> </tn:TaxonName>

I think the main problem I have is creating resources that aren't defined as vocabularies within Jena already. I can't seem to find a way to create that <tn:TaxonName root resource element.

I'm starting to think that just outputting vanilla text could be the best option here, or am I barking up the wrong tree?

Many thanks, Peter.

Peter Hollas MSc BSc(hons) (Peter.Hollas@thomson.com) Software Engineer /Systems Administrator Thomson Zoological Innovation Centre York Science Park Heslington York YO10 5DG

Tel: 01904-435113 Fax: 01904-435114

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

------------------------------------------------------------------------

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

Sally Hinchcliffe

26 Sep 26 Sep

01:03

Hi Steve /all We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better? Sally

...

By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

Roger Hyam

01:49

Hi Everyone, In the not very comforting words of software vendors these are "known issues" and will be resolved in the next release ;) But seriously.The namespaces in the TCS names vocabulary do not resolve because we didn't have a policy at that time. The use if entity references was 'borrowed' from an example (probably Protege output) and I wouldn't mind doing away with it. I have been thinking long and hard about the namespace issues in the last few weeks and believe I have a solution that I will propose at TDWG St Louis. It would be good to have face to face discussions about it and make a decision there. To briefly summarize: The issue is getting a namespace convention that will work across technologies. Suppose we want to serve data in a technology that "isn't very good at namespaces". As an example - if we were to have separate namespaces for TaxonNames, TaxonConcepts, Specimens, Metadata, GeospatialStuff, Collections and we wanted to validate a document using XML Schema that contained all these things it would require 6 independent schemas each with it's own target namespace. If you have ever tried to debug something like this you will know what total madness it is. We can't just abandon XML Schema because it would rule out not only our existing technologies but GML and probably others... It may also be desirable to express our ontology in things that aren't even XML. The only solution I can think is that for the TDWG Ontology we should have a single formal namespace of: http://rs.tdwg.org/ontology/ Within the ontology we have a convention for concepts that goes like this. A class would be: http://rs.tdwg.org/ontology/tdwg123_MyClass where 123 is the internal id of the class and MyClass is the class name A property in MyClass would be: http://rs.tdwg.org/ontology/tdwg123_myProperty where 123 is the *class* id not the property id and myProperty is the property name. An instance would be http://rs.tdwg.org/ontology/tdwg123_789 where 123 is the *class* id and 789 is the instance id (there is nothing stable about an instance that we could use as an identifier unless we force a label property and make it immutable). In a way this is using the part before the _ as a pseudo namespace. I think this hits the balance between something that is technology independent and something that will produce reasonably human readable documents. It is radical which is why I thought it would be good to talk about it. Any one got an alternative? We have to have a solution for this by the end of the St Louis meeting as it is critical path for ontology work. Most grateful for you patience and any thoughts you have. Roger Sally Hinchcliffe wrote:

...

Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

-- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

Sally Hinchcliffe

2 Oct 2 Oct

05:52

New subject: literal enumerated values in TCS-RDF

I'm trying to nail down a final spec for the IPNI LSID server if I can. Steve Perry raised the point, over on the tdwg-guid mailing list, that the following bit of output from the IPNI server was an error: <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> This syntax has been used in a number of places - for the value for the nomenclaturalCode, for the nomenclaturalNote noteType (eg. basedOn) and for the typeOfTyp (eg. holo) - basically wherever there was an enumeration in the original schema. Roger's response is below - but I'm not quite clear how that will work in terms of replacing the syntax Steve found was failing. I'm not sure from Roger's email whether he really meant we should use numeric ids for all literal enumerated values, or whether those were merely place holders - but if so it would seem to make the resulting rdf willfully unreadable. Roger - can you explain with a more concrete example? Sally

...

Hi Everyone,

In the not very comforting words of software vendors these are "known issues" and will be resolved in the next release ;)

But seriously.The namespaces in the TCS names vocabulary do not resolve because we didn't have a policy at that time. The use if entity references was 'borrowed' from an example (probably Protege output) and I wouldn't mind doing away with it.

I have been thinking long and hard about the namespace issues in the last few weeks and believe I have a solution that I will propose at TDWG St Louis. It would be good to have face to face discussions about it and make a decision there.

To briefly summarize: The issue is getting a namespace convention that will work across technologies. Suppose we want to serve data in a technology that "isn't very good at namespaces". As an example - if we were to have separate namespaces for TaxonNames, TaxonConcepts, Specimens, Metadata, GeospatialStuff, Collections and we wanted to validate a document using XML Schema that contained all these things it would require 6 independent schemas each with it's own target namespace. If you have ever tried to debug something like this you will know what total madness it is. We can't just abandon XML Schema because it would rule out not only our existing technologies but GML and probably others... It may also be desirable to express our ontology in things that aren't even XML.

The only solution I can think is that for the TDWG Ontology we should have a single formal namespace of: http://rs.tdwg.org/ontology/

Within the ontology we have a convention for concepts that goes like this.

A class would be: http://rs.tdwg.org/ontology/tdwg123_MyClass

where 123 is the internal id of the class and MyClass is the class name

A property in MyClass would be: http://rs.tdwg.org/ontology/tdwg123_myProperty

where 123 is the *class* id not the property id and myProperty is the property name.

An instance would be http://rs.tdwg.org/ontology/tdwg123_789

where 123 is the *class* id and 789 is the instance id (there is nothing stable about an instance that we could use as an identifier unless we force a label property and make it immutable).

In a way this is using the part before the _ as a pseudo namespace.

I think this hits the balance between something that is technology independent and something that will produce reasonably human readable documents.

It is radical which is why I thought it would be good to talk about it.

Any one got an alternative?

We have to have a solution for this by the end of the St Louis meeting as it is critical path for ontology work.

Most grateful for you patience and any thoughts you have.

Roger

Sally Hinchcliffe wrote:

...
Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

--

------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

Roderic Page

06:21

New subject: literal enumerated values in TCS-RDF

From an RDF perspective, I would vote for referring to the botanical code (or any other code) using a URI, not a literal, and not a name space reference (such as "tn:botanical) which, as Steve pointed out, RDF parsers can't handle. By referencing a URI, the semantics are clear (I can fetch whatever the URI points to and figure out what it means). In other words, something like <tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" /> as Steve proposed (so long as the URI is actually resolvable, which is another difference between XML schema and RDF --- in RDF the name spaces must be resolvable). I haven't really followed what Roger proposes, but IMHO anything motivated by an attempt to salvage the use of XML schema is bound to end in tears. It's a hack (anything that exposes internal identifiers of a class has got to be hack, surely). Regards Rod On 2 Oct 2006, at 13:52, Sally Hinchcliffe wrote:

...

I'm trying to nail down a final spec for the IPNI LSID server if I can.

Steve Perry raised the point, over on the tdwg-guid mailing list, that the following bit of output from the IPNI server was an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

This syntax has been used in a number of places - for the value for the nomenclaturalCode, for the nomenclaturalNote noteType (eg. basedOn) and for the typeOfTyp (eg. holo) - basically wherever there was an enumeration in the original schema.

Roger's response is below - but I'm not quite clear how that will work in terms of replacing the syntax Steve found was failing. I'm not sure from Roger's email whether he really meant we should use numeric ids for all literal enumerated values, or whether those were merely place holders - but if so it would seem to make the resulting rdf willfully unreadable.

Roger - can you explain with a more concrete example? Sally

...
Hi Everyone,

In the not very comforting words of software vendors these are "known issues" and will be resolved in the next release ;)

But seriously.The namespaces in the TCS names vocabulary do not resolve because we didn't have a policy at that time. The use if entity references was 'borrowed' from an example (probably Protege output) and I wouldn't mind doing away with it.

I have been thinking long and hard about the namespace issues in the last few weeks and believe I have a solution that I will propose at TDWG St Louis. It would be good to have face to face discussions about it and make a decision there.

To briefly summarize: The issue is getting a namespace convention that will work across technologies. Suppose we want to serve data in a technology that "isn't very good at namespaces". As an example - if we were to have separate namespaces for TaxonNames, TaxonConcepts, Specimens, Metadata, GeospatialStuff, Collections and we wanted to validate a document using XML Schema that contained all these things it would require 6 independent schemas each with it's own target namespace. If you have ever tried to debug something like this you will know what total madness it is. We can't just abandon XML Schema because it would rule out not only our existing technologies but GML and probably others... It may also be desirable to express our ontology in things that aren't even XML.

The only solution I can think is that for the TDWG Ontology we should have a single formal namespace of: http://rs.tdwg.org/ontology/

Within the ontology we have a convention for concepts that goes like this.

A class would be: http://rs.tdwg.org/ontology/tdwg123_MyClass

where 123 is the internal id of the class and MyClass is the class name

A property in MyClass would be: http://rs.tdwg.org/ontology/tdwg123_myProperty

where 123 is the *class* id not the property id and myProperty is the property name.

An instance would be http://rs.tdwg.org/ontology/tdwg123_789

where 123 is the *class* id and 789 is the instance id (there is nothing stable about an instance that we could use as an identifier unless we force a label property and make it immutable).

In a way this is using the part before the _ as a pseudo namespace.

I think this hits the balance between something that is technology independent and something that will produce reasonably human readable documents.

It is radical which is why I thought it would be good to talk about it.

Any one got an alternative?

We have to have a solution for this by the end of the St Louis meeting as it is critical path for ontology work.

Most grateful for you patience and any thoughts you have.

Roger

Sally Hinchcliffe wrote:

...
Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

--

------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com

Gregor Hagedorn

06:38

New subject: literal enumerated values in TCS-RDF

Steve writes:

...

...
...
...
However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be

...

...
...
...
the fully qualified resource URI which looks like: <tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

It seems the discussion confuses QNames ("namespace-colon") and XML-Entities (ampersand-semicolon) - or am I confused??? An attempt to clarify: From my understanding, xml itself has no such thing as a namespace-colon in literals - xml-schema has introduced it as a convenient thing (QName). However, the use of xml-entities is a requirement of xml 1.0 itself. I agree with Rod that URIs are correct way for RDF: rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" and under no circumstances (even with RDF-xml-Schema) can we use rdf:resource="tn:/botanical" because RDF does not use QNames. However, we can use (if abbreviation is an issue, and providing an entity definition for it, as the protege examples do.) rdf:resource="&tn;/botanical" If RDF-parsers fail to deal with the latter, they are grossly non-interoperable with xml as a whole. Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203

Steve Perry

20:32

New subject: literal enumerated values in TCS-RDF

I agree with Rod and Gregor that we ought to use full URIs to indicate nomenclatural codes. Gregor is right that we could use rdf:resource="&tn;/botanical" as an abbreviated reference to such a URI, so long as we provide an entity definition for tn in an embedded DTD. If this is done properly, RDF parsers can handle it (because the entity is expanded during XML parsing). However, it's easy to lose the DTD when copying and pasting, when creating your own RDF using someone else's as an example, or when using text processing routines on serialized RDF/XML. DTDs are defined at the document level while non-anonymous/non-blank RDF resources are global and the full description of a global resource can be split among multiple files. To cut down the likelihood of errors we should probably keep it simple and suggest the use of full URIs as a best practice. -Steve Gregor Hagedorn wrote:

...

Steve writes:

...
...
...
...
However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be

...
...
...
...
the fully qualified resource URI which looks like: <tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

It seems the discussion confuses QNames ("namespace-colon") and XML-Entities (ampersand-semicolon) - or am I confused???

An attempt to clarify: From my understanding, xml itself has no such thing as a namespace-colon in literals - xml-schema has introduced it as a convenient thing (QName). However, the use of xml-entities is a requirement of xml 1.0 itself. I agree with Rod that URIs are correct way for RDF:

rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical"

and under no circumstances (even with RDF-xml-Schema) can we use

rdf:resource="tn:/botanical"

because RDF does not use QNames. However, we can use (if abbreviation is an issue, and providing an entity definition for it, as the protege examples do.)

rdf:resource="&tn;/botanical"

If RDF-parsers fail to deal with the latter, they are grossly non-interoperable with xml as a whole.

Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

Sally Hinchcliffe

3 Oct 3 Oct

00:59

New subject: literal enumerated values in TCS-RDF

I'm happy with using rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" and we'll update our example documents and our LSID server accordingly We'll set up our templates so that if the path changes (e.g. the tdwg.org/2006/03/12/TaxonNames/ part) then we can easily update it. Thanks Sally

...

I agree with Rod and Gregor that we ought to use full URIs to indicate nomenclatural codes.

Gregor is right that we could use

rdf:resource="&tn;/botanical"

as an abbreviated reference to such a URI, so long as we provide an entity definition for tn in an embedded DTD. If this is done properly, RDF parsers can handle it (because the entity is expanded during XML parsing). However, it's easy to lose the DTD when copying and pasting, when creating your own RDF using someone else's as an example, or when using text processing routines on serialized RDF/XML.

DTDs are defined at the document level while non-anonymous/non-blank RDF resources are global and the full description of a global resource can be split among multiple files. To cut down the likelihood of errors we should probably keep it simple and suggest the use of full URIs as a best practice.

-Steve

Gregor Hagedorn wrote:

...
Steve writes:

...
...
...
...
However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be

...
...
...
...
the fully qualified resource URI which looks like: <tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

It seems the discussion confuses QNames ("namespace-colon") and XML-Entities (ampersand-semicolon) - or am I confused???

An attempt to clarify: From my understanding, xml itself has no such thing as a namespace-colon in literals - xml-schema has introduced it as a convenient thing (QName). However, the use of xml-entities is a requirement of xml 1.0 itself. I agree with Rod that URIs are correct way for RDF:

rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical"

and under no circumstances (even with RDF-xml-Schema) can we use

rdf:resource="tn:/botanical"

because RDF does not use QNames. However, we can use (if abbreviation is an issue, and providing an entity definition for it, as the protege examples do.)

rdf:resource="&tn;/botanical"

If RDF-parsers fail to deal with the latter, they are grossly non-interoperable with xml as a whole.

Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

Steve Perry

26 Sep 26 Sep

07:15

Hi Sally, No problem. The task was to create a prototype LSID resolver, not to solve all the KR issues surrounding taxon concepts. However, I do think it's time we start talking about these issues. I worry that the prototype resolvers we set up will become de facto reference implementations, that other people will start to construct services modeled on the prototypes without us ever having gone back to talk about what worked and what didn't. I know there are several versions of TCS-in-RDF floating around. I think Roger's is an RDFS document. Rob Gales created an OWL-DL version for the GBIF demonstration project that Jessie and he worked on this summer. Early this year I created a partial implementation in OWL-Lite (that I've since discarded). While each one is "TCS", they're all substantially different in the way they represent TCS classes and properties, in part because the different representation languages (RDFS, OWL-Lite, OWL-DL) have different language features and expressive powers. It would be nice if we could devise one standard RDF implementation of TCS. I don't care which one we use, but I would like to narrow the field to one so we can get the details sorted out. I'm talking about details like resolvable namespaces, typed versus non-typed literals, the use of anonymous resources, and serialization issues like the references-to-resources problem that cropped up in the IPNI example Peter Hollas is working from. These details are quite important because certain decisions taken here can effect the larger network of linked data providers. Take the anonymous resources issues: If you look at the example Peter cites, the typifiedBy property refers to an anonymous NomenclaturalType that has a dc:title. Within a single data provider, this is no big deal because many different data objects can refer to this NomenclaturalType. However the use of anonymous resources can cause big problems when you try to harvest and index the data from multiple providers. It also causes problems for the caching use case. It would be nice to discuss some of these things, perhaps within TAG. -Steve Sally Hinchcliffe wrote:

...

Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

Sally Hinchcliffe

08:18

Hi Steve/all I agree that we should start trying to nail down the TCS/RDF more firmly. I know that Roger wants to discuss this at TDWG but I think some firm proposals are needed to take to the meeting - and some robust examples as well, for those of us who struggle with RDFS, OWL et al and need to see actual implementations to properly engage with the proposals. Otherwise, as you say, we will end up with some de facto standards based on LSID implementations which were never intended to be anything more than proof of concepts. Now is a good time to tackle this for us, because we're re-opening the LSID implementation that we did for IPNI and trying to bring it up to something that will be properly releasable and can be more widely publicised. As part of that work it will be easy to make any changes to the metadata that's produced. Once released, pressures of time and other projects will make it hard to re-open the work and change the metadata output. Is TAG the best place to discuss this? I'm not on that mailing list but I can probably join if the discussion is going to be over there. Let me know Sally

...

Hi Sally,

No problem. The task was to create a prototype LSID resolver, not to solve all the KR issues surrounding taxon concepts. However, I do think it's time we start talking about these issues. I worry that the prototype resolvers we set up will become de facto reference implementations, that other people will start to construct services modeled on the prototypes without us ever having gone back to talk about what worked and what didn't.

I know there are several versions of TCS-in-RDF floating around. I think Roger's is an RDFS document. Rob Gales created an OWL-DL version for the GBIF demonstration project that Jessie and he worked on this summer. Early this year I created a partial implementation in OWL-Lite (that I've since discarded). While each one is "TCS", they're all substantially different in the way they represent TCS classes and properties, in part because the different representation languages (RDFS, OWL-Lite, OWL-DL) have different language features and expressive powers.

It would be nice if we could devise one standard RDF implementation of TCS. I don't care which one we use, but I would like to narrow the field to one so we can get the details sorted out. I'm talking about details like resolvable namespaces, typed versus non-typed literals, the use of anonymous resources, and serialization issues like the references-to-resources problem that cropped up in the IPNI example Peter Hollas is working from. These details are quite important because certain decisions taken here can effect the larger network of linked data providers.

Take the anonymous resources issues: If you look at the example Peter cites, the typifiedBy property refers to an anonymous NomenclaturalType that has a dc:title. Within a single data provider, this is no big deal because many different data objects can refer to this NomenclaturalType. However the use of anonymous resources can cause big problems when you try to harvest and index the data from multiple providers. It also causes problems for the caching use case.

It would be nice to discuss some of these things, perhaps within TAG.

-Steve

Sally Hinchcliffe wrote:

...
Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

Roger Hyam

09:39

I suggest we move the discussion to the TAG list as it is wider than LSIDs. Please join the TAG list Sally - we would appreciate your thoughts http://lists.tdwg.org/mailman/listinfo/tdwg-tag I have set up a wiki page here: http://wiki.tdwg.org/twiki/bin/view/TAG/EstablishingLsidVocabularies where I am brain storming the issues. Everyone is invited to add ideas/issues to this page. I would like to just take a series of decisions so that we have a solution to these issues as soon as possible but I am aware that we must move forward as a group. This is slower but has to be done. The situation with TCS/RDF isn't as bad as it might seem. Rob Gales version is based very heavily on the RDFS version that I did earlier in the year for GUID-2 (which IPNI and IF use) so provided Steve keeps his version buried we really only have two versions in the wild and they are really two ports of the same thing. If I could have answers to two questions I could probably get the TCS/RDF vocabularies set up in the right place pretty damn quickly. * How do we divide the namespace? - I suggested an approach earlier. * What language do we use for the RDF vocabularies in metadata responses? RDFS/OWL I have run out of time here today so I will not have a chance to add anything more to the wiki but will look out the RDFS and OWL versions of TCS Names part tomorrow and try and provide a commentary on any differences. More later, Roger Sally Hinchcliffe wrote:

...

Hi Steve/all

I agree that we should start trying to nail down the TCS/RDF more firmly. I know that Roger wants to discuss this at TDWG but I think some firm proposals are needed to take to the meeting - and some robust examples as well, for those of us who struggle with RDFS, OWL et al and need to see actual implementations to properly engage with the proposals. Otherwise, as you say, we will end up with some de facto standards based on LSID implementations which were never intended to be anything more than proof of concepts.

Now is a good time to tackle this for us, because we're re-opening the LSID implementation that we did for IPNI and trying to bring it up to something that will be properly releasable and can be more widely publicised. As part of that work it will be easy to make any changes to the metadata that's produced. Once released, pressures of time and other projects will make it hard to re-open the work and change the metadata output.

Is TAG the best place to discuss this? I'm not on that mailing list but I can probably join if the discussion is going to be over there.

Let me know

Sally

...
Hi Sally,

No problem. The task was to create a prototype LSID resolver, not to solve all the KR issues surrounding taxon concepts. However, I do think it's time we start talking about these issues. I worry that the prototype resolvers we set up will become de facto reference implementations, that other people will start to construct services modeled on the prototypes without us ever having gone back to talk about what worked and what didn't.

I know there are several versions of TCS-in-RDF floating around. I think Roger's is an RDFS document. Rob Gales created an OWL-DL version for the GBIF demonstration project that Jessie and he worked on this summer. Early this year I created a partial implementation in OWL-Lite (that I've since discarded). While each one is "TCS", they're all substantially different in the way they represent TCS classes and properties, in part because the different representation languages (RDFS, OWL-Lite, OWL-DL) have different language features and expressive powers.

It would be nice if we could devise one standard RDF implementation of TCS. I don't care which one we use, but I would like to narrow the field to one so we can get the details sorted out. I'm talking about details like resolvable namespaces, typed versus non-typed literals, the use of anonymous resources, and serialization issues like the references-to-resources problem that cropped up in the IPNI example Peter Hollas is working from. These details are quite important because certain decisions taken here can effect the larger network of linked data providers.

Take the anonymous resources issues: If you look at the example Peter cites, the typifiedBy property refers to an anonymous NomenclaturalType that has a dc:title. Within a single data provider, this is no big deal because many different data objects can refer to this NomenclaturalType. However the use of anonymous resources can cause big problems when you try to harvest and index the data from multiple providers. It also causes problems for the caching use case.

It would be nice to discuss some of these things, perhaps within TAG.

-Steve

Sally Hinchcliffe wrote:

...
Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

Sally Hinchcliffe

28 Sep 28 Sep

00:38

OK - have joined. Whoopee, more email ...

...

I suggest we move the discussion to the TAG list as it is wider than LSIDs.

Please join the TAG list Sally - we would appreciate your thoughts

http://lists.tdwg.org/mailman/listinfo/tdwg-tag

I have set up a wiki page here:

http://wiki.tdwg.org/twiki/bin/view/TAG/EstablishingLsidVocabularies

where I am brain storming the issues. Everyone is invited to add ideas/issues to this page.

I would like to just take a series of decisions so that we have a solution to these issues as soon as possible but I am aware that we must move forward as a group. This is slower but has to be done.

The situation with TCS/RDF isn't as bad as it might seem. Rob Gales version is based very heavily on the RDFS version that I did earlier in the year for GUID-2 (which IPNI and IF use) so provided Steve keeps his version buried we really only have two versions in the wild and they are really two ports of the same thing.

If I could have answers to two questions I could probably get the TCS/RDF vocabularies set up in the right place pretty damn quickly. * How do we divide the namespace? - I suggested an approach earlier. * What language do we use for the RDF vocabularies in metadata responses? RDFS/OWL

I have run out of time here today so I will not have a chance to add anything more to the wiki but will look out the RDFS and OWL versions of TCS Names part tomorrow and try and provide a commentary on any differences.

More later,

Roger

Sally Hinchcliffe wrote:

...
Hi Steve/all

I agree that we should start trying to nail down the TCS/RDF more firmly. I know that Roger wants to discuss this at TDWG but I think some firm proposals are needed to take to the meeting - and some robust examples as well, for those of us who struggle with RDFS, OWL et al and need to see actual implementations to properly engage with the proposals. Otherwise, as you say, we will end up with some de facto standards based on LSID implementations which were never intended to be anything more than proof of concepts.

Now is a good time to tackle this for us, because we're re-opening the LSID implementation that we did for IPNI and trying to bring it up to something that will be properly releasable and can be more widely publicised. As part of that work it will be easy to make any changes to the metadata that's produced. Once released, pressures of time and other projects will make it hard to re-open the work and change the metadata output.

Is TAG the best place to discuss this? I'm not on that mailing list but I can probably join if the discussion is going to be over there.

Let me know

Sally

...
Hi Sally,

No problem. The task was to create a prototype LSID resolver, not to solve all the KR issues surrounding taxon concepts. However, I do think it's time we start talking about these issues. I worry that the prototype resolvers we set up will become de facto reference implementations, that other people will start to construct services modeled on the prototypes without us ever having gone back to talk about what worked and what didn't.

I know there are several versions of TCS-in-RDF floating around. I think Roger's is an RDFS document. Rob Gales created an OWL-DL version for the GBIF demonstration project that Jessie and he worked on this summer. Early this year I created a partial implementation in OWL-Lite (that I've since discarded). While each one is "TCS", they're all substantially different in the way they represent TCS classes and properties, in part because the different representation languages (RDFS, OWL-Lite, OWL-DL) have different language features and expressive powers.

It would be nice if we could devise one standard RDF implementation of TCS. I don't care which one we use, but I would like to narrow the field to one so we can get the details sorted out. I'm talking about details like resolvable namespaces, typed versus non-typed literals, the use of anonymous resources, and serialization issues like the references-to-resources problem that cropped up in the IPNI example Peter Hollas is working from. These details are quite important because certain decisions taken here can effect the larger network of linked data providers.

Take the anonymous resources issues: If you look at the example Peter cites, the typifiedBy property refers to an anonymous NomenclaturalType that has a dc:title. Within a single data provider, this is no big deal because many different data objects can refer to this NomenclaturalType. However the use of anonymous resources can cause big problems when you try to harvest and index the data from multiple providers. It also causes problems for the caching use case.

It would be nice to discuss some of these things, perhaps within TAG.

-Steve

Sally Hinchcliffe wrote:

...
Hi Steve /all

We took that syntax straight from Roger's RDF/TCS examples. I think Roger was going to do more work on tidying up those sorts of loose ends. I have to admit that my knowledge of RDF and particularly RDFS is pretty superficial

We can switch to either the shorter format or the safer fully qualified URI - what do people think would be better?

Sally

...
By the way, the IPNI example you cite has an error:

<tn:nomenclaturalCode rdf:resource="&tn;#botanical" />

Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like:

<tn:nomenclaturalCode rdf:resource="tn:botanical" />

However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like:

<tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" />

-Steve

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

--

------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------

*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk

peter.hollas＠thomson.com

26 Sep 26 Sep

08:37

New subject: [Tdwg-guid] Which TCS/RDF?

Hi, Could someone advise on which TCS/RDF ontology would be the best to implement for metadata coming from nomenclatural sources such as ZooBank? I've yet to come across anything other than a TCS XML Schema in public circulation. I've decided to do away with using the Jena API altogether for returning metadata responses from ZooBank; it seems to be rather heavy handed approach to returning what is basically a simple structured text document. A much more flexible way to go is with a page templating system, especially when the schemata are in constant flux. A Spring Framework MVC/JSTL endpoint will allow for schemata changes to be implemented without recompilation. The LSID metadata class will just act as a façade/decorator to the templating system. Many thanks, Peter.

Steve Perry

09:12

New subject: [Tdwg-guid] Which TCS/RDF?

Hi Peter, I was just writing a response to your question about Jena, but I'll scrap it now. Unfortunately there's no standard representation for TCS in RDF at this time. Hopefully there will be one in the near future. Jena is well suited to consuming RDF metadata, but I agree with you that it's a bit heavyweight if all you want to do is produce many instances of a single class. Visualizing the problem as one of templating and using Spring MVC is a neat approach. I use Spring mostly for dependency injection and hadn't considered that it might be used to isolate this kind of software from changes in the schema. -Steve peter.hollas@thomson.com wrote:

...

Hi,

Could someone advise on which TCS/RDF ontology would be the best to implement for metadata coming from nomenclatural sources such as ZooBank? I've yet to come across anything other than a TCS XML Schema in public circulation.

I've decided to do away with using the Jena API altogether for returning metadata responses from ZooBank; it seems to be rather heavy handed approach to returning what is basically a simple structured text document. A much more flexible way to go is with a page templating system, especially when the schemata are in constant flux. A Spring Framework MVC/JSTL endpoint will allow for schemata changes to be implemented without recompilation. The LSID metadata class will just act as a façade/decorator to the templating system.

Many thanks, Peter.

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

peter.hollas＠thomson.com

02:37

Hi Steve, I can't seem to get your example to give me <tn:TaxonName rdf:about="lsid info">, it just seems to add a property. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="urn:lsid:zoobank.org:taxon:123"> <rdf:type>http://tdwg.org/2006/03/12/TaxonNames/TaxonName</rdf:type> </rdf:Description> </rdf:RDF>

...

...
...
From my code...

ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); Model model = ModelFactory.createDefaultModel(); Resource res = model.createResource(lsid.toString()); res.addProperty(model.createProperty("http://www.w3.org/1999/02/22-rdf-s yntax-ns#type"), TN_NS + "TaxonName"); model.write(byteStream, "RDF/XML-ABBREV"); return new MetadataResponse(new ByteArrayInputStream(byteStream.toByteArray()),null,MetadataResponse.RDF _FORMAT); Many thanks, Peter. -----Original Message----- From: Steve Perry [mailto:smperry@ku.edu] Sent: 25 September 2006 20:54 To: Hollas, Peter (TS UK); tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Jena examples? Hi Peter, Ben is right. To get a TaxonName element (instead of an rdf:Description), add the following line of code: res.addProperty(model.createProperty(" http://www.w3.org/1999/02/22-rdf-syntax-ns#type"), TN_NS + "TaxonName"); This should give you output like : <rdf:RDF xmlns:j.0="http://tdwg.org/2006/03/12/TaxonNames" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <j.0:TaxonName rdf:about="http://test"> <j.0:testProperty>test</j.0:testProperty> </j.0:TaxonName> </rdf:RDF> XML namespace prefixes are not preserved in RDF so Jena will supply its own prefixes when serializing to one of the XML formats (hence the j.0 instead of tn). By the way, the IPNI example you cite has an error: <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> Many RDF/XML parsers will see &tn; as an entity which cannot be resolved. Since I don't have a copy of the ontology (and http://tdwg.org/2006/03/12/TaxonNames does not resolve), I can only take a guess that it should look something like: <tn:nomenclaturalCode rdf:resource="tn:botanical" /> However, using XML namespace prefixes in resource references inside RDF/XML documents tends to cause problems because not all RDF/XML parsers are smart enough to dereference the namespace prefix and build a fully-qualified resource URI. A safer form of the above would be the fully qualified resource URI which looks like: <tn:nomenclaturalCode rdf:resource="http://tdwg.org/2006/03/12/TaxonNames/botanical" /> -Steve Benjamin H Szekely wrote:

...

Hi Peter, In XML/ABBREV mode, the serializer uses rdf:Types for element names. Since your example doesn't have an rdf:Type, it uses the default Jena element names.

- Ben

Ben Szekely IBM Software Engineer Advanced Internet Technology, Cambridge, MA bhszekel@us.ibm.com

tdwg-guid-bounces@mailman.nhm.ku.edu wrote on 09/25/2006 11:02:30 AM:

...
Hi Everyone,

Does anyone have/or know of any good examples of using Jena to output RDF metadata with non-typical namespaces such as TCS/RDF?

Currently I have the following code for my metadata method(I know it's wrong!).

String TN_NS = "http://tdwg.org/2006/03/12/TaxonNames/"; ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); Model model = ModelFactory.createDefaultModel(); Resource res = model.createResource("http://test"); res.addProperty(model.createProperty(TN_NS,"testProperty"),"test");

model.write(byteStream, "RDF/XML-ABBREV"); return new MetadataResponse(new

ByteArrayInputStream(byteStream.toByteArray()),null,MetadataResponse.RDF

...
_FORMAT);

...which gives me.

<rdf:RDF xmlns:j.0="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://test"> <j.0:testProperty >test</j.0:testProperty> </rdf:Description> </rdf:RDF>

...whereas I could really do with something along the lines of the IPNI metadata.

<rdf:RDF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:tn="http://tdwg.org/2006/03/12/TaxonNames/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

 <tn:TaxonName rdf:about="urn:lsid:ipni.org:names:30000959-2:1.1.2.1"> <tn:nomenclaturalCode rdf:resource="&tn;#botanical" /> <dc:title>Amaryllidaceae J.St.-Hil.</dc:title> <dcterms:created>2004-01-20 00:00:00.0</dcterms:created> <dcterms:modified>2005-06-23 15:45:33.0</dcterms:modified> <tn:rankString>fam.</tn:rankString> <tn:nameComplete>Amaryllidaceae</tn:nameComplete> <tn:uninomial>Amaryllidaceae</tn:uninomial> <tn:authorship>J.St.-Hil.</tn:authorship> <tn:publishedIn>Expos. Fam. Nat. 1: 134. 1805 [Feb-Apr 1805]</tn:publishedIn> <tn:year>1805</tn:year> <tn:typifiedBy> <tn:NomenclaturalType> <dc:title>Amaryllis Linnaeus, nom. cons.</dc:title> </tn:NomenclaturalType> </tn:typifiedBy> </tn:TaxonName>

I think the main problem I have is creating resources that aren't defined as vocabularies within Jena already. I can't seem to find a way to create that <tn:TaxonName root resource element.

I'm starting to think that just outputting vanilla text could be the best option here, or am I barking up the wrong tree?

Many thanks, Peter.

Peter Hollas MSc BSc(hons) (Peter.Hollas@thomson.com) Software Engineer /Systems Administrator Thomson Zoological Innovation Centre York Science Park Heslington York YO10 5DG

Tel: 01904-435113 Fax: 01904-435114

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

------------------------------------------------------------------------

...

_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

6782

Age (days ago)

6790

Last active (days ago)

List overview

Download

16 comments

7 participants

participants (7)

Benjamin H Szekely
Gregor Hagedorn
peter.hollas＠thomson.com
Roderic Page
Roger Hyam
Sally Hinchcliffe
Steve Perry