[tdwg-content] Unintentionally introducing classes. [SEC=UNCLASSIFIED]

Tue Nov 2 02:44:12 CET 2010

On 01/11/2010, at 8:51 AM, Bob Morris wrote:

> There are surprises even in the simplest use of RDFS and formalisms about classes. I've previously whined about premature assignment of rdfs:domain while conceding (did I???) that it can sometimes make a designer's intention clearer to humans. Perhaps more startling is that type assignment automatically "creates" an rdfs:class if one was not already available, due to the formal semantics of rdf:type [3]. Thus, in an earlier posting, Paul Murray  has (unintentionally?) introduced a new class apni:TaxonName in 33407.rdf [2] via 
>     <rdf:type rdf:resource="http://biodiversity.org.au/voc/apni/APNI#TaxonName"/>

No, it was deliberate. We have done some work on our hosting here at our end, and as of last Friday the boa (biodiversity.org.au) vocabulary files are now available. http://biodiversity.org.au/voc/apni/APNI redirects to http://biodiversity.org.au/voc/apni/APNI.rdf, which Protege (for instance) understands.

We have created class hierarchies for our various objects: an APNI name is a BOA name is a TDWG name. As you have noticed, our individual name objects are explicitly declared both as TDWG names and also APNI names. Although the TDWG type is implied, I include it explicitly so that people can ignore our vocabulary if they wish when looking at our data. We have created "de novo" properties and named individuals for things in our data for which we could not find a suitable equivalent in the TDWG vocabulary, and these are available too, eg: http://www.biodiversity.org.au/voc/apni/NomenclaturalQualifierTerm .

We have gone through a similar exercise for the XML vocabularies: see http://www.biodiversity.org.au/afd.name/468562.xml . A xsi:schemaLocation attribute is included, allowing XML validators to find our schema files.

Of course, "deliberately" does not necessarily mean "a good idea" or "done correctly", but you have to start somewhere.

The nice thing about the semantic web is that you can in fact do this. All of our extra bits are identified with URIs, and the URIs all start with "http://biodiversity.org.au". At present, these extra bits mean nothing outside of the data here at BOA. A human could make sense of many of them, but many of the types, properties, and named individuals do not even contain titles and descriptions: as I am not a taxonomist myself, I have only the vaguest idea what the difference might be between "nom illeg" and "nom rej". Better to leave it blank. 

Of course, this is not a problem for the TDWG vocabularies, but that's because I am working the other way around: I was not trying to create a vocabulary that the general community could use, but to document an existing (albeit implied) one.  Our properties declare explicit domains. For well-discussed reasons, the TDWG vocabularies do not. But I don't think that those reasons (unintentional type declarations made by people using your terms) apply. Indeed - the reverse is almost the whole point. I don't think we *want* other people using biodiversity.org.au terms: their meanings potentially are idiosyncratic to the systems here (perhaps subtly) because they don't have proper descriptions - descriptions I am not able to supply.

Once there are standards we can back-fit our data, just as everyone else will back-fit theirs. But in the meantime, the data is out there. (You can work with it, if you wish, using our splendid JSON interface. But it's subject to change, I'm afraid.) Again, the nice thing about the semantic web is that you can do this - gradually pulling together the strands of meaning using a common vocabulary as that vocabulary is developed. It might become a bit of a wild west in some areas, but those areas are explicitly fenced in with URI prefixes.The key is that our object identifiers - the URIs and LSIDs for the taxa and names - will remain persistent. Over time, we can clarify, enrich, and correct what we say *about the things that those identifiers identify*. 

A serious problem that we are aware of is aggregators - systems holding copies of data and reasoning over vocabularies which we at a later stage fix. I don't know what to do about that - is seems to me that one of the problems in the semantic web is provenance and data ageing. How to you keep the whole thing from turning into mush? (Speaking of which: I would like to cryptographically sign our outgoing data with certificate issued by TDWG, which indicates that we are indeed the TDWG-approved source of data coming from the biodiversity.org.au LSID authority. But that's a whole new area.) Although we address many of these issues with oai-pmh.

So: yes, we have a custom, idiosyncratic vocabulary, we declare and use nonstandard types and properties, we declare owl:domain - but I believe it's been properly done at the "machine" level. At the higher level, it's a work-in-progress. It helps to have something concrete to discuss, I think. When I was discussing using the DwC properties and types in our RDF, of putting it out on the web, I was thinking of a timeframe of weeks, not years.

------
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. 

Please consider the environment before printing this email.

------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101102/81bcf8ad/attachment.html