Paul,
This morning I have been reading through some of the posts that came by too fast for me to fully process during the work week. This one resonated particularly with me.

Paul Murray wrote:

No, it was deliberate. We have done some work on our hosting here at our end, and as of last Friday the boa (biodiversity.org.au) vocabulary files are now available. http://biodiversity.org.au/voc/apni/APNI redirects to http://biodiversity.org.au/voc/apni/APNI.rdf, which Protege (for instance) understands.

We have created class hierarchies for our various objects: an APNI name is a BOA name is a TDWG name. As you have noticed, our individual name objects are explicitly declared both as TDWG names and also APNI names. Although the TDWG type is implied, I include it explicitly so that people can ignore our vocabulary if they wish when looking at our data. We have created "de novo" properties and named individuals for things in our data for which we could not find a suitable equivalent in the TDWG vocabulary, and these are available too, eg: http://www.biodiversity.org.au/voc/apni/NomenclaturalQualifierTerm .

We have gone through a similar exercise for the XML vocabularies: see http://www.biodiversity.org.au/afd.name/468562.xml . A xsi:schemaLocation attribute is included, allowing XML validators to find our schema files.

Of course, "deliberately" does not necessarily mean "a good idea" or "done correctly", but you have to start somewhere.

Yes, you have to start somewhere. That was the reason why I created the sernec: namespace (http://bioimages.vanderbilt.edu/rdf/terms#). I've been trying out the terms there in my RDF. I now know that some of them don't work very well and I'll create better ones. But I wouldn't have known that if I hadn't tried to use them.

The nice thing about the semantic web is that you can in fact do this. All of our extra bits are identified with URIs, and the URIs all start with "http://biodiversity.org.au". At present, these extra bits mean nothing outside of the data here at BOA. A human could make sense of many of them, but many of the types, properties, and named individuals do not even contain titles and descriptions: as I am not a taxonomist myself, I have only the vaguest idea what the difference might be between "nom illeg" and "nom rej". Better to leave it blank.

Of course, this is not a problem for the TDWG vocabularies, but that's because I am working the other way around: I was not trying to create a vocabulary that the general community could use, but to document an existing (albeit implied) one. Our properties declare explicit domains. For well-discussed reasons, the TDWG vocabularies do not. But I don't think that those reasons (unintentional type declarations made by people using your terms) apply. Indeed - the reverse is almost the whole point. I don't think we *want* other people using biodiversity.org.au terms: their meanings potentially are idiosyncratic to the systems here (perhaps subtly) because they don't have proper descriptions - descriptions I am not able to supply.

Once there are standards we can back-fit our data, just as everyone else will back-fit theirs. But in the meantime, the data is out there. (You can work with it, if you wish, using our splendid JSON interface. But it's subject to change, I'm afraid.) Again, the nice thing about the semantic web is that you can do this - gradually pulling together the strands of meaning using a common vocabulary as that vocabulary is developed. It might become a bit of a wild west in some areas, but those areas are explicitly fenced in with URI prefixes.The key is that our object identifiers - the URIs and LSIDs for the taxa and names - will remain persistent. Over time, we can clarify, enrich, and correct what we say *about the things that those identifiers identify*.

Yes! I don't intend for the URI for the image http://bioimages.vanderbilt.edu/baskauf/66921 to ever change, although its web page and the RDF representation may change (maybe a lot!). That's OK. As you say, the data is out there. Hopefully a consensus will evolve about what terms (properties) mean to everyone in the community and then my RDF will mean something to somebody else. That's what set me off down the road to trying to get the Individual class as a part of DwC.

A serious problem that we are aware of is aggregators - systems holding copies of data and reasoning over vocabularies which we at a later stage fix. I don't know what to do about that - is seems to me that one of the problems in the semantic web is provenance and data ageing. How to you keep the whole thing from turning into mush? (Speaking of which: I would like to cryptographically sign our outgoing data with certificate issued by TDWG, which indicates that we are indeed the TDWG-approved source of data coming from the biodiversity.org.au LSID authority. But that's a whole new area.) Although we address many of these issues with oai-pmh.

You aren't kidding. Once I started tracking down information about myself using OpenLink. I discovered that there were all kinds of bizarre inferences that were being made about me in the LOD cloud. Most of the wrong ones were made when someone tried to take non-RDF documents and translate them into triples. But there is still the problem where bad triples get out into the cloud (e.g. errors). You can fix them in the RDF that you are serving, but how do you make them "go away" when they are being held and used for reasoning by somebody else? Or what if somebody somewhere makes the assertion

<foaf:Person rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me">
<foaf:name>Donald Duck</foaf:name>
</foaf:Person>

Anybody can do that, so how do we certify metadata sources as "trusted" in our community? The state of the LOD cloud at the moment reminds me of the early days of email and the Web, when it was reasonably "safe" to assume that users' intentions were good. Then came viruses, trojans, phishing scams, etc. If those kinds of things had been considered at the start of email and the Web and considered in its design, it would have been easier to prevent (or reduce) the evolution of nefarious uses of the Web. Perhaps we should be thinking about that more now when we are in the early stages of designing for the "semantic web".

So: yes, we have a custom, idiosyncratic vocabulary, we declare and use nonstandard types and properties, we declare owl:domain - but I believe it's been properly done at the "machine" level. At the higher level, it's a work-in-progress. It helps to have something concrete to discuss, I think. When I was discussing using the DwC properties and types in our RDF, of putting it out on the web, I was thinking of a timeframe of weeks, not years.

There has been the suggestion made by several people that we need a second kind of Darwin Core, an RDF recommendation that will allow for deep semantic reasoning. That might be nice, but given the amount of discussion that it's taken just to come to an agreement about what a dwc:Individual should be, I think "years" would be an accurate estimate for that task. What I personally really want is a recommendation for Darwin Core RDF "Lite". That is, a quick (and possibly dirty for the time being) set of guidelines for using DwC terms AS THEY EXIST in RDF with only the minimal number of changes or additions necessary to get the job done. THAT is something that I could see happening in a timeframe of weeks, not years.

Steve

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu