[tdwg-content] Unintentionally introducing classes. [SEC=UNCLASSIFIED]

Sat Nov 6 14:53:01 CET 2010

Paul,
This morning I have been reading through some of the posts that came by 
too fast for me to fully process during the work week.  This one 
resonated particularly with me.

Paul Murray wrote:
> No, it was deliberate. We have done some work on our hosting here at 
> our end, and as of last Friday the boa (biodiversity.org.au 
> <http://biodiversity.org.au>) vocabulary files are now 
> available. http://biodiversity.org.au/voc/apni/APNI redirects 
> to http://biodiversity.org.au/voc/apni/APNI.rdf, which Protege (for 
> instance) understands.
>
> We have created class hierarchies for our various objects: an APNI 
> name is a BOA name is a TDWG name. As you have noticed, our individual 
> name objects are explicitly declared both as TDWG names and also APNI 
> names. Although the TDWG type is implied, I include it explicitly so 
> that people can ignore our vocabulary if they wish when looking at our 
> data. We have created "de novo" properties and named individuals for 
> things in our data for which we could not find a suitable equivalent 
> in the TDWG vocabulary, and these are available too, eg: 
> http://www.biodiversity.org.au/voc/apni/NomenclaturalQualifierTerm .
>
> We have gone through a similar exercise for the XML vocabularies: 
> see http://www.biodiversity.org.au/afd.name/468562.xml . A 
> xsi:schemaLocation attribute is included, allowing XML validators to 
> find our schema files.
>
> Of course, "deliberately" does not necessarily mean "a good idea" or 
> "done correctly", but you have to start somewhere.
Yes, you have to start somewhere.  That was the reason why I created the 
sernec: namespace (http://bioimages.vanderbilt.edu/rdf/terms#).  I've 
been trying out the terms there in my RDF.  I now know that some of them 
don't work very well and I'll create better ones.  But I wouldn't have 
known that if I hadn't tried to use them.
>
> The nice thing about the semantic web is that you can in fact do this. 
> All of our extra bits are identified with URIs, and the URIs all start 
> with "http://biodiversity.org.au". At present, these extra bits mean 
> nothing outside of the data here at BOA. A human could make sense of 
> many of them, but many of the types, properties, and named individuals 
> do not even contain titles and descriptions: as I am not a taxonomist 
> myself, I have only the vaguest idea what the difference might be 
> between "nom illeg" and "nom rej". Better to leave it blank. 
>
> Of course, this is not a problem for the TDWG vocabularies, but that's 
> because I am working the other way around: I was not trying to create 
> a vocabulary that the general community could use, but to document an 
> existing (albeit implied) one.  Our properties declare explicit 
> domains. For well-discussed reasons, the TDWG vocabularies do not. But 
> I don't think that those reasons (unintentional type declarations made 
> by people using your terms) apply. Indeed - the reverse is almost the 
> whole point. I don't think we *want* other people using 
> biodiversity.org.au <http://biodiversity.org.au> terms: their meanings 
> potentially are idiosyncratic to the systems here (perhaps subtly) 
> because they don't have proper descriptions - descriptions I am not 
> able to supply.
>
> Once there are standards we can back-fit our data, just as everyone 
> else will back-fit theirs. But in the meantime, the data is out there. 
> (You can work with it, if you wish, using our splendid JSON interface. 
> But it's subject to change, I'm afraid.) Again, the nice thing about 
> the semantic web is that you can do this - gradually pulling together 
> the strands of meaning using a common vocabulary as that vocabulary is 
> developed. It might become a bit of a wild west in some areas, but 
> those areas are explicitly fenced in with URI prefixes.The key is that 
> our object identifiers - the URIs and LSIDs for the taxa and names - 
> will remain persistent. Over time, we can clarify, enrich, and correct 
> what we say *about the things that those identifiers identify*.
Yes!  I don't intend for the URI for the image 
http://bioimages.vanderbilt.edu/baskauf/66921 to ever change, although 
its web page and the RDF representation may change (maybe a lot!).  
That's OK.  As you say, the data is out there.  Hopefully a consensus 
will evolve about what terms (properties) mean to everyone in the 
community and then my RDF will mean something to somebody else.  That's 
what set me off down the road to trying to get the Individual class as a 
part of DwC. 
>
> A serious problem that we are aware of is aggregators - systems 
> holding copies of data and reasoning over vocabularies which we at a 
> later stage fix. I don't know what to do about that - is seems to me 
> that one of the problems in the semantic web is provenance and data 
> ageing. How to you keep the whole thing from turning into mush? 
> (Speaking of which: I would like to cryptographically sign our 
> outgoing data with certificate issued by TDWG, which indicates that we 
> are indeed the TDWG-approved source of data coming from the 
> biodiversity.org.au <http://biodiversity.org.au> LSID authority. But 
> that's a whole new area.) Although we address many of these issues 
> with oai-pmh.
You aren't kidding. Once I started tracking down information about 
myself using OpenLink.  I discovered that there were all kinds of 
bizarre inferences that were being made about me in the LOD cloud.  Most 
of the wrong ones were made when someone tried to take non-RDF documents 
and translate them into triples.  But there is still the problem where 
bad triples get out into the cloud (e.g. errors).  You can fix them in 
the RDF that you are serving, but how do you make them "go away" when 
they are being held and used for reasoning by somebody else?  Or what if 
somebody somewhere makes the assertion

<foaf:Person 
rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me">
    <foaf:name>Donald Duck</foaf:name>
</foaf:Person>

Anybody can do that, so how do we certify metadata sources as "trusted" 
in our community?  The state of the LOD cloud at the moment reminds me 
of the early days of email and the Web, when it was reasonably "safe" to 
assume that users' intentions were good.  Then came viruses, trojans, 
phishing scams, etc.  If those kinds of things had been considered at 
the start of email and the Web and considered in its design, it would 
have been easier to prevent (or reduce) the evolution of nefarious uses 
of the Web.  Perhaps we should be thinking about that more now when we 
are in the early stages of designing for the "semantic web".
>
> So: yes, we have a custom, idiosyncratic vocabulary, we declare and 
> use nonstandard types and properties, we declare owl:domain - but I 
> believe it's been properly done at the "machine" level. At the higher 
> level, it's a work-in-progress. It helps to have something concrete to 
> discuss, I think. When I was discussing using the DwC properties and 
> types in our RDF, of putting it out on the web, I was thinking of a 
> timeframe of weeks, not years.
There has been the suggestion made by several people that we need a 
second kind of Darwin Core, an RDF recommendation that will allow for 
deep semantic reasoning.  That might be nice, but given the amount of 
discussion that it's taken just to come to an agreement about what a 
dwc:Individual should be, I think "years" would be an accurate estimate 
for that task.  What I personally really want is a recommendation for 
Darwin Core RDF "Lite".  That is, a quick (and possibly dirty for the 
time being) set of guidelines for using DwC terms AS THEY EXIST in RDF 
with only the minimal number of changes or additions necessary to get 
the job done.  THAT is something that I could see happening in a 
timeframe of weeks, not years.

Steve

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101106/7711256e/attachment.html