Dear Roger,

On 30 Nov 2007, at 09:48, Roger Hyam wrote:


Sorry I have come into this thread late. It looks really exciting stuff.

The one thing I pick up on at the end was Rod offering to come up with a policy for handling Handles in the RDF returned by LSIDs ;)

:-O


I had previously presumed DOI and Handles could be treated as URIs and therefore just used as rdf resources like this:

<dc:identifier rdf:resource="doi:10.1007/BF02725185" />

or in our current example

<tdwg:parentPublication rdf:resource="doi:10.1007/BF02725185" />


My off-the-cuff comment was inspired by seeing a tag <doi> in the ontology. If you have DOIs, why not Handles, or any other GUID? Personally I would just use <dc:identifier>



Now if we can't treat DOI/Handle as URIs I guess we could treat them as strings and just use the dc identifier (possibly the worst case scenario - this could be an ISBN, local catalogue code or anything).

<dc:identifier>doi:10.1007/BF02725185</dc:identifier>

I'm leaning towards this (or a standard representation, see below) because if there are multiple DOI resolvers (and there are a couple out there) and people use different ones, we need to decide whether the identifier is the same. My feeling is that RDF populated by links to URIs, while being Semantic Web savvy, will actually break because many of these URIs will be fragile, or there may be multiple URIs that point to the same thing. I think we may be heading for an unholy mess of URIs, especially as we have lots of distributed databases serving related information.

Put another way, if I aggregate a bunch of RDF from multiple sources and want to find anything related to a paper, it would be much easier to find things that refer to the same DOI in the same way.


I presume support for DOI/Handle would have to be added to any semantic web clients to understand any of these.

Would it be better to take the approach we have with LSIDs where we always cite a proxied version?

What if the proxy goes away?

Is there an expectation that the proxied version will serve RDF?



What do other people do?


Connotea uses the INFO URI scheme, which is also used by OpenURL. It's a bit ugly, but at least is standardised. In this example the DOI would be

info:doi/10.1007/BF02725185

This scheme also supports Handles and SICIs, as well and GenBank sequences, and a few other identifiers (see http://info-uri.info/registry/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc).



We could add a global property so that any object could have one or hasHandle value.

Any solution other than making Handles behave like regular URIs means non-compliance with W3C recommendations and the need for specialist client software I think.

What do you think? Any ideas?


There seem to be at least two ways to approach this problem:

1. Write identifiers as strings in a canonical form likely to be shared by other databases, and leave it to client software to know how to handle them

2. Have a global proxy server for all classes of identifiers that we might have, and have this return RDF (and do 303 redirects, or whatever the Semantic Web community settle on). This was the motivation behind my experiments with http://bioguid.info .

Hope this makes sense...

Regards

Rod





Roger



On 30 Nov 2007, at 08:10, Roderic Page wrote:

Dear Rich,

Pyle, R.L. 2002. Pomacanthidae. pp. 3266-3286. In: Carpenter, K.E. and V.E.
Niem (Eds.) Living marine resources of the western central Pacific.  Volume
5.  Bony fishes part 3 (Menidae to Pomacentridae). Food and Agriculture
Organization of the United Nations (FAO), Rome. i-iv+2791-3379.

...there are at least three "levels" of publication:

1) Pyle, R.L. 2002. Pomacanthidae. pp. 3266-3286.

2) Carpenter, K.E. and V.E. Niem (Eds.) Living marine resources of the
western central Pacific.  Volume 5.  Bony fishes part 3 (Menidae to
Pomacentridae). Food and Agriculture Organization of the United Nations
(FAO), Rome. i-iv+2791-3379.

3) Carpenter, K.E. and V.E. Niem (Eds.) FAO species identification guide for
fishery purposes: Living marine resources of the western central Pacific.
Vols. 1-6. Food and Agriculture Organization of the United Nations (FAO),
Rome. xl+4218 pp.

Granted, some might argue that number 3 is not really a separate citable
"unit", but given that it is a single page number series, I would argue that
it is.

So...if we wanted to cite specifically Pyle 2002, the parentCitationString
might simply be the contents of of #2 above; or it might have two nested
parents (a parent, and agrand parent).

As I said before, I'm leaning towards the simpler solution.

Isn't this over engineering things a little? Don't you just need a GUID for the chapter (1), and a GUID for the book (2)? For the latter we have an ISBN (9251043879), so there's already a GUID for that.  I don't think we gain much from (3). Furthermore, if we use the ISBN as the GUID we know the items are linked because they share the same publisher code.


As for the ZooBank LSID resolver -- at this point in time conformance trumps
optimization (so we can all get off our collective arses and serve content)
-- so I'm just woking with what's up there now.  If I'm resolving LSIDs, and
I'm doing so because of TDWG standards, then I ought to conform to existing
TDWG standards on vocabularies -- right or wrong.  What we need to do is
update the TDWG standards on this (which the St. Lousi meeting was
attempting to accomplish), so we can conform *and* optimize!


The TDWG standard should need to be expanded to handle other kinds of GUIDs, notably Handles, which are being widely used in Digital Repositories.

Regards

Rod



Aloha,
Rich




----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page@bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
iChat: aim://rodpage1962
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com



_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid



----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page@bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
iChat: aim://rodpage1962
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com