Sally wrote
Even if the DOIs were working as advertised, what would it take to get to the point where we could supplement 'Syst. Bot. 28(2): 401 (2003).' with "doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2" ?
This example is actually quite close to being doable programmatically. 'Syst. Bot.' is a standardised publication abbreviation in IPNI, and the collation is in a standard form too, so if we had a source of DOIs tied to articles in Syst. Bot. we could imagine writing a routine to programmatically import the relevant DOI into the IPNI record.
CrossRef provide an interface to extract DOIs (see http://iphylo.blogspot.com/2006/05/crossrefs-openurl-resolver.html for an example). There's also a web form that you can use top play around with: http://www.crossref.org/guestquery/.
We're chugging away with standardisation in IPNI and we're doing updates at the moment that are cleaning up aspects of 10,000, 15,000, up to 50,000 records at a time. But there are 1.5 million records to do and we can still find (just from within Poissonia) citations like 'Adansonia, ix. (1870) 295. ' or 'Bol. Mus. Hist. Nat. Tucuman no. 6: 8. 1925 Hauman, in Kew Bull. 1925: 279. 1925 ' which look much less tractable
Hard work. There is a small literature on citation matching which may be relevant, e.g: http://citeseer.ist.psu.edu/pasula02identity.html and http://citeseer.ist.psu.edu/lawrence99autonomous.html.
60% of publications are not standardised in IPNI, and of the standardised ones many of them predate DOIs (are there any moves among journals to retrofit DOIs to older series?)
Some are. I've noticed some very old articles hosted by Springer have DOIs (sadly, I can't give a specific example). Of course you don't need to solely rely on DOIs. Handles should also be catered for. For example, the American Museum of Natural History provide full-text of current and back issues of their publications: http://digitallibrary.amnh.org/dspace/. Of course, they don't do plants, but if other institutions adopt the AMNH's approach of using DSpace, then handles are part of the mix. Some repositories use URLs.
Plus I've noticed in the IPNI and other editors at Kew a strange reluctance to type in strings that look like '10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2'
- I don't know whether barcode reading technology could help here...
Yeah, but there's this cool invention called "cut and paste" ;-) This can all be automated. Look at Connotea - you can highlight a DOI in your browser, click on a bookmarklet and hey presto, it's added to Connotea. Also, much of this would be done by scripts, web crawling, etc.
I think that any solution to standardising citations of literature must inevitably include (if not be confined to) some use of the infrastructure that DOIs represent. But we will also have to come up with a plan that will help us deal with the vast weight of legacy data that databases like IPNI carry
I think there are two things to be addressed here. The first is GUIDs for old literature that no publisher is every going to assign DOIs to (the publisher may no longer exist, the work is in the public domain, etc.).
Also, several modern publishers have not adopted DOIs but instead use fairly standard URLs. These include a lot of journals relevant to us (e.g., American Journal of Botany and other HighWire Press journals ).
I think major name databases such as IPNI would have to provide GUIDs for the literature the have. Then, map as much as possible to external GUIDs.
This raises the second issue, multiple GUIDs for publications. We have this already, with DOIs, URLs, JSTOR, and PubMed ids for the same papers.
It would be cool to have a GUID "broker" that, say could take a DOI and say "well, IPNI calls this reference xxxxxx, and MOBOT have it as yyyyyy, and there's a handle hdl:zzzzz for it, etc." It could also mimic CrossRef by taking bibliographic data and attempting to match that to an existing GUID. In otherwords, it would be an OpenURL resolver.
Regards
Rod
Sally*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com