[Tdwg-guid] DOIs and persistence -- when DOIs go bad

Fri May 26 11:37:45 CEST 2006

Sally wrote

> Even if the DOIs were working as advertised, what would it take to
> get to the point where we could supplement 'Syst. Bot. 28(2): 401
> (2003).' with "doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2" ?
>
> This example is actually quite close to being doable
> programmatically. 'Syst. Bot.' is a standardised publication
> abbreviation in IPNI, and the collation is in a standard form too, so
> if we had a source of DOIs tied to articles in Syst. Bot. we could
> imagine writing a routine to programmatically import the relevant DOI
> into the IPNI record.

CrossRef provide an interface to extract DOIs (see  
http://iphylo.blogspot.com/2006/05/crossrefs-openurl-resolver.html for  
an example). There's also a web form that you can use top play around  
with: http://www.crossref.org/guestquery/.

>
> We're chugging away with standardisation in IPNI and we're doing
> updates at the moment that are cleaning up aspects of 10,000, 15,000,
> up to 50,000 records at a time. But there are 1.5 million records to
> do and we can still find (just from within Poissonia) citations like
> 'Adansonia, ix. (1870) 295. ' or 'Bol. Mus. Hist. Nat. Tucuman no. 6:
> 8. 1925 Hauman, in Kew Bull. 1925: 279. 1925 ' which look much less
> tractable

Hard work. There is a small literature on citation matching which may  
be relevant, e.g: http://citeseer.ist.psu.edu/pasula02identity.html and  
http://citeseer.ist.psu.edu/lawrence99autonomous.html.

>
> 60% of publications are not standardised in IPNI, and of the
> standardised ones many of them predate DOIs (are there any moves
> among journals to retrofit DOIs to older series?)

Some are. I've noticed some very old articles hosted by Springer have  
DOIs (sadly, I can't give a specific example). Of course you don't need  
to solely rely on DOIs. Handles should also be catered for. For  
example, the American Museum of Natural History provide full-text of  
current and back issues of their publications:  
http://digitallibrary.amnh.org/dspace/. Of course, they don't do  
plants, but if other institutions adopt the AMNH's approach of using  
DSpace, then handles are part of the mix. Some repositories use URLs.

> Plus I've noticed
> in the IPNI and other editors at Kew a strange reluctance to type in
> strings that look like
> '10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2'
> - I don't know whether barcode reading technology could help here...

Yeah, but there's this cool invention called "cut and paste" ;-)  This  
can all be automated. Look at Connotea - you can highlight a DOI in  
your browser, click on a bookmarklet and hey presto, it's added to  
Connotea. Also, much of this would be done by scripts, web crawling,  
etc.

>
> I think that any solution to standardising citations of literature
> must inevitably include (if not be confined to) some use of the
> infrastructure that DOIs represent. But we will also have to come up
> with a plan that will help us deal with the vast weight of legacy
> data that databases like IPNI carry

I think there are two things to be addressed here. The first is GUIDs  
for old literature that no publisher is every going to assign DOIs to  
(the publisher may no longer exist, the work is in the public domain,  
etc.).

Also, several modern publishers have not adopted DOIs but instead use  
fairly standard URLs. These include a lot of journals relevant to us  
(e.g., American Journal of Botany and other HighWire Press journals ).

I think major name databases such as IPNI would have to provide GUIDs  
for the literature the have. Then, map as much as possible to external  
GUIDs.

This raises the second issue, multiple GUIDs for publications. We have  
this already, with DOIs, URLs, JSTOR, and PubMed ids for the same  
papers.

It would be cool to have a GUID "broker" that, say could take a DOI and  
say "well, IPNI calls this reference xxxxxx, and MOBOT have it as  
yyyyyy, and there's a handle hdl:zzzzz for it, etc."  It could also  
mimic CrossRef by taking bibliographic data and attempting to match  
that to an existing GUID. In otherwords, it would be an OpenURL  
resolver.

Regards

Rod

>
> Sally*** Sally Hinchcliffe
> *** Computer section, Royal Botanic Gardens, Kew
> *** tel: +44 (0)20 8332 5708
> *** S.Hinchcliffe at rbgkew.org.uk
>
>
>
------------------------------------------------------------------------ 
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com

Send instant messages to your online friends http://uk.messenger.yahoo.com