[Tdwg-guid] DOIs and persistence -- when DOIs go bad [ Scanned for viruses ]

Paul Kirk p.kirk at cabi.org
Fri May 26 11:23:46 CEST 2006


My guess is that almost all the legacy data, specifically the references and not digital images/xml documents, are already digitized by someone somewhere and the challenge is herding these cats into a collaborative environment (? EDIT). It need either a big pile of cat food (a.k.a. cash) or a project (? BHL) which does it, and if you are not participating and/or collaborating you are left behind. *IF* the TDWG standard(s) for Serial Publications (BPH) and Books/Non-serials (TL2) were available electronically we would already have a 'set of numbers' for these sub-objects which could be spliced into exiting databases - that would be a big step forward i.m.h.o.
 
In CABI 'mycology' we already have ...
c. 60000 macroreferences, mostly to the systematic literature
c. 180000 microreferences in IF (from the 400000 names - the balance without references), 41000 linked to macroreferences and 8500 linked to the page images of the protologue
c. 60000 page images including 30000 for 'the' two major thesauri of the mycology systematic literature (1753 - 1930) - these just waiting to be OCR'd and parsed (anyone interested?? ... or with systems to do this automatically??)
 
We just need some 'structures' to work to and it will all rather rapidly fall together ... ever the optimist.
 
Paul
 
Dr Paul M. Kirk 
Biosystematist 
CABI Bioscience 
Bakeham Lane 
Egham 
Surrey TW20 9TY 
UK 
tel. (+44) (0)1491 829023, fax (+44) (0)1491 829100, email p.kirk at cabi.org 
www.cabi-bioscience.org; www.indexfungorum.org
 
________________________________

From: tdwg-guid-bounces at mailman.nhm.ku.edu on behalf of Sally Hinchcliffe
Sent: Fri 26/05/2006 09:12
To: Roderic Page
Cc: neil Thomson; tdwg-guid at mailman.nhm.ku.edu
Subject: Re: [Tdwg-guid] DOIs and persistence -- when DOIs go bad [ Scanned for viruses ]



Rod wrote:
...
> The DOI example above is particularly annoying for me, because it 
> relates to IPNI's LSIDs. One of my favourite plant taxa is Poissonia 
> heterantha (lsidres:urn:lsid:ipni.org:names:20012728-1). The IPNI 
> metadata for this name include the following:
>
> <tn:publishedIn>Syst. Bot. 28(2): 401 (2003). </tn:publishedIn>
>
> Wouldn't it be nice to have something like:
>
> <tn:publishedIn 
> rdf:resource="doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2" />
>
> Given a DOI, users can locate the article quickly and, via CrossRef, 
> extract metadata. The more taxonomic literature is associated with 
> GUIDs such as DOIs, Handles, and LSIDs, the better.

This somewhat relates to a cross conversation (sorry, fell off the
mailing list) we've been having with Neil Thomson about his BHL
proposal & it points up another social problem, this time with legacy
data

Even if the DOIs were working as advertised, what would it take to
get to the point where we could supplement 'Syst. Bot. 28(2): 401
(2003).' with "doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2" ?

This example is actually quite close to being doable
programmatically. 'Syst. Bot.' is a standardised publication
abbreviation in IPNI, and the collation is in a standard form too, so
if we had a source of DOIs tied to articles in Syst. Bot. we could
imagine writing a routine to programmatically import the relevant DOI
into the IPNI record.

We're chugging away with standardisation in IPNI and we're doing
updates at the moment that are cleaning up aspects of 10,000, 15,000,
up to 50,000 records at a time. But there are 1.5 million records to
do and we can still find (just from within Poissonia) citations like
'Adansonia, ix. (1870) 295. ' or 'Bol. Mus. Hist. Nat. Tucuman no. 6:
8. 1925 Hauman, in Kew Bull. 1925: 279. 1925 ' which look much less
tractable

60% of publications are not standardised in IPNI, and of the
standardised ones many of them predate DOIs (are there any moves
among journals to retrofit DOIs to older series?) Plus I've noticed
in the IPNI and other editors at Kew a strange reluctance to type in
strings that look like
'10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2'
- I don't know whether barcode reading technology could help here...

I think that any solution to standardising citations of literature
must inevitably include (if not be confined to) some use of the
infrastructure that DOIs represent. But we will also have to come up
with a plan that will help us deal with the vast weight of legacy
data that databases like IPNI carry

Sally*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk


_______________________________________________
TDWG-GUID mailing list
TDWG-GUID at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060526/6f38095c/attachment.html 


More information about the tdwg-tag mailing list