My guess is that almost all the legacy data, specifically the references and not digital images/xml documents, are already digitized by someone somewhere and the challenge is herding these cats into a collaborative environment (? EDIT). It need either a big pile of cat food (a.k.a. cash) or a project (? BHL) which does it, and if you are not participating and/or collaborating you are left behind. *IF* the TDWG standard(s) for Serial Publications (BPH) and Books/Non-serials (TL2) were available electronically we would already have a 'set of numbers' for these sub-objects which could be spliced into exiting databases - that would be a big step forward i.m.h.o.
In CABI 'mycology' we already have ... c. 60000 macroreferences, mostly to the systematic literature c. 180000 microreferences in IF (from the 400000 names - the balance without references), 41000 linked to macroreferences and 8500 linked to the page images of the protologue c. 60000 page images including 30000 for 'the' two major thesauri of the mycology systematic literature (1753 - 1930) - these just waiting to be OCR'd and parsed (anyone interested?? ... or with systems to do this automatically??)
We just need some 'structures' to work to and it will all rather rapidly fall together ... ever the optimist.
Paul
Dr Paul M. Kirk Biosystematist CABI Bioscience Bakeham Lane Egham Surrey TW20 9TY UK tel. (+44) (0)1491 829023, fax (+44) (0)1491 829100, email p.kirk@cabi.org www.cabi-bioscience.org; www.indexfungorum.org
________________________________
From: tdwg-guid-bounces@mailman.nhm.ku.edu on behalf of Sally Hinchcliffe Sent: Fri 26/05/2006 09:12 To: Roderic Page Cc: neil Thomson; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] DOIs and persistence -- when DOIs go bad [ Scanned for viruses ]
Rod wrote: ...
The DOI example above is particularly annoying for me, because it relates to IPNI's LSIDs. One of my favourite plant taxa is Poissonia heterantha (lsidres:urn:lsid:ipni.org:names:20012728-1). The IPNI metadata for this name include the following:
tn:publishedInSyst. Bot. 28(2): 401 (2003). </tn:publishedIn>
Wouldn't it be nice to have something like:
<tn:publishedIn rdf:resource="doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2" />
Given a DOI, users can locate the article quickly and, via CrossRef, extract metadata. The more taxonomic literature is associated with GUIDs such as DOIs, Handles, and LSIDs, the better.
This somewhat relates to a cross conversation (sorry, fell off the mailing list) we've been having with Neil Thomson about his BHL proposal & it points up another social problem, this time with legacy data
Even if the DOIs were working as advertised, what would it take to get to the point where we could supplement 'Syst. Bot. 28(2): 401 (2003).' with "doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2" ?
This example is actually quite close to being doable programmatically. 'Syst. Bot.' is a standardised publication abbreviation in IPNI, and the collation is in a standard form too, so if we had a source of DOIs tied to articles in Syst. Bot. we could imagine writing a routine to programmatically import the relevant DOI into the IPNI record.
We're chugging away with standardisation in IPNI and we're doing updates at the moment that are cleaning up aspects of 10,000, 15,000, up to 50,000 records at a time. But there are 1.5 million records to do and we can still find (just from within Poissonia) citations like 'Adansonia, ix. (1870) 295. ' or 'Bol. Mus. Hist. Nat. Tucuman no. 6: 8. 1925 Hauman, in Kew Bull. 1925: 279. 1925 ' which look much less tractable
60% of publications are not standardised in IPNI, and of the standardised ones many of them predate DOIs (are there any moves among journals to retrofit DOIs to older series?) Plus I've noticed in the IPNI and other editors at Kew a strange reluctance to type in strings that look like '10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2' - I don't know whether barcode reading technology could help here...
I think that any solution to standardising citations of literature must inevitably include (if not be confined to) some use of the infrastructure that DOIs represent. But we will also have to come up with a plan that will help us deal with the vast weight of legacy data that databases like IPNI carry
Sally*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid