<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0">
<TITLE>Re: [Tdwg-guid] DOIs and persistence -- when DOIs go bad [ Scanned for viruses ]</TITLE>
</HEAD>
<BODY>
<DIV id=idOWAReplyText23666 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>My guess is that almost all
the legacy data, specifically the references and not digital images/xml
documents, are already digitized by someone somewhere and the challenge is
herding these cats into a collaborative environment (? EDIT). It need either a
big pile of cat food (a.k.a. cash) or a project (? BHL) which does it, and
if you are not participating and/or collaborating you are left behind. *IF* the
TDWG standard(s) for Serial Publications (BPH) and Books/Non-serials (TL2) were
available electronically we would already have a 'set of numbers' for these
sub-objects which could be spliced into exiting databases - that would be a big
step forward i.m.h.o.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>In CABI 'mycology' we already
have ...</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>c. 60000 macroreferences, mostly to the
systematic literature</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>c. 180000 microreferences in IF (from the
400000 names - the balance without references), 41000 linked to
macroreferences and 8500 linked to the page images of the
protologue</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>c. 60000 page images including
30000 for 'the' two major thesauri of the mycology systematic literature
(1753 - 1930) - these just waiting to be OCR'd and parsed (anyone interested??
... or with systems to do this automatically??)</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>We just need some 'structures' to work to
and it will all rather rapidly fall together ... ever the optimist.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>Paul</FONT></DIV>
<DIV dir=ltr> </DIV></DIV>
<DIV id=idSignature96320 dir=ltr>
<DIV><FONT face=Arial color=#000000 size=2>
<DIV><SPAN lang=en-gb><FONT face=Tahoma size=2>Dr Paul M. Kirk</FONT></SPAN>
<BR><SPAN lang=en-gb><FONT face=Tahoma size=2>Biosystematist</FONT></SPAN>
<BR><SPAN lang=en-gb><FONT face=Tahoma size=2>CABI Bioscience</FONT></SPAN>
<BR><SPAN lang=en-gb><FONT face=Tahoma size=2>Bakeham Lane</FONT></SPAN>
<BR><SPAN lang=en-gb><FONT face=Tahoma size=2>Egham</FONT></SPAN> <BR><SPAN
lang=en-gb><FONT face=Tahoma size=2>Surrey TW20 9TY</FONT></SPAN> <BR><SPAN
lang=en-gb><FONT face=Tahoma size=2>UK</FONT></SPAN> </DIV>
<DIV><SPAN lang=en-gb><FONT face=Tahoma size=2>tel. (+44) (0)1491 829023, fax
(+44) (0)1491 829100, email p.kirk@cabi.org</FONT></SPAN> <BR><SPAN
lang=en-gb><FONT face=Tahoma size=2><A
href="http://www.cabi-bioscience.org">www.cabi-bioscience.org</A>; <A
href="http://www.indexfungorum.org">www.indexfungorum.org</A></FONT></SPAN></DIV>
<DIV></FONT> </DIV>
<DIV>
<DIV align=justify>
<HR tabIndex=-1>
</DIV>
<DIV align=justify><FONT face=Tahoma size=2><B>From:</B>
tdwg-guid-bounces@mailman.nhm.ku.edu on behalf of Sally
Hinchcliffe<BR><B>Sent:</B> Fri 26/05/2006 09:12<BR><B>To:</B> Roderic
Page<BR><B>Cc:</B> neil Thomson; tdwg-guid@mailman.nhm.ku.edu<BR><B>Subject:</B>
Re: [Tdwg-guid] DOIs and persistence -- when DOIs go bad [ Scanned for viruses
]<BR></FONT><BR></DIV></DIV></DIV></DIV>
<DIV>
<P><FONT size=2>Rod wrote:<BR>...<BR>> The DOI example above is particularly
annoying for me, because it <BR>> relates to IPNI's LSIDs. One of my
favourite plant taxa is Poissonia <BR>> heterantha
(lsidres:urn:lsid:ipni.org:names:20012728-1). The IPNI <BR>> metadata
for this name include the following:<BR>><BR>> <tn:publishedIn>Syst.
Bot. 28(2): 401 (2003). </tn:publishedIn><BR>><BR>> Wouldn't it be
nice to have something like:<BR>><BR>> <tn:publishedIn <BR>>
rdf:resource="doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2"
/><BR>><BR>> Given a DOI, users can locate the article quickly and, via
CrossRef, <BR>> extract metadata. The more taxonomic literature is
associated with <BR>> GUIDs such as DOIs, Handles, and LSIDs, the
better.<BR><BR>This somewhat relates to a cross conversation (sorry, fell off
the<BR>mailing list) we've been having with Neil Thomson about his
BHL<BR>proposal & it points up another social problem, this time with
legacy<BR>data<BR><BR>Even if the DOIs were working as advertised, what would it
take to<BR>get to the point where we could supplement 'Syst. Bot. 28(2):
401<BR>(2003).' with "doi:10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2"
?<BR><BR>This example is actually quite close to being
doable<BR>programmatically. 'Syst. Bot.' is a standardised
publication<BR>abbreviation in IPNI, and the collation is in a standard form
too, so<BR>if we had a source of DOIs tied to articles in Syst. Bot. we
could<BR>imagine writing a routine to programmatically import the relevant
DOI<BR>into the IPNI record.<BR><BR>We're chugging away with standardisation in
IPNI and we're doing<BR>updates at the moment that are cleaning up aspects of
10,000, 15,000,<BR>up to 50,000 records at a time. But there are 1.5 million
records to<BR>do and we can still find (just from within Poissonia) citations
like<BR>'Adansonia, ix. (1870) 295. ' or 'Bol. Mus. Hist. Nat. Tucuman no.
6:<BR>8. 1925 Hauman, in Kew Bull. 1925: 279. 1925 ' which look much
less<BR>tractable<BR><BR>60% of publications are not standardised in IPNI, and
of the<BR>standardised ones many of them predate DOIs (are there any
moves<BR>among journals to retrofit DOIs to older series?) Plus I've
noticed<BR>in the IPNI and other editors at Kew a strange reluctance to type
in<BR>strings that look
like<BR>'10.1600/0363-6445(2003)028[0387:PORLFR]2.0.CO;2'<BR>- I don't know
whether barcode reading technology could help here...<BR><BR>I think that any
solution to standardising citations of literature<BR>must inevitably include (if
not be confined to) some use of the<BR>infrastructure that DOIs represent. But
we will also have to come up<BR>with a plan that will help us deal with the vast
weight of legacy<BR>data that databases like IPNI carry<BR><BR>Sally*** Sally
Hinchcliffe<BR>*** Computer section, Royal Botanic Gardens, Kew<BR>*** tel: +44
(0)20 8332 5708<BR>***
S.Hinchcliffe@rbgkew.org.uk<BR><BR><BR>_______________________________________________<BR>TDWG-GUID
mailing list<BR>TDWG-GUID@mailman.nhm.ku.edu<BR><A
href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</A><BR></FONT></P></DIV>
</BODY>
</HTML>