[Tdwg-lit] PublicationBank - requirements evaluation
dhobern at gbif.org
Fri Mar 24 08:41:55 CET 2006
Thank you for making the connection between these two groups. I think it
would help if I explained (particularly for the TDWG-LIT group) what
questions are being addressed by the TDWG-GUID work under the general
heading of "PublicationBank".
During the first GUID workshop, we recognized that different classes of
information require (for want of a better term) different strengths of
GUIDs. It is a great help for us to be able to recognise that two
references are to the same piece of data because they use the same GUID to
reference it. Let me give some examples.
If I state that my taxon concept includes a specimen with LSID
urn:lsid:my.org:specimen:123 and someone else also includes the same LSID in
the list of specimens examined as part of their revision, it helps us to
make some firm deductions about shared material. It seems reasonable that
we will be able to associate identifiers with specimens in a way that
ensures that the vast majority of specimens can receive a single identifier,
meaning that all references to that identifier refer to that specimen and
that all references to that specimen use that identifier. This second part
is what I mean when I speak of a strong identifier.
Now consider the situation with taxon names. Many people are going to wish
to refer to the same names (or nomenclatural acts). It will clearly be
really valuable if we can work towards having a single GUID for each validly
published name, so that we can maximize the interconnectedness of our data.
If I say that refer to the name with the LSID urn:lsid:my.org:names:xyz and
that LSID has data or metadata indicating that it relates to Aus bus Jones,
2004, and you use the LSID urn:lsid:another.org:names.abc to refer to the
same Aus bus Jones, 2004, then we are still left with the same string
matching problems we have right now with names. It therefore seems sensible
to work with the nomenclators as the "preferred" issuers of LSIDs for taxon
names (recognising the gaps we have today for zoological names) and to
encourage a move to using those identifiers whenever we wish to provide a
secure reference to each name. (Of course this implies an urgent need for
tools and services to make this easy.)
Turning to taxon concepts, we had a long debate as to whether it was
plausible to try to enforce the same degree of preferred issuers for LSIDs
for taxon concepts. If I publish the first LSID-enabled revision of a
group, I may need to assign LSIDs to refer to many different taxon concepts.
Someone else databasing the taxonomy of the group will have a similar task.
Unless we manage a central easy-to-search registry for people quickly to
find out whether someone has already assigned an LSID to Aus bus Jones, 2004
sensu Smith, 2006, we will never be able to make any assumptions based on
the fact that I have used urn:lsid:my.org:concepts:123.1 and you have used
urn:lsid:my.org:concepts:abc.001. Even though the two identifiers are
different there is still a good chance that they may refer to the same
concept (expressed as name-according to-publication). It seems much more
reasonable instead to tackle the problem of getting really strong LSIDs for
names (through the nomenclators) and doing the same thing for the taxonomic
literature (through someone for whom we used the placeholder name
"PublicationBank"). Any concept LSID can resolve through its metadata to
two LSIDs, one for a name and one for a publication. Comparing concept
LSIDs can therefore be based on the comparisons between these two more
So, from the standpoint of the GUID group, the requirement here is a very
specific one. We need to find a way to manage assigning LSIDs to the
publications that make up the taxonomic literature, so that we can all have
what would amount to a master list of relevant publications. From this
angle, "all" that is needed is a secure registry into which the
bibliographic data can be stored, cleaned and assigned identifiers. Of
course such a resource could also be an excellent place to register the
location of online digital versions of each publication. At that point it
becomes something even more valuable. On the other hand, considering it
this way suggests that it may already naturally be addressed as part of the
BHL or a similar effort, and part of what we would like to do is to identify
any existing initiatives which may serve as a part or all of what is
required for the LSID work.
As I see it, the TDWG-LIT work gives a framework for the exchange of these
bibliographic data, but we also need to understand the best way to get the
kind of integrated biodiversity bibliography we would like to have.
Does that all make sense?
Donald Hobern (dhobern at gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
From: Taxonomic Databases Working Group GUID Project
[mailto:TDWG-GUID at LISTSERV.NHM.KU.EDU] On Behalf Of Anna Weitzman
Sent: 21 March 2006 20:10
To: TDWG-GUID at LISTSERV.NHM.KU.EDU
Subject: Re: PublicationBank - requirements evaluation
You may not be aware that TDWG has a list devoted to taxonomic literature
standards. It would be great if you (and anyone else interested) would join
in that discussion ( TDWG Literature standards mailing list
tdwg-lit at lists.tdwg.org; sign up at
http://lists.tdwg.org/mailman/listinfo/tdwg-lit_lists.tdwg.org/general ) and
add your expertise. The list has only been active since early February, and
the complete correspondence is in the archives (
Anna L. Weitzman, Ph.D.
Informatics Branch Chief, National Museum of Natural History
Smithsonian Institution, PO Box 37012
Natural History Building, Room W-623, MRC 136
Washington, DC 20013-7012 U.S.A.
phone: (202) 633-0846
fax: (202) 786-3180
email: weitzman at si.edu
INOTAXA - http://www.sil.si.edu/digitalcollections/bca/status.cfm
electronic Biologia Centrali-Americana -
>>> rhuber at WDC-MARE.ORG 21-Mar-2006 5:06:30 AM >>>
Below is a short 'survey' which hopefully can help to get an overview
on how bibliographic information currently is stored in your databases.
If you don't like to fill such forms, any other info on your current
literature db is also welcome, just send it to me by email!
The list maybe incomplete, if you think important questions are missing
there just let me and the others know.
I will try to sumarize the results on the wiki later.
best regards, Robert
1) How is your literature database/module organised?
- [ ]Database structure completely normalized
- [ ]Database structure not/incomplete normalized
2) How do you hold your bibliographic information?
- [ ]Complete set of Bib info (Author, Title,Source, Volume, Pages)
- [ ]Incomplete set of Bib info
- [ ]Abbreviations (e.g. Stafleu&Cowan)
- [ ]Bib Info and Abbreviations
- Specify which bibliographic fields you hold in your db:
--[ ]Source (Journal/Book)
--[ ]Source Editors
--[ ]Series Editors
3) How do you store author names:
- [ ]Abbreviations (e.g. Brummitt & Powell)
- [ ]Complete Name as String, one author per string
- [ ]Complete Name as String, all authors in one string
- [ ]Last Name, First Name separated
4) How do you store journal names/ other sources
- [ ]Complete Name
- [ ]Abbreviation
- [ ]Both
- [ ]If you hold abbreviations acc. to which standard?
Dr. Robert Huber
WDC-MARE / PANGAEA - www.pangaea.de <http://www.pangaea.de/> ,
Stratigraphy.net - www.stratigraphy.net <http://www.stratigraphy.net/>
MARUM - Institute for Marine Environmental Sciences (location)
POP 330 440
Phone ++49 421 218-65593, Fax ++49 421 218-65505
e-mail rhuber@@wdc-mare.org , robert.huber at stratigraphy.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tdwg-content