PublicationBank survey results and some things to discuss

Wed Apr 12 12:09:28 CEST 2006

Dear all,

Following the discussion on PublicationBank and after the results of
the survey, I must confess I still have some difficulties to see what
exactly PublicationBank will be and how it could be implemented.
Therefore I would just like to throw the survey results and some of
my thoughts/ideas here and on the WIKI. I hope this will stimulate
the discussion..

Current status of bibliographic data

Based on the survey I sent around, it can be said that there are many
ways how bib data is treated by participating dbs.  I received 7
responses to the survey.
For about 50%, bib modules are not or incomplete normalized, but the
majority stores the complete set of bibliographic information I asked
for (or even more). However, some institutions use abbreviations(e.g.
Brummit) only as bibliographic information.
About 50% store authors names as strings and 50% store last name,
first name separately. Journal and other sources are stored mostly as
both, abbreviation and full journal name.

The results give a good impression of the difficulties a operating
PublicationBank might expect. In some way PublicationBank will serve
GUIDs and to do this it will need to know e.g. Author, Year and
source/ journal to find the appropriate GUID.
So the question is can this information easily be extracted from the
existing dbs to formulate a query by participating institutions?
The other reason why I cretaed the survey was to find out whether
the existing dbs can be used as initial sources contributing their
entries to PublicationBank to have an initial set of citations.

The good point is that most institutions store the complete set of
bib info, however it might be a bit difficult to e.g. extract the
volume or issue information when the db is not normalized and the
info is stored as a string. Same difficulties may occur for author
names, this depends on how the string is stored (Last Name, Initial,
Last Name, Initial would be difficult..).
Important is that both, abbreviations as well as complete citations
are stored in the dbs.

In summary it can be said that it will probably not be easy to create
a initial citation pool for PublicationBank because the current
status of bib data is too heterogenous. It will also be difficult for
some of the dbs to extract the necessary info from their dbs to be
able to effectively use PublicationBank. But as I said this depends
on the detailled way their data is stored. Some might need to put
some additional efforts on their dbs, but most surely could rightaway
start to query PublicationBank.

-- What should PublicationBank be:

1) A service provider for existing taxonomic databases to ease the
        management of bibliographic data. It allows client databases to
        check   their existing bib entries against PublicationBank and to
        retrieve a GUID.
2) An external, common bibliographic management system for existing
        taxonomic databases. It allows client databases to concentrate
        on taxonomic information, thus to leave all bib related data
        management at PublicationBank.
        Client dbs would  internally use only GUIDs, no bibliographic
        module needed anymore.
3) to be _the_ bibliographic ressource for biodiversity relevant
        publications 1) and 2) included but major(curatorial/librarian)
        additional efforts are made to analyze, find and store relevant
        publications.

-- What is "relevant" bibliographic data

1) Gets relevant on demand -The literature/citations which is already
        stored (and will be stored in the future) in taxonomic database
        using PublicationBank see above scenarios 19 and 2)
2) Is "per se" relevant - Any literature which treats biodiversity/
        taxonomy etc. relevant topics

-- What gets an GUID?

1) Abbreviations (such as Brummit & Powell)
2) Copmplete citations
3) Both

-- Granularity of biblographic data (Roger Hyam):

1)LSID for the Journal/Book
2)LSID for the volume
3)LSID for the part
4)LSID for the article
5)LSID for the actual description on page 15.
I assume all except 5 are usually treated by bibliographic databases?

-- Where to get GUIDs from?
1) CrossRef / Journals (DOIs)
2) Citation Databases (PubMed.. ISI, Georef)
3) Library digitization efforts (Animal base, Google books, )
4) Assign our own PublicationBank LSID (e.g. as preliminary GUIDs)
...

-- Which ressources are already there?

1) CrossRef and other bibliographic GUID resolver services
2) IPNI and uBIOS 'author abbreviation resolver'
...know more?
some ressources could help but do not use GUIDs (AnimalBase etc),
we need to encourage these initiatives to use GUIDs

-- Are there initiatives/products which could help or take over the whole
thing?

1) Biodiversity Heritage Library Project (http://www.bhl.si.edu/)
2) Natures Connotea project (http://www.connotea.org/)

-- Which other people need to be involved?

1) Librarians

-- What will be the components of PublicationBank and how will they interact
todo

best regards and happy Eastern!

Robert

Dr. Robert Huber
WDC-MARE / PANGAEA - www.pangaea.de, www.wdc-mare.org
Stratigraphy.net - www.stratigraphy.net
_____________________________________________
MARUM - Institute for Marine Environmental Sciences (location)
University Bremen
Leobener Strasse
POP 330 440
28359 Bremen
Phone ++49 421 218-65593, Fax ++49 421 218-65505
e-mail rhuber@@wdc-mare.org, robert.huber at stratigraphy.net