Re: [Tdwg-guid] Demise of Phyloinformatics journal

28 Nov 2006

      I certainly agree with Dave that the technology exists, and I agree with
Roger that we seem to be on the right path. My comment about it being "a
much bigger issue than our community is able to solve" was more along the
lines of ensuring persistence for centuries or millenia.  The Library of
Congress was appropriated $100 million to deal with this issue
(http://www.digitalpreservation.gov/about/index.html), which is a just a bit
more than we have access to. The real problem, of course, is that because
digital media have existed for only a few decades, we don't have an
established track record to say, with adequate confidence, that we "know"
how to preserve digital data for centuries or millenia (in the way that some
paper-based media have survived for such periods of time).  This is why the
system along the lines of what we're discussing can only really be thought
of as a "pilot project".

The fact of the matter is, we don't really need to have confidence that our
system is good enough to perservere for centuries.  We only have to be
confident that it will perservere until technology establishes a system that
*will* survive for centuries. If we're lucky, that probably will happen
within the next few decades.

So...to echo Roger, "the sun is shining and I feel we are heading in the
right direction" -- Rumsfeldisms notwithstanding.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
...
-----Original Message-----
From: Dave Vieglais [mailto:vieglais@ku.edu] 
Sent: Monday, November 27, 2006 5:08 PM
To: Richard Pyle
Cc: "'\"Döring, Markus\"'"; tdwg-guid@mailman.nhm.ku.edu
Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this 
community- even without any particularly novel new 
developments.  We have a system for unique IDs (LSIDs) which 
can be assigned to each document (actually each combination 
of object + metadata).  Assuming the documents are stored in 
an environment exposed by a protocol such as OAI (Open 
Archives Initiative), a harvester could easily retrieve 
copies of documents (actually any objects with IDs).
There's nothing to stop the harvester cache being exposed by 
the same protocol.  With a group of these harvester + OAI 
servers, and no limits on subscriptions then each harvester 
would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all 
objects of type document" or "all objects published before 
1999" or any other query supported by the metadata.  Or, 
given the availability of one or more indexers, which index 
all the available OAI services, a query such as "all objects 
for which there are only 9 copies" could be executed.  The 
result would be a list of LSIDs that need to be retrieved by 
the cache.  Of course there will be time lags between index 
and harvester states, so there will likely end up being more 
than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already 
exist in the WASABI framework - LSID assignment, OAI server, 
OAI harvester, indexer, cache.
The only real modification is to adapt the WASABI server to 
store objects along with their metadata, but this was kind of 
planned to support media objects.  I don't mean to preach 
WASABIsh here, such a topic has been on my mind for a while 
(actually distributed object storage, not just documents).
TAPIR and other protocols would probably work just fine as 
well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers,
  Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
...
Great article, Markus! Very similar to what I had in mind.  
I've never 
visited BitTorrent, but I gather that its structure and 
function are 
not altogether different from the original Napster.  Your 
description 
of a system that monitors available copies of any digital 
document and 
automatically ensures that a minimum number of copies are extant is
*exactly* what I was thinking.  In my view, there wouldn't 
be only one 
"hall monitor" server, but dozens or hundreds (likely 
correlated with 
major institutions or hard-core individuals with ample 
available storage space).
And I would probably draw the line for minimum number of copies at 
closer to 100 or so, and also include algorithms to ensure they are 
adequately distributed on geographic scales. Obviously, 
GUIDs would be 
a critical component of such a system.
It's a much bigger issue than our community is able to 
solve, I think 
-- but certainly we could implement some pilot projects along these 
lines for our own data needs, to see how such a system 
might work within our context.
Aloha,
Rich
...
-----Original Message-----
From: tdwg-guid-bounces@mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of 
"Döring, 
Markus"
Sent: Monday, November 27, 2006 5:56 AM
To: tdwg-guid@mailman.nhm.ku.edu
Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague 
idea I had 
for some time to use P2P networks like bittorrent as an persitent 
storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as-
a-persistent-storage-space/
Don't take it as a real proposal, but I like the general 
idea if it. 
It might even have been done already within the GRID 
community. But 
it conveys the original internet idea of distributing 
resources and 
minimizing impact if a nodes gets lost.
A quite nice discussion by the way.
Markus
--
 Markus Döring
 Botanic Garden and Botanical Museum Berlin Dahlem,  Dept. of 
Biodiversity Informatics  Königin-Luise-Str. 6-8, D-14191 Berlin
 Phone: +49 30 83850-284
 Email: m.doering@bgbm.org
 URL: http://www.bgbm.org/BioDivInf/
...
-----Original Message-----
From: tdwg-guid-bounces@mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf 
Of Richard 
Pyle
Sent: Sonntag, 26. November 2006 20:42
To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom'
Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic 
(below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago.  Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their 
digitial catalogs of stored electronic publications (and other
resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone. 
The system I
envision is somewhat of a cross between existing protocols for 
interlibrary loan, and the original Napster.  Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term 
problems (less than a century), compared to the long-term
(multi-millenia?) issue of information persistence. The point is, 
knowing the whereabaouts of extant copies of digital documents, 
coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major
perturbation) of human civilization.  And the electronic 
translator 
problem I alluded to in an earlier post cannot be 
ignored.  But to 
pretend that the potential doesn't exist or shouldn't be actively 
pursued is pure folly, in my opinion.
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology Department of Natural 
Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
...
-----Original Message-----
From: tdwg-guid-bounces@mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. 
Bryan Heidorn
Sent: Friday, November 24, 2006 8:22 AM
To: tdwg-guid@mailman.nhm.ku.edu; Taxacom
Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity.
The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions 
that house 
them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to 
hold them 
until the advent of public and academic libraries. Papers in 
individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity".
Commercial publishers, are clearly not the answer for
preservation. 
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a 
record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity 
community to go it alone on this. We are just in the 
beginning of 
the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data 
curation, 
preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. 
In the United States the Library of Congress The 
National Digital 
Information Infrastructure and Preservation Program http:// 
www.digitalpreservation.gov/ is one example. The U.S.
Government agency the Institute of Museum and Library Services
(IMLS) http:// www.imls.gov/ began grant programs to train 
librarians and museum curators in digital librarianship and most 
recently in digital data curation 
http://www.imls.gov/applicants/grants/
21centuryLibrarian.shtm is addressing the education issues. 
The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
http://sci.lis.uiuc.edu/
DCEP/ have begun working on best practices and education. 
This week
say the successful Data Curation Conference (DCC) in Glasgow, 
Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals"
31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world: 
the Project
StORe perspective on scientific repositories" Graham Pryor, 
University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/
programme/presentations/g-pryor.ppt that more scientists trusted 
publishers to save their digital documents than their home 
institutions and libraries! It is clear that scientists are 
generally not trained in economics and that the information 
technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to 
apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation.
Institutions like the Internet Archive help with some of
the current
problems.
Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors'
responsibility  to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also 
exist. They all have their weaknesses and additional research, 
development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation 
principles but al too often the publishers in spite of best 
intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature. 
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule.  We hurt science and human health
is we do not
at the same time address the information access issue.  
We need to
aspire to better dissemination and preservation. Electronic 
publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent. 
URLs were 
designed to be reused and be flexible.
With DOIs we can place the same paper in multiple digital or 
physical locations and reliably find copies.
Bryan Heidorn
--

...
...
...
...
P. Bryan Heidorn    Graduate School of Library and 
Information
Science
   pheidorn@uiuc.edu   University of Illinois at 
Urbana-Champaign MC-493
   (V)217/ 244-7792    501 East Daniel St., Champaign, IL  
61820-6212
   (F)217/ 244-3302    https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
...
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply 
disappear.
Best Regards,
Renato
--
IT Researcher
CRIA - Reference Center on Environmental Information 
http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
> The Open Access web-only journal "Phyloinformatics" 
seems to have
> disappeared, with the Internet address http:// 
> www.phyloinformatics.org now up for sale. This means the
articles
> have just disappeared!
>
> There weren't many papers published, but some were
interesting and
> have
> been cited in the mainstream literature.
>
> This also illustrates the problems with linking to digital
resources
> using URLs, as opposed to identifiers such as DOIs. With
the loss of
> the domain name, this journal has effectively died.
>
> A sobering lesson...
>
> Regards
>
> Rod
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdw> g-guid
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid