Roger that we seem to be on the right path. My
comment about it being "a
lines of ensuring persistence for centuries or
millenia. The Library
of
more than we have access to. The real problem, of
course, is that because
how to preserve digital data for centuries or
millenia (in the way that some
paper-based media have survived for such periods of
time). This is why the
of as a "pilot project".
system is good enough to perservere for
centuries. We only have to
be
confident that it will perservere until technology
establishes a system that
*will* survive for centuries. If we're lucky, that
probably will happen
within the next few decades.
right direction" -- Rumsfeldisms
notwithstanding.
Richard L. Pyle, PhD
-----Original Message-----
Sent: Monday, November 27, 2006 5:08 PM
To: Richard Pyle
Subject: Re: [Tdwg-guid] Demise of
Phyloinformatics journal
I think such a system is quite well within the
grasp of this
community- even without any particularly novel
new
developments. We have a system for unique IDs
(LSIDs) which
can be assigned to each document (actually each
combination
of object + metadata). Assuming the documents are
stored in
an environment exposed by a protocol such as OAI
(Open
Archives Initiative), a harvester could easily
retrieve
copies of documents (actually any objects with
IDs).
There's nothing to stop the harvester cache being
exposed by
the same protocol. With a group of these harvester
+ OAI
servers, and no limits on subscriptions then each
harvester
would have a copy of everything, probably an
undesirable outcome.
Harvester reach could be restricted by queries
such as "all
objects of type document" or "all objects
published before
1999" or any other query supported by the
metadata. Or,
given the availability of one or more indexers,
which index
all the available OAI services, a query such as
"all objects
for which there are only 9 copies" could be
executed. The
result would be a list of LSIDs that need to be
retrieved by
the cache. Of course there will be time
lags between index
and harvester states, so there will likely end up
being more
than 10 copies of objects per cache, but is that
really a problem?
All the pieces necessary for building such a
system already
exist in the WASABI framework - LSID assignment,
OAI server,
OAI harvester, indexer, cache.
The only real modification is to adapt the WASABI
server to
store objects along with their metadata, but this
was kind of
planned to support media objects. I don't mean to preach
WASABIsh here, such a topic has been on my mind
for a while
(actually distributed object storage, not just
documents).
TAPIR and other protocols would probably work
just fine as
well with some modifications.
It seems pretty simple, but perhaps I'm missing
some important pieces?
cheers,
Dave V.
Richard Pyle said the following on 28-11-2006
09:05:
Great article, Markus! Very similar to what I
had in mind.
I've never
visited BitTorrent, but I gather that its
structure and
function are
not altogether different from the original
Napster. Your
description
of a system that monitors available copies of
any digital
document and
automatically ensures that a minimum number of
copies are extant is
*exactly* what I was thinking. In my view, there
wouldn't
be only one
"hall monitor" server, but dozens or hundreds
(likely
correlated with
major institutions or hard-core individuals
with ample
available storage space).
And I would probably draw the line for minimum
number of copies at
closer to 100 or so, and also include
algorithms to ensure they are
adequately distributed on geographic scales.
Obviously,
GUIDs would be
a critical component of such a system.
It's a much bigger issue than our community is
able to
solve, I think
-- but certainly we could implement some pilot
projects along these
lines for our own data needs, to see how such a
system
might work within our context.
Aloha,
Rich
-----Original Message-----
"Döring,
Markus"
Sent: Monday, November 27, 2006 5:56 AM
Subject: Re: [Tdwg-guid] Demise of
Phyloinformatics journal
Richards post and Napster keyword reminded me
of a vague
idea I had
for some time to use P2P networks like
bittorrent as an persitent
storage space. You can read about it a bit
more closely here:
a-persistent-storage-space/
Don't take it as a real proposal, but I like
the general
idea if it.
It might even have been done already within
the GRID
community. But
it conveys the original internet idea of
distributing
resources and
minimizing impact if a nodes gets lost.
A quite nice discussion by the way.
Markus
--
Markus Döring
Botanic Garden and Botanical
Museum Berlin Dahlem,
Dept. of
Biodiversity Informatics Königin-Luise-Str. 6-8,
D-14191 Berlin
Phone: +49 30 83850-284
-----Original Message-----
Of Richard
Pyle
Sent: Sonntag, 26. November 2006
20:42
Subject: Re: [Tdwg-guid] Demise of
Phyloinformatics journal
I only just now read Bryan Heidorn's
excellent post on this topic
(below). One thing I would add is that the
nature of the
internet and
electronic information allow us
opportunities to ensure
permanence and
access that were either impossible, or
prohibitively
expensive even a
decade ago. Imagine, for example, an
internet protocol
that allowed
both institutions and individuals to "plug
in" and expose their
digitial catalogs of stored electronic
publications (and other
resources) such that the whereabouts of
literally thousands
of copies
of every electronic publication could be
known to anyone.
The system I
envision is somewhat of a cross between
existing protocols for
interlibrary loan, and the original
Napster. Certainly
all
sorts of
copyright issues need to be sorted out, but
these are short-term
problems (less than a century), compared to
the long-term
(multi-millenia?) issue of information
persistence. The point is,
knowing the whereabaouts of extant copies
of digital documents,
coupled with the amazing ease and low cost
of duplication
and global
dissemination (not to mention plummeting
costs of
electronic storage
media), would virtually guarantee the
long-term persistence
of digital
information.
Any system is, of course, vulnerable to the
collapse (or major
perturbation) of human civilization. And the electronic
translator
problem I alluded to in an earlier post
cannot be
ignored.
But to
pretend that the potential doesn't exist or
shouldn't be actively
pursued is pure folly, in my opinion.
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural
Sciences
and Associate Zoologist in Ichthyology Department of
Natural
Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
-----Original Message-----
Bryan Heidorn
Sent: Friday, November 24, 2006 8:22
AM
Subject: Re: [Tdwg-guid] Demise of
Phyloinformatics journal
The problem and solution has less to do
with the Internet
and more
to do with institutional longevity.
The permanence of paper has less to do
with acid free
paper and more
to do with the relative permanence of the
institutions
that house
them. Most paper documents over a hundred
years old have
been lost
forever because there were no permanent
institutions to
hold them
until the advent of public and academic
libraries. Papers in
individual scientists collections are
discarded when they
die. War
and economic upheavals left paper in rain
and fire. It is
foolhardy
to assume that what is on paper is
safe.
We know that dissemination of information
in electronic
form is must
more economical than paper dissemination.
The issue is
development
of proper institutions with adequate
stable funding to
develop and
maintain copies into "perpetuity".
Commercial publishers, are clearly not
the answer for
preservation.
Corporations and publishers go out of
business all the
time. It is
only because libraries kept paper copies
that we still have a
record.
Digital preservation and access problems
exist for all
sciences and
government documents so there is no need
to the biodiversity
community to go it alone on this. We are
just in the
beginning of
the digital publishing history and have
not yet
established adequate
preservation mechanisms within libraries
to handle data
curation,
preservation and access in all the
situations where it is
necessary.
There are projects underway world wide to
address this issue.
In the United States the Library of
Congress The
National Digital
Information Infrastructure and
Preservation Program http://
www.digitalpreservation.gov/ is one
example. The U.S.
Government agency the Institute of Museum
and Library Services
librarians and museum curators in digital
librarianship and most
recently in digital data curation
21centuryLibrarian.shtm is addressing the
education issues.
The University of North
Carolina
papers.html and the University of
Illinois
DCEP/ have begun working on best
practices and education.
This week
say the successful Data Curation
Conference (DCC) in Glasgow,
be running
"Long-term Curation and Preservation of
Journals"
31 January 2007. (as an aside, at DCC
conference I saw
results of a
survey in "Attitudes and aspirations in a
diverse world:
the Project
StORe perspective on scientific
repositories" Graham Pryor,
programme/presentations/g-pryor.ppt that
more scientists trusted
publishers to save their digital
documents than their home
institutions and libraries! It is clear
that scientists are
generally not trained in economics and
that the information
technology management of many
institutions must be abysmal!
We need something like to 5 institution
rule for distribution to
apply for digital documents. Digital
documents need to be
replicated
as well for both access and
preservation.
Institutions like the Internet Archive
help with some of
the current
problems.
Institutional Repositories (IR) are
another. Many
universities and
libraries world wide are beginning these.
It is authors'
responsibility to deposit their
publications in these
institutions
and to support their creation. JSTOR and
other institutions also
exist. They all have their weaknesses and
additional research,
development and funding is needed to
adequately address
the issues.
Also, all journals need to be managed
using good data curation
principles but al too often the
publishers in spite of best
intentions are not educated in such
issues.
Digital publishing of taxonomic
literature are not the
full answer
for current poor dissemination of
taxonomic literature.
The deposit
of a published name in five institutions
is a
preservation rule, not
a dissemination rule. We hurt science and
human health
is we do not
at the same time address the information
access issue.
We need to
aspire to better dissemination and
preservation. Electronic
publishing will help but only if
appropriate institutions
in place.
On the smaller issue, DOIs for
publications, electronic
or paper is
a no-brainer. URLs were never designed to
be permanent.
URLs were
designed to be reused and be
flexible.
With DOIs we can place the same paper in
multiple digital or
physical locations and reliably find
copies.
Bryan Heidorn
--
--------------------------------------------------------------------
P. Bryan
Heidorn
Graduate School of Library and
Information
Science
Urbana-Champaign MC-493
(V)217/
244-7792 501
East Daniel St., Champaign, IL
61820-6212
On Nov 24, 2006, at 9:54 AM, Renato De
Giovanni wrote:
Rod,
Thanks for sharing with us the
information. I already
imagined that
things like that could happen, but it's
always better to
argue having
real examples.
Anyway, just in case someone reading
the story decides to
blame URLs,
I just wanted to say that in my opinion
the main issue
here is not
the technology or the GUID format being
used. It's the
business model
and the management strategy.
I can easily imagine similar things
happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the
institution(s) behind them simply
disappear.
Best Regards,
Renato
--
IT Researcher
CRIA - Reference Center on
Environmental Information
On 24 Nov 2006 at 13:37, Roderic Page
wrote:
The Open Access web-only journal
"Phyloinformatics"
seems to have
disappeared, with the Internet
address http://
www.phyloinformatics.org now up for
sale. This means the
articles
have just disappeared!
There weren't many papers published,
but some were
interesting and
have
been cited in the mainstream
literature.
This also illustrates the problems
with linking to digital
resources
using URLs, as opposed to identifiers
such as DOIs. With
the loss of
the domain name, this journal has
effectively died.
A sobering lesson...
Regards
Rod
_______________________________________________
TDWG-GUID mailing list
_______________________________________________
TDWG-GUID mailing list
_______________________________________________
TDWG-GUID mailing list
_______________________________________________
TDWG-GUID mailing list
_______________________________________________
TDWG-GUID mailing list