On Nov 28, 2006, at 1:40 PM, Richard Pyle wrote:

I certainly agree with Dave that the technology exists, and I agree with

Roger that we seem to be on the right path. My comment about it being "a

much bigger issue than our community is able to solve" was more along the

lines of ensuring persistence for centuries or millenia. The Library of

Congress was appropriated $100 million to deal with this issue

(http://www.digitalpreservation.gov/about/index.html), which is a just a bit

more than we have access to. The real problem, of course, is that because

digital media have existed for only a few decades, we don't have an

established track record to say, with adequate confidence, that we "know"

how to preserve digital data for centuries or millenia (in the way that some

paper-based media have survived for such periods of time). This is why the

system along the lines of what we're discussing can only really be thought

of as a "pilot project".

The fact of the matter is, we don't really need to have confidence that our

system is good enough to perservere for centuries. We only have to be

confident that it will perservere until technology establishes a system that

*will* survive for centuries. If we're lucky, that probably will happen

within the next few decades.

So...to echo Roger, "the sun is shining and I feel we are heading in the

right direction" -- Rumsfeldisms notwithstanding.

Aloha,

Rich

Richard L. Pyle, PhD

Database Coordinator for Natural Sciences

and Associate Zoologist in Ichthyology

Department of Natural Sciences, Bishop Museum

1525 Bernice St., Honolulu, HI 96817

Ph: (808)848-4115, Fax: (808)847-8252

email: deepreef@bishopmuseum.org

http://hbs.bishopmuseum.org/staff/pylerichard.html

-----Original Message-----

From: Dave Vieglais [mailto:vieglais@ku.edu]

Sent: Monday, November 27, 2006 5:08 PM

To: Richard Pyle

Cc: "'\"Döring, Markus\"'"; tdwg-guid@mailman.nhm.ku.edu

Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

I think such a system is quite well within the grasp of this

community- even without any particularly novel new

developments. We have a system for unique IDs (LSIDs) which

can be assigned to each document (actually each combination

of object + metadata). Assuming the documents are stored in

an environment exposed by a protocol such as OAI (Open

Archives Initiative), a harvester could easily retrieve

copies of documents (actually any objects with IDs).

There's nothing to stop the harvester cache being exposed by

the same protocol. With a group of these harvester + OAI

servers, and no limits on subscriptions then each harvester

would have a copy of everything, probably an undesirable outcome.

Harvester reach could be restricted by queries such as "all

objects of type document" or "all objects published before

1999" or any other query supported by the metadata. Or,

given the availability of one or more indexers, which index

all the available OAI services, a query such as "all objects

for which there are only 9 copies" could be executed. The

result would be a list of LSIDs that need to be retrieved by

the cache. Of course there will be time lags between index

and harvester states, so there will likely end up being more

than 10 copies of objects per cache, but is that really a problem?

All the pieces necessary for building such a system already

exist in the WASABI framework - LSID assignment, OAI server,

OAI harvester, indexer, cache.

The only real modification is to adapt the WASABI server to

store objects along with their metadata, but this was kind of

planned to support media objects. I don't mean to preach

WASABIsh here, such a topic has been on my mind for a while

(actually distributed object storage, not just documents).

TAPIR and other protocols would probably work just fine as

well with some modifications.

It seems pretty simple, but perhaps I'm missing some important pieces?

cheers,

Dave V.

Richard Pyle said the following on 28-11-2006 09:05:

Great article, Markus! Very similar to what I had in mind.

I've never

visited BitTorrent, but I gather that its structure and

function are

not altogether different from the original Napster. Your

description

of a system that monitors available copies of any digital

document and

automatically ensures that a minimum number of copies are extant is

*exactly* what I was thinking. In my view, there wouldn't

be only one

"hall monitor" server, but dozens or hundreds (likely

correlated with

major institutions or hard-core individuals with ample

available storage space).

And I would probably draw the line for minimum number of copies at

closer to 100 or so, and also include algorithms to ensure they are

adequately distributed on geographic scales. Obviously,

GUIDs would be

a critical component of such a system.

It's a much bigger issue than our community is able to

solve, I think

-- but certainly we could implement some pilot projects along these

lines for our own data needs, to see how such a system

might work within our context.

Aloha,

Rich

-----Original Message-----

From: tdwg-guid-bounces@mailman.nhm.ku.edu

[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of

"Döring,

Markus"

Sent: Monday, November 27, 2006 5:56 AM

To: tdwg-guid@mailman.nhm.ku.edu

Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

Richards post and Napster keyword reminded me of a vague

idea I had

for some time to use P2P networks like bittorrent as an persitent

storage space. You can read about it a bit more closely here:

http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as-

a-persistent-storage-space/

Don't take it as a real proposal, but I like the general

idea if it.

It might even have been done already within the GRID

community. But

it conveys the original internet idea of distributing

resources and

minimizing impact if a nodes gets lost.

A quite nice discussion by the way.

Markus

--

Markus Döring

Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of

Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin

Phone: +49 30 83850-284

Email: m.doering@bgbm.org

URL: http://www.bgbm.org/BioDivInf/

-----Original Message-----

From: tdwg-guid-bounces@mailman.nhm.ku.edu

[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf

Of Richard

Pyle

Sent: Sonntag, 26. November 2006 20:42

To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom'

Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

I only just now read Bryan Heidorn's excellent post on this topic

(below). One thing I would add is that the nature of the

internet and

electronic information allow us opportunities to ensure

permanence and

access that were either impossible, or prohibitively

expensive even a

decade ago. Imagine, for example, an internet protocol

that allowed

both institutions and individuals to "plug in" and expose their

digitial catalogs of stored electronic publications (and other

resources) such that the whereabouts of literally thousands

of copies

of every electronic publication could be known to anyone.

The system I

envision is somewhat of a cross between existing protocols for

interlibrary loan, and the original Napster. Certainly all

sorts of

copyright issues need to be sorted out, but these are short-term

problems (less than a century), compared to the long-term

(multi-millenia?) issue of information persistence. The point is,

knowing the whereabaouts of extant copies of digital documents,

coupled with the amazing ease and low cost of duplication

and global

dissemination (not to mention plummeting costs of

electronic storage

media), would virtually guarantee the long-term persistence

of digital

information.

Any system is, of course, vulnerable to the collapse (or major

perturbation) of human civilization. And the electronic

translator

problem I alluded to in an earlier post cannot be

ignored. But to

pretend that the potential doesn't exist or shouldn't be actively

pursued is pure folly, in my opinion.

Aloha,

Rich

Richard L. Pyle, PhD

Database Coordinator for Natural Sciences

and Associate Zoologist in Ichthyology Department of Natural

Sciences, Bishop Museum

1525 Bernice St., Honolulu, HI 96817

Ph: (808)848-4115, Fax: (808)847-8252

email: deepreef@bishopmuseum.org

http://hbs.bishopmuseum.org/staff/pylerichard.html

-----Original Message-----

From: tdwg-guid-bounces@mailman.nhm.ku.edu

[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P.

Bryan Heidorn

Sent: Friday, November 24, 2006 8:22 AM

To: tdwg-guid@mailman.nhm.ku.edu; Taxacom

Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

The problem and solution has less to do with the Internet

and more

to do with institutional longevity.

The permanence of paper has less to do with acid free

paper and more

to do with the relative permanence of the institutions

that house

them. Most paper documents over a hundred years old have

been lost

forever because there were no permanent institutions to

hold them

until the advent of public and academic libraries. Papers in

individual scientists collections are discarded when they

die. War

and economic upheavals left paper in rain and fire. It is

foolhardy

to assume that what is on paper is safe.

We know that dissemination of information in electronic

form is must

more economical than paper dissemination. The issue is

development

of proper institutions with adequate stable funding to

develop and

maintain copies into "perpetuity".

Commercial publishers, are clearly not the answer for

preservation.

Corporations and publishers go out of business all the

time. It is

only because libraries kept paper copies that we still have a

record.

Digital preservation and access problems exist for all

sciences and

government documents so there is no need to the biodiversity

community to go it alone on this. We are just in the

beginning of

the digital publishing history and have not yet

established adequate

preservation mechanisms within libraries to handle data

curation,

preservation and access in all the situations where it is

necessary.

There are projects underway world wide to address this issue.

In the United States the Library of Congress The

National Digital

Information Infrastructure and Preservation Program http://

www.digitalpreservation.gov/ is one example. The U.S.

Government agency the Institute of Museum and Library Services

(IMLS) http:// www.imls.gov/ began grant programs to train

librarians and museum curators in digital librarianship and most

recently in digital data curation

http://www.imls.gov/applicants/grants/

21centuryLibrarian.shtm is addressing the education issues.

The University of North Carolina

http://www.ils.unc.edu/digccurr2007/

papers.html and the University of Illinois

http://sci.lis.uiuc.edu/

DCEP/ have begun working on best practices and education.

This week

say the successful Data Curation Conference (DCC) in Glasgow,

Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will

be running

"Long-term Curation and Preservation of Journals"

31 January 2007. (as an aside, at DCC conference I saw

results of a

survey in "Attitudes and aspirations in a diverse world:

the Project

StORe perspective on scientific repositories" Graham Pryor,

University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/

programme/presentations/g-pryor.ppt that more scientists trusted

publishers to save their digital documents than their home

institutions and libraries! It is clear that scientists are

generally not trained in economics and that the information

technology management of many institutions must be abysmal!

We need something like to 5 institution rule for distribution to

apply for digital documents. Digital documents need to be

replicated

as well for both access and preservation.

Institutions like the Internet Archive help with some of

the current

problems.

Institutional Repositories (IR) are another. Many

universities and

libraries world wide are beginning these. It is authors'

responsibility to deposit their publications in these

institutions

and to support their creation. JSTOR and other institutions also

exist. They all have their weaknesses and additional research,

development and funding is needed to adequately address

the issues.

Also, all journals need to be managed using good data curation

principles but al too often the publishers in spite of best

intentions are not educated in such issues.

Digital publishing of taxonomic literature are not the

full answer

for current poor dissemination of taxonomic literature.

The deposit

of a published name in five institutions is a

preservation rule, not

a dissemination rule. We hurt science and human health

is we do not

at the same time address the information access issue.

We need to

aspire to better dissemination and preservation. Electronic

publishing will help but only if appropriate institutions

in place.

On the smaller issue, DOIs for publications, electronic

or paper is

a no-brainer. URLs were never designed to be permanent.

URLs were

designed to be reused and be flexible.

With DOIs we can place the same paper in multiple digital or

physical locations and reliably find copies.

Bryan Heidorn

--

--------------------------------------------------------------------

   P. Bryan Heidorn Graduate School of Library and

Information

Science

   pheidorn@uiuc.edu University of Illinois at

Urbana-Champaign MC-493

   (V)217/ 244-7792 501 East Daniel St., Champaign, IL

61820-6212

   (F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www

On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:

Rod,

Thanks for sharing with us the information. I already

imagined that

things like that could happen, but it's always better to

argue having

real examples.

Anyway, just in case someone reading the story decides to

blame URLs,

I just wanted to say that in my opinion the main issue

here is not

the technology or the GUID format being used. It's the

business model

and the management strategy.

I can easily imagine similar things happening to DOIs,

LSIDs or other

kinds of issued GUIDs if the institution(s) behind them simply

disappear.

Best Regards,

Renato

--

IT Researcher

CRIA - Reference Center on Environmental Information

http://www.cria.org.br/

On 24 Nov 2006 at 13:37, Roderic Page wrote:

The Open Access web-only journal "Phyloinformatics"

seems to have

disappeared, with the Internet address http://

www.phyloinformatics.org now up for sale. This means the

articles

have just disappeared!

There weren't many papers published, but some were

interesting and

have

been cited in the mainstream literature.

This also illustrates the problems with linking to digital

resources

using URLs, as opposed to identifiers such as DOIs. With

the loss of

the domain name, this journal has effectively died.

A sobering lesson...

Regards

Rod

_______________________________________________

TDWG-GUID mailing list

TDWG-GUID@mailman.nhm.ku.edu

http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________

TDWG-GUID mailing list

TDWG-GUID@mailman.nhm.ku.edu

http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________

TDWG-GUID mailing list

TDWG-GUID@mailman.nhm.ku.edu

http://mailman.nhm.ku.edu/mailman/listinfo/tdw> g-guid

_______________________________________________

TDWG-GUID mailing list

TDWG-GUID@mailman.nhm.ku.edu

http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________

TDWG-GUID mailing list

TDWG-GUID@mailman.nhm.ku.edu

http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________

TDWG-GUID mailing list

TDWG-GUID@mailman.nhm.ku.edu

http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

--------------------------------------------------------------------

P. Bryan Heidorn Graduate School of Library and Information Science

pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493

(V)217/ 244-7792 501 East Daniel St., Champaign, IL 61820-6212

(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www

Online Calendar: http://tinyurl.com/6fd5q

Visit the Biobrowser Web site at http://www.biobrowser.org