Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague idea I had for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as-a-persistent-s...
Don't take it as a real proposal, but I like the general idea if it. It might even have been done already within the GRID community. But it conveys the original internet idea of distributing resources and minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the internet and electronic information allow us opportunities to ensure permanence and access that were either impossible, or prohibitively expensive even a decade ago. Imagine, for example, an internet protocol that allowed both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands of copies of every electronic publication could be known to anyone. The system I envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all sorts of copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication and global dissemination (not to mention plummeting costs of electronic storage media), would virtually guarantee the long-term persistence of digital information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic translator problem I alluded to in an earlier post cannot be ignored. But to pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet and more to do with institutional longevity. The permanence of paper has less to do with acid free paper and more to do with the relative permanence of the institutions that house them. Most paper documents over a hundred years old have been lost forever because there were no permanent institutions to hold them until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they die. War and economic upheavals left paper in rain and fire. It is foolhardy to assume that what is on paper is safe.
We know that dissemination of information in electronic form is must more economical than paper dissemination. The issue is development of proper institutions with adequate stable funding to develop and maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for preservation. Corporations and publishers go out of business all the time. It is only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all sciences and government documents so there is no need to the biodiversity community to go it alone on this. We are just in the beginning of the digital publishing history and have not yet established adequate preservation mechanisms within libraries to handle data curation, preservation and access in all the situations where it is necessary. There are projects underway world wide to address this issue. In the United States the Library of Congress The National Digital Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois http://sci.lis.uiuc.edu/ DCEP/ have begun working on best practices and education. This week say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will be running "Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw results of a survey in "Attitudes and aspirations in a diverse world: the Project StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be replicated as well for both access and preservation. Institutions like the Internet Archive help with some of the current problems. Institutional Repositories (IR) are another. Many universities and libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these institutions and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address the issues. Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the full answer for current poor dissemination of taxonomic literature. The deposit of a published name in five institutions is a preservation rule, not a dissemination rule. We hurt science and human health is we do not at the same time address the information access issue. We need to aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions in place.
On the smaller issue, DOIs for publications, electronic or paper is a no-brainer. URLs were never designed to be permanent. URLs were designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and Information Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
The Open Access web-only journal "Phyloinformatics"
seems to have
disappeared, with the Internet address http:// www.phyloinformatics.org now up for sale. This means the
articles
have just disappeared!
There weren't many papers published, but some were
interesting and
have been cited in the mainstream literature.
This also illustrates the problems with linking to digital
resources
using URLs, as opposed to identifiers such as DOIs. With
the loss of
the domain name, this journal has effectively died.
A sobering lesson...
Regards
Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
Great article, Markus! Very similar to what I had in mind. I've never visited BitTorrent, but I gather that its structure and function are not altogether different from the original Napster. Your description of a system that monitors available copies of any digital document and automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't be only one "hall monitor" server, but dozens or hundreds (likely correlated with major institutions or hard-core individuals with ample available storage space). And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously, GUIDs would be a critical component of such a system.
It's a much bigger issue than our community is able to solve, I think -- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of "Döring, Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague idea I had for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general idea if it. It might even have been done already within the GRID community. But it conveys the original internet idea of distributing resources and minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic translator problem I alluded to in an earlier post cannot be ignored. But to pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions that house them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to hold them until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the beginning of the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data curation, preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The National Digital Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent. URLs were designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
The Open Access web-only journal "Phyloinformatics"
seems to have
disappeared, with the Internet address http:// www.phyloinformatics.org now up for sale. This means the
articles
have just disappeared!
There weren't many papers published, but some were
interesting and
have been cited in the mainstream literature.
This also illustrates the problems with linking to digital
resources
using URLs, as opposed to identifiers such as DOIs. With
the loss of
the domain name, this journal has effectively died.
A sobering lesson...
Regards
Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind. I've never visited BitTorrent, but I gather that its structure and function are not altogether different from the original Napster. Your description of a system that monitors available copies of any digital document and automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't be only one "hall monitor" server, but dozens or hundreds (likely correlated with major institutions or hard-core individuals with ample available storage space). And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously, GUIDs would be a critical component of such a system.
It's a much bigger issue than our community is able to solve, I think -- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of "Döring, Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague idea I had for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general idea if it. It might even have been done already within the GRID community. But it conveys the original internet idea of distributing resources and minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic translator problem I alluded to in an earlier post cannot be ignored. But to pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions that house them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to hold them until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the beginning of the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data curation, preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The National Digital Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent. URLs were designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
The Open Access web-only journal "Phyloinformatics"
seems to have
disappeared, with the Internet address http:// www.phyloinformatics.org now up for sale. This means the
articles
have just disappeared!
There weren't many papers published, but some were
interesting and
have been cited in the mainstream literature.
This also illustrates the problems with linking to digital
resources
using URLs, as opposed to identifiers such as DOIs. With
the loss of
the domain name, this journal has effectively died.
A sobering lesson...
Regards
Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
Hi Dave,
I think you just about summed up where we are headed with the architecture in terms of enabling this kind of thing. We have resolution (LSID) and harvest (OAI) giving us the main backbone for flow of data. The different caches (thematic caches perhaps) can provide search services on top of this, whether these be SPARQL or TAPIR or whatever based. These are nice clean interfaces so the whole thing stands a chance of being implementation independent. A semantic/ WASABI based thematic network might include data from TAPIR providers as well as home spun Perl scripts or even static files.
A network of caches would, in theory, allow a missing resource to be resurrected from its pieces. It would still be nice if it hadn't gone away in the first place though.
The sun is shining and I feel we are heading in the right direction - but as you say the unknown unknowns are still lurking out there!
All the best,
Roger
On 28 Nov 2006, at 03:07, Dave Vieglais wrote:
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind. I've never visited BitTorrent, but I gather that its structure and function are not altogether different from the original Napster. Your description of a system that monitors available copies of any digital document and automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't be only one "hall monitor" server, but dozens or hundreds (likely correlated with major institutions or hard-core individuals with ample available storage space). And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously, GUIDs would be a critical component of such a system.
It's a much bigger issue than our community is able to solve, I think -- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of "Döring, Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague idea I had for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general idea if it. It might even have been done already within the GRID community. But it conveys the original internet idea of distributing resources and minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic translator problem I alluded to in an earlier post cannot be ignored. But to pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions that house them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to hold them until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the beginning of the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data curation, preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The National Digital Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent. URLs were designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
> The Open Access web-only journal "Phyloinformatics"
seems to have
> disappeared, with the Internet address http:// > www.phyloinformatics.org now up for sale. This means the
articles
> have just disappeared! > > There weren't many papers published, but some were
interesting and
> have > been cited in the mainstream literature. > > This also illustrates the problems with linking to digital
resources
> using URLs, as opposed to identifiers such as DOIs. With
the loss of
> the domain name, this journal has effectively died. > > A sobering lesson... > > Regards > > Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
<vieglais.vcf> _______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
I certainly agree with Dave that the technology exists, and I agree with Roger that we seem to be on the right path. My comment about it being "a much bigger issue than our community is able to solve" was more along the lines of ensuring persistence for centuries or millenia. The Library of Congress was appropriated $100 million to deal with this issue (http://www.digitalpreservation.gov/about/index.html), which is a just a bit more than we have access to. The real problem, of course, is that because digital media have existed for only a few decades, we don't have an established track record to say, with adequate confidence, that we "know" how to preserve digital data for centuries or millenia (in the way that some paper-based media have survived for such periods of time). This is why the system along the lines of what we're discussing can only really be thought of as a "pilot project".
The fact of the matter is, we don't really need to have confidence that our system is good enough to perservere for centuries. We only have to be confident that it will perservere until technology establishes a system that *will* survive for centuries. If we're lucky, that probably will happen within the next few decades.
So...to echo Roger, "the sun is shining and I feel we are heading in the right direction" -- Rumsfeldisms notwithstanding.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Monday, November 27, 2006 5:08 PM To: Richard Pyle Cc: "'"Döring, Markus"'"; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind.
I've never
visited BitTorrent, but I gather that its structure and
function are
not altogether different from the original Napster. Your
description
of a system that monitors available copies of any digital
document and
automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't
be only one
"hall monitor" server, but dozens or hundreds (likely
correlated with
major institutions or hard-core individuals with ample
available storage space).
And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously,
GUIDs would be
a critical component of such a system.
It's a much bigger issue than our community is able to
solve, I think
-- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system
might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of
"Döring,
Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague
idea I had
for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general
idea if it.
It might even have been done already within the GRID
community. But
it conveys the original internet idea of distributing
resources and
minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf
Of Richard
Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic
translator
problem I alluded to in an earlier post cannot be
ignored. But to
pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions
that house
them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to
hold them
until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the
beginning of
the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data
curation,
preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The
National Digital
Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent.
URLs were
designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
> The Open Access web-only journal "Phyloinformatics"
seems to have
> disappeared, with the Internet address http:// > www.phyloinformatics.org now up for sale. This means the
articles
> have just disappeared! > > There weren't many papers published, but some were
interesting and
> have > been cited in the mainstream literature. > > This also illustrates the problems with linking to digital
resources
> using URLs, as opposed to identifiers such as DOIs. With
the loss of
> the domain name, this journal has effectively died. > > A sobering lesson... > > Regards > > Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
It is important to inform the Library of Congress and other entities about the special needs of the taxonomic community so that they can put the appropriate mechanisms into place. The library community solution including, academic libraries, LC and its counterparts in other countries do make a fairly survivable system that might indeed survive some major upheavals. The biodiversity community need not go it alone. It is better to share the deeper pockets of nuclear physics, astronomy, and medicine. We have unique needs but we can rely on subsystems put into place for these other sciences.
Also, currently, going digital does not mean not using paper. For the near future it would be good to have a few paper copies of important documents along with the digital copy. Also, if we have digital copy, at any point in the future, assuming we still have trees to make paper, we can decide to abandon the digital systems and print out what we want. It is not an either/or decision not and will not be in the future either.
On Nov 28, 2006, at 1:40 PM, Richard Pyle wrote:
I certainly agree with Dave that the technology exists, and I agree with Roger that we seem to be on the right path. My comment about it being "a much bigger issue than our community is able to solve" was more along the lines of ensuring persistence for centuries or millenia. The Library of Congress was appropriated $100 million to deal with this issue (http://www.digitalpreservation.gov/about/index.html), which is a just a bit more than we have access to. The real problem, of course, is that because digital media have existed for only a few decades, we don't have an established track record to say, with adequate confidence, that we "know" how to preserve digital data for centuries or millenia (in the way that some paper-based media have survived for such periods of time). This is why the system along the lines of what we're discussing can only really be thought of as a "pilot project".
The fact of the matter is, we don't really need to have confidence that our system is good enough to perservere for centuries. We only have to be confident that it will perservere until technology establishes a system that *will* survive for centuries. If we're lucky, that probably will happen within the next few decades.
So...to echo Roger, "the sun is shining and I feel we are heading in the right direction" -- Rumsfeldisms notwithstanding.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Monday, November 27, 2006 5:08 PM To: Richard Pyle Cc: "'"Döring, Markus"'"; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind.
I've never
visited BitTorrent, but I gather that its structure and
function are
not altogether different from the original Napster. Your
description
of a system that monitors available copies of any digital
document and
automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't
be only one
"hall monitor" server, but dozens or hundreds (likely
correlated with
major institutions or hard-core individuals with ample
available storage space).
And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously,
GUIDs would be
a critical component of such a system.
It's a much bigger issue than our community is able to
solve, I think
-- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system
might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of
"Döring,
Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague
idea I had
for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general
idea if it.
It might even have been done already within the GRID
community. But
it conveys the original internet idea of distributing
resources and
minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf
Of Richard
Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic
translator
problem I alluded to in an earlier post cannot be
ignored. But to
pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions
that house
them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to
hold them
until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the
beginning of
the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data
curation,
preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The
National Digital
Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent.
URLs were
designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
> Rod, > > Thanks for sharing with us the information. I already
imagined that
> things like that could happen, but it's always better to argue having > real examples. > > Anyway, just in case someone reading the story decides to blame URLs, > I just wanted to say that in my opinion the main issue
here is not
> the technology or the GUID format being used. It's the business model > and the management strategy. > > I can easily imagine similar things happening to DOIs, LSIDs or other > kinds of issued GUIDs if the institution(s) behind them simply > disappear. > > Best Regards, > > Renato > -- > IT Researcher > CRIA - Reference Center on Environmental Information > http://www.cria.org.br/ > > On 24 Nov 2006 at 13:37, Roderic Page wrote: > >> The Open Access web-only journal "Phyloinformatics"
seems to have
>> disappeared, with the Internet address http:// >> www.phyloinformatics.org now up for sale. This means the
articles
>> have just disappeared! >> >> There weren't many papers published, but some were interesting and >> have >> been cited in the mainstream literature. >> >> This also illustrates the problems with linking to digital resources >> using URLs, as opposed to identifiers such as DOIs. With the loss of >> the domain name, this journal has effectively died. >> >> A sobering lesson... >> >> Regards >> >> Rod > > _______________________________________________ > TDWG-GUID mailing list > TDWG-GUID@mailman.nhm.ku.edu > http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
As a sidebar to this topic, one of the techiques developing in digital preservation is the use of unique identifiers such as PUIDs to unquely identify data formats. These can be used in conjunction with data format registries, such as PRONOM, to be alerted to the "expiry" of support for a particular format and to determine a viable migration path.
An introduction to this can be found on my favourite site: http://en.wikipedia.org/wiki/PRONOM_technical_registry
There is an opensource tool called DROID that can be run on repositories to automatically identify formats: http://droid.sourceforge.net/wiki/index.php/Development_History
Neil ------- Neil Thomson Head of Data & Digital Systems The Natural History Museum, Cromwell Road, London SW7 5BD Tel: +44 (0)20 7942 5294, Fax: +44 (0)20 7942 5559, Email: n.thomson@nhm.ac.uk http://www.nhm.ac.uk
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: 28 November 2006 19:40 To: 'Dave Vieglais' Cc: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I certainly agree with Dave that the technology exists, and I agree with Roger that we seem to be on the right path. My comment about it being "a much bigger issue than our community is able to solve" was more along the lines of ensuring persistence for centuries or millenia. The Library of Congress was appropriated $100 million to deal with this issue (http://www.digitalpreservation.gov/about/index.html), which is a just a bit more than we have access to. The real problem, of course, is that because digital media have existed for only a few decades, we don't have an established track record to say, with adequate confidence, that we "know" how to preserve digital data for centuries or millenia (in the way that some paper-based media have survived for such periods of time). This is why the system along the lines of what we're discussing can only really be thought of as a "pilot project".
The fact of the matter is, we don't really need to have confidence that our system is good enough to perservere for centuries. We only have to be confident that it will perservere until technology establishes a system that *will* survive for centuries. If we're lucky, that probably will happen within the next few decades.
So...to echo Roger, "the sun is shining and I feel we are heading in the right direction" -- Rumsfeldisms notwithstanding.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Monday, November 27, 2006 5:08 PM To: Richard Pyle Cc: "'"Döring, Markus"'"; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind.
I've never
visited BitTorrent, but I gather that its structure and
function are
not altogether different from the original Napster. Your
description
of a system that monitors available copies of any digital
document and
automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't
be only one
"hall monitor" server, but dozens or hundreds (likely
correlated with
major institutions or hard-core individuals with ample
available storage space).
And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously,
GUIDs would be
a critical component of such a system.
It's a much bigger issue than our community is able to
solve, I think
-- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system
might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of
"Döring,
Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague
idea I had
for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general
idea if it.
It might even have been done already within the GRID
community. But
it conveys the original internet idea of distributing
resources and
minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf
Of Richard
Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic
translator
problem I alluded to in an earlier post cannot be
ignored. But to
pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions
that house
them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to
hold them
until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the
beginning of
the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data
curation,
preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The
National Digital
Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent.
URLs were
designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
> The Open Access web-only journal "Phyloinformatics"
seems to have
> disappeared, with the Internet address http:// > www.phyloinformatics.org now up for sale. This means the
articles
> have just disappeared! > > There weren't many papers published, but some were
interesting and
> have > been cited in the mainstream literature. > > This also illustrates the problems with linking to digital
resources
> using URLs, as opposed to identifiers such as DOIs. With
the loss of
> the domain name, this journal has effectively died. > > A sobering lesson... > > Regards > > Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
Dear all,
The Phyloinformatics journal shows how easy social contracts can fail and formerly trusted archives just disappear... There are some initiatives which aim to provide standards for the preservation and archiving of digital contents. For example the OAIS reference model (http://public.ccsds.org/publications/archive/650x0b1.pdf) which aims to become ISO. Or here in germany the the DINI initiative (http://www.dini.de/) and the NESTOR project (http://www.langzeitarchivierung.de/index.php?newlang=eng). Both german initiatives aim to provide certificates for long term archives, I guess similar to the quality standard ISO 9001 .
It would be very interesting to me if you knew similar initiatives from your countries?
Today, it is impossible to estimate how good the ability of a content provider (including taxonomic databases) for real long term archiving really is . Certificates could help a lot I think, especially if they would require a 'fall back strategy' of the candidate archive. Such a strategy could simple be to nominate a partner archive which would store the content on their servers in case of emergency (and resolve the LSIDs in our case).
best regards, Robert
-----Ursprüngliche Nachricht----- Von: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu]Im Auftrag von P. Bryan Heidorn Gesendet: Dienstag, 28. November 2006 23:14 An: Richard Pyle Cc: tdwg-guid@mailman.nhm.ku.edu Betreff: Re: [Tdwg-guid] Demise of Phyloinformatics journal
It is important to inform the Library of Congress and other entities about the special needs of the taxonomic community so that they can put the appropriate mechanisms into place. The library community solution including, academic libraries, LC and its counterparts in other countries do make a fairly survivable system that might indeed survive some major upheavals. The biodiversity community need not go it alone. It is better to share the deeper pockets of nuclear physics, astronomy, and medicine. We have unique needs but we can rely on subsystems put into place for these other sciences.
Also, currently, going digital does not mean not using paper. For the near future it would be good to have a few paper copies of important documents along with the digital copy. Also, if we have digital copy, at any point in the future, assuming we still have trees to make paper, we can decide to abandon the digital systems and print out what we want. It is not an either/or decision not and will not be in the future either.
On Nov 28, 2006, at 1:40 PM, Richard Pyle wrote:
I certainly agree with Dave that the technology exists, and I agree with Roger that we seem to be on the right path. My comment about it being "a much bigger issue than our community is able to solve" was more along the lines of ensuring persistence for centuries or millenia. The Library of Congress was appropriated $100 million to deal with this issue (http://www.digitalpreservation.gov/about/index.html), which is a just a bit more than we have access to. The real problem, of course, is that because digital media have existed for only a few decades, we don't have an established track record to say, with adequate confidence, that we "know" how to preserve digital data for centuries or millenia (in the way that some paper-based media have survived for such periods of time). This is why the system along the lines of what we're discussing can only really be thought of as a "pilot project".
The fact of the matter is, we don't really need to have confidence that our system is good enough to perservere for centuries. We only have to be confident that it will perservere until technology establishes a system that *will* survive for centuries. If we're lucky, that probably will happen within the next few decades.
So...to echo Roger, "the sun is shining and I feel we are heading in the right direction" -- Rumsfeldisms notwithstanding.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Monday, November 27, 2006 5:08 PM To: Richard Pyle Cc: "'"Döring, Markus"'"; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05: Great article, Markus! Very similar to what I had in mind. I've never visited BitTorrent, but I gather that its structure and function are not altogether different from the original Napster. Your description of a system that monitors available copies of any digital document and automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't be only one "hall monitor" server, but dozens or hundreds (likely correlated with major institutions or hard-core individuals with ample available storage space). And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously, GUIDs would be a critical component of such a system.
It's a much bigger issue than our community is able to solve, I think -- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of "Döring, Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague idea I had for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general idea if it. It might even have been done already within the GRID community. But it conveys the original internet idea of distributing resources and minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the internet and electronic information allow us opportunities to ensure permanence and access that were either impossible, or prohibitively expensive even a decade ago. Imagine, for example, an internet protocol that allowed both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands of copies of every electronic publication could be known to anyone. The system I envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all sorts of copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication and global dissemination (not to mention plummeting costs of electronic storage media), would virtually guarantee the long-term persistence of digital information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic translator problem I alluded to in an earlier post cannot be ignored. But to pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet and more to do with institutional longevity. The permanence of paper has less to do with acid free paper and more to do with the relative permanence of the institutions that house them. Most paper documents over a hundred years old have been lost forever because there were no permanent institutions to hold them until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they die. War and economic upheavals left paper in rain and fire. It is foolhardy to assume that what is on paper is safe.
We know that dissemination of information in electronic form is must more economical than paper dissemination. The issue is development of proper institutions with adequate stable funding to develop and maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for preservation. Corporations and publishers go out of business all the time. It is only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all sciences and government documents so there is no need to the biodiversity community to go it alone on this. We are just in the beginning of the digital publishing history and have not yet established adequate preservation mechanisms within libraries to handle data curation, preservation and access in all the situations where it is necessary. There are projects underway world wide to address this issue. In the United States the Library of Congress The National Digital Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina http://www.ils.unc.edu/digccurr2007/ papers.html and the University of Illinois http://sci.lis.uiuc.edu/ DCEP/ have begun working on best practices and education. This week say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will be running "Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw results of a survey in "Attitudes and aspirations in a diverse world: the Project StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be replicated as well for both access and preservation. Institutions like the Internet Archive help with some of the current problems. Institutional Repositories (IR) are another. Many universities and libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these institutions and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address the issues. Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the full answer for current poor dissemination of taxonomic literature. The deposit of a published name in five institutions is a preservation rule, not a dissemination rule. We hurt science and human health is we do not at the same time address the information access issue. We need to aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions in place. On the smaller issue, DOIs for publications, electronic or paper is a no-brainer. URLs were never designed to be permanent. URLs were designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn --
-------------------------------------------------------------------- P. Bryan Heidorn Graduate School of Library and Information Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL 61820-6212 (F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already imagined that things like that could happen, but it's always better to argue having real examples.
Anyway, just in case someone reading the story decides to blame URLs, I just wanted to say that in my opinion the main issue here is not the technology or the GUID format being used. It's the business model and the management strategy.
I can easily imagine similar things happening to DOIs, LSIDs or other kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato -- IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
The Open Access web-only journal "Phyloinformatics" seems to have disappeared, with the Internet address http:// www.phyloinformatics.org now up for sale. This means the articles have just disappeared!
There weren't many papers published, but some were interesting and have been cited in the mainstream literature.
This also illustrates the problems with linking to digital resources using URLs, as opposed to identifiers such as DOIs. With the loss of the domain name, this journal has effectively died.
A sobering lesson...
Regards
Rod
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
-- -------------------------------------------------------------------- P. Bryan Heidorn Graduate School of Library and Information Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL 61820-6212 (F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www Online Calendar: http://tinyurl.com/6fd5q Visit the Biobrowser Web site at http://www.biobrowser.org
Hi all:
We are grappling with the same kinds of issues here at the NCBI (Library of Medicine) though our mandate covers the NIH-funded biological literature (as opposed to the NSF-funded literature) so our archive is weak on systematics and such.
PubMed Central (PMC) is a full-text literature archive, which has backscanned complete runs of some participating journals, and is also intended to include all papers published on NIH-funded research. We have also developed a portable version of this system (pPMC) which has been used to set up collaborating national archives in other countries, e.g. UK-PMC.
I just wanted to mention it as a potential model, since it hadn't come up in the discussion yet.
http://www.pubmedcentral.nih.gov/ http://www.ukpmc.org/
Cheers,
:Scott federhen@ncbi.nlm.nih.gov GenBank Taxonomy & LinkOut
On Nov 29, 2006, at 4:26 AM, Robert Huber wrote:
Dear all,
The Phyloinformatics journal shows how easy social contracts can fail and formerly trusted archives just disappear... There are some initiatives which aim to provide standards for the preservation and archiving of digital contents. For example the OAIS reference model (http://public.ccsds.org/ publications/archive/650x0b1.pdf) which aims to become ISO. Or here in germany the the DINI initiative (http:// www.dini.de/) and the NESTOR project (http://www.langzeitarchivierung.de/index.php?newlang=eng). Both german initiatives aim to provide certificates for long term archives, I guess similar to the quality standard ISO 9001 .
It would be very interesting to me if you knew similar initiatives from your countries?
Today, it is impossible to estimate how good the ability of a content provider (including taxonomic databases) for real long term archiving really is . Certificates could help a lot I think, especially if they would require a 'fall back strategy' of the candidate archive. Such a strategy could simple be to nominate a partner archive which would store the content on their servers in case of emergency (and resolve the LSIDs in our case).
best regards, Robert
-----Ursprüngliche Nachricht----- Von: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid- bounces@mailman.nhm.ku.edu]Im Auftrag von P. Bryan Heidorn Gesendet: Dienstag, 28. November 2006 23:14 An: Richard Pyle Cc: tdwg-guid@mailman.nhm.ku.edu Betreff: Re: [Tdwg-guid] Demise of Phyloinformatics journal
It is important to inform the Library of Congress and other entities about the special needs of the taxonomic community so that they can put the appropriate mechanisms into place. The library community solution including, academic libraries, LC and its counterparts in other countries do make a fairly survivable system that might indeed survive some major upheavals. The biodiversity community need not go it alone. It is better to share the deeper pockets of nuclear physics, astronomy, and medicine. We have unique needs but we can rely on subsystems put into place for these other sciences.
Also, currently, going digital does not mean not using paper. For the near future it would be good to have a few paper copies of important documents along with the digital copy. Also, if we have digital copy, at any point in the future, assuming we still have trees to make paper, we can decide to abandon the digital systems and print out what we want. It is not an either/or decision not and will not be in the future either.
On Nov 28, 2006, at 1:40 PM, Richard Pyle wrote:
I certainly agree with Dave that the technology exists, and I agree with Roger that we seem to be on the right path. My comment about it being "a much bigger issue than our community is able to solve" was more along the lines of ensuring persistence for centuries or millenia. The Library of Congress was appropriated $100 million to deal with this issue (http://www.digitalpreservation.gov/about/index.html), which is a just a bit more than we have access to. The real problem, of course, is that because digital media have existed for only a few decades, we don't have an established track record to say, with adequate confidence, that we "know" how to preserve digital data for centuries or millenia (in the way that some paper-based media have survived for such periods of time). This is why the system along the lines of what we're discussing can only really be thought of as a "pilot project".
The fact of the matter is, we don't really need to have confidence that our system is good enough to perservere for centuries. We only have to be confident that it will perservere until technology establishes a system that *will* survive for centuries. If we're lucky, that probably will happen within the next few decades.
So...to echo Roger, "the sun is shining and I feel we are heading in the right direction" -- Rumsfeldisms notwithstanding.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Monday, November 27, 2006 5:08 PM To: Richard Pyle Cc: "'"Döring, Markus"'"; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind.
I've never
visited BitTorrent, but I gather that its structure and
function are
not altogether different from the original Napster. Your
description
of a system that monitors available copies of any digital
document and
automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't
be only one
"hall monitor" server, but dozens or hundreds (likely
correlated with
major institutions or hard-core individuals with ample
available storage space).
And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously,
GUIDs would be
a critical component of such a system.
It's a much bigger issue than our community is able to
solve, I think
-- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system
might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of
"Döring,
Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague
idea I had
for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general
idea if it.
It might even have been done already within the GRID
community. But
it conveys the original internet idea of distributing
resources and
minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf
Of Richard
Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic
translator
problem I alluded to in an earlier post cannot be
ignored. But to
pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
> -----Original Message----- > From: tdwg-guid-bounces@mailman.nhm.ku.edu > [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. > Bryan Heidorn > Sent: Friday, November 24, 2006 8:22 AM > To: tdwg-guid@mailman.nhm.ku.edu; Taxacom > Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal > > The problem and solution has less to do with the Internet
and more
> to do with institutional longevity. > The permanence of paper has less to do with acid free
paper and more
> to do with the relative permanence of the institutions
that house
> them. Most paper documents over a hundred years old have
been lost
> forever because there were no permanent institutions to
hold them
> until the advent of public and academic libraries. Papers in > individual scientists collections are discarded when they
die. War
> and economic upheavals left paper in rain and fire. It is
foolhardy
> to assume that what is on paper is safe. > > We know that dissemination of information in electronic
form is must
> more economical than paper dissemination. The issue is
development
> of proper institutions with adequate stable funding to
develop and
> maintain copies into "perpetuity". > Commercial publishers, are clearly not the answer for
preservation.
> Corporations and publishers go out of business all the
time. It is
> only because libraries kept paper copies that we still have a > record. > > Digital preservation and access problems exist for all
sciences and
> government documents so there is no need to the biodiversity > community to go it alone on this. We are just in the
beginning of
> the digital publishing history and have not yet
established adequate
> preservation mechanisms within libraries to handle data
curation,
> preservation and access in all the situations where it is
necessary.
> There are projects underway world wide to address this issue. > In the United States the Library of Congress The
National Digital
> Information Infrastructure and Preservation Program http:// > www.digitalpreservation.gov/ is one example. The U.S. > Government agency the Institute of Museum and Library Services > (IMLS) http:// www.imls.gov/ began grant programs to train > librarians and museum curators in digital librarianship and most > recently in digital data curation > http://www.imls.gov/applicants/grants/ > 21centuryLibrarian.shtm is addressing the education issues. > The University of North Carolina http://www.ils.unc.edu/digccurr2007/ > papers.html and the University of Illinois
> DCEP/ have begun working on best practices and education.
This week
> say the successful Data Curation Conference (DCC) in Glasgow, > Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
> "Long-term Curation and Preservation of Journals" > 31 January 2007. (as an aside, at DCC conference I saw
results of a
> survey in "Attitudes and aspirations in a diverse world:
the Project
> StORe perspective on scientific repositories" Graham Pryor, > University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ > programme/presentations/g-pryor.ppt that more scientists trusted > publishers to save their digital documents than their home > institutions and libraries! It is clear that scientists are > generally not trained in economics and that the information > technology management of many institutions must be abysmal! > > We need something like to 5 institution rule for distribution to > apply for digital documents. Digital documents need to be
replicated
> as well for both access and preservation. > Institutions like the Internet Archive help with some of
the current
> problems. > Institutional Repositories (IR) are another. Many
universities and
> libraries world wide are beginning these. It is authors' > responsibility to deposit their publications in these
institutions
> and to support their creation. JSTOR and other institutions also > exist. They all have their weaknesses and additional research, > development and funding is needed to adequately address
the issues.
> Also, all journals need to be managed using good data curation > principles but al too often the publishers in spite of best > intentions are not educated in such issues. > > Digital publishing of taxonomic literature are not the
full answer
> for current poor dissemination of taxonomic literature.
The deposit
> of a published name in five institutions is a
preservation rule, not
> a dissemination rule. We hurt science and human health
is we do not
> at the same time address the information access issue.
We need to
> aspire to better dissemination and preservation. Electronic > publishing will help but only if appropriate institutions
in place.
> On the smaller issue, DOIs for publications, electronic
or paper is
> a no-brainer. URLs were never designed to be permanent.
URLs were
> designed to be reused and be flexible. > With DOIs we can place the same paper in multiple digital or > physical locations and reliably find copies. > > Bryan Heidorn > -- >
> P. Bryan Heidorn Graduate School of Library and
Information
> Science > pheidorn@uiuc.edu University of Illinois at > Urbana-Champaign MC-493 > (V)217/ 244-7792 501 East Daniel St., Champaign, IL 61820-6212 > (F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www > > > On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote: > >> Rod, >> >> Thanks for sharing with us the information. I already imagined that >> things like that could happen, but it's always better to > argue having >> real examples. >> >> Anyway, just in case someone reading the story decides to > blame URLs, >> I just wanted to say that in my opinion the main issue here is not >> the technology or the GUID format being used. It's the > business model >> and the management strategy. >> >> I can easily imagine similar things happening to DOIs, > LSIDs or other >> kinds of issued GUIDs if the institution(s) behind them simply >> disappear. >> >> Best Regards, >> >> Renato >> -- >> IT Researcher >> CRIA - Reference Center on Environmental Information >> http://www.cria.org.br/ >> >> On 24 Nov 2006 at 13:37, Roderic Page wrote: >> >>> The Open Access web-only journal "Phyloinformatics" seems to have >>> disappeared, with the Internet address http:// >>> www.phyloinformatics.org now up for sale. This means the articles >>> have just disappeared! >>> >>> There weren't many papers published, but some were > interesting and >>> have >>> been cited in the mainstream literature. >>> >>> This also illustrates the problems with linking to digital > resources >>> using URLs, as opposed to identifiers such as DOIs. With > the loss of >>> the domain name, this journal has effectively died. >>> >>> A sobering lesson... >>> >>> Regards >>> >>> Rod >> >> _______________________________________________ >> TDWG-GUID mailing list >> TDWG-GUID@mailman.nhm.ku.edu >> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid > > > > _______________________________________________ > TDWG-GUID mailing list > TDWG-GUID@mailman.nhm.ku.edu > http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
--
P. Bryan Heidorn Graduate School of Library and Information Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL 61820-6212 (F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www Online Calendar: http://tinyurl.com/6fd5q Visit the Biobrowser Web site at http://www.biobrowser.org
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
Hi Neil
That is a neat project! I like days when I learn about something totally new that shows foresight.
Thanks
Lee
Lee Belbin Manager, TDWG Infrastructure Project Email: lee@tdwg.org Phone: +61(0)419 374 133 -----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Neil Thomson Sent: Wednesday, 29 November 2006 7:35 PM To: Richard Pyle; Dave Vieglais Cc: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
As a sidebar to this topic, one of the techiques developing in digital preservation is the use of unique identifiers such as PUIDs to unquely identify data formats. These can be used in conjunction with data format registries, such as PRONOM, to be alerted to the "expiry" of support for a particular format and to determine a viable migration path.
An introduction to this can be found on my favourite site: http://en.wikipedia.org/wiki/PRONOM_technical_registry
There is an opensource tool called DROID that can be run on repositories to automatically identify formats: http://droid.sourceforge.net/wiki/index.php/Development_History
Neil ------- Neil Thomson Head of Data & Digital Systems The Natural History Museum, Cromwell Road, London SW7 5BD Tel: +44 (0)20 7942 5294, Fax: +44 (0)20 7942 5559, Email: n.thomson@nhm.ac.uk http://www.nhm.ac.uk
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of Richard Pyle Sent: 28 November 2006 19:40 To: 'Dave Vieglais' Cc: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I certainly agree with Dave that the technology exists, and I agree with Roger that we seem to be on the right path. My comment about it being "a much bigger issue than our community is able to solve" was more along the lines of ensuring persistence for centuries or millenia. The Library of Congress was appropriated $100 million to deal with this issue (http://www.digitalpreservation.gov/about/index.html), which is a just a bit more than we have access to. The real problem, of course, is that because digital media have existed for only a few decades, we don't have an established track record to say, with adequate confidence, that we "know" how to preserve digital data for centuries or millenia (in the way that some paper-based media have survived for such periods of time). This is why the system along the lines of what we're discussing can only really be thought of as a "pilot project".
The fact of the matter is, we don't really need to have confidence that our system is good enough to perservere for centuries. We only have to be confident that it will perservere until technology establishes a system that *will* survive for centuries. If we're lucky, that probably will happen within the next few decades.
So...to echo Roger, "the sun is shining and I feel we are heading in the right direction" -- Rumsfeldisms notwithstanding.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: Dave Vieglais [mailto:vieglais@ku.edu] Sent: Monday, November 27, 2006 5:08 PM To: Richard Pyle Cc: "'"Döring, Markus"'"; tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I think such a system is quite well within the grasp of this community- even without any particularly novel new developments. We have a system for unique IDs (LSIDs) which can be assigned to each document (actually each combination of object + metadata). Assuming the documents are stored in an environment exposed by a protocol such as OAI (Open Archives Initiative), a harvester could easily retrieve copies of documents (actually any objects with IDs). There's nothing to stop the harvester cache being exposed by the same protocol. With a group of these harvester + OAI servers, and no limits on subscriptions then each harvester would have a copy of everything, probably an undesirable outcome.
Harvester reach could be restricted by queries such as "all objects of type document" or "all objects published before 1999" or any other query supported by the metadata. Or, given the availability of one or more indexers, which index all the available OAI services, a query such as "all objects for which there are only 9 copies" could be executed. The result would be a list of LSIDs that need to be retrieved by the cache. Of course there will be time lags between index and harvester states, so there will likely end up being more than 10 copies of objects per cache, but is that really a problem?
All the pieces necessary for building such a system already exist in the WASABI framework - LSID assignment, OAI server, OAI harvester, indexer, cache. The only real modification is to adapt the WASABI server to store objects along with their metadata, but this was kind of planned to support media objects. I don't mean to preach WASABIsh here, such a topic has been on my mind for a while (actually distributed object storage, not just documents). TAPIR and other protocols would probably work just fine as well with some modifications.
It seems pretty simple, but perhaps I'm missing some important pieces?
cheers, Dave V.
Richard Pyle said the following on 28-11-2006 09:05:
Great article, Markus! Very similar to what I had in mind.
I've never
visited BitTorrent, but I gather that its structure and
function are
not altogether different from the original Napster. Your
description
of a system that monitors available copies of any digital
document and
automatically ensures that a minimum number of copies are extant is *exactly* what I was thinking. In my view, there wouldn't
be only one
"hall monitor" server, but dozens or hundreds (likely
correlated with
major institutions or hard-core individuals with ample
available storage space).
And I would probably draw the line for minimum number of copies at closer to 100 or so, and also include algorithms to ensure they are adequately distributed on geographic scales. Obviously,
GUIDs would be
a critical component of such a system.
It's a much bigger issue than our community is able to
solve, I think
-- but certainly we could implement some pilot projects along these lines for our own data needs, to see how such a system
might work within our context.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of
"Döring,
Markus" Sent: Monday, November 27, 2006 5:56 AM To: tdwg-guid@mailman.nhm.ku.edu Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
Richards post and Napster keyword reminded me of a vague
idea I had
for some time to use P2P networks like bittorrent as an persitent storage space. You can read about it a bit more closely here:
http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as- a-persistent-storage-space/
Don't take it as a real proposal, but I like the general
idea if it.
It might even have been done already within the GRID
community. But
it conveys the original internet idea of distributing
resources and
minimizing impact if a nodes gets lost.
A quite nice discussion by the way. Markus -- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin Phone: +49 30 83850-284 Email: m.doering@bgbm.org URL: http://www.bgbm.org/BioDivInf/
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf
Of Richard
Pyle Sent: Sonntag, 26. November 2006 20:42 To: 'P. Bryan Heidorn'; tdwg-guid@mailman.nhm.ku.edu; 'Taxacom' Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
I only just now read Bryan Heidorn's excellent post on this topic (below). One thing I would add is that the nature of the
internet and
electronic information allow us opportunities to ensure
permanence and
access that were either impossible, or prohibitively
expensive even a
decade ago. Imagine, for example, an internet protocol
that allowed
both institutions and individuals to "plug in" and expose their digitial catalogs of stored electronic publications (and other resources) such that the whereabouts of literally thousands
of copies
of every electronic publication could be known to anyone.
The system I
envision is somewhat of a cross between existing protocols for interlibrary loan, and the original Napster. Certainly all
sorts of
copyright issues need to be sorted out, but these are short-term problems (less than a century), compared to the long-term (multi-millenia?) issue of information persistence. The point is, knowing the whereabaouts of extant copies of digital documents, coupled with the amazing ease and low cost of duplication
and global
dissemination (not to mention plummeting costs of
electronic storage
media), would virtually guarantee the long-term persistence
of digital
information.
Any system is, of course, vulnerable to the collapse (or major perturbation) of human civilization. And the electronic
translator
problem I alluded to in an earlier post cannot be
ignored. But to
pretend that the potential doesn't exist or shouldn't be actively pursued is pure folly, in my opinion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
-----Original Message----- From: tdwg-guid-bounces@mailman.nhm.ku.edu [mailto:tdwg-guid-bounces@mailman.nhm.ku.edu] On Behalf Of P. Bryan Heidorn Sent: Friday, November 24, 2006 8:22 AM To: tdwg-guid@mailman.nhm.ku.edu; Taxacom Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
The problem and solution has less to do with the Internet
and more
to do with institutional longevity. The permanence of paper has less to do with acid free
paper and more
to do with the relative permanence of the institutions
that house
them. Most paper documents over a hundred years old have
been lost
forever because there were no permanent institutions to
hold them
until the advent of public and academic libraries. Papers in individual scientists collections are discarded when they
die. War
and economic upheavals left paper in rain and fire. It is
foolhardy
to assume that what is on paper is safe.
We know that dissemination of information in electronic
form is must
more economical than paper dissemination. The issue is
development
of proper institutions with adequate stable funding to
develop and
maintain copies into "perpetuity". Commercial publishers, are clearly not the answer for
preservation.
Corporations and publishers go out of business all the
time. It is
only because libraries kept paper copies that we still have a record.
Digital preservation and access problems exist for all
sciences and
government documents so there is no need to the biodiversity community to go it alone on this. We are just in the
beginning of
the digital publishing history and have not yet
established adequate
preservation mechanisms within libraries to handle data
curation,
preservation and access in all the situations where it is
necessary.
There are projects underway world wide to address this issue. In the United States the Library of Congress The
National Digital
Information Infrastructure and Preservation Program http:// www.digitalpreservation.gov/ is one example. The U.S. Government agency the Institute of Museum and Library Services (IMLS) http:// www.imls.gov/ began grant programs to train librarians and museum curators in digital librarianship and most recently in digital data curation http://www.imls.gov/applicants/grants/ 21centuryLibrarian.shtm is addressing the education issues. The University of North Carolina
http://www.ils.unc.edu/digccurr2007/
papers.html and the University of Illinois
DCEP/ have begun working on best practices and education.
This week
say the successful Data Curation Conference (DCC) in Glasgow, Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
be running
"Long-term Curation and Preservation of Journals" 31 January 2007. (as an aside, at DCC conference I saw
results of a
survey in "Attitudes and aspirations in a diverse world:
the Project
StORe perspective on scientific repositories" Graham Pryor, University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/ programme/presentations/g-pryor.ppt that more scientists trusted publishers to save their digital documents than their home institutions and libraries! It is clear that scientists are generally not trained in economics and that the information technology management of many institutions must be abysmal!
We need something like to 5 institution rule for distribution to apply for digital documents. Digital documents need to be
replicated
as well for both access and preservation. Institutions like the Internet Archive help with some of
the current
problems. Institutional Repositories (IR) are another. Many
universities and
libraries world wide are beginning these. It is authors' responsibility to deposit their publications in these
institutions
and to support their creation. JSTOR and other institutions also exist. They all have their weaknesses and additional research, development and funding is needed to adequately address
the issues.
Also, all journals need to be managed using good data curation principles but al too often the publishers in spite of best intentions are not educated in such issues.
Digital publishing of taxonomic literature are not the
full answer
for current poor dissemination of taxonomic literature.
The deposit
of a published name in five institutions is a
preservation rule, not
a dissemination rule. We hurt science and human health
is we do not
at the same time address the information access issue.
We need to
aspire to better dissemination and preservation. Electronic publishing will help but only if appropriate institutions
in place.
On the smaller issue, DOIs for publications, electronic
or paper is
a no-brainer. URLs were never designed to be permanent.
URLs were
designed to be reused and be flexible. With DOIs we can place the same paper in multiple digital or physical locations and reliably find copies.
Bryan Heidorn
P. Bryan Heidorn Graduate School of Library and
Information
Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 501 East Daniel St., Champaign, IL
61820-6212
(F)217/ 244-3302 https://netfiles.uiuc.edu/pheidorn/www
On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
Rod,
Thanks for sharing with us the information. I already
imagined that
things like that could happen, but it's always better to
argue having
real examples.
Anyway, just in case someone reading the story decides to
blame URLs,
I just wanted to say that in my opinion the main issue
here is not
the technology or the GUID format being used. It's the
business model
and the management strategy.
I can easily imagine similar things happening to DOIs,
LSIDs or other
kinds of issued GUIDs if the institution(s) behind them simply disappear.
Best Regards,
Renato
IT Researcher CRIA - Reference Center on Environmental Information http://www.cria.org.br/
On 24 Nov 2006 at 13:37, Roderic Page wrote:
> The Open Access web-only journal "Phyloinformatics"
seems to have
> disappeared, with the Internet address http:// > www.phyloinformatics.org now up for sale. This means the
articles
> have just disappeared! > > There weren't many papers published, but some were
interesting and
> have > been cited in the mainstream literature. > > This also illustrates the problems with linking to digital
resources
> using URLs, as opposed to identifiers such as DOIs. With
the loss of
> the domain name, this journal has effectively died. > > A sobering lesson... > > Regards > > Rod
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdw%3E g-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
participants (9)
-
"Döring, Markus"
-
Dave Vieglais
-
Lee Belbin
-
Neil Thomson
-
P. Bryan Heidorn
-
Richard Pyle
-
Robert Huber
-
Roger Hyam
-
Scott Federhen, NCBI