[Tdwg-guid] Demise of Phyloinformatics journal

Wed Nov 29 23:14:42 CET 2006

Hi Neil

That is a neat project! I like days when I learn about something totally new
that shows foresight. 

Thanks

Lee

Lee Belbin
Manager, TDWG Infrastructure Project
Email: lee at tdwg.org
Phone: +61(0)419 374 133 
-----Original Message-----
From: tdwg-guid-bounces at mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf Of Neil Thomson
Sent: Wednesday, 29 November 2006 7:35 PM
To: Richard Pyle; Dave Vieglais
Cc: tdwg-guid at mailman.nhm.ku.edu
Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

As a sidebar to this topic, one of the techiques developing in digital
preservation is the use of unique identifiers such as PUIDs to unquely
identify data formats. These can be used in conjunction with data format
registries, such as PRONOM, to be alerted to the "expiry" of support for a
particular format and to determine a viable migration path.

An introduction to this can be found on my favourite site:
  http://en.wikipedia.org/wiki/PRONOM_technical_registry

There is an opensource tool called DROID that can be run on repositories to
automatically identify formats:
  http://droid.sourceforge.net/wiki/index.php/Development_History

Neil
-------
Neil Thomson
Head of Data & Digital Systems
The Natural History Museum, Cromwell Road, London SW7 5BD
Tel: +44 (0)20 7942 5294,
Fax: +44 (0)20 7942 5559,
Email: n.thomson at nhm.ac.uk
http://www.nhm.ac.uk 

-----Original Message-----
From: tdwg-guid-bounces at mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf Of Richard Pyle
Sent: 28 November 2006 19:40
To: 'Dave Vieglais'
Cc: tdwg-guid at mailman.nhm.ku.edu
Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

I certainly agree with Dave that the technology exists, and I agree with
Roger that we seem to be on the right path. My comment about it being "a
much bigger issue than our community is able to solve" was more along the
lines of ensuring persistence for centuries or millenia.  The Library of
Congress was appropriated $100 million to deal with this issue
(http://www.digitalpreservation.gov/about/index.html), which is a just a bit
more than we have access to. The real problem, of course, is that because
digital media have existed for only a few decades, we don't have an
established track record to say, with adequate confidence, that we "know"
how to preserve digital data for centuries or millenia (in the way that some
paper-based media have survived for such periods of time).  This is why the
system along the lines of what we're discussing can only really be thought
of as a "pilot project".

The fact of the matter is, we don't really need to have confidence that our
system is good enough to perservere for centuries.  We only have to be
confident that it will perservere until technology establishes a system that
*will* survive for centuries. If we're lucky, that probably will happen
within the next few decades.

So...to echo Roger, "the sun is shining and I feel we are heading in the
right direction" -- Rumsfeldisms notwithstanding.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology Department of Natural Sciences,
Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html

> -----Original Message-----
> From: Dave Vieglais [mailto:vieglais at ku.edu]
> Sent: Monday, November 27, 2006 5:08 PM
> To: Richard Pyle
> Cc: "'\"Döring, Markus\"'"; tdwg-guid at mailman.nhm.ku.edu
> Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
> 
> I think such a system is quite well within the grasp of this
> community- even without any particularly novel new developments.  We 
> have a system for unique IDs (LSIDs) which can be assigned to each 
> document (actually each combination of object + metadata).  Assuming 
> the documents are stored in an environment exposed by a protocol such 
> as OAI (Open Archives Initiative), a harvester could easily retrieve 
> copies of documents (actually any objects with IDs).
> There's nothing to stop the harvester cache being exposed by the same 
> protocol.  With a group of these harvester + OAI servers, and no 
> limits on subscriptions then each harvester would have a copy of 
> everything, probably an undesirable outcome.
> 
> Harvester reach could be restricted by queries such as "all objects of 
> type document" or "all objects published before 1999" or any other 
> query supported by the metadata.  Or, given the availability of one or 
> more indexers, which index all the available OAI services, a query 
> such as "all objects for which there are only 9 copies" could be 
> executed.  The result would be a list of LSIDs that need to be 
> retrieved by the cache.  Of course there will be time lags between 
> index and harvester states, so there will likely end up being more 
> than 10 copies of objects per cache, but is that really a problem?
> 
> All the pieces necessary for building such a system already exist in 
> the WASABI framework - LSID assignment, OAI server, OAI harvester, 
> indexer, cache.
> The only real modification is to adapt the WASABI server to store 
> objects along with their metadata, but this was kind of planned to 
> support media objects.  I don't mean to preach WASABIsh here, such a 
> topic has been on my mind for a while (actually distributed object 
> storage, not just documents).
> TAPIR and other protocols would probably work just fine as well with 
> some modifications.
> 
> It seems pretty simple, but perhaps I'm missing some important pieces?
> 
> 
> cheers,
>   Dave V.
> 
> Richard Pyle said the following on 28-11-2006 09:05:
> > Great article, Markus! Very similar to what I had in mind.  
> I've never
> > visited BitTorrent, but I gather that its structure and
> function are
> > not altogether different from the original Napster.  Your
> description
> > of a system that monitors available copies of any digital
> document and
> > automatically ensures that a minimum number of copies are extant is
> > *exactly* what I was thinking.  In my view, there wouldn't
> be only one
> > "hall monitor" server, but dozens or hundreds (likely
> correlated with
> > major institutions or hard-core individuals with ample
> available storage space).
> > And I would probably draw the line for minimum number of copies at 
> > closer to 100 or so, and also include algorithms to ensure they are 
> > adequately distributed on geographic scales. Obviously,
> GUIDs would be
> > a critical component of such a system.
> > 
> > It's a much bigger issue than our community is able to
> solve, I think
> > -- but certainly we could implement some pilot projects along these 
> > lines for our own data needs, to see how such a system
> might work within our context.
> > 
> > Aloha,
> > Rich
> > 
> >> -----Original Message-----
> >> From: tdwg-guid-bounces at mailman.nhm.ku.edu
> >> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf Of
> "Döring,
> >> Markus"
> >> Sent: Monday, November 27, 2006 5:56 AM
> >> To: tdwg-guid at mailman.nhm.ku.edu
> >> Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
> >>
> >> Richards post and Napster keyword reminded me of a vague
> idea I had
> >> for some time to use P2P networks like bittorrent as an persitent 
> >> storage space. You can read about it a bit more closely here:
> >>
> >> http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as-
> >> a-persistent-storage-space/
> >>
> >> Don't take it as a real proposal, but I like the general
> idea if it. 
> >> It might even have been done already within the GRID
> community. But
> >> it conveys the original internet idea of distributing
> resources and
> >> minimizing impact if a nodes gets lost.
> >>
> >> A quite nice discussion by the way.
> >> Markus
> >> --
> >>  Markus Döring
> >>  Botanic Garden and Botanical Museum Berlin Dahlem,  Dept. of 
> >> Biodiversity Informatics  Königin-Luise-Str. 6-8, D-14191 Berlin
> >>  Phone: +49 30 83850-284
> >>  Email: m.doering at bgbm.org
> >>  URL: http://www.bgbm.org/BioDivInf/
> >>
> >>
> >>> -----Original Message-----
> >>> From: tdwg-guid-bounces at mailman.nhm.ku.edu
> >>> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf
> Of Richard
> >>> Pyle
> >>> Sent: Sonntag, 26. November 2006 20:42
> >>> To: 'P. Bryan Heidorn'; tdwg-guid at mailman.nhm.ku.edu; 'Taxacom'
> >>> Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
> >>>
> >>>
> >>>
> >>> I only just now read Bryan Heidorn's excellent post on this topic 
> >>> (below). One thing I would add is that the nature of the
> >> internet and
> >>> electronic information allow us opportunities to ensure
> >> permanence and
> >>> access that were either impossible, or prohibitively
> >> expensive even a
> >>> decade ago.  Imagine, for example, an internet protocol
> >> that allowed
> >>> both institutions and individuals to "plug in" and expose their 
> >>> digitial catalogs of stored electronic publications (and other
> >>> resources) such that the whereabouts of literally thousands
> >> of copies
> >>> of every electronic publication could be known to anyone. 
> >> The system I
> >>> envision is somewhat of a cross between existing protocols for 
> >>> interlibrary loan, and the original Napster.  Certainly all
> >> sorts of
> >>> copyright issues need to be sorted out, but these are short-term 
> >>> problems (less than a century), compared to the long-term
> >>> (multi-millenia?) issue of information persistence. The point is, 
> >>> knowing the whereabaouts of extant copies of digital documents, 
> >>> coupled with the amazing ease and low cost of duplication
> >> and global
> >>> dissemination (not to mention plummeting costs of
> >> electronic storage
> >>> media), would virtually guarantee the long-term persistence
> >> of digital
> >>> information.
> >>>
> >>> Any system is, of course, vulnerable to the collapse (or major
> >>> perturbation) of human civilization.  And the electronic
> translator
> >>> problem I alluded to in an earlier post cannot be
> ignored.  But to
> >>> pretend that the potential doesn't exist or shouldn't be actively 
> >>> pursued is pure folly, in my opinion.
> >>>
> >>> Aloha,
> >>> Rich
> >>>
> >>> Richard L. Pyle, PhD
> >>> Database Coordinator for Natural Sciences
> >>>   and Associate Zoologist in Ichthyology Department of Natural 
> >>> Sciences, Bishop Museum
> >>> 1525 Bernice St., Honolulu, HI 96817
> >>> Ph: (808)848-4115, Fax: (808)847-8252
> >>> email: deepreef at bishopmuseum.org
> >>> http://hbs.bishopmuseum.org/staff/pylerichard.html
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: tdwg-guid-bounces at mailman.nhm.ku.edu
> >>>> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf Of P. 
> >>>> Bryan Heidorn
> >>>> Sent: Friday, November 24, 2006 8:22 AM
> >>>> To: tdwg-guid at mailman.nhm.ku.edu; Taxacom
> >>>> Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal
> >>>>
> >>>> The problem and solution has less to do with the Internet
> >> and more
> >>>> to do with institutional longevity.
> >>>> The permanence of paper has less to do with acid free
> >> paper and more
> >>>> to do with the relative permanence of the institutions
> that house
> >>>> them. Most paper documents over a hundred years old have
> >> been lost
> >>>> forever because there were no permanent institutions to
> hold them
> >>>> until the advent of public and academic libraries. Papers in 
> >>>> individual scientists collections are discarded when they
> >> die. War
> >>>> and economic upheavals left paper in rain and fire. It is
> >> foolhardy
> >>>> to assume that what is on paper is safe.
> >>>>
> >>>> We know that dissemination of information in electronic
> >> form is must
> >>>> more economical than paper dissemination. The issue is
> >> development
> >>>> of proper institutions with adequate stable funding to
> >> develop and
> >>>> maintain copies into "perpetuity".
> >>>> Commercial publishers, are clearly not the answer for
> >> preservation. 
> >>>> Corporations and publishers go out of business all the
> >> time. It is
> >>>> only because libraries kept paper copies that we still have a 
> >>>> record.
> >>>>
> >>>> Digital preservation and access problems exist for all
> >> sciences and
> >>>> government documents so there is no need to the biodiversity 
> >>>> community to go it alone on this. We are just in the
> beginning of
> >>>> the digital publishing history and have not yet
> >> established adequate
> >>>> preservation mechanisms within libraries to handle data
> curation,
> >>>> preservation and access in all the situations where it is
> >> necessary.
> >>>> There are projects underway world wide to address this issue. 
> >>>> In the United States the Library of Congress The
> National Digital
> >>>> Information Infrastructure and Preservation Program http:// 
> >>>> www.digitalpreservation.gov/ is one example. The U.S.
> >>>> Government agency the Institute of Museum and Library Services
> >>>> (IMLS) http:// www.imls.gov/ began grant programs to train 
> >>>> librarians and museum curators in digital librarianship and most 
> >>>> recently in digital data curation 
> >>>> http://www.imls.gov/applicants/grants/
> >>>> 21centuryLibrarian.shtm is addressing the education issues. 
> >>>> The University of North Carolina
> >>> http://www.ils.unc.edu/digccurr2007/
> >>>> papers.html and the University of Illinois
> >> http://sci.lis.uiuc.edu/
> >>>> DCEP/ have begun working on best practices and education. 
> >> This week
> >>>> say the successful Data Curation Conference (DCC) in Glasgow, 
> >>>> Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
> >> be running
> >>>> "Long-term Curation and Preservation of Journals"
> >>>> 31 January 2007. (as an aside, at DCC conference I saw
> >> results of a
> >>>> survey in "Attitudes and aspirations in a diverse world: 
> >> the Project
> >>>> StORe perspective on scientific repositories" Graham Pryor, 
> >>>> University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/
> >>>> programme/presentations/g-pryor.ppt that more scientists trusted 
> >>>> publishers to save their digital documents than their home 
> >>>> institutions and libraries! It is clear that scientists are 
> >>>> generally not trained in economics and that the information 
> >>>> technology management of many institutions must be abysmal!
> >>>>
> >>>> We need something like to 5 institution rule for distribution to 
> >>>> apply for digital documents. Digital documents need to be
> >> replicated
> >>>> as well for both access and preservation.
> >>>> Institutions like the Internet Archive help with some of
> >> the current
> >>>> problems.
> >>>> Institutional Repositories (IR) are another. Many
> >> universities and
> >>>> libraries world wide are beginning these. It is authors'
> >>>> responsibility  to deposit their publications in these
> >> institutions
> >>>> and to support their creation. JSTOR and other institutions also 
> >>>> exist. They all have their weaknesses and additional research, 
> >>>> development and funding is needed to adequately address
> >> the issues.
> >>>> Also, all journals need to be managed using good data curation 
> >>>> principles but al too often the publishers in spite of best 
> >>>> intentions are not educated in such issues.
> >>>>
> >>>> Digital publishing of taxonomic literature are not the
> >> full answer
> >>>> for current poor dissemination of taxonomic literature. 
> >> The deposit
> >>>> of a published name in five institutions is a
> >> preservation rule, not
> >>>> a dissemination rule.  We hurt science and human health
> >> is we do not
> >>>> at the same time address the information access issue.  
> >> We need to
> >>>> aspire to better dissemination and preservation. Electronic 
> >>>> publishing will help but only if appropriate institutions
> >> in place.
> >>>> On the smaller issue, DOIs for publications, electronic
> >> or paper is
> >>>> a no-brainer. URLs were never designed to be permanent. 
> URLs were
> >>>> designed to be reused and be flexible.
> >>>> With DOIs we can place the same paper in multiple digital or 
> >>>> physical locations and reliably find copies.
> >>>>
> >>>> Bryan Heidorn
> >>>> --
> >>>>
> >> 
> --------------------------------------------------------------------
> >>>>    P. Bryan Heidorn    Graduate School of Library and 
> >> Information
> >>>> Science
> >>>>    pheidorn at uiuc.edu   University of Illinois at 
> >>>> Urbana-Champaign MC-493
> >>>>    (V)217/ 244-7792    501 East Daniel St., Champaign, IL  
> >>> 61820-6212
> >>>>    (F)217/ 244-3302    https://netfiles.uiuc.edu/pheidorn/www
> >>>>
> >>>>
> >>>> On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:
> >>>>
> >>>>> Rod,
> >>>>>
> >>>>> Thanks for sharing with us the information. I already
> >>> imagined that
> >>>>> things like that could happen, but it's always better to
> >>>> argue having
> >>>>> real examples.
> >>>>>
> >>>>> Anyway, just in case someone reading the story decides to
> >>>> blame URLs,
> >>>>> I just wanted to say that in my opinion the main issue
> >>> here is not
> >>>>> the technology or the GUID format being used. It's the
> >>>> business model
> >>>>> and the management strategy.
> >>>>>
> >>>>> I can easily imagine similar things happening to DOIs,
> >>>> LSIDs or other
> >>>>> kinds of issued GUIDs if the institution(s) behind them simply 
> >>>>> disappear.
> >>>>>
> >>>>> Best Regards,
> >>>>>
> >>>>> Renato
> >>>>> --
> >>>>> IT Researcher
> >>>>> CRIA - Reference Center on Environmental Information 
> >>>>> http://www.cria.org.br/
> >>>>>
> >>>>> On 24 Nov 2006 at 13:37, Roderic Page wrote:
> >>>>>
> >>>>>> The Open Access web-only journal "Phyloinformatics" 
> >>> seems to have
> >>>>>> disappeared, with the Internet address http:// 
> >>>>>> www.phyloinformatics.org now up for sale. This means the
> >>> articles
> >>>>>> have just disappeared!
> >>>>>>
> >>>>>> There weren't many papers published, but some were
> >>>> interesting and
> >>>>>> have
> >>>>>> been cited in the mainstream literature.
> >>>>>>
> >>>>>> This also illustrates the problems with linking to digital
> >>>> resources
> >>>>>> using URLs, as opposed to identifiers such as DOIs. With
> >>>> the loss of
> >>>>>> the domain name, this journal has effectively died.
> >>>>>>
> >>>>>> A sobering lesson...
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>> Rod
> >>>>>
> >>>>> _______________________________________________
> >>>>> TDWG-GUID mailing list
> >>>>> TDWG-GUID at mailman.nhm.ku.edu
> >>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> TDWG-GUID mailing list
> >>>> TDWG-GUID at mailman.nhm.ku.edu
> >>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>>
> >>> _______________________________________________
> >>> TDWG-GUID mailing list
> >>> TDWG-GUID at mailman.nhm.ku.edu
> >>> http://mailman.nhm.ku.edu/mailman/listinfo/tdw> g-guid
> >>>
> >> _______________________________________________
> >> TDWG-GUID mailing list
> >> TDWG-GUID at mailman.nhm.ku.edu
> >> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> > 
> > 
> > _______________________________________________
> > TDWG-GUID mailing list
> > TDWG-GUID at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> > 
> 
> 
> 

_______________________________________________
TDWG-GUID mailing list
TDWG-GUID at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________
TDWG-GUID mailing list
TDWG-GUID at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid