[Tdwg-guid] Demise of Phyloinformatics journal

Wed Nov 29 10:26:13 CET 2006

Dear all,

The Phyloinformatics journal shows how easy social contracts can fail and
formerly trusted archives just
disappear...
There are some initiatives which aim to provide standards for the
preservation and archiving of digital contents.
For example the OAIS reference model
(http://public.ccsds.org/publications/archive/650x0b1.pdf) which aims
to become ISO. Or here in germany the the DINI initiative
(http://www.dini.de/) and the NESTOR project
(http://www.langzeitarchivierung.de/index.php?newlang=eng).
Both german initiatives aim to provide certificates for long term archives,
I guess similar to the quality standard
ISO 9001 .

It would be very interesting to me if you knew similar initiatives from your
countries?

Today, it is impossible to estimate how good the ability of a content
provider (including taxonomic databases)
for real long term archiving really is . Certificates could help a lot I
think, especially if they would require a
'fall back strategy' of the candidate archive. Such a strategy could simple
be to nominate a partner archive
which would store the content on their servers in case of emergency (and
resolve the LSIDs in our case).

best regards, Robert

 -----Ursprüngliche Nachricht-----
Von: tdwg-guid-bounces at mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]Im Auftrag von P. Bryan Heidorn
Gesendet: Dienstag, 28. November 2006 23:14
An: Richard Pyle
Cc: tdwg-guid at mailman.nhm.ku.edu
Betreff: Re: [Tdwg-guid] Demise of Phyloinformatics journal

  It is important to inform the Library of Congress and other entities about
the special needs of the taxonomic community so that they can put the
appropriate mechanisms into place. The library community solution including,
academic libraries, LC and its counterparts in other countries do make a
fairly survivable system that might indeed survive some major upheavals. The
biodiversity community need not go it alone. It is better to share the
deeper pockets of nuclear physics, astronomy, and medicine. We have unique
needs but we can rely on subsystems put into place for these other sciences.

  Also, currently, going digital does not mean not using paper. For the near
future it would be good to have a few paper copies of important documents
along with the digital copy. Also, if we have digital copy, at any point in
the future, assuming we still have trees to make paper, we can decide to
abandon the digital systems and print out what we want. It is not an
either/or decision not and will not be in the future either.

  On Nov 28, 2006, at 1:40 PM, Richard Pyle wrote:

    I certainly agree with Dave that the technology exists, and I agree with
    Roger that we seem to be on the right path. My comment about it being "a
    much bigger issue than our community is able to solve" was more along
the
    lines of ensuring persistence for centuries or millenia.  The Library of
    Congress was appropriated $100 million to deal with this issue
    (http://www.digitalpreservation.gov/about/index.html), which is a just a
bit
    more than we have access to. The real problem, of course, is that
because
    digital media have existed for only a few decades, we don't have an
    established track record to say, with adequate confidence, that we
"know"
    how to preserve digital data for centuries or millenia (in the way that
some
    paper-based media have survived for such periods of time).  This is why
the
    system along the lines of what we're discussing can only really be
thought
    of as a "pilot project".

    The fact of the matter is, we don't really need to have confidence that
our
    system is good enough to perservere for centuries.  We only have to be
    confident that it will perservere until technology establishes a system
that
    *will* survive for centuries. If we're lucky, that probably will happen
    within the next few decades.

    So...to echo Roger, "the sun is shining and I feel we are heading in the
    right direction" -- Rumsfeldisms notwithstanding.

    Aloha,
    Rich

    Richard L. Pyle, PhD
    Database Coordinator for Natural Sciences
      and Associate Zoologist in Ichthyology
    Department of Natural Sciences, Bishop Museum
    1525 Bernice St., Honolulu, HI 96817
    Ph: (808)848-4115, Fax: (808)847-8252
    email: deepreef at bishopmuseum.org
    http://hbs.bishopmuseum.org/staff/pylerichard.html

      -----Original Message-----
      From: Dave Vieglais [mailto:vieglais at ku.edu]
      Sent: Monday, November 27, 2006 5:08 PM
      To: Richard Pyle
      Cc: "'\"Döring, Markus\"'"; tdwg-guid at mailman.nhm.ku.edu
      Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

      I think such a system is quite well within the grasp of this
      community- even without any particularly novel new
      developments.  We have a system for unique IDs (LSIDs) which
      can be assigned to each document (actually each combination
      of object + metadata).  Assuming the documents are stored in
      an environment exposed by a protocol such as OAI (Open
      Archives Initiative), a harvester could easily retrieve
      copies of documents (actually any objects with IDs).
      There's nothing to stop the harvester cache being exposed by
      the same protocol.  With a group of these harvester + OAI
      servers, and no limits on subscriptions then each harvester
      would have a copy of everything, probably an undesirable outcome.

      Harvester reach could be restricted by queries such as "all
      objects of type document" or "all objects published before
      1999" or any other query supported by the metadata.  Or,
      given the availability of one or more indexers, which index
      all the available OAI services, a query such as "all objects
      for which there are only 9 copies" could be executed.  The
      result would be a list of LSIDs that need to be retrieved by
      the cache.  Of course there will be time lags between index
      and harvester states, so there will likely end up being more
      than 10 copies of objects per cache, but is that really a problem?

      All the pieces necessary for building such a system already
      exist in the WASABI framework - LSID assignment, OAI server,
      OAI harvester, indexer, cache.
      The only real modification is to adapt the WASABI server to
      store objects along with their metadata, but this was kind of
      planned to support media objects.  I don't mean to preach
      WASABIsh here, such a topic has been on my mind for a while
      (actually distributed object storage, not just documents).
      TAPIR and other protocols would probably work just fine as
      well with some modifications.

      It seems pretty simple, but perhaps I'm missing some important pieces?

      cheers,
        Dave V.

      Richard Pyle said the following on 28-11-2006 09:05:
        Great article, Markus! Very similar to what I had in mind.
      I've never
        visited BitTorrent, but I gather that its structure and
      function are
        not altogether different from the original Napster.  Your
      description
        of a system that monitors available copies of any digital
      document and
        automatically ensures that a minimum number of copies are extant is
        *exactly* what I was thinking.  In my view, there wouldn't
      be only one
        "hall monitor" server, but dozens or hundreds (likely
      correlated with
        major institutions or hard-core individuals with ample
      available storage space).
        And I would probably draw the line for minimum number of copies at
        closer to 100 or so, and also include algorithms to ensure they are
        adequately distributed on geographic scales. Obviously,
      GUIDs would be
        a critical component of such a system.

        It's a much bigger issue than our community is able to
      solve, I think
        -- but certainly we could implement some pilot projects along these
        lines for our own data needs, to see how such a system
      might work within our context.

        Aloha,
        Rich

          -----Original Message-----
          From: tdwg-guid-bounces at mailman.nhm.ku.edu
          [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf Of
      "Döring,
          Markus"
          Sent: Monday, November 27, 2006 5:56 AM
          To: tdwg-guid at mailman.nhm.ku.edu
          Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

          Richards post and Napster keyword reminded me of a vague
      idea I had
          for some time to use P2P networks like bittorrent as an persitent
          storage space. You can read about it a bit more closely here:

          http://www.pywrapper.com/markus/blog/2006/using-bittorrent-as-
          a-persistent-storage-space/

          Don't take it as a real proposal, but I like the general
      idea if it.
          It might even have been done already within the GRID
      community. But
          it conveys the original internet idea of distributing
      resources and
          minimizing impact if a nodes gets lost.

          A quite nice discussion by the way.
          Markus
          --
           Markus Döring
           Botanic Garden and Botanical Museum Berlin Dahlem,  Dept. of
          Biodiversity Informatics  Königin-Luise-Str. 6-8, D-14191 Berlin
           Phone: +49 30 83850-284
           Email: m.doering at bgbm.org
           URL: http://www.bgbm.org/BioDivInf/

            -----Original Message-----
            From: tdwg-guid-bounces at mailman.nhm.ku.edu
            [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf
      Of Richard
            Pyle
            Sent: Sonntag, 26. November 2006 20:42
            To: 'P. Bryan Heidorn'; tdwg-guid at mailman.nhm.ku.edu; 'Taxacom'
            Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

            I only just now read Bryan Heidorn's excellent post on this
topic
            (below). One thing I would add is that the nature of the
          internet and
            electronic information allow us opportunities to ensure
          permanence and
            access that were either impossible, or prohibitively
          expensive even a
            decade ago.  Imagine, for example, an internet protocol
          that allowed
            both institutions and individuals to "plug in" and expose their
            digitial catalogs of stored electronic publications (and other
            resources) such that the whereabouts of literally thousands
          of copies
            of every electronic publication could be known to anyone.
          The system I
            envision is somewhat of a cross between existing protocols for
            interlibrary loan, and the original Napster.  Certainly all
          sorts of
            copyright issues need to be sorted out, but these are short-term
            problems (less than a century), compared to the long-term
            (multi-millenia?) issue of information persistence. The point
is,
            knowing the whereabaouts of extant copies of digital documents,
            coupled with the amazing ease and low cost of duplication
          and global
            dissemination (not to mention plummeting costs of
          electronic storage
            media), would virtually guarantee the long-term persistence
          of digital
            information.

            Any system is, of course, vulnerable to the collapse (or major
            perturbation) of human civilization.  And the electronic
      translator
            problem I alluded to in an earlier post cannot be
      ignored.  But to
            pretend that the potential doesn't exist or shouldn't be
actively
            pursued is pure folly, in my opinion.

            Aloha,
            Rich

            Richard L. Pyle, PhD
            Database Coordinator for Natural Sciences
              and Associate Zoologist in Ichthyology Department of Natural
            Sciences, Bishop Museum
            1525 Bernice St., Honolulu, HI 96817
            Ph: (808)848-4115, Fax: (808)847-8252
            email: deepreef at bishopmuseum.org
            http://hbs.bishopmuseum.org/staff/pylerichard.html

              -----Original Message-----
              From: tdwg-guid-bounces at mailman.nhm.ku.edu
              [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu] On Behalf Of P.
              Bryan Heidorn
              Sent: Friday, November 24, 2006 8:22 AM
              To: tdwg-guid at mailman.nhm.ku.edu; Taxacom
              Subject: Re: [Tdwg-guid] Demise of Phyloinformatics journal

              The problem and solution has less to do with the Internet
          and more
              to do with institutional longevity.
              The permanence of paper has less to do with acid free
          paper and more
              to do with the relative permanence of the institutions
      that house
              them. Most paper documents over a hundred years old have
          been lost
              forever because there were no permanent institutions to
      hold them
              until the advent of public and academic libraries. Papers in
              individual scientists collections are discarded when they
          die. War
              and economic upheavals left paper in rain and fire. It is
          foolhardy
              to assume that what is on paper is safe.

              We know that dissemination of information in electronic
          form is must
              more economical than paper dissemination. The issue is
          development
              of proper institutions with adequate stable funding to
          develop and
              maintain copies into "perpetuity".
              Commercial publishers, are clearly not the answer for
          preservation.
              Corporations and publishers go out of business all the
          time. It is
              only because libraries kept paper copies that we still have a
              record.

              Digital preservation and access problems exist for all
          sciences and
              government documents so there is no need to the biodiversity
              community to go it alone on this. We are just in the
      beginning of
              the digital publishing history and have not yet
          established adequate
              preservation mechanisms within libraries to handle data
      curation,
              preservation and access in all the situations where it is
          necessary.
              There are projects underway world wide to address this issue.
              In the United States the Library of Congress The
      National Digital
              Information Infrastructure and Preservation Program http://
              www.digitalpreservation.gov/ is one example. The U.S.
              Government agency the Institute of Museum and Library Services
              (IMLS) http:// www.imls.gov/ began grant programs to train
              librarians and museum curators in digital librarianship and
most
              recently in digital data curation
              http://www.imls.gov/applicants/grants/
              21centuryLibrarian.shtm is addressing the education issues.
              The University of North Carolina
            http://www.ils.unc.edu/digccurr2007/
              papers.html and the University of Illinois
          http://sci.lis.uiuc.edu/
              DCEP/ have begun working on best practices and education.
          This week
              say the successful Data Curation Conference (DCC) in Glasgow,
              Scotland http://www.dcc.ac.uk/events/dcc-2006/. DCC will
          be running
              "Long-term Curation and Preservation of Journals"
              31 January 2007. (as an aside, at DCC conference I saw
          results of a
              survey in "Attitudes and aspirations in a diverse world:
          the Project
              StORe perspective on scientific repositories" Graham Pryor,
              University of Edinburgh http://www.dcc.ac.uk/events/dcc-2006/
              programme/presentations/g-pryor.ppt that more scientists
trusted
              publishers to save their digital documents than their home
              institutions and libraries! It is clear that scientists are
              generally not trained in economics and that the information
              technology management of many institutions must be abysmal!

              We need something like to 5 institution rule for distribution
to
              apply for digital documents. Digital documents need to be
          replicated
              as well for both access and preservation.
              Institutions like the Internet Archive help with some of
          the current
              problems.
              Institutional Repositories (IR) are another. Many
          universities and
              libraries world wide are beginning these. It is authors'
              responsibility  to deposit their publications in these
          institutions
              and to support their creation. JSTOR and other institutions
also
              exist. They all have their weaknesses and additional research,
              development and funding is needed to adequately address
          the issues.
              Also, all journals need to be managed using good data curation
              principles but al too often the publishers in spite of best
              intentions are not educated in such issues.

              Digital publishing of taxonomic literature are not the
          full answer
              for current poor dissemination of taxonomic literature.
          The deposit
              of a published name in five institutions is a
          preservation rule, not
              a dissemination rule.  We hurt science and human health
          is we do not
              at the same time address the information access issue.
          We need to
              aspire to better dissemination and preservation. Electronic
              publishing will help but only if appropriate institutions
          in place.
              On the smaller issue, DOIs for publications, electronic
          or paper is
              a no-brainer. URLs were never designed to be permanent.
      URLs were
              designed to be reused and be flexible.
              With DOIs we can place the same paper in multiple digital or
              physical locations and reliably find copies.

              Bryan Heidorn
              --

      --------------------------------------------------------------------
                 P. Bryan Heidorn    Graduate School of Library and
          Information
              Science
                 pheidorn at uiuc.edu   University of Illinois at
              Urbana-Champaign MC-493
                 (V)217/ 244-7792    501 East Daniel St., Champaign, IL
            61820-6212
                 (F)217/ 244-3302    https://netfiles.uiuc.edu/pheidorn/www

              On Nov 24, 2006, at 9:54 AM, Renato De Giovanni wrote:

                Rod,

                Thanks for sharing with us the information. I already
            imagined that
                things like that could happen, but it's always better to
              argue having
                real examples.

                Anyway, just in case someone reading the story decides to
              blame URLs,
                I just wanted to say that in my opinion the main issue
            here is not
                the technology or the GUID format being used. It's the
              business model
                and the management strategy.

                I can easily imagine similar things happening to DOIs,
              LSIDs or other
                kinds of issued GUIDs if the institution(s) behind them
simply
                disappear.

                Best Regards,

                Renato
                --
                IT Researcher
                CRIA - Reference Center on Environmental Information
                http://www.cria.org.br/

                On 24 Nov 2006 at 13:37, Roderic Page wrote:

                  The Open Access web-only journal "Phyloinformatics"
            seems to have
                  disappeared, with the Internet address http://
                  www.phyloinformatics.org now up for sale. This means the
            articles
                  have just disappeared!

                  There weren't many papers published, but some were
              interesting and
                  have
                  been cited in the mainstream literature.

                  This also illustrates the problems with linking to digital
              resources
                  using URLs, as opposed to identifiers such as DOIs. With
              the loss of
                  the domain name, this journal has effectively died.

                  A sobering lesson...

                  Regards

                  Rod

                _______________________________________________
                TDWG-GUID mailing list
                TDWG-GUID at mailman.nhm.ku.edu
                http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

              _______________________________________________
              TDWG-GUID mailing list
              TDWG-GUID at mailman.nhm.ku.edu
              http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

            _______________________________________________
            TDWG-GUID mailing list
            TDWG-GUID at mailman.nhm.ku.edu
            http://mailman.nhm.ku.edu/mailman/listinfo/tdw> g-guid

          _______________________________________________
          TDWG-GUID mailing list
          TDWG-GUID at mailman.nhm.ku.edu
          http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

        _______________________________________________
        TDWG-GUID mailing list
        TDWG-GUID at mailman.nhm.ku.edu
        http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

    _______________________________________________
    TDWG-GUID mailing list
    TDWG-GUID at mailman.nhm.ku.edu
    http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

  --
  --------------------------------------------------------------------
    P. Bryan Heidorn    Graduate School of Library and Information Science
    pheidorn at uiuc.edu   University of Illinois at Urbana-Champaign MC-493
    (V)217/ 244-7792    501 East Daniel St., Champaign, IL  61820-6212
    (F)217/ 244-3302    https://netfiles.uiuc.edu/pheidorn/www
    Online Calendar: http://tinyurl.com/6fd5q
    Visit the Biobrowser Web site at http://www.biobrowser.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20061129/32f3f3e7/attachment.html