<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">The irony is this is what LSIDs were supposed to do, they had semantics for describing the nature of the data, and how to access it (HTTP, FTP, SOAP, etc.). In the end too many moving parts, and a failure to keep things simple (i.e., resolve in a browser, be easy to serve) killed them. Need to keep things simple and useful if we are to avoid that trap again.<div><br></div><div>Regards</div><div><br></div><div>Rod<br><div><br><div><div>On 6 May 2014, at 07:28, John Deck &lt;<a href="mailto:jdeck@berkeley.edu">jdeck@berkeley.edu</a>&gt; wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"><div dir="ltr">The biggest appeal to me of the linked data framework is connecting data across domains... you have bio-, geo-, eco-, -omic, and not to mention all their various media representations. &nbsp;Hard to believe that DOIs solve all of these needs (just wait till someone wants to assign DOIs to Loci from NextGen sequencing, or delving into transcriptomics with this). &nbsp;I'd hope that GUID services could provide high level consistent metadata (such as the Datacite metadata, but maybe just a bit more like type), and provide a clear articulation of the service that stands behind the identifiers, no matter what you're dealing with.&nbsp;<div>

<div><br><div>As far as delivering more specific RDF, I'm more inclined to think along the lines of your last sentence: "<span style="font-family: arial, sans-serif; font-size: 12.727272033691406px;">&nbsp;But maybe that's just a function of where the data provider choses to redirect RDF requests" and let the providers themselves describe it.</span></div>

</div></div><div><span style="font-family: arial, sans-serif; font-size: 12.727272033691406px;"><br></span></div><div><span style="font-family: arial, sans-serif; font-size: 12.727272033691406px;">John</span></div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, May 5, 2014 at 5:42 PM, Steve Baskauf <span dir="ltr">&lt;<a href="mailto:steve.baskauf@vanderbilt.edu" target="_blank">steve.baskauf@vanderbilt.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>

<div bgcolor="#ffffff" text="#000000">

I'm a big fan of not reinventing the wheel, and as such find the idea

of using DOIs appealing.&nbsp; I think they pretty much follow all of the

"rules" set out in the TDWG GUID Applicability Standard. &nbsp; They also

play nicely in the Linked Data universe in their HTTP URI form, i.e.

they redirect to HTML or RDF depending on the request header.&nbsp; <br>

<br>

But I have a question for someone who understands how DOIs work better

than I do.&nbsp; The HTML representation seems to arise by redirection to

whatever is the current web page&nbsp; for the resource.&nbsp; You can see this

by pasting this DOI for a specimen into a browser:

<a href="http://dx.doi.org/10.7299/X7VQ32SJ" target="_blank">http://dx.doi.org/10.7299/X7VQ32SJ</a> which redirects to

<a href="http://arctos.database.museum/guid/UAM:Ento:230092" target="_blank">http://arctos.database.museum/guid/UAM:Ento:230092</a> when HTML is

requested by a client.&nbsp; However, when the client requests RDF, one gets

redirected to a DataCite metadata page:

<a href="http://data.datacite.org/10.7299/X7VQ32SJ" target="_blank">http://data.datacite.org/10.7299/X7VQ32SJ</a> .&nbsp; Can the creator of the DOI

redirect to any desired URI for the RDF?&nbsp; <br>

<br>

The resulting RDF metadata doesn't have any of the kind useful

information about the specimen that you get on the web page but rather

looks like what you would expect for a publication (creator, publisher,

date, etc.):<br>

<br>

<span>&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://dx.doi.org/10.7299/X7VQ32SJ&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://dx.doi.org/10.7299/X7VQ32SJ</a>&gt;<br>

&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://purl.org/dc/terms/creator&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://purl.org/dc/terms/creator</a>&gt;<br>

"Derek S. Sikes" ;<br>

&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://purl.org/dc/terms/date&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://purl.org/dc/terms/date</a>&gt;<br>

"2004" ;<br>

&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://purl.org/dc/terms/identifier&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://purl.org/dc/terms/identifier</a>&gt;<br>

"10.7299/X7VQ32SJ" ;<br>

&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://purl.org/dc/terms/publisher&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://purl.org/dc/terms/publisher</a>&gt;<br>

"University of Alaska Museum" ;<br>

&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://purl.org/dc/terms/title&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://purl.org/dc/terms/title</a>&gt;<br>

"UAM:Ento:230092 - Grylloblatta campodeiformis" ;<br>

&lt;<a href="http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http://www.w3.org/2002/07/owl#sameAs&amp;acceptheader=text%2Fturtle%3Bq%3D1%2Capplication%2Fx-turtle%3Bq%3D0.5&amp;useragentheader=" target="_blank">http://www.w3.org/2002/07/owl#sameAs</a>&gt;<br>

"info:doi/10.7299/X7VQ32SJ" , "doi:10.7299/X7VQ32SJ" .<br>

<br>

Can one control what kinds of metadata are provided in "</span>DataCite's

metadata"? <span>Assuming that we get our act together and adopt

an RDF guide for Darwin Core, it would be nice for the RDF metadata to

look more like the description of a specimen and less like the

description of a book.&nbsp; But maybe that's just a function of where the

data provider choses to redirect RDF requests.<br>

<br>

Steve<br>

</span><div><div class="h5"><br>

John Deck wrote:

<blockquote type="cite">

  <div dir="ltr">

  <div>&nbsp;+1 on DOIs, and on ARKS &nbsp;(see:&nbsp;<a href="https://wiki.ucop.edu/display/Curation/ARK" target="_blank">https://wiki.ucop.edu/display/Curation/ARK</a>

), and also i'll mention IGSN:'s &nbsp;(see&nbsp;&nbsp;<a href="http://www.geosamples.org/" target="_blank">http://www.geosamples.org/</a>)&nbsp;IGSN:

is rapidly gaining traction for geo-samples. &nbsp;I don't know of anyone

using them for bio-samples but they offer many features that we've been

asking for as well. &nbsp;What our community considers a sample (or

observation) is diverse enough that multiple ID systems are probably

inevitable and perhaps even warranted. &nbsp;</div>

  <div><br>

  </div>

  <div>Whatever the ID system, the data providers (museums, field

researchers, labs, etc..) must adopt that identifier and use it

whenever linking to downstream sequence, image, and sub-sampling

repository agencies. This is great to say this in theory but difficult

to do in reality because the decision to adopt long term and stable

identifiers is often an institutional one, and the technology is still

new and argued about, in particular, on this fine list. &nbsp;Further, those

agencies that receive data associated with a GUID must honor that

source GUID when passing to consumers and other aggregators, who must

also have some level of confidence in the source GUIDs as well. &nbsp; Thus,

a primary issue that we're confronted with here is trust.</div>

  <div><br>

  </div>

  <div>Having Hilmar's hackathon support several possible GUID schemes

(each with their own long term persistence strategy), and sponsored by

a well known global institution affiliated with biodiversity

informatics that could offer technical guidance to data providers, good

name branding, and the nuts and bolts expertise to demonstrate good

shepherding of source GUIDs through a data aggregation chain would be

ideal. &nbsp;I nominate GBIF :)</div>

  <div><br>

  </div>

  <div>John Deck<br>

  </div>

  </div>

  <div class="gmail_extra"><br>

  <br>

  <div class="gmail_quote">On Mon, May 5, 2014 at 1:09 PM, Roderic Page

  <span dir="ltr">&lt;<a href="mailto:r.page@bio.gla.ac.uk" target="_blank">r.page@bio.gla.ac.uk</a>&gt;</span>

wrote:<br>

  <blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">

    <div>Hi Markus,

    <div><br>

    </div>

    <div>I have three &nbsp;use cases that</div>

    <div><br>

    </div>

    <div>1. Linking sequences in GenBank to voucher specimens. Lots of

voucher specimens are listed in GenBank but not linked to digital

records for those specimens. These links are useful in two directions,

one is to link GBIF to genomic data, the second is to enhance data in

both databases, see&nbsp;<a href="http://iphylo.blogspot.co.uk/2012/02/linking-gbif-and-genbank.html" target="_blank">http://iphylo.blogspot.co.uk/2012/02/linking-gbif-and-genbank.html</a>

(e.g., by adding missing georeferencing that is available in one

database but not the other).</div>

    <div><br>

    </div>

    <div>2. Linking to specimens cited in the literature. I�ve done

some work on this in BioStor, see&nbsp;<a href="http://iphylo.blogspot.co.uk/2012/02/linking-gbif-and-biodiversity-heritage.html" target="_blank">http://iphylo.blogspot.co.uk/2012/02/linking-gbif-and-biodiversity-heritage.html</a>

&nbsp;One immediate benefit of this is that GBIF could display the

scientific literature associated with a specimen, so we get access to

the evidence supporting identification, georeferencing, etc.&nbsp;</div>

    <div><br>

    </div>

    <div>3. Citation metrics for collections, see&nbsp;<a href="http://iphylo.blogspot.co.uk/2013/05/the-impact-of-museum-collections-one.html" target="_blank">http://iphylo.blogspot.co.uk/2013/05/the-impact-of-museum-collections-one.html</a>

and&nbsp;<a href="http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html" target="_blank">http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html</a>

Based on citation sod specimens in the literature, and in databases

such as GenBank (i.e., basically combining 1 + 2 above) we can

demonstrate the value of a collection.</div>

    <div><br>

    </div>

    <div>All of these use cases depend on GBIF occurenceIds remaining

stable, I have often ranted on iPHylo when this doesn�t happen:&nbsp;<a href="http://iphylo.blogspot.co.uk/2012/07/dear-gbif-please-stop-changing.html" target="_blank">http://iphylo.blogspot.co.uk/2012/07/dear-gbif-please-stop-changing.html</a></div>

    <div><br>

    </div>

    <div>Regards</div>

    <div><br>

    </div>

    <div>Rod</div>

    <div>

    <div>

    <div><br>

    </div>

    <div><br>

    </div>

    <div><br>

    <div>

    <div>On 5 May 2014, at 20:51, Markus D�ring &lt;<a href="mailto:mdoering@gbif.org" target="_blank">mdoering@gbif.org</a>&gt;

wrote:</div>

    <br>

    <blockquote type="cite">

      <div>Hi Rod,

      <div><br>

      </div>

      <div>I agree GBIF has troubles to keep identifiers stable for

*some* records, but in general we do a much better job than the

original publishers in the first place. We try hard to keep GBIF ids

stable even if publishers change collection codes, registered datasets

twice or do other things to break a simple automated way of mapping

source records to existing GBIF ids. Also the stable identifier in GBIF

never has been the URL, but it is the local GBIF integer alone. The

GBIF services that consume those ids have changed over the years, but

its pretty trivial to adjust if you use the GBIF ids instead of the

URLs. If there is a clear need to have stable URLs instead I am sure we

can get that working easily.</div>

      <div><br>

      </div>

      <div>The two real issues for GBIF are a) duplicates and b)

records with varying local identifiers of any sort (triplet,

occurrenceID or whatever else).</div>

      <div><br>

      </div>

      <div>When it comes to the varying source identifiers I always

liked the idea of flagging those records and datasets as unstable, so

it is obvious to users. This is not a 100% safe, but most terrible

datasets change all of their ids and that is easily detectable.</div>

      <div>Also with a service like that it would become more obvious

to publishers how important stable source ids are.</div>

      <div><br>

      </div>

      <div>Before jumping on DOIs as the next big thing I would really

like to understand what needs the community has around specimen ids.</div>

      <div>Gabi clearly has a very real use case, are there others we

know about?</div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div>Markus</div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div>

      <div>On 05 May 2014, at 21:05, Roderic Page &lt;<a href="mailto:r.page@bio.gla.ac.uk" target="_blank">r.page@bio.gla.ac.uk</a>&gt; wrote:</div>

      <br>

      <blockquote type="cite">

        <div>Hi Hilmar,

        <div><br>

        </div>

        <div>I�m not arguing that we shouldn�t build a resolver (I have

one that I use, Rich has mentioned he�s got one, Markus has one at

GBIF, etc.).</div>

        <div><br>

        </div>

        <div>Nor do I think we should wait for institutional and social

commitment (because then we�d never get anything done).</div>

        <div><br>

        </div>

        <div>But I do think it would be useful to think it through. For

example, it�s easy to create a URL for a specimen. Easy peasy. OK, how

do I discover that URL? How do I discover these for all specimens?

Sounds like I need a centralised discover service like you�e described.</div>

        <div><br>

        </div>

        <div>How do I handle changes in those URLs? I built a specimen

code to GBIF resolver for BioStor so that I could link to specimens,

GBIF changed lots of those URLs, all my work was undone, boy does GBIF

suck sometimes. For example, if I map codes to URLs, I need to handle

cases when they change.&nbsp;</div>

        <div><br>

        </div>

        <div>If URLs can change, is there a way to defend against that

(this is one reason for DOIs, or other methods of indirection, such as

PURLs).&nbsp;</div>

        <div><br>

        </div>

        <div>If providers change, will the URLs change? Is there a way

to defend against that (again, DOIs handle this nicely by virtue of (a)

indirection, and (b) lack of branding).</div>

        <div><br>

        </div>

        <div>How can I encourage people to use the specimen service?

What can I do to make them think it will persist? Can I convince

academic publishers to trust it enough to link to it in articles?

What�s the pitch to Pensoft, to Magnolai Press, to Springer and

Elsevier?</div>

        <div><br>

        </div>

        <div>Is there some way to make the service itself become

trusted? For example if I look at a journal and see that it has DOIs

issued by CrossRef, I take that journal more seriously than if it�s

just got simple URLs. I know that papers in that journal will be linked

into the citation network, I also know that there is a backup plan if

the journal goes under (because you need that to have DOIs in

CrossRef). Likewise, I think Figshare got a big boost when it stared

minting DOIs (wow, a DOI, I know DOIs, you mean I can now cite stuff

I�ve uploaded there?).&nbsp;</div>

        <div><br>

        </div>

        <div>How can museums and herbaria be persuaded to keep their

identifiers stable? What incentives can we provide (e.g., citation

metrics for collections)? What system would enable us to do this? What

about tracing funding (e.g., the NSF paid for these n papers, and they

cite these y specimens, from these z collections, so science paid for

by the NSF requires these collections to exist).</div>

        <div><br>

        </div>

        <div>I guess I�m arguing that we should think all this through,

because a specimen code to specimen URL is a small piece of the puzzle.

Now, I�m desperately trying not to simply say what I think is

blindingly obvious here (put DOIs on specimens, add metadata to

specimen and specimen citation services, and we are done), but I think

if we sit back and look at where we want to be, this is exactly what we

need (or something functionally equivalent). Until we see the bigger

picture, we will be stuck in amateur hour.</div>

        <div><br>

        </div>

        <div>Take &nbsp;a look at:</div>

        <div><br>

        </div>

        <div><a href="http://search.crossref.org/" target="_blank">http://search.crossref.org</a></div>

        <div><a href="http://www.crossref.org/fundref/" target="_blank">http://www.crossref.org/fundref/</a></div>

        <div><a href="http://support.crossref.org/" target="_blank">http://support.crossref.org/</a></div>

        <div><a href="https://prospect.crossref.org/splash/" target="_blank">https://prospect.crossref.org/splash/</a></div>

        <div><br>

        </div>

        <div>Isn�t this the kind of stuff we�d like to do? If so, let�s

work out what�s needed and make it happen.</div>

        <div><br>

        </div>

        <div>In short, I think we constantly solve an immediate problem

in the quickest way we know how, without thinking it through. I�d argue

that if we think about the bigger picture (what do we want to be able

to, what are the questions we want to be able to ask) then things

become clearer. This is independent of getting everyone�s agreement

(but it would help if we made their agreement seem a no brainer by

providing solutions to things that cause them pain).</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>Regards</div>

        <div><br>

        </div>

        <div>Rod</div>

        <div><br>

        <div>

        <div>On 5 May 2014, at 19:14, Hilmar Lapp &lt;<a href="mailto:hlapp@nescent.org" target="_blank">hlapp@nescent.org</a>&gt;

wrote:</div>

        <br>

        <blockquote type="cite">

          <div dir="ltr">

          <div class="gmail_extra"><br>

          <div class="gmail_quote">On Mon, May 5, 2014 at 1:29 PM,

Roderic Page <span dir="ltr">&lt;<a href="mailto:r.page@bio.gla.ac.uk" target="_blank">r.page@bio.gla.ac.uk</a>&gt;</span>

wrote:<br>

          <blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0px 0px 0px 0.8ex;padding-left:1ex">Contrary

to Hilmar, there is more to this than simply a quick hackathon. Yes, a

service that takes metadata and returns one or more identifiers is a

good idea and easy to create (there will often be more than one because

museum codes are not unique). But who maintains this service? Who

maintains the identifiers? Who do I complain to if they break? How do

we ensure that they persist when, say, a museum closes down, moves its

collection, changes it�s web technology? Who provides the tools that

add value to the identifiers? (there�s no point having them if they are

not useful)<br>

          </blockquote>

          <div><br>

          </div>

          <div>Jonathan Rees pointed this out to me too off-list. Just

for the record, this isn't contrary but fully in line with what I was

saying (or trying to say). Yes, I didn't elaborate that part, assuming,

perhaps rather erroneously, that all this goes without saying, but I

did mention that one part of this becoming a real solution has to be an

institution with an in-scope cyberinfrastructure mandate that&nbsp;going

in&nbsp;would make a commitment to sustain the resolver, including working

with partners on the above slew of questions. The institution I gave

was iDigBio; perhaps for some reason that would not be a good choice,

but whether they are or not wasn't my point.</div>

          <div><br>

          </div>

          <div>I will add one point to this, though. It seems to me

that by continuing to argue that we can't go ahead with building a

resolver that works (as far as technical requirements are concerned)

before we haven't first fully addressed the institutional and social

long-term sustainability commitment problem, we are and have been

making this one big hairy problem that we can't make any practical

pragmatic headway about, rather than breaking it down into parts, some

of which (namely the primarily technical ones) are actually fairly

straightforward to solve. As a result, to this day we don't have some

solution that even though it's not very sustainable yet, at least

proves to everyone how critical it is, and that the community can rally

behind. Perhaps that's na�ve, but I do think that once there's a

solution the community rallies behind, ways to sustain it will be

found.&nbsp;</div>

          <div><br>

          </div>

          <div>&nbsp; -hilmar</div>

          </div>

-- <br>

          <div dir="ltr">

          <div>Hilmar Lapp -:- <a href="http://informatics.nescent.org/wiki" target="_blank">informatics.nescent.org/wiki</a>

-:- <a href="http://lappland.io/" target="_blank">lappland.io</a><br>

          </div>

          <br>

          </div>

          </div>

          </div>

        </blockquote>

        </div>

        <br>

        </div>

        </div>

      </blockquote>

      </div>

      <br>

      </div>

    </blockquote>

    </div>

    <br>

    </div>

    </div>

    </div>

    </div>

    <br>

_______________________________________________<br>

tdwg-content mailing list<br>

    <a href="mailto:tdwg-content@lists.tdwg.org" target="_blank">tdwg-content@lists.tdwg.org</a><br>

    <a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>

    <br>

  </blockquote>

  </div>

  <br>

  <br clear="all">

  <div><br>

  </div>

-- <br>

John Deck<br>

<a href="tel:%28541%29%20321-0689" value="+15413210689" target="_blank">(541) 321-0689</a><br>

  </div>

</blockquote>

<br>

</div></div><pre cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

PMB 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: <a href="tel:%28615%29%20343-4582" value="+16153434582" target="_blank">(615) 343-4582</a>,  fax: <a href="tel:%28615%29%20322-4942" value="+16153224942" target="_blank">(615) 322-4942</a>

If you fax, please phone or email so that I will know to look for it.

<a href="http://bioimages.vanderbilt.edu/" target="_blank">http://bioimages.vanderbilt.edu</a>

<a href="http://vanderbilt.edu/trees" target="_blank">http://vanderbilt.edu/trees</a>

</pre>

</div>

<br>_______________________________________________<br>

tdwg-content mailing list<br>

<a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>

<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>John Deck<br>(541) 321-0689<br>

</div>

_______________________________________________<br>tdwg-content mailing list<br><a href="mailto:tdwg-content@lists.tdwg.org">tdwg-content@lists.tdwg.org</a><br>http://lists.tdwg.org/mailman/listinfo/tdwg-content<br></blockquote></div><br></div></div></body></html>