tdwg-content
Threads by month
- ----- 2024 -----
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2003 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2002 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2001 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2000 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 1999 -----
- December
- November
- October
- September
- August
- 1557 discussions
Software available for LSID resolution, Linked Data, OAI-PMH [SEC=UNCLASSIFIED]
by Paul Murray 28 Jun '11
by Paul Murray 28 Jun '11
28 Jun '11
We have uploaded to google code the software currently being used at biodiversity.org.au to serve up LSID metadata, linked-data objects identified by URI, and reply to OAI-PMH requests.
The code is available here:
svn checkout http://ala-nsl.googlecode.com/svn/service-layer/trunk service-layer
or
svn checkout https://ala-nsl.googlecode.com/svn/service-layer/trunk service-layer
(I find that http doesn't work for me - something to do with authenticating to google code through the corporate firewall)
The system is our "quick, get something working" interim solution for creating a web presence for our data, pending a full National Species List application. It is a suite of XSQL source files that are loaded into an instance of the eXist XML database (exist.sourceforge.net).
Using it is a matter of
* acquiring a domain name that you wish to use for your LSID authority and hosting the software
* generating dumps of your data as XML
* identifying your XML elements with lsids
* writing XSL style shhets to convert your XML into the various output formats that you wish to support.
As this system imposes few constraints on the underlying XML, you do not need to "massage" your data into some specific format. Being small and self-contained, it could be a useful way of exposing small data sets peculiar to some space - although it accommodates millions of records perfectly well at our installation.
The payoff is that system does correctly implement the LSID standard (etc). This can be seen by navigating to
http://lsid.tdwg.org/summary/urn:lsid:biodiversity.org.au:apni.taxon:54321
As you see - the software at tdwg has no difficulty talking to our resolver and fetching the data from it.
Likewise, linked data is implemented - although HEAD requests are not getting through correctly and so the first test on this page fails.
http://validator.linkeddata.org/vapour?vocabUri=http%3A%2F%2Fbiodiversity.o…
Nevertheless: uriburner, zeitgeist and so on manage to retrieve and process the data from our URIs correctly.
Caveats:
* The system is not tested with LSIDs with versions. I believe that the usual use for versioned LSIDs is for serving up data (eg: media) rather than metadata, and that is not the focus of this bit of software.
* The system requires some configuration, both internally and at the web server. Most particularly, we needed to perform mappings at our reverse proxy to get URIs passed though to the correct targets behind out firewall.
* The script that loads the executable XSQL into eXist is a unix script which works on Solaris and on OSX. The repository does not include an equivalent windows command file.
* You will almost certainly need to write at least some XSLT to get the output you want. The repository comes with a demo dataset.
* There is minimal documentation, I'm afraid. The READMEs point you at our confluence page, but there's not much there in relation to getting this bit of software going.
* The system does not magically vanish all the issues with regard to RDF and XML vocabularies and the like. Conforming to standards is a matter of generating and loading conformant XML, or tweaking your XSLT.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
1
0
09 Jun '11
For better or worse, the TDWG Applicability Statement for GUID's, and
most of the TDWG community, uses "GUID" in a generic sense, not
conformant to RFC 4122 which declares "GUID" and "UUID" to be
equivalent terms.
Also, according to http://www.rfc-editor.org, ---which by STD 1 always
contains the current status of any RFC---it seems that RFC 4122 has
never advanced past "Proposed Standard" in the 6 years since it was
proposed. So, however one reads RFC 4122 on the question of "*ONE*
GUID" (meaning in the 4122 context "*ONE* UUID"), at best "*ONE* GUID"
is a proposal, not a standard in the sense of IETF. Not even a Draft
Standard.
Finally, FWIW, the author of
http://en.wikipedia.org/wiki/Universally_unique_identifier seems to
not take "Unique" to mean at most one per resource:
"The intent of UUIDs is to enable distributed systems to uniquely
identify information without significant central coordination. Thus,
anyone can create a UUID and use it to identify something with
reasonable confidence that the identifier will never be
unintentionally used by anyone for anything else. Information labeled
with UUIDs can therefore be later combined into a single database
without needing to resolve name conflicts."
I find nothing in the above, nor in RFC 4122 that prohibits multiple
UUIDs for the same resource, counterproductive as that might be. The
TDWG GUID Applicability standard, however, needs some cleaning on
related points, since it has some internally confusing narrative on
this issue. On balance, it comes out implicitly in favor of a single
UUID (or any other kind of "GUID") for each resource, issued only by
the resource provider; but explicitly permits multiple schemes for
identifying a resource.
Bob Morris
On Wed, Jun 8, 2011 at 10:56 PM, Paul Murray <pmurray(a)anbg.gov.au> wrote:
>
> On 08/06/2011, at 8:05 AM, Steve Baskauf wrote:
>
> ... I think it's foolish to regard all of these different
> resolution mechanisms as distinct "identifiers". There is *ONE* GUID. It
> is: A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523. There are ten different ways to
> make it actionable. It therefore meets the recommendations of the
> applicability statement.
>
> The problem is that when you create an HTTP URI out of a UUID, you are
> creating an identifier whether you think you are or not.
>
> Jumping in again, but perhaps RFC 4122 might help a little here.
> A GUID (or UUID) is a set of 128 bits, 16 octets, 32 hex digits, 5 inches of
> punched paper tape. However you choose to write or express it, there is
> indeed "*ONE* GUID".
> A URI is not a GUID. This:
> http://example.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> is a different URI to this
> http://my.organisation.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> is a different URI to this:
> http://example.org/A9F435E08ED746DDBAB4EA8E5BF41523
> Furthermore, these uris have nothing whatever to do with the guid - apart
> from the fact that it's obvious to we humans that they do.
> Fortunately, there is a standard for expressing a guid/uuid as a URI, and it
> is the "uuid" urn namespace, defined in RFC-4122. Thus:
> urn:uuid:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> is a URI that - according to a w3c standard - corresponds to the 128-bit
> guid. This:
> urn:uuid:A9F435E08ED746DDBAB4EA8E5BF41523
> is *not valid* - it doesn't conform to the schema. There is one unique (case
> insensitive) uuid urn for any guid, and a defined equivalence between them.
> These are not "cool uris", but guids are inherently uncool so that's to be
> expected.
> If you want to use GUIDs for identifiers and need equivalent URIs (for use
> in RDF and the semweb), then urn:uuid:<the guid> might be a good way to go.
>
> If you have received this transmission in error please notify us immediately
> by return e-mail and delete all copies. If this e-mail or any attachments
> have been sent to you in error, that error does not constitute waiver of any
> confidentiality, privilege or copyright in respect of information in the
> e-mail or attachments. Please consider the environment before printing this
> email.
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content(a)lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
--
Robert A. Morris
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
IT Staff
Filtered Push Project
Department of Organismal and Evolutionary Biology
Harvard University
email: morris.bob(a)gmail.com
web: http://efg.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)
3
3
Hi Steve (and Dave),
[NB: After having composed the email below, just before sending it, I re-read your initial email more carefully and realized that you said you already had the ITIS TSNs, and were looking to add the NamebankIDs! Doh! Well, in case you (or anyone else) is interested in methods of matching names to get TSNs, I'll go ahead and send this anyway. But do note the comments below about the ITIS "versions" and ongoing overhaul of the vascular plant data in ITIS!!! -Dave]
I noticed this just before leaving work last week, and was out yesterday, but I wanted to chime in on this. I'm glad the uBio tools are meeting your needs (they do have some cool stuff!), but it should be noted that those tools are using a static snapshot of ITIS data from January 2009, and we have added about 50,000 additional scientific names, and updated tens of thousands of names beyond that (most of that in the last 6 months, as the frequency of loads dropped off in 2009-2010 due to technical issues).
I also want to note that ITIS is right in the middle of a full update of the vascular plant data in ITIS, and we're loading updated families on a monthly basis... and at long last we are tackling all the leftover issues from several bulk loads from USDA PLANTS data that left unreconciled bits of ITIS' older vascular plant data in various confusing states... so it is a VAST improvement that is underway.
There are several options for bouncing your names off the current version of ITIS.
One is to automate a matching process using the live ITIS data, based on the existing ITIS Web Services. I am CC'ing Alan Hampson, our IT fellow who built the Web Services ( http://www.itis.gov/web_service.html ), in case you'd like to follow up with him on that option. The advantage is that once you have a process in place it is completely self-serve and can always utilize the current ITIS data. If you have the resources to do this I think it would be greatly to your advantage to use this approach.
You can explore some ideas for client software to use the services at:
http://www.itis.gov/ws_develop.html
And for more information on ITIS web services try
http://www.itis.gov/ws_description.html
http://www.itis.gov/ITISWebService.xml
The ability to flag multiply-matched names (as you noted) should probably be considered, so that appropriate manual steps can be taken. This solution will allow you to take advantage of subsequent updates to ITIS with a minimum of additional effort, and given that the plant data are in the middle of a major overhaul, this bears consideration!
Another possibility is to grab a full snapshot of the ITIS data, and load it into a database so you can do what you wish. The obvious drawback is that it goes out of date, as with the ITIS snapshot uBio is currently using. But it puts you in the driver's seat re what to do & getting new versions of ITIS. Some general information about the full exports is in the following page, although conspicuously absent is any mention of the MySQL version which (assuming you have the free MySQL properly installed & configured) can be loaded with just a few clicks or a few command lines (depending on your platform):
http://www.itis.gov/ftp_download.html
And the current ITIS data are all here for downloading:
http://www.itis.gov/downloads/
A third option, which I note with some trepidation, is the old "Compare Nomenclature/Taxonomy" function on the ITIS site:
http://www.itis.gov/taxmatch_ftp.html
This is a VERY old function that we do plan on replacing (timeframe not yet certain), and it is vulnerable to timeouts, etc., which is why it notes to limit the number of names per pass. But with smaller chunks of names it does work quite well. The caveat is that I would make sure to choose the 4th option in Step 4, as it is at least aware (unlike the 3 other options) of multiply-matched name cases, and lists them separately at the bottom of the report. Just a bare listing of the scientific names, with the word "name" at the top, saved as plain text, is all that is needed for input.
A final option would be to ask someone at ITIS to handle the matching for you (leaving you to decide re the multiply-matched names). This might be simple from your end, but is suboptimal as it leaves you in the same position as you are now should you want or need to compare names again in the future (whether due to acquiring new names in your system, or wanting to check against a later updated version of ITIS), and it pulls someone here (probably me) off of the push to get more updates into ITIS. But in a pinch, I'm certainly willing to try to help you, should it come down to that! I would just ask that you seriously consider the web services option (in particular) or the others above first.
I hope this helps some. If you have already run all your matches against the old "ITIS" data via uBio then you might consider re-running (against the current ITIS data) at least the leftover names that you did not yet get matched. Let us know if you have questions (the itiswebmaster(a)itis.gov address goes to myself and Alan and several others, so that might be the best bet for a follow-up unless you have a question specifically for me).
Regards,
Dave
David Nicolson
Data Development Coordinator, Integrated Taxonomic Information System
Biologist, USGS Core Science Systems, Biological Informatics Program
nicolsod(a)si.edu Office 202-633-2149 Fax 202-786-2934
http://www.itis.gov/
http://www.cbif.gc.ca/itis/
"Nihil sumas necesse est..."
-----Original Message-----
Date: Fri, 20 May 2011 05:42:03 -0500
From: Steve Baskauf <steve.baskauf(a)vanderbilt.edu>
Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
To: "David Remsen (GBIF)" <dremsen(a)gbif.org>
Cc: "tdwg-content(a)lists.tdwg.org" <tdwg-content(a)lists.tdwg.org>
Message-ID: <4DD6457B.2080204(a)vanderbilt.edu>
Content-Type: text/plain; charset="iso-8859-1"
Thanks, all, for the responses. The "Compare to ITIS" function does
just what I want. I did a test run of 1000 names and it worked like a
charm. I will need to do a little massaging because sometimes two or
more ITIS IDs come back for each uBio ID. But I can handle that.
Steve
David Remsen (GBIF) wrote:
> Steve
>
> Have you tried this?
> http://www.ubio.org/clients/ITIS/index.php
>
> or this?
> http://www.ubio.org/services/mapper/index2.php
>
> All this ubio talk makes me think we were on to something. Worth a thought about adopting the new stnadrds and tools and making it really smooth.
>
> DR
>
>
> On 20 May 2011, at 04:46, Steve Baskauf wrote:
>
>
>> I have generated a csv spreadsheet of about 39 000 plant names for the
>> U.S. which has the ITIS TSNIDs for the names in a column. I would like
>> to have the uBio Namebank IDs in another column of the table. I have
>> been looking them up on the uBio website by typing in the names as I
>> need to know the IDs, but after doing about 300 of them, I'm getting
>> tired of it. Does anybody have a clever idea of a way to get the other
>> 38 000 Namebank IDs without looking them up. I'm sure that it would be
>> possible to find this out because uBio gets names from ITIS. However, I
>> haven't seen any clues about how to do it in an automated fashion. I'm
>> guessing that there might be some way to use the uBio web services, but
>> if so, it isn't obvious and I probably don't have the skills to carry it
>> out anyway.
>>
>> Any ideas?
>> Steve
>>
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN 37235-1634, U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582, fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content(a)lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>
> .
>
>
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
_______________________________________________
tdwg-content mailing list
tdwg-content(a)lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
16
52
Re: [tdwg-content] Why UUIDs alone are not adequate as GUIDs, was Re: ITIS TSNID to uBio NamebankIDs mapping
by Chuck Miller 08 Jun '11
by Chuck Miller 08 Jun '11
08 Jun '11
Yes, ratified following the official TDWG process. Ratified is the final status of a proposed standard that has followed the process all the way through and has become "official". This process replaced the deprecated "member voting" formerly conducted at the annual meeting.
Chuck
-----Original Message-----
From: tdwg-content-bounces(a)lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Kevin Richards
Sent: Wednesday, June 08, 2011 4:16 PM
To: Richard Pyle; 'Steve Baskauf'
Cc: tdwg-content(a)lists.tdwg.org
Subject: Re: [tdwg-content] Why UUIDs alone are not adequate as GUIDs, was Re: ITIS TSNID to uBio NamebankIDs mapping
Answering your question about whether the GUID applicability statements are "ratified" standards...
To be honest, I am not sure of the difference between an un-ratified standard and a ratified standard. My impression was that we have done all we need to with the applicability statements in the standards process, so perhaps they are ratified??
An email from 22 February (ironic date - the date of our devastating earthquake) about the applicability statements:
"The TDWG Executive Committee has approved the Life Sciences Identifiers Applicability Statement (LSID_AS) and the Globally Unique Identifiers (GUID_AS) Applicability Statement as new TDWG standards.
The Executive committee acknowledges Kevin Richards of Landcare New Zealand as author of the GUID Applicability Statement. Likewise, Kevin Richards, Ricardo Pereira (TDWG Infrastructure Project), Donald Hobern (Atlas of Living Australia), Roger Hyam (TDWG Infrastructure Project), Lee Belbin (TDWG Infrastructure Project) and Stan Blum (California Academy of Sciences) as co-authors of the LSID Applicability Statement.
The committee also greatly appreciated the patience and perseverance of Ben Richardson of the Department of Environment and Conservation of Western Australia who was the Review Manager for these standards. The process, as can be seen from the institutional associations, was in this case longer than all would have liked, but we hope that the standards will prove useful to the Biodiversity Informatics community.
We would also thank all those who were involved as formal or public reviewers of these standards. Your input was greatly appreciated and was in various ways, incorporated into the final standards.
These standards can be downloaded from http://www.tdwg.org/standards/150/download/.
Chuck Miller
TDWG Chair
On behalf of the TDWG Executive Committee"
Kevin
-----Original Message-----
From: tdwg-content-bounces(a)lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle
Sent: Thursday, 9 June 2011 8:46 a.m.
To: 'Steve Baskauf'
Cc: tdwg-content(a)lists.tdwg.org
Subject: Re: [tdwg-content] Why UUIDs alone are not adequate as GUIDs, was Re: ITIS TSNID to uBio NamebankIDs mapping
Hi Steve,
First of all, I owe you (and the list at large) a sincere apology for my excessively long and largely discombobulated email. I was distracted by many things, and I ended up writing it in chunks over the course of a full day. You were very clear on who you were responding to in each section, but I probably lost track of that because of the discontinuous mode of my response. Another problem is that I converted my reply to plain text, and that caused me to lose track in a few places whom I was responding to. Again, my sincere apologies.
> For the purposes of clarity, any time I say "GUID" here, I intend it
> in the sense of the TDWG GUID Applicability Statement.
OK thanks. That became clear as I responded, but somehow I didn't pick up on that when I first started responding. But even the TDWG GUID Applicability Statement (TGAS) is not perfectly clear or consistent in its use of the term GUID. In some cases, the term implies self-actionability; in other cases, it says what to do when GUIDs are not self-actionable.
> In the GBIF "Adoption of Persistent Identifiers for Biodiversity
> Informatics" document
> (http://www2.gbif.org/Persistent-Identifiers.pdf),
> the term "persistent actionable identifiers" is used instead of GUID,
> but in the interest of brevity I'll use GUID.
OK, fair enough. The GBIF document was the most recent one I contributed to, so I was thinking in those terms for using the qualified "persistent actionable identifiers" language in contrast to "GUID"; but I'm perfectly happy using the term "GUID" now that we have it (reasonably) well-defined.
> Thanks for taking the time to explain more about how GNUB will work.
> I am anxious to see it come to fruition and to use it.
I'm hoping that by late summer we'll have it functioning with several core services, and perhaps you and others on this list can help test those services and provide suggestions for new services. Before that can be a productive use of everyone's time, though, we need to hammer out some technical documentation. As I am writing this from my hotel room at Disney's Caribbean Beach Resort in Orlando (while my family naps after a long flight in preparation for some serious Magic Kingdom action tonight), I'm not really in a position to delve into this in too much detail right now. But I'll take a stab at it.
> First a word about the TDWG GUID Applicability Statement.
> You were expressing some reservations about calling it a "standard".
> If you go to http://www.tdwg.org/standards/, you will find it listed
> under "Current Standards".
My reservations were mostly about calling it a "ratified standard". I honestly don't know if it is or isn't, but I don't rememeber a vote on it (like there was for TCS and for the "ratified" DwC). Perhaps Kevin Richards or someone else at TDWG can clarify (for both of us).
> So an understanding of the "appropriate" way to apply something like a
> UUID must be inferred from the general statements and examples about
> UUIDs, by "reading between the lines" by considering how general
> recommendations about GUIDs would impact the handling of UUIDs, and by
> analogy to how LSIDs (another non-HTTP URI-based GUID) are handled.
Perhaps instead of reading between the lines, the discussion surrounding the drafting of the "TGAS" is available online somewhere. That would include details about the thinking behind the final wording.
> So based on this, you are correct to call a UUID a GUID. However, the
> part that I disagree with is:
>
> ... I think it's foolish to regard all of these different resolution
> mechanisms as distinct "identifiers". There is *ONE* GUID. It
> is: A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523. There are ten different
> ways to make it actionable. It therefore meets the recommendations of
> the applicability statement.
You are not alone in disagreeing with me on this.
> The problem is that when you create an HTTP URI out of a UUID, you are
> creating an identifier whether you think you are or not.
Fair enough; but by that definition & logic, *every* HTTP URI (sensu the "Contemporary View" explained at http://www.w3.org/TR/uri-clarification/; i.e., inclusive of things we sometimes call URN or URL) is an identifier. But I think that goes well beyond the scope of the discussion we're having here about GUIDs.
> I suppose as a matter of semantics, you could say "I don't intend for
> the ten ways I showed of making my UUID actionable to be GUIDs", but
> if I encounter one of them, how am I supposed to know that?
That is *exactly* the point I was trying to get at in my earlier message. Right now, everything that resolves via HTTP GET must be treated as a GUID. But it's not guaranteed to be persistent (thinking again in terms of the more explicit "persistent actionable identifiers"). I think our community can do better than that. The problem is not the resolution -- I can (and intend to) persist all ten service syntax forms, so they will all fit the TGAS recommendation as GUIDs. But that doesn't do you any good if you're trying to compare cited objects in two different datasets that each happened to use different syntax for the resolution mechanism.
A little more context might be helpful here. Those ten different mechanisms to resolve ZooBank identifiers existed before the drafting of the TGAS document. I assumed, at the time I established them, that everyone would see as clearly as I do that the need for identification is different from the need for "resolution" (=actionability). So strong was the opposition to what seemed obvious to me, that I followed my normal pattern in such cases, which is to assume that I was wrong. But the unsettling part is that the more carefully I thought about it, the more obvious it became that I was right, and the opposing viewpoint was wrong (despite the inherent assumption by various big-name web luminaries, who I otherwise hold enormous respect for). So, through the early TDWG/GBIF discussions, and both TDWG/GBIF GUID workshops, and the drafting of the various TDWG and GBIF documents, I stubbornly maintained this perspective (that identification and resolution should not be conflated). I believe that it was my stubbornness that accounts for the acknowledgement of the distinction between identification and resolution in TGAS and other documents.
Now, the easy way out would be to throw in the towel and terminate 9 of those resolution services, and make everyone happy with a single ZooBank URI that can be actioned via HTTP GET. But to do so instills in me the same sort of lack of conviction that I would feel if I confessed to a crime I did not commit just because it was the easy way out. On this issue, I'm not ready to do that, because it is so glaringly obvious to me that we *must* maintain a distinction between identification and resolution.
> You may not think that an HTTP proxied non-HTTP URI GUID (e.g. an HTTP
> proxied UUID) is a GUID, but anyone who is interested in describing
> the properties of the identified resource in RDF (which should be
> everyone, GUID A.S.
> recommendation 10) will think so.
Not everyone. But I concede that most would. And this is what I want to fix.
Another part of the TGAS that I quoted was this part (p 11):
"For non-self-resolving GUIDs, such as UUIDs, resolution of that GUID via the HTTP protocol’s GET method (the standard method by which a resource is retrieved on the web) must be implemented. This ensures that the data for the object being identified can be obtained from the provider of that GUID with tools that a majority of Internet users and developers already understand and use."
This, I believe, is one of the paragraphs inserted because of my insistence that the roles of identification and actionability be distinguished. Nothing in that statement -- or anywhere else in the TGAS that I am aware of -- suggests that HTTP-proxied "non-self-resolving GUIDs" themselves represent distinct GUIDs. Nor does it say that multiple mechanisms for establishing that HTTP-proxied actionability function represent a violation of Recommendation 4.
> The GUID A.S. does not contain any RDF examples (unfortunately) but
> the LSID Applicability Statement talks in detail about how LSIDs should be used in RDF.
> Recommendation 29 of the LSID A.S. states that "objects must be
> identified by an LSID in its standard form using the rdf:about
> attribute". You can do this with an LSID because it is a urn (subset
> of the more generic URI) and therefore a describable thing in RDF.
> However, a UUID cannot be used similarly in an rdf:about attribute
> because it is not any kind of URI. It is just a globally unique string.
Right -- which is exactly why ZooBank identifiers are presented publicly as LSIDs (with proper resolution mechanisms), rather than simply as UUIDs. But that doesn't change the fact that the UUID is the "real" identifier, and is simply "wrapped" in LSID-compliant resolution metadata. But I will say that I also regard the LSID as a bona-fide "identifier" in and of itself, because that's how the LSID spec is written. So I (grudgingly) admit that our minting of LSIDs commits us to treating the full-context LSID as though it is a distinct identifier from the UUID that it encapsulates. However, I don't think this applies to all the flavors of HTTP proxying, because there is no spec (that I am aware of) that says "all HTTP URIs should be treated as though they are GUIDs" -- even though, by some definitions, they technically are.
> Recommendation 31 says "All references to objects identified by LSIDs
> using the rdf:resource attribute must use a proxy version of the LSID."
Right, and this is where I think I dropped the ball on ZooBank LSID resolution. At the moment, resolving a ZooBank LSID directly (e.g., via Rod Page's LSID tester, or TDWG's LSID resolver service) retruns the proper RDF (thanks to Kevin Richards, who set that service up). However, the HTTP proxy version returns HTTP by default. I needed to do this because I didn't (and still don't) know enough about applying style sheets to RDF to render them in a human-friendly form. I spoke with Rob Whitton about this last week, and he will have this fixed soon.
> Recommendation 30 says that the description of all objects identified
> by an LSID must contain an owl:sameAs, owl:equivalentProperty or
> owl:equivalentClass statement expressing the equivalence beteen the object identifier in its standard form and its proxy version.
Ahh!! OK, this may be the fatal bullet to my argument. But let me explain a bit further:
The "true" GUID for a ZooBank record is the UUID. The standard form of presenting this UUID to the public is as an LSID. I'm happy with saying that the LSID *is* the TDWG-context GUID for the record (calling the UUID the "true" GUID is just a semantic technicality that has no real bearing in the context of TDWG standards). The standard http proxy for ZooBank LSIDs is "http://zoobank.org/[LSID]" -- that is, the LSID appended to a "http://zoobank.org" prefix.
I have no argument with the Recommendation 30 that says there should be an owl:sameAs, owl:equivalentProperty or owl:equivalentClass statement expressing the equivalence between the LSID and its proxy version.
But I do have an argument against the notion that *any* web service that can resolve the LSID into its constituent metadata (whether HTTP, RDF, or whatever) must be treated as a distinct GUID, with a similar need for the owl:sameAs [etc.] statement.
Perhaps this, ultimately, is the crux of our argument.
> I don't think you were seriously suggesting that all 12 of the
> identifiers on the list would actually be used in "real life". You
> were making a point about how a UUID could be made actionable.
In part yes. But what I was really saying is that it's silly to think of all of those different metadata resolution services as distinct GUIDs (even though in the broad sense, all HTTP URIs are technically GUIDs). Also, it depends on what you mean by "used in real life". They should certainly not be used in "real life" as identifiers of the sort you gave examples for. But they may well be "used" in other real-life contexts.
> But my point is that you simply cannot meet the requirements of the
> GUID A.S. with ONLY a UUID.
We may quibbling about semantics here. I never said that the TGAS was met with ONLY a UUID. My point was, the UUID *is* the identifier, and it can meet the TGAS requirements and recommendations *provided* that there is an appropriate HTTP GET resolution service for it, and provided that the UUID is exposed externally only in the context of the relevant resolution metadata. In other words, I *COMPLETELY* agree with you (and have tried to make this clear all along) that one would never see something like "<dc:identifier>A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</dc:identifier>" in an RDF (or other similar) document. But I do believe that something like "<dc:identifier>http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</dc:identifier>" *would* be compliant.
> You MUST have an HTTP proxied version of it in order to "do the right thing"
> (i.e. GUID A.S. rec 10) and provide metadata in the form of RDF serialized as XML.
Yes, exactly.
> That HTTP proxied version isn't just going to be seen as a "resolution mechanism".
But my point is that it *should* be. In other words, our community should rise to that level of sophistication, because it would, I am quite certain, benefit us in the long run.
> If you and GNUB are going to participate in BiSciCol as I understand
> it to be developing (and I believe that you are), you will HAVE to
> have an HTTP URI version of your UUIDs and in that context the raw
> UUID will be relatively irrelevant.
Of course! And if you ever thought otherwise, then obviously I am not expressing myself well. Maybe part of our argument is that you are focused on implementation, and I am speaking more on principle. I thought I made it clear in my first post on this thread that a UUID by itself is not actionable (recall my example of walking through the park and discovering a UUID written on a slip of paper), and therefore not, by itself, functional as a persistent actionable identifier (sensu TDWG/GBIF). My only point in all of this is that identification and resolution are two separate functions, and we should be sophisticated enough to recognize the distinction. I don't know if it's feasible, but I think one way that it could be made feasible comes back to my suggestion of a registry of resolution services. This is not going backward; it's going forward. However, our community may have its hands full with just implementing the things we most need to implement, and may not have the luxury of time and resources to implement a standard acknowledgement of the distinction between resolution services and object identification -- by my contention is that we ignore that distinction at our peril.
> My point is that you should decide on just one of these HTTP URIs and
> use that as your identifier when you communicate with the outside
> world.
That is already the case (has been the case ever since July 2007, when Kevin Richards set up our LSID resolution service).
> My preference would be "http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"
> as the shortest and least complex one that would do everything that
> needs to get done.
Well, for various reasons we went with the LSID version:
"http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF…"
Or, as RDF in accordance with the LSID spec:
http://zoobank.org/authority/metadata/?lsid=urn:lsid:zoobank.org:act:A9F435…
> I guess that there isn't problem with the other nine existing, but
> from my point of view there is nothing but harm to be done by exposing
> them to the outside world.
I guess that depends on what you mean by "exposing" them. In my mind, they are already "exposed" because they work. However, I don't think anyone would (or should) embed them in semantic documents as though they were TDWG-style GUIDs. HOWEVER, the point I was originally making is that if we could (rightly) recognize the different roles of identification and resolution, then we wouldn't have a problem. You could very easily use your preferred "short" version of "http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", and reasoning service would have no difficulty recognizing it as identifying the same object as urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523, or http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF…. I realize there is no elegant way to do this using existing RDF syntax, which is why this is *really* a much more fundamental argument than just TDWG-space. But in my extremely naïve way of representing it, it might look something like:
<rdf:Description rdf:about="http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">
<dc:identifier>A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</dc:identifier>
<xxx:resolutionService>http://zoobank.org/</xxx:resolutionService>
...which would have no trouble combining with a document that had something like this:
<rdf:Description rdf:about="http://zoobank.org/?uuid=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">
<dc:identifier>A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</dc:identifier>
<xxx:resolutionService>http://zoobank.org/?uuid=</xxx:resolutionService>
> The other point which I was trying to make is: why would you choose to
> expose to the outside world an identifier that only does part of the
> desirable things that we want (i.e. my list of 8 desirable attributes
> of a GUID), when you could use a modification of that identifier that would do everything you want?
I would *never* "choose" to do that. However, I may very well be stuck with that due to insufficient resources and expertise. *That* is what I intend to fix now that I (finally) have both resources and expertise.
> But with virtually no additional cost (15 minutes of time from
> somebody who knows how to create a single 3 kB XSLT file)
Ah....if only I had 15 minutes of such a person's time before now! :-)
> I would assert the same thing about LSIDs. Why would you create in
> identifier that is part of (what seems to me to be universally
> recognized as) a dead technology when you could create a simpler HTTP
> URI that would do the same thing and potentially more?
The answer to that is much easier, and should be self-evident when you consider what I already mentioned previously: that the service was established in the summer of 2007. At that time, LSID was absolutely NOT dead, and indeed was actively being promoted by both TDWG and GBIF. This was the outcome of the two GUID workshops those organizations sponsored. There certainly were detractors to LSIDs back then, making the same arguments they are making now. To the extent that LSIDs are currently perceived as "dead" by some, is due largely to the self-fulfilling prophecy of those detractors.
But in any case, regardless of whether LSIDs really are dead or not, and regardless of why that may be so (if it is so), there were very good reasons why ZooBank went with LSIDs. And while I realize that the four years since then are a veritable EON in IT contexts, keep in mind that ZooBank has to think in terms of centuries. In that context, the HTTP protocol is not guaranteed to be persistent, and things like DOI are pretty-much downright ephemeral. In fact, this is exactly why I went with UUIDs in the first place. As long as electronic data are stored in binary form, 128 bits will have mathematical stability. *That's* why I realized that UUIDs were the only defensible choice for the "real" identifier, and is the identifier that ZooBank will persist. The choice of LSID as a resolution protocol was, as already stated, influenced by the thinking of our community at the time. *My* thinking at the time was that the only thing with any real plausibility of ICZN-scale longevity was binary data encoding (even that may not withstand more than a few decades), so I embraced UUIDs (which is to say, I embraced 128-bit identity). Everything else (LSID protocol, HTTP protocol, etc.) could be regarded as no more than the "resolution mechanism du joir". Perhaps this starts to explain why I keep emphasizing the distinction between identity and metadata resolution. The ZooBank registry has to think in terms of long-term identity, and assume that resolution mechanisms will continue to change as the technological wind blows.
> In the case of uBio and Biodiversity Collections Index, they were set
> up when LSIDs were believed to be the "Next Big Thing".
Actually, all of us were implementing them at the same time. I think IPNI was one of the first; BCI came later. This all emerged from the two TDWG/GBIF GUID workshops.
> That did not turn out to be the case, so those organizations are stuck
> with painful HTTP URIs like
> "http://biocol.org/urn:lsid:biocol.org:col:35115" and
> "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:…"
> when they could have had "http://biocol.org/35115"
> and "http://www.ubio.org/9479554". I would say "lesson learned" -
Ha! Hardly! We are only just now beginning to start learning lessons. Let's revisit this conversation again in a couple of decades and see how many more lessons are yet in store for us.
In any case, my family just woke up from their nap, so I'll have to look at the rest of your message later, after some time with Mickey and the gang.
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef(a)bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
_______________________________________________
tdwg-content mailing list
tdwg-content(a)lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email
Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz _______________________________________________
tdwg-content mailing list
tdwg-content(a)lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
1
0
Proposal to reconcile Current DarwinCore with Possible Fully Semantic Representations
by Peter DeVries 08 Jun '11
by Peter DeVries 08 Jun '11
08 Jun '11
Hi,
I was thinking that we are a bit stuck on current vs. new representations
and the idea whether a potential semantic version should be fully decidable
or if an executable version would be more appropriate.
There is also the issue of the various data restrictions that are entailed
in different data sets.
I created the diagram below that I hope will illustrate the following idea.
End users format their data in DarwinCore.
GBIF or some other group would need to clean and normalize this data.
They could then output this data in various forms.
1) One based on a fully decidable model
2) One based on an executable model
3) One based on an executable model that would be open and public
This would require some sort of data fuzzying and I was thinking that
replacing the actual GPS records with a URI that represents the 10km x 10km
region of the earth.
The size of 10km x 10km is flexible, it just needs to be some standard
size and represented by a URI
Here is the URI to the diagram
http://www.taxonconcept.org/storage/images/DarwinSWProposal.png
Respectfully,
- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries(a)wisc.edu
TaxonConcept <http://www.taxonconcept.org/> &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
--------------------------------------------------------------------------------------
1
0
> An Occurrence is a combination of an Individual and an Event.
> An Occurrence is a coupling of an Individual and an Event.
> An Occurrence is a pairing of an Individual and an Event.
How about:
An Occurrence is the reification of an individual's involvement in (entaglement with? presence at? relationship to?) an event. It reifies an "Event involvesIndividual Individual" fact.
The need for this construct is that we often need to say a number of additional things about an individual's involvement with (presence at) an event beyond simply assertin that there is some relationship. We need to say what tokens that individual left, what role that individual had (Predator? Prey? Parasite?), perhaps temporal or other limits of that particular individual at the event. Occurrence is the object to which these facts may be attached. An individual might meaningfully have more than one occurrence at an event - particularly in cases where events are part-of larger events, or where an individual somehow has multiple roles (hyenas chased away from their kill by a lion - or is it the other way around?).
To put it another way: "reification" = "tuple" = "association table" = "pulling a property out into an object". More or less.
To put it another another way, an Occurence object stands in relation to an event and an individual much as a TaxonRelationship object stands in relation to the two taxa it mentions. You could simply model taxonomy with a "hasSubtaxon" predicate, but we usually need to say a great deal more about taxonomic relationships than that.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
7
18
Producing a global taxon register (was: ITIS TSNID to uBio NamebankIDs mapping)
by Tony.Rees@csiro.au 05 Jun '11
by Tony.Rees@csiro.au 05 Jun '11
05 Jun '11
Hi all (jumping in with some trepidation...)
It's good to hear some ramp-up may be coming of activity in the GNUB space (congratulations, Rich et al.). My main concern, however is that it does not solve my particular problem - which is in a nutshell, given "any" cited taxonomic name, what can we tell about it - with regard to its classification, nomenclatural and taxonomic/synonym status, and certain attributes (initially for my use case, simple geologic time - is it extant or not - and simple habitat classification - is it marine or not - though of course infinitely expandable from there).
To me the vision of GNUB is too grand - to index all usages of all names in all sources - and the vision of GNI is too limited - to index the names but not actually record/harmonise/verify/manage (in a structured way) any associated information. I'm after something in between - what I have tentatively previously called HCAL - a hierarchical catalogue of all life (presuming that at least one "management" hierarchy is incorporated) - or maybe just a GTR - global taxon register. Sort of, waiting for the Catalogue of Life and/or ITIS to be complete, for both extant and fossil taxa, and also incorporate selected "taxon attributes" as above. (This is the space into which my IRMNG database is cast as a preliminary/"working for now" solution, but obviously without the significant resourcing / community cooperation required to build and sustain the thing for the long term).
So my question is, how can such a product emerge from ongoing developments in GN* space, or other...
Over to the experts,
Best - Tony
________________________________________
From: tdwg-content-bounces(a)lists.tdwg.org [tdwg-content-bounces(a)lists.tdwg.org] On Behalf Of Richard Pyle [deepreef(a)bishopmuseum.org]
Sent: Saturday, 4 June 2011 8:48 AM
To: tdwg-content(a)lists.tdwg.org
Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
Working backwards through this thread...
I hadn't read Dima's post until just now, and I see that at least a couple of his points (i.e., #2, #5, #6) apply to exposing the UUIDs externally. However, I think that a simple protocol (such as replacing spaces with "_", and avoiding characters that look the same but are different -- such as the Cyrillic 'a') could go a long way to mitigating those problems.
On the other hand, it really depends on what the identifier is for. The string "Danaus_plexippus_(Linnaeus_1758)" may be more friendly to our eyes, but "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" is definitely more friendly to a computer (Dima's points 1, 3 & 4, among others). My feeling is that the push for GUIDs is more about enabling computer-computer conversations, than it is about enabling human-human or human-computer interactions; and therefore we should not get bogged down in the "ugliness" of the identifiers. In the context of electronic data services, the "ugliness" potential of the "Danaus_plexippus_(Linnaeus_1758)" approach to identifiers is far greater than the ugliness potential of "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", when it comes to interlinking electronic biodiversity data. It is nothing for a computer to render relevant metadata of the object identified by "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" into "Danaus plexippus (Linnaeus_1758)" on a computer screen or piece of paper for human-eyeball consumption. But there are many pitfalls (some noted by Dima) for a computer to unambiguously resolve "Danaus_plexippus_(Linnaeus_1758)" back to a meaningful data object.
I guess my revised point is: GNI (and uBio/NameBank) are essentially the only taxonomic databases out there where a human-friendly persistent/actionable identifier of the sort being discussed is even plausible as an option. It may not even be wise in this context (as per Dima's points), but it *might* be, depending on the need for a human-friendly identifier.
Maybe the simplest thing to do would be to not regard "http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758)" as an identifier per se, but rather as a protocol for a web service. In other words, if you append a text string to the root URL "http://gni.globalnames.org/name_strings/", GNI would run that text string against its index and return whatever metadata based on a text-string match. This is not mutually exclusive with an "identifier" in the form of "http://gni.globalnames.org/name_strings/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", that would less ambiguously resolve a known record in GNI. At this point, the line between "identifier" and "service" gets fuzzy, of course. But the analogy is true in ZooBank:
The persistent "Identifer" looks like this:
A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
One way that this identifier can be represented as an *actionable* identifier is this:
urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
Another "actionable" form of the identifier might be this:
http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF…
or this:
http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
or even this(?):
http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5…
(all of which work, by the way)
However, the following are examples of what I would think of as *services*:
http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758)
http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BA…
http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:…
But really, from the perspective of the end-user, does it matter if it's an identifier or a service? Ultimately, they ask the questions, and the answers appear on their computer screens.
Aloha,
Rich
> -----Original Message-----
> From: tdwg-content-bounces(a)lists.tdwg.org [mailto:tdwg-content-
> bounces(a)lists.tdwg.org] On Behalf Of Dmitry Mozzherin
> Sent: Friday, June 03, 2011 4:34 AM
> To: David Remsen (GBIF)
> Cc: tdwg-content(a)lists.tdwg.org; Dmitry Mozzherin; Orrell, Thomas; Alan J
> Hampson; Nicolson, David; Gerald Guala
> Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
>
> In my opinion UUIDs have a few advantages over strings --
>
> 1. It is uuid, so it will work with uuid tools (current and future ones)
> 2. It is less ambiguous -- For example -- what is the difference between Betulа and
> Betula for your eyes? (one of them has a Cyrillic 'a')
> 3. Database wise it is faster to search because it is just a 128bit number, while
> a name is at least 245 byte varchar -- it makes searching much faster because
> in relational databases the size of keys directly proportional to the search
> speed
> 4. UUID v. 5
> (http://en.wikipedia.org/wiki/Universally_unique_identifier)
> allows to generate UUID algorithmically without looking up a database (no
> need for network connection)
> 5. Links like http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758) might be ambigous -- I can think of several ways I can represent name string
> part in the url and they will all resolve to the same thing in GNI.
> 6. Unescaped unicode characters in url containing literal name strings (people
> will forget to escape them) will depend on an implementation of a url
> resolver
>
> Saying this links like
> http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175
> 8)
> are definitely attractive and is it good to have them as another way to access
> a name!
> My personal preference would be not use them as main identifier because
> of the reasons 1, 2, 3 and 5.
>
> Dima
>
>
>
>
> On Fri, Jun 3, 2011 at 7:59 AM, David Remsen (GBIF) <dremsen(a)gbif.org>
> wrote:
> > Why not use the name as the basis for the resolvable identifier
> > instead of a uuid. Isnt there a 1:1 cardinality between the name and
> > the uuid in the GNI? Doesnt that mean that
> >
> > http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> 755c34
> > c601ec
> > and
> >
> http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175
> > 8)
> >
> > are equally unique? The latter is certainly more readable. In those
> > cases where the namestring is a homonym like
> >
> > http://gni.globalnames.org/name_strings/Oenanthe
> >
> > couldn't you just return the addresses of the two globally unique
> > forms of the name when you resolve it?
> >
> > http://gni.globalnames.org/name_strings/Oenanthe_Smith_1899
> >
> > http://gni.globalnames.org/name_strings/Oenanthe_Jones_1900
> >
> > Wouldn't those be as globally unique and easier to read and adjust to?
> > Or am I missing something. I always wanted to do that with ubio IDs
> > after a back and forth with Gregor Hagedorn and wished we hadn't
> > exposed those integers.
> >
> > DR
> >
> >> Hi Steve,
> >>
> >> I don't have time to go through this in detail, and I can't speak for
> >> the GNI, but I can tell you about how the GNI URI's work at least for now.
> >>
> >> A while back Dima Mozzherin and I were looking into how triples etc.
> >> might be of use to the GNI.
> >>
> >> We needed a way to generate unique URI's for each name.
> >>
> >> We wanted to avoid having to keep these in sync and not require
> >> everyone to look each ID up through some service.
> >>
> >> Dima came up with the following plan. We use the namestring as seed
> >> to generate a unique UUID.
> >>
> >> Basically this is a shared algorithm which the GNI and TaxonConcept
> >> both use. But it could be used by anyone.
> >>
> >> You feed the name string to the algorithm and it spits out a UUID. We
> >> append then append that to a URI and web service so it is resolvable.
> >>
> >> So the name Danaus plexippus (Linnaeus 1758) =>
> >> 4ef223c4-0c3e-5e84-ace9-755c34c601ec
> >>
> >> So if the GNI and and another group have the same namestring they
> >> have the same UUID.
> >>
> >> People can then can link their data set to the GNI with the following
> >> URI
> >>
> >> http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> 755c3
> >> 4c601ec
> >>
> >> RDF
> >> http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> 755c3
> >> 4c601ec.rdf
> >>
> >> <http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-
> 755c
> >> 34c601ec.rdf>If you think of your data set as one table and the GNI
> >> as another, this URI serves as the foreign key that connects them
> >> together.
> >>
> >> Some on the list don't like how these look, but there is a tremendous
> >> advantage in not having to worry about syncing two large data sets
> >> and determining if a given integer is already in use.
> >>
> >> Also Rod Page has written a recently about UUID's.
> >> http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-replicati
> >> on.html
> >>
> >> <http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-
> replicat
> >> ion.html>There may be a way to do something similar with bit.ly like
> >> identifiers that are shorter (mCcSp), but I think it the general idea
> >> is a good one.
> >>
> >> If you recall from my talk at TDWG, I was able to use these to make
> >> statements that one namestring was a synonym etc. of another etc.
> >>
> >> The algorithm we use is written in Ruby but I could be ported to many
> >> different languages since UUIDs are widely supported.
> >>
> >> Respectfully,
> >>
> >> - Pete
> >>
> >>
> >>
> >> On Thu, Jun 2, 2011 at 11:41 PM, Steven J. Baskauf <
> >> steve.baskauf(a)vanderbilt.edu> wrote:
> >>
> >>> My email access has been sporadic since this thread developed, so
> >>> at this point I'll respond to points made in several of the
> >>> messages.
> >>>
> >>> First, I should note that there has been previous discussion on this
> >>> list on a similar topic from
> >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002231.htm
> >>> lthrough
> >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-
> January/002231.html.
> >>> One can review what was said at that time rather quickly by starting
> >>> on the first linked message and clicking on the "Next Message" link
> >>> until you get to the end of the range I gave above.
> >>>
> >>> My reason for the request for information that started this thread
> >>> was that I wanted to link to a URI that would anchor the name
> >>> portion of a name/sensu pair (TNU or Taxon Concept a la TCS if you
> >>> prefer) as in this RDF
> >>> snippet:
> >>>
> >>> <tc:nameString>Quercus rubra L.</tc:nameString>
> >>> <tc:hasName
> >>>
> rdf:about="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio
> .org:namebank:448439"
> >>> <http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:n
> >>> amebank:448439>/>
> >>>
> >>>
> >>> At this point in the discussion, I'm not actually talking about
> >>> creating a link to a taxon concept but rather to a taxon name, so
> >>> some of the issues Pete raised don't apply here (e.g. what's the
> >>> "right" name for a concept
> >>> -
> >>> the question here is simply what's a stable identifier for the name) .
> >>> In
> >>> principle, I could probably just provide the name string and be done
> >>> with it. However, having some degree of faith that Smart, Computer
> >>> Savvy People might some day be able to use the metadata returned by
> >>> the URI (or perhaps metadata which they already have in a triple
> >>> store onsite) to do cool things like knowing that my name is the
> >>> same as an orthographic variant or that "Quercus rubra L." is
> >>> basically the same thing as "Quercus rubra", I would like to also
> >>> provide a functional URI.
> >>>
> >>> As an end -user who isn't very interested in the technical issues
> >>> involving names, I don't really care what URI I use. I would prefer
> >>> for it to be widely recognized and for it to "work" (i.e. be
> >>> resolvable). In the earlier
> >>> (January) thread, there was discussion about existing identifiers.
> >>> There
> >>> were a number of posts, but in particular
> >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002258.htm
> >>> l
> >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002259.htm
> >>> ldiscussed the relative merits of ITIS and uBio ID numbers. My
> >>> take-home message from this was that uBio represented the largest
> >>> single set of names with assigned identifiers (see
> >>> http://gni.globalnames.org/data_sourcescited in Pete's email) and
> >>> that uBio metadata provides useful references.
> >>> Hence my interest in referencing uBio ids as a URI. However, as a
> >>> practical matter, the organizations that I share images with either
> >>> want ITIS TSNs (EOL and Morphbank) or just names (Discover Life).
> >>> Nobody is asking for uBio identifiers or any other identifier.
> >>>
> >>> I found Kevin's comment at
> >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002486.html
> >>> very
> >>> thought-provoking: "My thoughts are that the most likely way this
> >>> will be solved is by standard market type pressures - ie the best
> >>> solution/IDs will be used the most and 'float' to the top." I'm not
> >>> going to make a judgment about what is the "best" solution or ID.
> >>> But I would say that in "computer"
> >>> history, being the "best" doesn't necessarily mean that something
> >>> will be used. Take for example, the FOAF vocabulary. What the heck
> >>> is Friend of a Friend? I would venture to say that most of the
> >>> people using the FOAF vocabulary don't know or care. The FOAF
> >>> vocabulary was the one that people started to use and once that
> >>> happened, people didn't switch even if there was something better.
> >>> I'm not familiar with the history of other stuff like YouTube and
> >>> Craig's List, but I would guess that they weren't necessarily "the
> >>> best" systems - they were just the one that the most people started
> >>> using first and once that happened, people didn't switch. I'm using
> >>> ITIS IDs because they are easy to get and the people I communicate
> >>> with want them. Whether they are the "best" or "done correctly"
> >>> doesn't matter to me as much as the fact that that they are widely
> >>> recognized and stable (and that thus far every name that I've looked
> >>> for has been in their database).
> >>>
> >>> I think that one reason why this question has been on my mind is
> >>> that I've been waiting for GNUB (Global Name Use Bank) to come out.
> >>> I'm not really up on how it is going to work, but my impression is
> >>> that it was going to be based on the Global Name Index (GNI) which
> >>> was mentioned in that earlier January thread. At that point, the
> >>> GNI names didn't have any identifiers that were exposed to the
> >>> public as permanent GUIDs. I'm assuming that if GNUB refers to GNI
> >>> names, they will have some kind of identifiers. So if that happens
> >>> how is the GUID recommendation 8 going to be followed? As Kevin
> >>> said in
> >>> http://lists.tdwg.org/pipermail/tdwg-content/2011-June/002499.html
> >>> "What I take from recommendation 8 of the GUID applicability guide
> >>> ... is that if you DON'T already have a record in your own database
> >>> for a taxon name/concept, then reuse an existing one. " What we
> >>> have here with GNI is a situation where none of the records have
> >>> identifiers. In my mind, the "best practice" according to
> >>> recommendation 8 would be for the GNI to reuse existing identifiers
> >>> where they exist and NOT make up new ones. This is a bit more
> >>> complicated because the ITIS identifiers (which are in common
> >>> use)
> >>> don't have an http URI version that is resolvable, and while the
> >>> uBio identifiers have a resolvable http URI, it's in the form of a
> >>> proxied LSID, which I've already complained is very ugly. So I'd
> >>> like to hear some ideas about how to have "reused" identifiers in
> >>> the GNI.
> >>>
> >>> One thing that comes to my mind would be to have a "domain name"
> >>> like "http://purl.org/gni/" <http://purl.org/gni/> or
> >>> "http://purl.org/tn/"<http://purl.org/tn/>("tn" for "taxon name")
> >>> and to follow it with a namespace/id combination similar to what is
> >>> done with lsids. So for example "itis/19408" and "ubio/448439"
> >>> could be appended, creating http://purl.org/gni/itis/19408and
> >>> http://purl.org/gni/ubio/448439 for "Quercus rubra L." Both URIs
> >>> could point to the same RDF and that RDF could indicate that the two
> >>> identifiers are owl:sameAs . I realize from what Bob Morris has
> >>> cautioned in the past that there are problems with owl:sameAs when
> >>> the two things aren't actually the same thing (e.g. if the uBio ID
> >>> refers to a name string only but the ITIS TSN refers to the name
> >>> plus an "accepted" status and a relationship to parent taxa).
> >>> However, if there were an understanding that the GNI only refers to
> >>> name strings, then one could still refer to
> >>> http://purl.org/gni/itis/19408 as an identifier for the name string
> >>> of the thing (whatever it is) that is referred to by an ITIS TSN of
> >>> 19408. I don't think there would be a problem saying that and the
> >>> ubio ID were "owl:sameAs". Some kind of solution like this would
> >>> allow people to easily generate a resolvable URI for a name if they
> >>> were using ITIS TSNs or uBio IDs. If the name that one wanted to
> >>> use was so obscure that it was one of the 9.5 million names that
> >>> uBio has that ITIS doesn't have, then that name would only have the
> >>> ubio version. I have no idea whether this would be a good idea or
> >>> not, but I was really cringing to think about 19 million newly
> >>> minted UUIDs appended to
> >>> "http://gni.globalnames.org/"<http://gni.globalnames.org/>and
> >>> figuring out how to connect those horrid things to the names and
> >>> ITIS TSNs that I'm already using. I think that I said this before,
> >>> but using the purl.org domain rather than one like
> >>> http://gni.globalnames.org/ would in the future allow somebody else
> >>> to take over management of providing the metadata when the GUIDs
> are
> >>> resolved without having to deal with issues of who "owns" the domain
> >>> name.
> >>>
> >>> Steve
> >>>
> >>>
> >>>
> >>> Kevin Richards wrote:
> >>>
> >>> Pete,
> >>>
> >>> I'm not trying to say what you are doing is a waste of time/impossible.
> >>> I
> >>> actually think RDF + semantics are a good way forward, but this
> >>> really implies that we need to rely on the semantics and linkages
> >>> rather than having a SINGLE ID for a taxon name. (which is what I
> >>> thought Steve was getting at). Each instance of a taxon name can
> >>> have its own ID and then all these instances are connected via
> >>> ontology defined semantic links. This seems more appropriate to me
> >>> than insisting everyone uses the "Global Taxon Name ID X".
> >>>
> >>>
> >>>
> >>> In your example of *Aedes triseriatus* and *Ochlerotatus
> >>> triseriatus* - these are two different names so they need two
> >>> different IDs, they may be linked by a single taxon concept, but
> >>> they are separate names. So which of these now 3 IDs do you expect
> >>> people to use, and according to what source??
> >>>
> >>>
> >>>
> >>> For example if we have a name, eg the Robin, Erithacus rubecula,
> >>> mentioned
> >>> in IT IS (TSN : 559964) and also in EOL (www.eol.org/pages/1051567),
> >>> also
> >>> in GBIF (http://data.gbif.org/species/21266780), also in avibase (
> >>> http://avibase.bsc-eoc.org/species.jsp?avibaseid=C809B2B90399A43D),
> >>> which
> >>> ID are you hoping people will use?? Would you put the IT IS ID in your
> >>> own
> >>> dataset as the ID for that name - unlikely. Or would it be better to
> >>> link
> >>> them up with semantic linkages.
> >>>
> >>>
> >>>
> >>> What I take from recommendation 8 of the GUID applicability guide (as
> >>> Steve
> >>> puts is "stop making up new identifiers when somebody else already has
> >>> one
> >>> for the thing you are talking about") is that if you DON'T already have
> >>> a
> >>> record in your own database for a taxon name/concept, then reuse an
> >>> existing
> >>> one. NOT ditch all your current IDs and adopt someone else's
> >>> (especially
> >>> hard considering it is so hard to work out which if the multitude of
> >>> names
> >>> ad concept IDs that directly relates to your taxon name).
> >>>
> >>>
> >>>
> >>> I am all for limiting the number of IDs for the "same" thing, but in
> >>> some
> >>> cases it is more useful to build linkages than force this tight
> >>> integration
> >>> of data and IDs. Especially for taxon names and concepts, where it is
> >>> complex to define if you are even talking about the "same" thing or not.
> >>>
> >>>
> >>>
> >>> Kevin
> >>>
> >>>
> >>>
> >>> *From:* Peter DeVries
> >>> [mailto:pete.devries@gmail.com<pete.devries(a)gmail.com>]
> >>>
> >>> *Sent:* Wednesday, 1 June 2011 12:38 p.m.
> >>> *To:* Kevin Richards
> >>> *Cc:* Steve Baskauf; tdwg-content(a)lists.tdwg.org; Gerald Guala;
> >>> Nicolson,
> >>> David; Alan J Hampson; Orrell, Thomas
> >>> *Subject:* Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping
> >>>
> >>>
> >>>
> >>> Hi Kevin,
> >>>
> >>>
> >>>
> >>> I forgot one mention some other things that are different about my
> >>> project.
> >>>
> >>>
> >>>
> >>> You can write a simple SPARQL query to get a list of all the
> >>> TaxonConcept's
> >>> that have ITIS ids, or all those that have ITIS and NCBI ID's etc.
> >>>
> >>>
> >>>
> >>> You can do this on any SPARQL endpoint that hosts the data.
> >>>
> >>>
> >>>
> >>> You can download the entire data set and run the queries on your own
> >>> endpoint.
> >>>
> >>>
> >>>
> >>> You can write a script that runs the query and downloads the ITIS
> >>> numbers
> >>> and exports them to CSV etc.
> >>>
> >>>
> >>>
> >>> - Pete
> >>>
> >>>
> >>>
> >>> On Tue, May 31, 2011 at 5:16 PM, Peter DeVries
> <pete.devries(a)gmail.com>
> >>> wrote:
> >>>
> >>> Hi Kevin,
> >>>
> >>> On Tue, May 31, 2011 at 3:27 PM, Kevin Richards <
> >>> RichardsK(a)landcareresearch.co.nz> wrote:
> >>>
> >>> This is exactly why this problem still exists and will be very complex
> >>> to
> >>> solve - everyone says "we should have a single ID for a specific taxon
> >>> name,
> >>> there seems to be several IDs 'out there' that refer to the same taxon
> >>> name,
> >>> so Im going to create another ID to link them all up" - yet another ID
> >>> that
> >>> no one will particularly want to follow - you would have to get everyone
> >>> to
> >>> agree that your combinations/integration of taxon names is the best one
> >>> and
> >>> hope everyone follows it - unlikely in this domain.
> >>>
> >>>
> >>>
> >>> Isn't this kind of what the The Plant List, and eBird already do?
> >>>
> >>>
> >>>
> >>> A difference being that they tie these to a specific name and specific
> >>> classification.
> >>>
> >>>
> >>>
> >>> The Plant list is not really even open so it is difficult to people to
> >>> adopt it in mass.
> >>>
> >>>
> >>>
> >>> For instance, if I manage a herbarium, how do I easily reconcile my
> >>> species
> >>> list with the entities represented in the Plant List?
> >>>
> >>>
> >>>
> >>> eBird has millions of records which implies that they have been able to
> >>> convince the observers in the field to adopt their system. You are
> >>> correct
> >>> in that there are probably a lot of taxonomists that don't like their
> >>> list.
> >>>
> >>> It differs from many of the other classifications, but remember the
> >>> system
> >>> rewards them for not agreeing. Note the difference between the
> microbial
> >>> taxonomists and other taxonomists. In the case of the microbial
> >>>
> >>> workers, the system rewards them for solving problems not debating
> >>> alternatives. Also, if a good idea comes out that will make it easier
> >>> for
> >>> the microbiologists to solve the problems they are rewarded for solving,
> >>> they are less likely to care whose idea it is.
> >>>
> >>>
> >>>
> >>> Like the microbiologists, there are lots of biologists that work with
> >>> species with the goal of addressing some non-taxonomic problem.
> >>>
> >>>
> >>>
> >>> They don't really care if the name is *Aedes triseriatus* or
> >>> *Ochlerotatus
> >>> triseriatus, *but they do care that the identifier that they connect
> >>> their
> >>> data to is stable.
> >>>
> >>>
> >>>
> >>> In regards to the issue of market forces,I suspect (but have no
> >>> knowledge
> >>> of) that there were probably decisions made in devising these lists that
> >>> have more to do with appeasing certain personalities that creating best
> >>> list. With the way this system rewards people it is likely that the
> >>> "correct" version will float to the top only after that person has
> >>> passed
> >>> away. I don't have much faith that the best system will always float to
> >>> the
> >>> top, That has a lot to do with the personalities and how the system
> >>> rewards
> >>> are setup. Theoretically, it is possible for one strong personality or
> >>> group
> >>> to force others to adopt their less than optimal solution - at least
> >>> this
> >>> seems to happen in other environments.
> >>>
> >>>
> >>>
> >>> Also, there are all sorts of ways that people can use the publication
> >>> record to rewrite history. Simply cite the review paper that cites the
> >>> original paper. Or don't cite it at all.
> >>>
> >>>
> >>>
> >>> I would have used only the ITIS TSN but if the name changes the ID
> >>> changes.
> >>> This isn't "wrong", it just does not solve my problem.
> >>>
> >>>
> >>>
> >>> * ITIS also should add the spiders from the World Spider Catalog.
> >>>
> >>>
> >>>
> >>> Another issue that I think has inhibited adoption of a common list is
> >>> that
> >>> people can't agree on a particular name or a particular classification.
> >>>
> >>>
> >>>
> >>> Since you can model a species concept as having many names and many
> >>> classifications why not do so?
> >>>
> >>>
> >>>
> >>> If this idea was originally accepted, I would not have needed to create
> >>> TaxonConcept.org.
> >>>
> >>>
> >>>
> >>> My plan has aways been to get something that works to solve some
> >>> problems
> >>> and then let some larger group take it over.
> >>>
> >>>
> >>>
> >>> In a sense, I am more like the microbiologists in that I am not being
> >>> paid
> >>> to solve this or debate this problem.
> >>>
> >>>
> >>>
> >>> I am doing it because I think something like this is needed, and it is
> >>> an
> >>> interesting and personally rewarding puzzle.
> >>>
> >>>
> >>>
> >>> - Pete
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> My thoughts are that the most likely way this will be solve is by
> >>> stnadard
> >>> market type pressures - ie the best solution/IDs will be used the most
> >>> and
> >>> "float" to the top. It is easy to say that the global taxon name data
> >>> is a
> >>> mess, but if you think about it 30 years ago taxon name data were very
> >>> disparate, duplicated, unconnected, many with NO IDs at all. So I
> >>> beleive
> >>> we are making progress and that we will continue to do so albeit at a
> >>> fairly
> >>> slow rate.
> >>>
> >>> Kevin
> >>>
> >>>
> >>>
> >>> "I agree. This was one of the reasons that I setup TaxonConcept the way
> >>> I
> >>> did. It attempts to connect both the LOD entities and the foreign key
> >>> based
> >>> entities."
> >>>
> >>> Please consider the environment before printing this email
> >>> Warning: This electronic message together with any attachments is
> >>> confidential. If you receive it in error: (i) you must not read, use,
> >>> disclose, copy or retain it; (ii) please contact the sender immediately
> >>> by
> >>> reply email and then delete the emails.
> >>> The views expressed in this email may not be those of Landcare
> Research
> >>> New
> >>> Zealand Limited. http://www.landcareresearch.co.nz
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> ------------------------------------------------------------------------------------
> >>> Pete DeVries
> >>> Department of Entomology
> >>> University of Wisconsin - Madison
> >>> 445 Russell Laboratories
> >>> 1630 Linden Drive
> >>> Madison, WI 53706
> >>> Email: pdevries(a)wisc.edu
> >>> TaxonConcept <http://www.taxonconcept.org/> &
> >>> GeoSpecies<http://about.geospecies.org/> Knowledge
> >>> Bases
> >>> A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
> >>>
> >>> --------------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> ------------------------------------------------------------------------------------
> >>> Pete DeVries
> >>> Department of Entomology
> >>> University of Wisconsin - Madison
> >>> 445 Russell Laboratories
> >>> 1630 Linden Drive
> >>> Madison, WI 53706
> >>> Email: pdevries(a)wisc.edu
> >>> TaxonConcept <http://www.taxonconcept.org/> &
> >>> GeoSpecies<http://about.geospecies.org/> Knowledge
> >>> Bases
> >>> A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
> >>>
> >>> --------------------------------------------------------------------------------------
> >>>
> >>> ------------------------------
> >>> Please consider the environment before printing this email
> >>> Warning: This electronic message together with any attachments is
> >>> confidential. If you receive it in error: (i) you must not read, use,
> >>> disclose, copy or retain it; (ii) please contact the sender immediately
> >>> by
> >>> reply email and then delete the emails.
> >>> The views expressed in this email may not be those of Landcare
> Research
> >>> New
> >>> Zealand Limited. http://www.landcareresearch.co.nz
> >>>
> >>>
> >>> --
> >>> Steven J. Baskauf, Ph.D., Senior Lecturer
> >>> Vanderbilt University Dept. of Biological Sciences
> >>>
> >>> postal mail address:
> >>> VU Station B 351634
> >>> Nashville, TN 37235-1634, U.S.A.
> >>>
> >>> delivery address:
> >>> 2125 Stevenson Center
> >>> 1161 21st Ave., S.
> >>> Nashville, TN 37235
> >>>
> >>> office: 2128 Stevenson Center
> >>> phone: (615) 343-4582, fax: (615)
> >>> 343-6707http://bioimages.vanderbilt.edu
> >>>
> >>>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------------------
> >> Pete DeVries
> >> Department of Entomology
> >> University of Wisconsin - Madison
> >> 445 Russell Laboratories
> >> 1630 Linden Drive
> >> Madison, WI 53706
> >> Email: pdevries(a)wisc.edu
> >> TaxonConcept <http://www.taxonconcept.org/> &
> >> GeoSpecies<http://about.geospecies.org/> Knowledge
> >> Bases
> >> A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
> >> --------------------------------------------------------------------------------------
> >> _______________________________________________
> >> tdwg-content mailing list
> >> tdwg-content(a)lists.tdwg.org
> >> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >>
> >
> >
> >
> > ----------------------------------------------------------------------------
> > David Remsen, Senior Programme Officer
> > Electronic Catalog of Names of Known Organisms
> > Global Biodiversity Information Facility Secretariat
> > Universitetsparken 15, DK-2100 Copenhagen, Denmark
> > Tel: +45-35321472 Fax: +45-35321480
> > Skype: dremsen
> > ----------------------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content(a)lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >
> _______________________________________________
> tdwg-content mailing list
> tdwg-content(a)lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________
tdwg-content mailing list
tdwg-content(a)lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
5
8
04 Jun '11
A new edition of this book just came out.
*Semantic Web for the Working Ontologist, Second Edition: Effective Modeling
in RDFS and OWL* [Paperback]
Dean Allemang, James Hendler
http://www.amazon.com/Semantic-Web-Working-Ontologist-Second/dp/0123859654/
Starting on page 326 there is a good section on OWL subsets and Modeling
Philosophy.
It describes provable models (decidable) and executable models.
There is a lot of good info in here, but I thought I would highlight these
sentences.
*"In fact, it is quite challenging to come up with a logical system that can
represent anything useful that is also decidable" *
also under Executable Models.
*"A different motivation for modeling in the Semantic Web is to form an
integrated picture of some sort of domain by federating information from
multiple sources"*
Individuals should really read the entire chapter and come to their own
conclusions, but I think these are issues that people need to think about.
To what extent can we design a system that actually is decidable?
How are these models going to be used, what is their purpose, and what are
some clear use cases?
Here is the use case that started me down path of linked data and the
semantic web.
I am in location X,Y and have a specimen of family Z.
What species of family Z are expected here and what are the characters that
can be used to differentiate them?
Give me a list of other resources that contain information about these
species.
Respectfully,
- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries(a)wisc.edu
TaxonConcept <http://www.taxonconcept.org/> &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
--------------------------------------------------------------------------------------
1
0
USDA Plants Growth Forms added to TaxonConcept Plants - Example SPARQL Queries
by Peter DeVries 03 Jun '11
by Peter DeVries 03 Jun '11
03 Jun '11
I have a new update to my data set. It is not in the cloud yet but you can
download it or run queries on my server.
See http://www.taxonconcept.org/rdf_and_sitemap/
The USDA Plants data set has lists of plants expected in each county and
state.
I have marked up this data as triples for several states in the Midwest as
well as Massachusetts and Texas.
I have also marked up the data for each Wisconsin County.
The USDA data set also has an attribute called "GrowthForm"
I have these attributes marked up in this vocabulary
OWL http://lod.taxonconcept.org/ontology/usda_plants.owl
Doc http://lod.taxonconcept.org/ontology/usda_plants_doc/index.html
Here are the Growth forms I have as triples.
usda_plant:Growth_Habit_Tree
usda_plant:Growth_Habit_Shrub
usda_plant:Growth_Habit_Subshrub
usda_plant:Growth_Habit_Forb_Herb
usda_plant:Growth_Habit_Graminoid
You can see this in this example from the Knowledge Base
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org…
bit.ly http://bit.ly/locSNl
You can now run the following kinds of queries:
What plants with the USDA Growth Form Forb_Herb are expected in Door County
WI.
PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX door_county_wi: <http://sws.geonames.org/5250768/>
PREFIX usda_plant: <http://lod.taxonconcept.org/ontology/usda_plants.owl#>
select distinct ?s, ?o as ?image, ?col_class, ?sciname where {
?s rdf:type txn:SpeciesConcept.
?s rdf:type usda_plant:Growth_Habit_Forb_Herb.
?s txn:isExpectedIn door_county_wi:.
?s txn:inCoLClass ?col_class.
?s txn:hasScientificName ?sciname.
optional {?s txn:thumbnail ?o.}.
}
ORDER BY ASC(?col_class)
limit 650
This link is the query encoded in a URI
<
http://lsd.taxonconcept.org/isparql/view/?query=PREFIX%20txn%3A%20%20%3Chtt…
>
Since this does not always go though email, here is the bit.ly link
http://bit.ly/j5PLzH
<http://bit.ly/j5PLzH>Since the a number of the TDWG BioBlitz Species are
included in the USDA Plants data set, those species concepts now have the
USDA attributes attached.
Since many of these related and now linked data sets exist under different
names, it would have been difficult for this to work without some identifier
like my species concepts.
Respectfully,
- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries(a)wisc.edu
TaxonConcept <http://www.taxonconcept.org/> &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
--------------------------------------------------------------------------------------
1
0
This is probably worth posting here.
Begin forwarded message:
> From: David Eades <dceades(a)illinois.edu>
> Date: 25 May 2011 00:17:20 CEST
> To: "'David Remsen (GBIF)'" <dremsen(a)gbif.org>
> Cc: "'Burke Chih-Jen Ko (GBIF)'" <bko(a)gbif.org>, 'Hernán Pereira' <ellocodelassembler(a)gmail.com>, "Flood, Rich" <jrflood(a)illinois.edu>
> Subject: error in TDWG geogrphic classification
> Reply-To: <dceades(a)illinois.edu>
>
> Dear David,
>
> Hernán Pereira and Rich Flood have located what we believe is an error in
> the TDWG data that has been widely used. The geographic coordinates
> associated with Sardinia actually form the outline of Corsica. The
> geographic coordinates associated with Corsica actually outline Sardinia.
> We got the data from TDWG in 2001. I have separately notified Kew, which
> has the same data we got from TDWG available for download at
> http://kew.org/gis/tdwg/.
>
> David
>
>
1
0