Re: [tdwg-content] status of uBio
I’d be glad to host it here as well.
Chuck
From: jonathan.rees@gmail.com [mailto:jonathan.rees@gmail.com] On Behalf Of Jonathan A Rees Sent: Thursday, October 15, 2015 11:39 AM To: Chuck Miller Cc: Steve Baskauf; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] status of uBio
Even just providing a dump file of it somewhere would be a big advance over the current situation. I'd be happy to host it, if I can get ahold of it (of course assuming the blessing of whomever is responsible).
Good intentions. However, uBio has historically expressly not allowed the download or redistribution of its content in bulk. (Perhaps they’ve changed this lately, but if so, I’m not aware that they have.)
Science would be a lot farther than it is if everyone were less possessive about their data, taxonomies, source code, etc.
-hilmar
On Oct 15, 2015, at 12:41 PM, Chuck Miller <Chuck.Miller@mobot.orgmailto:Chuck.Miller@mobot.org> wrote:
I’d be glad to host it here as well.
Chuck
From: jonathan.rees@gmail.commailto:jonathan.rees@gmail.com [mailto:jonathan.rees@gmail.com] On Behalf Of Jonathan A Rees Sent: Thursday, October 15, 2015 11:39 AM To: Chuck Miller Cc: Steve Baskauf; tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] status of uBio
Even just providing a dump file of it somewhere would be a big advance over the current situation. I'd be happy to host it, if I can get ahold of it (of course assuming the blessing of whomever is responsible).
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- genome.duke.eduhttp://genome.duke.edu -:- lappland.io
Good intentions. However, uBio has historically expressly not allowed the download or redistribution of its content in bulk. (Perhaps they’ve changed this lately, but if so, I’m not aware that they have.)
Is there still a 'they' that can respond? If there is no longer a 'they', is this a case of a resource that does more harm in being revived than in turning it off? I did not see evidence of activity in uBio prior to its disappearance; we ought to strive for resources that can acquire and integrate new content with minimal effort and be able to illustrate that. Perhaps we should solicit the once present 'they' and current entities/individuals that have assumed ownership to release the content & functionality for reabsorption elsewhere.
David P. Shorthouse
For me it's not a question of supporting new users or promoting uBio or keeping up with curation activity; it's a matter of supporting existing uses and promoting scientific reproducibility. I know of two projects that use uBio (treebase and phylopic) and would be surprised if they weren't just the tip of the iceberg. The identifiers could be buried in papers, data files, software, and so on.
Moving on to other resources, and using uBio as an input, are fine ideas, but it's not either/or. It's highly desirable to retain the ability to resolve current uBio identifiers somehow, even if the means for doing so are awkward (e.g. scanning a dump file or using an API not designed for the purpose).
But I suspect Hilmar is right, and all those identifiers will become inert, if not now then within a few years.
Jonathan
On Oct 15, 2015, at 1:20 PM, Shorthouse, David david.shorthouse@umontreal.ca wrote:
Good intentions. However, uBio has historically expressly not allowed the download or redistribution of its content in bulk. (Perhaps they’ve changed this lately, but if so, I’m not aware that they have.)
Is there still a 'they' that can respond? If there is no longer a 'they', is this a case of a resource that does more harm in being revived than in turning it off? I did not see evidence of activity in uBio prior to its disappearance; we ought to strive for resources that can acquire and integrate new content with minimal effort and be able to illustrate that. Perhaps we should solicit the once present 'they' and current entities/individuals that have assumed ownership to release the content & functionality for reabsorption elsewhere.
David P. Shorthouse _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
I got this uBio status update from David Remsen, who is at MBL but no longer involved with uBio:
"uBio and associated systems (such as Nomenclator Zoologicus) are installed on machines for which the supporting software has not been updated in some time and was crashing regularly. It is currently offline but Dmitry Mozzherin has a solution to host it in a Docker container that would stabilize it for use. We are looking to have it back in service in the next weeks before Dima heads to Champaign and Species File Software."
Chuck
-----Original Message----- From: davidpshorthouse@gmail.com [mailto:davidpshorthouse@gmail.com] On Behalf Of Shorthouse, David Sent: Thursday, October 15, 2015 12:21 PM To: Hilmar Lapp Cc: Chuck Miller; tdwg-content@lists.tdwg.org; Jonathan A Rees Subject: Re: [tdwg-content] status of uBio
Good intentions. However, uBio has historically expressly not allowed the download or redistribution of its content in bulk. (Perhaps they’ve changed this lately, but if so, I’m not aware that they have.)
Is there still a 'they' that can respond? If there is no longer a 'they', is this a case of a resource that does more harm in being revived than in turning it off? I did not see evidence of activity in uBio prior to its disappearance; we ought to strive for resources that can acquire and integrate new content with minimal effort and be able to illustrate that. Perhaps we should solicit the once present 'they' and current entities/individuals that have assumed ownership to release the content & functionality for reabsorption elsewhere.
David P. Shorthouse
I had been administering uBio for the last year, but now I am moving from MBL. uBio machine is in a bad shape, and it crashes after a few hours of work. My plan is to create Docker containers for database, code and data, which should make whole system much more stable, and much more manageable. Good news I will definitely try my best to do it, the bad news I am spread thinner than usual with move, transferring hardware and grant, GN things, EOL things, and figuring out what to do with the house etc. uBio 'code' part is about 35 Gb, which makes the task more complicated, but I am quite optimistic that I will be able to make containers and put them either on an MBL machine, run it from University of Illinois, or give it to Naturalis -- depending on what will make more sense for Dave Remsen, MBL and all interested in the project.
On Thu, Oct 15, 2015 at 4:44 PM, Chuck Miller Chuck.Miller@mobot.org wrote:
I got this uBio status update from David Remsen, who is at MBL but no longer involved with uBio:
"uBio and associated systems (such as Nomenclator Zoologicus) are installed on machines for which the supporting software has not been updated in some time and was crashing regularly. It is currently offline but Dmitry Mozzherin has a solution to host it in a Docker container that would stabilize it for use. We are looking to have it back in service in the next weeks before Dima heads to Champaign and Species File Software."
Chuck
-----Original Message----- From: davidpshorthouse@gmail.com [mailto:davidpshorthouse@gmail.com] On Behalf Of Shorthouse, David Sent: Thursday, October 15, 2015 12:21 PM To: Hilmar Lapp Cc: Chuck Miller; tdwg-content@lists.tdwg.org; Jonathan A Rees Subject: Re: [tdwg-content] status of uBio
Good intentions. However, uBio has historically expressly not allowed the download or redistribution of its content in bulk. (Perhaps they’ve changed this lately, but if so, I’m not aware that they have.)
Is there still a 'they' that can respond? If there is no longer a 'they', is this a case of a resource that does more harm in being revived than in turning it off? I did not see evidence of activity in uBio prior to its disappearance; we ought to strive for resources that can acquire and integrate new content with minimal effort and be able to illustrate that. Perhaps we should solicit the once present 'they' and current entities/individuals that have assumed ownership to release the content & functionality for reabsorption elsewhere.
David P. Shorthouse _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks all for the information and comments about the status of uBio. I'm glad to hear that the server will probably come back up. If not, then I hope the data will be made available to those who said they would be willing to host it.
I have been interested in using the uBio identifiers for several reasons: 1. They have managed to stick around for a long time and are stable in their format (as LSIDs and HTTP proxied LSIDs). 2. The coverage of names is really good for plants, animals, different geographic locations, etc. I also use ITIS identifiers but it's fairly common for me to not be able to find one for the name I need, which almost never happens with uBio. 3. It's somewhat clear what uBio identifiers refer to: names vs. something more nebulous involving taxa or ... something. (Not trying to push your button, Rich Pyle). 4. You can actually get RDF associated with the LSID version of the uBio identifiers. I was wanting to download some to play with in our new triplestore (http://rdf.library.vanderbilt.edu) when I discovered that the server was down. The RDF is somewhat ad hoc, but hey, it's there.
There isn't really any other source that has all of these characteristics. So please keep uBio going indefinitely, if at all possible. Steve
Dmitry Mozzherin wrote:
I had been administering uBio for the last year, but now I am moving from MBL. uBio machine is in a bad shape, and it crashes after a few hours of work. My plan is to create Docker containers for database, code and data, which should make whole system much more stable, and much more manageable. Good news I will definitely try my best to do it, the bad news I am spread thinner than usual with move, transferring hardware and grant, GN things, EOL things, and figuring out what to do with the house etc. uBio 'code' part is about 35 Gb, which makes the task more complicated, but I am quite optimistic that I will be able to make containers and put them either on an MBL machine, run it from University of Illinois, or give it to Naturalis -- depending on what will make more sense for Dave Remsen, MBL and all interested in the project.
Hello Steve, Thanks for triggering an interesting thread. Just waving the IPNI flag for a minute:
1. They have managed to stick around for a long time and are stable in their format (as LSIDs and HTTP proxied LSIDs). At ipni.org we support our own HTTP proxy for LSIDs: http://ipni.org/urn:lsid:ipni.org:names:12345-1 … and it’s been an age since I tried to resolve an LSID using the formal LSID spec but a quick run-through today shows that all the steps appear to be in working order.
2. The coverage of names is really good for plants, animals, different geographic locations, etc. I also use ITIS identifiers but it's fairly common for me to not be able to find one for the name I need, which almost never happens with uBio.
IPNI is comprehensive for vascular plants (at species level). We’ll be addressing the data gaps at infra-specific level – but it’s very useful for us to be armed with reasons (from users like yourself) as to why we should spend time doing this.
3. It's somewhat clear what uBio identifiers refer to: names vs. something more nebulous involving taxa or ... something. (Not trying to push your button, Rich Pyle). IPNI only serves data about names, no taxa here. We are pushing IPNI IDs into our taxonomic resources so that a user can flexibly match a name and get an IPNI identifier (the nomenclatural part), and then as a later step query a taxonomic resource for their current view as to the taxonomic status of that name. We are aiming for a clean separation of names matching from the (multiple, potentially different) uses of those names to form taxonomies.
4. You can actually get RDF associated with the LSID version of the uBio identifiers. I was wanting to download some to play with in our new triplestore (http://rdf.library.vanderbilt.edu) when I discovered that the server was down. The RDF is somewhat ad hoc, but hey, it's there.
The HTTP proxy above returns RDF metadata for the specified record. New developments (most of the functionality outlined above is 8-10 years old), I’ve been working on exposing IPNI data through standard match services – namely the Open Refine reconciliation API, which permits much more flexible names look-ups using a lot of heuristics gathered from matching data port exercises carried out at Kew over the past few years. Currently the main hitch with this service is that it includes duplicate records. This is something we’re working on resolving right now. The service is outlined here: http://data1.kew.org/reconciliation/ and details of the IPNI service in particular are here: http://data1.kew.org/reconciliation/about/IpniName
If you’re interested in using IPNI, I’d be happy to hear any comments on the functionality above and / or any requests as to how we can make it more useful for you.
cheers, Nicky
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: 16 October 2015 14:09 To: Dmitry Mozzherin dmozzherin@gmail.com Cc: Chuck Miller (Contact) chuck.miller@mobot.org; tdwg-content@lists.tdwg.org; Jonathan A Rees rees@mumble.net; Shorthouse, David david.shorthouse@umontreal.ca Subject: Re: [tdwg-content] status of uBio
Thanks all for the information and comments about the status of uBio. I'm glad to hear that the server will probably come back up. If not, then I hope the data will be made available to those who said they would be willing to host it.
I have been interested in using the uBio identifiers for several reasons: 1. They have managed to stick around for a long time and are stable in their format (as LSIDs and HTTP proxied LSIDs). 2. The coverage of names is really good for plants, animals, different geographic locations, etc. I also use ITIS identifiers but it's fairly common for me to not be able to find one for the name I need, which almost never happens with uBio. 3. It's somewhat clear what uBio identifiers refer to: names vs. something more nebulous involving taxa or ... something. (Not trying to push your button, Rich Pyle). 4. You can actually get RDF associated with the LSID version of the uBio identifiers. I was wanting to download some to play with in our new triplestore (http://rdf.library.vanderbilt.edu) when I discovered that the server was down. The RDF is somewhat ad hoc, but hey, it's there.
There isn't really any other source that has all of these characteristics. So please keep uBio going indefinitely, if at all possible. Steve
Dmitry Mozzherin wrote: I had been administering uBio for the last year, but now I am moving from MBL. uBio machine is in a bad shape, and it crashes after a few hours of work. My plan is to create Docker containers for database, code and data, which should make whole system much more stable, and much more manageable. Good news I will definitely try my best to do it, the bad news I am spread thinner than usual with move, transferring hardware and grant, GN things, EOL things, and figuring out what to do with the house etc. uBio 'code' part is about 35 Gb, which makes the task more complicated, but I am quite optimistic that I will be able to make containers and put them either on an MBL machine, run it from University of Illinois, or give it to Naturalis -- depending on what will make more sense for Dave Remsen, MBL and all interested in the project.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
It's somewhat clear what uBio identifiers refer to: names vs.
something more nebulous involving taxa or ... something.
(Not trying to push your button, Rich Pyle).
:-)
Actually, saying that uBio identifiers refer to “names” is a bit like saying uBio identifiers refer to “stuff” (which is only slightly more ambiguous than “names”). I think what you mean (a bit more explicitly) is that uBio identifiers refer to “name-strings”, wherein each unique literal UTF-8-encoded string of characters purported to represent a scientific name receives an integer as an identifier.
GNI also assigns unique identifiers to such name-strings, but the difference is that the GNI identifier is a hash of the string itself. Thus, given a string, you can algorithmically derive the GNI identifier for it. There is no way to algorithmically convert the uBio name-strings into their corresponding integer uBio identifiers.
I have a question, and a suggestion.
Question: Are new uBio integer identifiers ever going to be minted? Or do we just want to maintain them for legacy purposes?
Suggestion: Regardless of the answer to the question, I suggest that Dima generate an index for uBio integer identifiers and the corresponding hash uuid used by GNI. I will import these into BioGUID.org, so going forward we will always have an index to translate the uBio integers into the GN hashed UUIDs. From there, it would be a simple step to allow GN services to process uBio identifiers natively, and do all the cool things that Steve (and others) would like uBio/GN to do.
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Friday, October 16, 2015 3:09 AM To: Dmitry Mozzherin Cc: Chuck Miller; tdwg-content@lists.tdwg.org; Jonathan A Rees; Shorthouse, David Subject: Re: [tdwg-content] status of uBio
Thanks all for the information and comments about the status of uBio. I'm glad to hear that the server will probably come back up. If not, then I hope the data will be made available to those who said they would be willing to host it.
I have been interested in using the uBio identifiers for several reasons: 1. They have managed to stick around for a long time and are stable in their format (as LSIDs and HTTP proxied LSIDs). 2. The coverage of names is really good for plants, animals, different geographic locations, etc. I also use ITIS identifiers but it's fairly common for me to not be able to find one for the name I need, which almost never happens with uBio. 3. It's somewhat clear what uBio identifiers refer to: names vs. something more nebulous involving taxa or ... something. (Not trying to push your button, Rich Pyle). 4. You can actually get RDF associated with the LSID version of the uBio identifiers. I was wanting to download some to play with in our new triplestore (http://rdf.library.vanderbilt.edu) when I discovered that the server was down. The RDF is somewhat ad hoc, but hey, it's there.
There isn't really any other source that has all of these characteristics. So please keep uBio going indefinitely, if at all possible. Steve
Dmitry Mozzherin wrote:
I had been administering uBio for the last year, but now I am moving from MBL. uBio machine is in a bad shape, and it crashes after a few hours of work. My plan is to create Docker containers for database, code and data, which should make whole system much more stable, and much more manageable. Good news I will definitely try my best to do it, the bad news I am spread thinner than usual with move, transferring hardware and grant, GN things, EOL things, and figuring out what to do with the house etc. uBio 'code' part is about 35 Gb, which makes the task more complicated, but I am quite optimistic that I will be able to make containers and put them either on an MBL machine, run it from University of Illinois, or give it to Naturalis -- depending on what will make more sense for Dave Remsen, MBL and all interested in the project.
All name strings from uBio are in GNI and they do have UUIDs v5 generated from them as it is described in this blog post - http://globalnames.org/news/2015/05/31/gn-uuid-0-5-0. And we do have uBio IDs associated with these UUIDs via GNI.
On Fri, Oct 16, 2015 at 3:25 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
It's somewhat clear what uBio identifiers refer to: names vs.
something more nebulous involving taxa or ... something.
(Not trying to push your button, Rich Pyle).
:-)
Actually, saying that uBio identifiers refer to “names” is a bit like saying uBio identifiers refer to “stuff” (which is only slightly more ambiguous than “names”). I think what you mean (a bit more explicitly) is that uBio identifiers refer to “name-strings”, wherein each unique literal UTF-8-encoded string of characters purported to represent a scientific name receives an integer as an identifier.
GNI also assigns unique identifiers to such name-strings, but the difference is that the GNI identifier is a hash of the string itself. Thus, given a string, you can algorithmically derive the GNI identifier for it. There is no way to algorithmically convert the uBio name-strings into their corresponding integer uBio identifiers.
I have a question, and a suggestion.
Question: Are new uBio integer identifiers ever going to be minted? Or do we just want to maintain them for legacy purposes?
Suggestion: Regardless of the answer to the question, I suggest that Dima generate an index for uBio integer identifiers and the corresponding hash uuid used by GNI. I will import these into BioGUID.org, so going forward we will always have an index to translate the uBio integers into the GN hashed UUIDs. From there, it would be a simple step to allow GN services to process uBio identifiers natively, and do all the cool things that Steve (and others) would like uBio/GN to do.
Aloha,
Rich
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Steve Baskauf *Sent:* Friday, October 16, 2015 3:09 AM *To:* Dmitry Mozzherin *Cc:* Chuck Miller; tdwg-content@lists.tdwg.org; Jonathan A Rees; Shorthouse, David *Subject:* Re: [tdwg-content] status of uBio
Thanks all for the information and comments about the status of uBio. I'm glad to hear that the server will probably come back up. If not, then I hope the data will be made available to those who said they would be willing to host it.
I have been interested in using the uBio identifiers for several reasons:
- They have managed to stick around for a long time and are stable in
their format (as LSIDs and HTTP proxied LSIDs). 2. The coverage of names is really good for plants, animals, different geographic locations, etc. I also use ITIS identifiers but it's fairly common for me to not be able to find one for the name I need, which almost never happens with uBio. 3. It's somewhat clear what uBio identifiers refer to: names vs. something more nebulous involving taxa or ... something. (Not trying to push your button, Rich Pyle). 4. You can actually get RDF associated with the LSID version of the uBio identifiers. I was wanting to download some to play with in our new triplestore (http://rdf.library.vanderbilt.edu) when I discovered that the server was down. The RDF is somewhat ad hoc, but hey, it's there.
There isn't really any other source that has all of these characteristics. So please keep uBio going indefinitely, if at all possible. Steve
Dmitry Mozzherin wrote:
I had been administering uBio for the last year, but now I am moving from MBL. uBio machine is in a bad shape, and it crashes after a few hours of work. My plan is to create Docker containers for database, code and data, which should make whole system much more stable, and much more manageable. Good news I will definitely try my best to do it, the bad news I am spread thinner than usual with move, transferring hardware and grant, GN things, EOL things, and figuring out what to do with the house etc. uBio 'code' part is about 35 Gb, which makes the task more complicated, but I am quite optimistic that I will be able to make containers and put them either on an MBL machine, run it from University of Illinois, or give it to Naturalis -- depending on what will make more sense for Dave Remsen, MBL and all interested in the project.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
PMB 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
participants (8)
-
Chuck Miller
-
Dmitry Mozzherin
-
Hilmar Lapp
-
Jonathan A Rees
-
Nicky Nicolson
-
Richard Pyle
-
Shorthouse, David
-
Steve Baskauf