[Tdwg-guid] LSIDs for PubMed, GenBank, and NCBI Taxonomy
The web server that hosts most of my LSID work (including the Taxonomy Search Engine) got hacked recently. Rebuilding it is taking time, but one positive outcome is I'm trying to clean up some LSID stuff.
I've rebuilt the LSID authority for PubMed, GenBank, and NCBI taxonomy (by hacking Roger Hyam's PHP code - IBM's Perl stack is giving me grief on a Fedora Core 4 box). These are, of course, experimental, but hopefully the RDF will be of interest. I've tried to use standard vocabularies, and link between records wherever possible (e.g., a PubMed record for a paper will list sequences referred to in that paper, a sequence will link back to the paper in which it was published, etc.). I've set this up partly to support my attempt to build a triple store for ants (see http://iphylo.blogspot.com/2006/05/ants-rdf-and-triple-stores.html), which I hope to have running in time for GUID2.
You can try out the LSIDs, which have the form
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk: pubmed or gi or taxon : id
e.g.
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:pubmed:16601190
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:gi:87047074
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:taxon:369204
There's a lot more which could be added to the metadata, but I hope this is of interest.
Regards
Rod
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
___________________________________________________________ Win tickets to the 2006 FIFA World Cup Germany with Yahoo! Messenger. http://advision.webevents.yahoo.com/fifaworldcup_uk/
Hi Rod, We have implementations of NCBI Pubmed, Genbank, Protein, and OMIM. Gene is on the way. I'm in the process of updating our implementations since the NCBI upgraded their web service to 1.4. I have created OWL ontologies for the various databases that you might find useful in your implementation. They will be posted online soon but I could send them to you if you like.
- Ben
tdwg-guid-bounces@mailman.nhm.ku.edu wrote on 05/06/2006 06:12:13 AM:
The web server that hosts most of my LSID work (including the Taxonomy Search Engine) got hacked recently. Rebuilding it is taking time, but one positive outcome is I'm trying to clean up some LSID stuff.
I've rebuilt the LSID authority for PubMed, GenBank, and NCBI taxonomy (by hacking Roger Hyam's PHP code - IBM's Perl stack is giving me grief on a Fedora Core 4 box). These are, of course, experimental, but hopefully the RDF will be of interest. I've tried to use standard vocabularies, and link between records wherever possible (e.g., a PubMed record for a paper will list sequences referred to in that paper, a sequence will link back to the paper in which it was published, etc.). I've set this up partly to support my attempt to build a triple store for ants (see http://iphylo.blogspot.com/2006/05/ants-rdf-and-triple-stores.html), which I hope to have running in time for GUID2.
You can try out the LSIDs, which have the form
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk: pubmed or gi or taxon : id
e.g.
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:pubmed:16601190
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:gi:87047074
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:taxon:369204
There's a lot more which could be added to the metadata, but I hope this is of interest.
Regards
Rod
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Win tickets to the 2006 FIFA World Cup Germany with Yahoo! Messenger. http://advision.webevents.yahoo.com/fifaworldcup_uk/
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
Dear Ben,
Who do I contact regarding the lsid.biopathways.org resolver? It's a bit flaky, and I think the DNS SRV record is out of date. My crude LSID tester reports errors (e.g., http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/? q=urn%3Alsid%3Ancbi.nlm.nih.gov.lsid.biopathways.org%3Apubmed%3A12441807 ), which I suspect is because the SRV record states the resolver is on port 9090, when it's actually on port 80 (at least, I get a WSDL from http://lsid.biopathways.org:80/authority/ but not http://lsid.biopathways.org:9090/authority/).
As an aside, I guess this is an interesting management issue -- how does one discover who to complain to when an LSID authority is broken?
I've had a quick look at the PubMed stuff that I did manage to get out of the Biopathways site. My concern about this is that it perhaps models the PubMed record a little too closely (in the same way that a lot of NCBI's XML is horrendous as it models the underlying ASN.1 - yuck). If we're going to integrate this stuff with other providers (e.g., RDF feeds from the publishing industry or projects like Connotea) then I think we need to adopt generic vocabularies (such as Dublin Core and PRISM), rather than object specific ones. I think the last thing a user wants to do is have to learn a new vocabulary for each data source. That rather defeats the dream of easy integration.
Regards
Rod
On 6 May 2006, at 15:43, Benjamin H Szekely wrote:
Hi Rod, We have implementations of NCBI Pubmed, Genbank, Protein, and OMIM. Gene is on the way. I'm in the process of updating our implementations since the NCBI upgraded their web service to 1.4. I have created OWL ontologies for the various databases that you might find useful in your implementation. They will be posted online soon but I could send them to you if you like.
- Ben
tdwg-guid-bounces@mailman.nhm.ku.edu wrote on 05/06/2006 06:12:13 AM:
The web server that hosts most of my LSID work (including the
Taxonomy
Search Engine) got hacked recently. Rebuilding it is taking time,
but
one positive outcome is I'm trying to clean up some LSID stuff.
I've rebuilt the LSID authority for PubMed, GenBank, and NCBI
taxonomy
(by hacking Roger Hyam's PHP code - IBM's Perl stack is giving me
grief
on a Fedora Core 4 box). These are, of course, experimental, but hopefully the RDF will be of interest. I've tried to use standard vocabularies, and link between records wherever possible (e.g., a PubMed record for a paper will list sequences referred to in that paper, a sequence will link back to the paper in which it was published, etc.). I've set this up partly to support my attempt to build a triple store for ants (see
http://iphylo.blogspot.com/2006/05/ants-rdf-and-triple-stores.html),
which I hope to have running in time for GUID2.
You can try out the LSIDs, which have the form
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk: pubmed or gi or
taxon : id
e.g.
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:pubmed:16601190
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:gi:87047074
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:taxon:369204
There's a lot more which could be added to the metadata, but I hope
this is of interest.
Regards
Rod
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names:
http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
___________________________________________________________ Win tickets to the 2006 FIFA World Cup Germany with Yahoo!
Messenger.
http://advision.webevents.yahoo.com/fifaworldcup_uk/
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com
Hi Rod, The Perl implemented authorities run on port 80 and the Java ones (which are NCBI etc..) run on port 9090. However, NCBI recently upated their web services causing our Java stuff to break. We are in the midst of fixing it. As for the ontologies themselves, I couldn't agree more. However, the task of implementing the ontologies was so daunting that as a first pass, I just did a naive transliteration from their Web Service XML format to RDF and wrote Axis-Jastor translation code in the middle. My hope is to begin replacing predicates with DC predicates where possible, and have domain experts suggest ontologies that we could reuse for the more bio-specific predicates.
- Ben
Roderic Page r.page@bio.gla.ac.uk wrote on 05/08/2006 04:12:16 AM:
Dear Ben,
Who do I contact regarding the lsid.biopathways.org resolver? It's a bit flaky, and I think the DNS SRV record is out of date. My crude LSID tester reports errors (e.g., http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/? q=urn%3Alsid%3Ancbi.nlm.nih.gov.lsid.biopathways.org%3Apubmed%3A12441807
), which I suspect is because the SRV record states the resolver is on port 9090, when it's actually on port 80 (at least, I get a WSDL from http://lsid.biopathways.org:80/authority/ but not http://lsid.biopathways.org:9090/authority/).
As an aside, I guess this is an interesting management issue -- how does one discover who to complain to when an LSID authority is broken?
I've had a quick look at the PubMed stuff that I did manage to get out of the Biopathways site. My concern about this is that it perhaps models the PubMed record a little too closely (in the same way that a lot of NCBI's XML is horrendous as it models the underlying ASN.1 - yuck). If we're going to integrate this stuff with other providers (e.g., RDF feeds from the publishing industry or projects like Connotea) then I think we need to adopt generic vocabularies (such as Dublin Core and PRISM), rather than object specific ones. I think the last thing a user wants to do is have to learn a new vocabulary for each data source. That rather defeats the dream of easy integration.
Regards
Rod
On 6 May 2006, at 15:43, Benjamin H Szekely wrote:
Hi Rod, We have implementations of NCBI Pubmed, Genbank, Protein, and OMIM. Gene is on the way. I'm in the process of updating our implementations since the NCBI upgraded their web service to 1.4. I have created OWL ontologies for the various databases that you might find useful in your implementation. They will be posted online soon but I could send them to you if you like.
- Ben
tdwg-guid-bounces@mailman.nhm.ku.edu wrote on 05/06/2006 06:12:13 AM:
The web server that hosts most of my LSID work (including the
Taxonomy
Search Engine) got hacked recently. Rebuilding it is taking time,
but
one positive outcome is I'm trying to clean up some LSID stuff.
I've rebuilt the LSID authority for PubMed, GenBank, and NCBI
taxonomy
(by hacking Roger Hyam's PHP code - IBM's Perl stack is giving me
grief
on a Fedora Core 4 box). These are, of course, experimental, but hopefully the RDF will be of interest. I've tried to use standard vocabularies, and link between records wherever possible (e.g., a
PubMed record for a paper will list sequences referred to in that paper, a sequence will link back to the paper in which it was published, etc.). I've set this up partly to support my attempt to
build a triple store for ants (see
http://iphylo.blogspot.com/2006/05/ants-rdf-and-triple-stores.html),
which I hope to have running in time for GUID2.
You can try out the LSIDs, which have the form
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk: pubmed or gi or
taxon : id
e.g.
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:pubmed:16601190
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:gi:87047074
urn:lsid:ncbi.nlm.nih.gov.lsid.zoology.gla.ac.uk:taxon:369204
There's a lot more which could be added to the metadata, but I hope
this is of interest.
Regards
Rod
-----------------------------------------------------------------------
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names:
http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Win tickets to the 2006 FIFA World Cup Germany with Yahoo!
Messenger.
http://advision.webevents.yahoo.com/fifaworldcup_uk/
TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends
participants (2)
-
Benjamin H Szekely
-
Roderic Page