It's taken me a while to finish this project, but I have successfully turned Darwin Core Archives from GBIF and GGBN having Event, Taxon, and Occurrence cores into RDF, and merged them with RDF generated from an EOL traitbank archive and the pre-existing Bioimages RDF. The resulting graph describes over 13 thousand distinct organisms and contains 8 million triples. I've compressed the entire non-Bioimages RDF graph into a downloadable zip file (the Bioimages graph is available elsewhere) so that anyone can play with it. It's also available for querying at Vanderbilt's Heard Library SPARQL endpoint.
You can read the gory details of the experiment at http://baskauf.blogspot.com/2016/11/guid-o-matic-meets-dwc-rdf-octopus.html and there should be enough details that anyone who is serious about this could replicate the exercise. I doubt that many people will, but you are also welcome to read the blog post the same way I read National Geographic: skim the high points and look at the pictures. You might read the descriptions of each of the five datasets and look at there octopus diagrams, then skip to the end and try pasting the SPARQL queries into the endpoint to try them out for yourself (or hack them to find out more interesting things).
Thanks Quentin Groom, Willem Coetzer, Gabi Dröge, and Anne Thessen for sharing their cool datasets. I'm excited about this and am looking forward to playing more with the big merged graph when I have more time.
Steve
Steve Baskauf wrote:
Markus and all,
Thanks for the responses. I have several great datasets that I've downloaded from datasets at GBIF, so I have a lot to play with. I've got occurrence, taxon, and event core archives, which is great. I'd still be interested in an example of a material sample core archive if anyone has a good example, particularly if it had rich extension files linked to it. I'll probably report on my experimentation in a future blog post.
Steve
Markus Döring wrote:
Hi Steve, you can find thousands of openly published dwc archives at the GBIF registry: http://www.gbif.org/dataset
unless it is a biocase or digir resource each details page contains the link to the dwca on the right side. Taxon and Event core datasets are always dwc-as
best, markus
Von meinem iPhone gesendet
Am 25.10.2016 um 06:35 schrieb Quentin Groom <quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be>:
Hi Steve, I have a sampling event dataset on GBIF, which is quite richly populated. http://www.gbif.org/dataset/5d784d06-fa1d-4f00-8cdc-663d04d26061 Regards Quentin
Dr. Quentin Groom (Botany and Information Technology)
Botanic Garden Meise Domein van Bouchout B-1860 Meise Belgium
ORCID: 0000-0002-0596-5376 http://orcid.org/0000-0002-0596-5376
Landline; +32 (0) 226 009 20 ext. 364 FAX: +32 (0) 226 009 45
E-mail: quentin.groom@plantentuinmeise.be mailto:quentin.groom@plantentuinmeise.be Skype name: qgroom Website: www.botanicgarden.be http://www.botanicgarden.be
On 25 October 2016 at 02:06, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
I have been playing around with turning Darwin Core Archives into RDF [1] and would like to extend my experiments to attempting to integrate data from archives that have very different core files. I have played with the dwcaMolluscsAndorra.zip Occurrence archive mentioned in Annex 3 of the DwC-A How-To Guide [2], but the Whales-DWC-A.zip file isn't available any more (I guess because of the demise of Google Code). A little bit of Google searching failed to turn up additional obvious example files. If anyone would be interested in making DwC-A archives available to me, I'd appreciate it. I'm particularly interested in archives that have cores other than Occurrence (although an Occurrence archive that has more complex data, including extensions, than the mollusc example would be welcome). If there are any MaterialSample or Event core arcives, that would be particularly interesting. Preferably, I'd like to have access to archives that contain publicly available data, since I'm likely to blog about the outcome and potentially use the data in examples. If there are publicly available archives downloadable via a URL, you can reply to the list - otherwise let me know how I could access your example. Thanks in advance for your help! Steve Baskauf [1] http://baskauf.blogspot.com/2016/10/guid-o-matic-meets-darwin-core-archives.html <http://baskauf.blogspot.com/2016/10/guid-o-matic-meets-darwin-core-archives.html> [2] http://www.gbif.org/resource/80636 <http://www.gbif.org/resource/80636> -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu <http://bioimages.vanderbilt.edu> http://vanderbilt.edu/trees _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content <http://lists.tdwg.org/mailman/listinfo/tdwg-content>
tdwg-content mailing list tdwg-content@lists.tdwg.org mailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees