[tdwg-content] Merged RDF generated from diverse DwC archives, was Re: sample Darwin Core Archives of various flavors

Steve Baskauf steve.baskauf at vanderbilt.edu
Mon Nov 7 05:57:46 CET 2016


It's taken me a while to finish this project, but I have successfully 
turned Darwin Core Archives from GBIF and GGBN having Event, Taxon, and 
Occurrence cores into RDF, and merged them with RDF generated from an 
EOL traitbank archive and the pre-existing Bioimages RDF.  The resulting 
graph describes over 13 thousand distinct organisms and contains 8 
million triples.  I've compressed the entire non-Bioimages RDF graph 
into a downloadable zip file (the Bioimages graph is available 
elsewhere) so that anyone can play with it.  It's also available for 
querying at Vanderbilt's Heard Library SPARQL endpoint. 

You can read the gory details of the experiment at 
http://baskauf.blogspot.com/2016/11/guid-o-matic-meets-dwc-rdf-octopus.html 
and there should be enough details that anyone who is serious about this 
could replicate the exercise.  I doubt that many people will, but you 
are also welcome to read the blog post the same way I read National 
Geographic: skim the high points and look at the pictures.  You might 
read the descriptions of each of the five datasets and look at there 
octopus diagrams, then skip to the end and try pasting the SPARQL 
queries into the endpoint to try them out for yourself (or hack them to 
find out more interesting things).

Thanks Quentin Groom, Willem Coetzer, Gabi Dröge, and Anne Thessen for 
sharing their cool datasets.  I'm excited about this and am looking 
forward to playing more with the big merged graph when I have more time.

Steve

Steve Baskauf wrote:
> Markus and all,
>
> Thanks for the responses.  I have several great datasets that I've 
> downloaded from datasets at GBIF, so I have a lot to play with.  I've 
> got occurrence, taxon, and event core archives, which is great.  I'd 
> still be interested in an example of a material sample core archive if 
> anyone has a good example, particularly if it had rich extension files 
> linked to it.  I'll probably report on my experimentation in a future 
> blog post.
>
> Steve
>
> Markus Döring wrote:
>> Hi Steve,
>> you can find thousands of openly published dwc archives at the GBIF 
>> registry: http://www.gbif.org/dataset
>>
>> unless it is a biocase or digir resource   each details page contains 
>> the link to the dwca on the right side. Taxon and Event core datasets 
>> are always dwc-as 
>>
>> best,
>> markus
>>
>> Von meinem iPhone gesendet
>>
>> Am 25.10.2016 um 06:35 schrieb Quentin Groom 
>> <quentin.groom at plantentuinmeise.be 
>> <mailto:quentin.groom at plantentuinmeise.be>>:
>>
>>> Hi Steve,
>>> I have a sampling event dataset on GBIF, which is quite richly 
>>> populated.
>>> http://www.gbif.org/dataset/5d784d06-fa1d-4f00-8cdc-663d04d26061
>>> Regards
>>> Quentin
>>>
>>>
>>>
>>> Dr. Quentin Groom
>>> (Botany and Information Technology)
>>>
>>> Botanic Garden Meise
>>> Domein van Bouchout
>>> B-1860 Meise
>>> Belgium
>>>
>>> ORCID: 0000-0002-0596-5376 <http://orcid.org/0000-0002-0596-5376>
>>>
>>> Landline; +32 (0) 226 009 20 ext. 364
>>> FAX:      +32 (0) 226 009 45
>>>
>>> E-mail:     quentin.groom at plantentuinmeise.be 
>>> <mailto:quentin.groom at plantentuinmeise.be>
>>> Skype name: qgroom
>>> Website:    www.botanicgarden.be <http://www.botanicgarden.be>
>>>
>>>
>>> On 25 October 2016 at 02:06, Steve Baskauf 
>>> <steve.baskauf at vanderbilt.edu <mailto:steve.baskauf at vanderbilt.edu>> 
>>> wrote:
>>>
>>>     I have been playing around with turning Darwin Core Archives
>>>     into RDF [1] and would like to extend my experiments to
>>>     attempting to integrate data from archives that have very
>>>     different core files.  I have played with the
>>>     dwcaMolluscsAndorra.zip Occurrence archive mentioned in Annex 3
>>>     of the DwC-A How-To Guide [2], but the Whales-DWC-A.zip file
>>>     isn't available any more (I guess because of the demise of
>>>     Google Code).  A little bit of Google searching failed to turn
>>>     up additional obvious example files.
>>>
>>>     If anyone would be interested in making DwC-A archives available
>>>     to me, I'd appreciate it.  I'm particularly interested in
>>>     archives that have cores other than Occurrence (although an
>>>     Occurrence archive that has more complex data, including
>>>     extensions, than the mollusc example would be welcome).  If
>>>     there are any MaterialSample or Event core arcives, that would
>>>     be particularly interesting.
>>>     Preferably, I'd like to have access to archives that contain
>>>     publicly available data, since I'm likely to blog about the
>>>     outcome and potentially use the data in examples.  If there are
>>>     publicly available archives downloadable via a URL, you can
>>>     reply to the list - otherwise let me know how I could access
>>>     your example.
>>>
>>>     Thanks in advance for your help!
>>>     Steve Baskauf
>>>
>>>     [1]
>>>     http://baskauf.blogspot.com/2016/10/guid-o-matic-meets-darwin-core-archives.html
>>>     <http://baskauf.blogspot.com/2016/10/guid-o-matic-meets-darwin-core-archives.html>
>>>     [2] http://www.gbif.org/resource/80636
>>>     <http://www.gbif.org/resource/80636>
>>>
>>>     -- 
>>>     Steven J. Baskauf, Ph.D., Senior Lecturer
>>>     Vanderbilt University Dept. of Biological Sciences
>>>
>>>     postal mail address:
>>>     PMB 351634
>>>     Nashville, TN  37235-1634,  U.S.A.
>>>
>>>     delivery address:
>>>     2125 Stevenson Center
>>>     1161 21st Ave., S.
>>>     Nashville, TN 37235
>>>
>>>     office: 2128 Stevenson Center
>>>     phone: (615) 343-4582,  fax: (615) 322-4942
>>>     If you fax, please phone or email so that I will know to look
>>>     for it.
>>>     http://bioimages.vanderbilt.edu <http://bioimages.vanderbilt.edu>
>>>     http://vanderbilt.edu/trees
>>>
>>>
>>>     _______________________________________________
>>>     tdwg-content mailing list
>>>     tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>>>     http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>     <http://lists.tdwg.org/mailman/listinfo/tdwg-content>
>>>
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> PMB 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 322-4942
> If you fax, please phone or email so that I will know to look for it.
> http://bioimages.vanderbilt.edu
> http://vanderbilt.edu/trees
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
http://vanderbilt.edu/trees


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-content/attachments/20161106/da08d84d/attachment.html>


More information about the tdwg-content mailing list