[tdwg-tag] DwC-A and JSON

Alex Thompson godfoder at acis.ufl.edu
Fri Aug 21 19:02:00 CEST 2015


As someone who is normally a big proponent of JSON as a general 
information representation format, I'd have to say that you're likely to 
run into a myriad of issues with this. Chief among those would be that 
JSON doesn't tend to play well with progressive decoding - most JSON 
libraries force you to parse and decode the entire file, often in 
memory, before you have access to any of the information. This works 
fine for things like APIs where the information is generally quite 
small, but for something a DwC-A, where each item can have 40-50 
properties, and there can be hundreds of thousands of items, it quickly 
gets un-manageable. There are progressive json decoding libraries in 
most languages, but it is a hurdle to effective usage, since they 
normally aren't part of the standard library packages.

The two major strategies around this are either using some kind of 
hybrid JSON-delimited (normally new line or null byte) format, or 
writing hundreds of thousands of individual JSON files into a zip or tar 
archive directly without first writing them to disk. I've tried both, 
and I don't really like either of them for anything more than a quick hack.

In terms of advanced serializations for DwC-A type data, I'd much rather 
see something like DwC-SQLite, or DwC-HDF5 that would start to give us 
some real tools to work with something other than a star schema.

- Alex

P.S.
You could always do this:

meta.xml:
<core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" 
fieldsEnclosedBy="" ignoreHeaderLines="1" 
rowType="http://rs.tdwg.org/dwc/terms/Occurrence">

occurrence.txt:
id,dynamicProperties
ABC123 <tab> {<all of your actual data>}

On 08/21/2015 12:43 PM, Bob Morris wrote:
> Is there or should there be a form of DwC-A serialized with JSON? If
> no, should Interest Group X ( X= ???) initiate some discussion or
> Task. If IG X is already at work, where is its discussion?
>
> Alternatively, should my question be something like "What is the JSON
> alternative to DwC-A and where is it, or should it be, discussed?"
>
> Thanks
> Bob
>



More information about the tdwg-tag mailing list