[tdwg-tag] DwC-A and JSON
Alex Thompson
godfoder at acis.ufl.edu
Fri Aug 21 19:02:00 CEST 2015
As someone who is normally a big proponent of JSON as a general
information representation format, I'd have to say that you're likely to
run into a myriad of issues with this. Chief among those would be that
JSON doesn't tend to play well with progressive decoding - most JSON
libraries force you to parse and decode the entire file, often in
memory, before you have access to any of the information. This works
fine for things like APIs where the information is generally quite
small, but for something a DwC-A, where each item can have 40-50
properties, and there can be hundreds of thousands of items, it quickly
gets un-manageable. There are progressive json decoding libraries in
most languages, but it is a hurdle to effective usage, since they
normally aren't part of the standard library packages.
The two major strategies around this are either using some kind of
hybrid JSON-delimited (normally new line or null byte) format, or
writing hundreds of thousands of individual JSON files into a zip or tar
archive directly without first writing them to disk. I've tried both,
and I don't really like either of them for anything more than a quick hack.
In terms of advanced serializations for DwC-A type data, I'd much rather
see something like DwC-SQLite, or DwC-HDF5 that would start to give us
some real tools to work with something other than a star schema.
- Alex
P.S.
You could always do this:
meta.xml:
<core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n"
fieldsEnclosedBy="" ignoreHeaderLines="1"
rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
occurrence.txt:
id,dynamicProperties
ABC123 <tab> {<all of your actual data>}
On 08/21/2015 12:43 PM, Bob Morris wrote:
> Is there or should there be a form of DwC-A serialized with JSON? If
> no, should Interest Group X ( X= ???) initiate some discussion or
> Task. If IG X is already at work, where is its discussion?
>
> Alternatively, should my question be something like "What is the JSON
> alternative to DwC-A and where is it, or should it be, discussed?"
>
> Thanks
> Bob
>
More information about the tdwg-tag
mailing list