[tdwg-tag] DwC-A and JSON

Tim Robertson trobertson at gbif.org
Mon Aug 24 10:35:04 CEST 2015

I’d suggest TDWG hold back on this until the W3C CSV on the web group finish (Feb. 2016).
I submitted DwC-A as a use case which was accepted (http://w3c.github.io/csvw/use-cases-and-requirements/) and have been following the progress.

As far as I can tell the recommendations from that group will provide one possible future evolution of DwC-A covering tabular formats, encoding, micro syntax, JSON and RDF serialization and deserialisations, controlled terms, generic models (i.e. not star-schema) etc.  It is because of this group that I have held back on updating the DwC text guidelines to address the issues we all know about, as I believe they will be covered there.  

By adopting W3C recommendations / standards, it will allow TDWG to focus on biodiversity specific issues - namely vocabularies and classes / models - and less on serialisation formats.

I aim to write up / present a proposal on the future of DwC-A built around the recommendations to coincide with the conclusion of the W3C group.  It should be a fairly logical progression from where we are today, and backwards compatibility looks doable.   I’d be very happy to work with others on that.


On 21 Aug 2015, at 19:02, Alex Thompson <godfoder at acis.ufl.edu> wrote:

> As someone who is normally a big proponent of JSON as a general 
> information representation format, I'd have to say that you're likely to 
> run into a myriad of issues with this. Chief among those would be that 
> JSON doesn't tend to play well with progressive decoding - most JSON 
> libraries force you to parse and decode the entire file, often in 
> memory, before you have access to any of the information. This works 
> fine for things like APIs where the information is generally quite 
> small, but for something a DwC-A, where each item can have 40-50 
> properties, and there can be hundreds of thousands of items, it quickly 
> gets un-manageable. There are progressive json decoding libraries in 
> most languages, but it is a hurdle to effective usage, since they 
> normally aren't part of the standard library packages.
> The two major strategies around this are either using some kind of 
> hybrid JSON-delimited (normally new line or null byte) format, or 
> writing hundreds of thousands of individual JSON files into a zip or tar 
> archive directly without first writing them to disk. I've tried both, 
> and I don't really like either of them for anything more than a quick hack.
> In terms of advanced serializations for DwC-A type data, I'd much rather 
> see something like DwC-SQLite, or DwC-HDF5 that would start to give us 
> some real tools to work with something other than a star schema.
> - Alex
> P.S.
> You could always do this:
> meta.xml:
> <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" 
> fieldsEnclosedBy="" ignoreHeaderLines="1" 
> rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
> occurrence.txt:
> id,dynamicProperties
> ABC123 <tab> {<all of your actual data>}
> On 08/21/2015 12:43 PM, Bob Morris wrote:
>> Is there or should there be a form of DwC-A serialized with JSON? If
>> no, should Interest Group X ( X= ???) initiate some discussion or
>> Task. If IG X is already at work, where is its discussion?
>> Alternatively, should my question be something like "What is the JSON
>> alternative to DwC-A and where is it, or should it be, discussed?"
>> Thanks
>> Bob
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag

More information about the tdwg-tag mailing list