[tdwg-tag] DwC-A and JSON

Tim Robertson trobertson at gbif.org
Mon Aug 24 10:35:04 CEST 2015


I’d suggest TDWG hold back on this until the W3C CSV on the web group finish (Feb. 2016).
I submitted DwC-A as a use case which was accepted (http://w3c.github.io/csvw/use-cases-and-requirements/) and have been following the progress.

As far as I can tell the recommendations from that group will provide one possible future evolution of DwC-A covering tabular formats, encoding, micro syntax, JSON and RDF serialization and deserialisations, controlled terms, generic models (i.e. not star-schema) etc.  It is because of this group that I have held back on updating the DwC text guidelines to address the issues we all know about, as I believe they will be covered there.  

By adopting W3C recommendations / standards, it will allow TDWG to focus on biodiversity specific issues - namely vocabularies and classes / models - and less on serialisation formats.

I aim to write up / present a proposal on the future of DwC-A built around the recommendations to coincide with the conclusion of the W3C group.  It should be a fairly logical progression from where we are today, and backwards compatibility looks doable.   I’d be very happy to work with others on that.

Thanks,
Tim


On 21 Aug 2015, at 19:02, Alex Thompson <godfoder at acis.ufl.edu> wrote:

> As someone who is normally a big proponent of JSON as a general 
> information representation format, I'd have to say that you're likely to 
> run into a myriad of issues with this. Chief among those would be that 
> JSON doesn't tend to play well with progressive decoding - most JSON 
> libraries force you to parse and decode the entire file, often in 
> memory, before you have access to any of the information. This works 
> fine for things like APIs where the information is generally quite 
> small, but for something a DwC-A, where each item can have 40-50 
> properties, and there can be hundreds of thousands of items, it quickly 
> gets un-manageable. There are progressive json decoding libraries in 
> most languages, but it is a hurdle to effective usage, since they 
> normally aren't part of the standard library packages.
> 
> The two major strategies around this are either using some kind of 
> hybrid JSON-delimited (normally new line or null byte) format, or 
> writing hundreds of thousands of individual JSON files into a zip or tar 
> archive directly without first writing them to disk. I've tried both, 
> and I don't really like either of them for anything more than a quick hack.
> 
> In terms of advanced serializations for DwC-A type data, I'd much rather 
> see something like DwC-SQLite, or DwC-HDF5 that would start to give us 
> some real tools to work with something other than a star schema.
> 
> - Alex
> 
> P.S.
> You could always do this:
> 
> meta.xml:
> <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" 
> fieldsEnclosedBy="" ignoreHeaderLines="1" 
> rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
> 
> occurrence.txt:
> id,dynamicProperties
> ABC123 <tab> {<all of your actual data>}
> 
> On 08/21/2015 12:43 PM, Bob Morris wrote:
>> Is there or should there be a form of DwC-A serialized with JSON? If
>> no, should Interest Group X ( X= ???) initiate some discussion or
>> Task. If IG X is already at work, where is its discussion?
>> 
>> Alternatively, should my question be something like "What is the JSON
>> alternative to DwC-A and where is it, or should it be, discussed?"
>> 
>> Thanks
>> Bob
>> 
> 
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
> 



More information about the tdwg-tag mailing list