[tdwg-tag] DwC-A and JSON

Alex Thompson godfoder at acis.ufl.edu
Mon Aug 24 15:12:34 CEST 2015


Ditto for iDigBio.

- Alex

On 08/24/2015 04:40 AM, John Wieczorek wrote:
> I'd be very happy to work with you on that.
>
> On Mon, Aug 24, 2015 at 10:35 AM, Tim Robertson <trobertson at gbif.org 
> <mailto:trobertson at gbif.org>> wrote:
>
>     I’d suggest TDWG hold back on this until the W3C CSV on the web
>     group finish (Feb. 2016).
>     I submitted DwC-A as a use case which was accepted
>     (http://w3c.github.io/csvw/use-cases-and-requirements/) and have
>     been following the progress.
>
>     As far as I can tell the recommendations from that group will
>     provide one possible future evolution of DwC-A covering tabular
>     formats, encoding, micro syntax, JSON and RDF serialization and
>     deserialisations, controlled terms, generic models (i.e. not
>     star-schema) etc.  It is because of this group that I have held
>     back on updating the DwC text guidelines to address the issues we
>     all know about, as I believe they will be covered there.
>
>     By adopting W3C recommendations / standards, it will allow TDWG to
>     focus on biodiversity specific issues - namely vocabularies and
>     classes / models - and less on serialisation formats.
>
>     I aim to write up / present a proposal on the future of DwC-A
>     built around the recommendations to coincide with the conclusion
>     of the W3C group.  It should be a fairly logical progression from
>     where we are today, and backwards compatibility looks doable. 
>      I’d be very happy to work with others on that.
>
>     Thanks,
>     Tim
>
>
>     On 21 Aug 2015, at 19:02, Alex Thompson <godfoder at acis.ufl.edu
>     <mailto:godfoder at acis.ufl.edu>> wrote:
>
>     > As someone who is normally a big proponent of JSON as a general
>     > information representation format, I'd have to say that you're
>     likely to
>     > run into a myriad of issues with this. Chief among those would
>     be that
>     > JSON doesn't tend to play well with progressive decoding - most JSON
>     > libraries force you to parse and decode the entire file, often in
>     > memory, before you have access to any of the information. This works
>     > fine for things like APIs where the information is generally quite
>     > small, but for something a DwC-A, where each item can have 40-50
>     > properties, and there can be hundreds of thousands of items, it
>     quickly
>     > gets un-manageable. There are progressive json decoding libraries in
>     > most languages, but it is a hurdle to effective usage, since they
>     > normally aren't part of the standard library packages.
>     >
>     > The two major strategies around this are either using some kind of
>     > hybrid JSON-delimited (normally new line or null byte) format, or
>     > writing hundreds of thousands of individual JSON files into a
>     zip or tar
>     > archive directly without first writing them to disk. I've tried
>     both,
>     > and I don't really like either of them for anything more than a
>     quick hack.
>     >
>     > In terms of advanced serializations for DwC-A type data, I'd
>     much rather
>     > see something like DwC-SQLite, or DwC-HDF5 that would start to
>     give us
>     > some real tools to work with something other than a star schema.
>     >
>     > - Alex
>     >
>     > P.S.
>     > You could always do this:
>     >
>     > meta.xml:
>     > <core encoding="utf-8" fieldsTerminatedBy="\t"
>     linesTerminatedBy="\n"
>     > fieldsEnclosedBy="" ignoreHeaderLines="1"
>     > rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
>     >
>     > occurrence.txt:
>     > id,dynamicProperties
>     > ABC123 <tab> {<all of your actual data>}
>     >
>     > On 08/21/2015 12:43 PM, Bob Morris wrote:
>     >> Is there or should there be a form of DwC-A serialized with
>     JSON? If
>     >> no, should Interest Group X ( X= ???) initiate some discussion or
>     >> Task. If IG X is already at work, where is its discussion?
>     >>
>     >> Alternatively, should my question be something like "What is
>     the JSON
>     >> alternative to DwC-A and where is it, or should it be, discussed?"
>     >>
>     >> Thanks
>     >> Bob
>     >>
>     >
>     > _______________________________________________
>     > tdwg-tag mailing list
>     > tdwg-tag at lists.tdwg.org <mailto:tdwg-tag at lists.tdwg.org>
>     > http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>     >
>
>     _______________________________________________
>     tdwg-tag mailing list
>     tdwg-tag at lists.tdwg.org <mailto:tdwg-tag at lists.tdwg.org>
>     http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
>
>
>
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20150824/4109a2c6/attachment.html 


More information about the tdwg-tag mailing list