Is there or should there be a form of DwC-A serialized with JSON? If no, should Interest Group X ( X= ???) initiate some discussion or Task. If IG X is already at work, where is its discussion?
Alternatively, should my question be something like "What is the JSON alternative to DwC-A and where is it, or should it be, discussed?"
Thanks Bob
As someone who is normally a big proponent of JSON as a general information representation format, I'd have to say that you're likely to run into a myriad of issues with this. Chief among those would be that JSON doesn't tend to play well with progressive decoding - most JSON libraries force you to parse and decode the entire file, often in memory, before you have access to any of the information. This works fine for things like APIs where the information is generally quite small, but for something a DwC-A, where each item can have 40-50 properties, and there can be hundreds of thousands of items, it quickly gets un-manageable. There are progressive json decoding libraries in most languages, but it is a hurdle to effective usage, since they normally aren't part of the standard library packages.
The two major strategies around this are either using some kind of hybrid JSON-delimited (normally new line or null byte) format, or writing hundreds of thousands of individual JSON files into a zip or tar archive directly without first writing them to disk. I've tried both, and I don't really like either of them for anything more than a quick hack.
In terms of advanced serializations for DwC-A type data, I'd much rather see something like DwC-SQLite, or DwC-HDF5 that would start to give us some real tools to work with something other than a star schema.
- Alex
P.S. You could always do this:
meta.xml: <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
occurrence.txt: id,dynamicProperties ABC123 <tab> {<all of your actual data>}
On 08/21/2015 12:43 PM, Bob Morris wrote:
Is there or should there be a form of DwC-A serialized with JSON? If no, should Interest Group X ( X= ???) initiate some discussion or Task. If IG X is already at work, where is its discussion?
Alternatively, should my question be something like "What is the JSON alternative to DwC-A and where is it, or should it be, discussed?"
Thanks Bob
I’d suggest TDWG hold back on this until the W3C CSV on the web group finish (Feb. 2016). I submitted DwC-A as a use case which was accepted (http://w3c.github.io/csvw/use-cases-and-requirements/) and have been following the progress.
As far as I can tell the recommendations from that group will provide one possible future evolution of DwC-A covering tabular formats, encoding, micro syntax, JSON and RDF serialization and deserialisations, controlled terms, generic models (i.e. not star-schema) etc. It is because of this group that I have held back on updating the DwC text guidelines to address the issues we all know about, as I believe they will be covered there.
By adopting W3C recommendations / standards, it will allow TDWG to focus on biodiversity specific issues - namely vocabularies and classes / models - and less on serialisation formats.
I aim to write up / present a proposal on the future of DwC-A built around the recommendations to coincide with the conclusion of the W3C group. It should be a fairly logical progression from where we are today, and backwards compatibility looks doable. I’d be very happy to work with others on that.
Thanks, Tim
On 21 Aug 2015, at 19:02, Alex Thompson godfoder@acis.ufl.edu wrote:
As someone who is normally a big proponent of JSON as a general information representation format, I'd have to say that you're likely to run into a myriad of issues with this. Chief among those would be that JSON doesn't tend to play well with progressive decoding - most JSON libraries force you to parse and decode the entire file, often in memory, before you have access to any of the information. This works fine for things like APIs where the information is generally quite small, but for something a DwC-A, where each item can have 40-50 properties, and there can be hundreds of thousands of items, it quickly gets un-manageable. There are progressive json decoding libraries in most languages, but it is a hurdle to effective usage, since they normally aren't part of the standard library packages.
The two major strategies around this are either using some kind of hybrid JSON-delimited (normally new line or null byte) format, or writing hundreds of thousands of individual JSON files into a zip or tar archive directly without first writing them to disk. I've tried both, and I don't really like either of them for anything more than a quick hack.
In terms of advanced serializations for DwC-A type data, I'd much rather see something like DwC-SQLite, or DwC-HDF5 that would start to give us some real tools to work with something other than a star schema.
- Alex
P.S. You could always do this:
meta.xml: <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
occurrence.txt: id,dynamicProperties ABC123 <tab> {<all of your actual data>}
On 08/21/2015 12:43 PM, Bob Morris wrote:
Is there or should there be a form of DwC-A serialized with JSON? If no, should Interest Group X ( X= ???) initiate some discussion or Task. If IG X is already at work, where is its discussion?
Alternatively, should my question be something like "What is the JSON alternative to DwC-A and where is it, or should it be, discussed?"
Thanks Bob
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
I'd be very happy to work with you on that.
On Mon, Aug 24, 2015 at 10:35 AM, Tim Robertson trobertson@gbif.org wrote:
I’d suggest TDWG hold back on this until the W3C CSV on the web group finish (Feb. 2016). I submitted DwC-A as a use case which was accepted ( http://w3c.github.io/csvw/use-cases-and-requirements/) and have been following the progress.
As far as I can tell the recommendations from that group will provide one possible future evolution of DwC-A covering tabular formats, encoding, micro syntax, JSON and RDF serialization and deserialisations, controlled terms, generic models (i.e. not star-schema) etc. It is because of this group that I have held back on updating the DwC text guidelines to address the issues we all know about, as I believe they will be covered there.
By adopting W3C recommendations / standards, it will allow TDWG to focus on biodiversity specific issues - namely vocabularies and classes / models
- and less on serialisation formats.
I aim to write up / present a proposal on the future of DwC-A built around the recommendations to coincide with the conclusion of the W3C group. It should be a fairly logical progression from where we are today, and backwards compatibility looks doable. I’d be very happy to work with others on that.
Thanks, Tim
On 21 Aug 2015, at 19:02, Alex Thompson godfoder@acis.ufl.edu wrote:
As someone who is normally a big proponent of JSON as a general information representation format, I'd have to say that you're likely to run into a myriad of issues with this. Chief among those would be that JSON doesn't tend to play well with progressive decoding - most JSON libraries force you to parse and decode the entire file, often in memory, before you have access to any of the information. This works fine for things like APIs where the information is generally quite small, but for something a DwC-A, where each item can have 40-50 properties, and there can be hundreds of thousands of items, it quickly gets un-manageable. There are progressive json decoding libraries in most languages, but it is a hurdle to effective usage, since they normally aren't part of the standard library packages.
The two major strategies around this are either using some kind of hybrid JSON-delimited (normally new line or null byte) format, or writing hundreds of thousands of individual JSON files into a zip or tar archive directly without first writing them to disk. I've tried both, and I don't really like either of them for anything more than a quick
hack.
In terms of advanced serializations for DwC-A type data, I'd much rather see something like DwC-SQLite, or DwC-HDF5 that would start to give us some real tools to work with something other than a star schema.
- Alex
P.S. You could always do this:
meta.xml: <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
occurrence.txt: id,dynamicProperties ABC123 <tab> {<all of your actual data>}
On 08/21/2015 12:43 PM, Bob Morris wrote:
Is there or should there be a form of DwC-A serialized with JSON? If no, should Interest Group X ( X= ???) initiate some discussion or Task. If IG X is already at work, where is its discussion?
Alternatively, should my question be something like "What is the JSON alternative to DwC-A and where is it, or should it be, discussed?"
Thanks Bob
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Ditto for iDigBio.
- Alex
On 08/24/2015 04:40 AM, John Wieczorek wrote:
I'd be very happy to work with you on that.
On Mon, Aug 24, 2015 at 10:35 AM, Tim Robertson <trobertson@gbif.org mailto:trobertson@gbif.org> wrote:
I’d suggest TDWG hold back on this until the W3C CSV on the web group finish (Feb. 2016). I submitted DwC-A as a use case which was accepted (http://w3c.github.io/csvw/use-cases-and-requirements/) and have been following the progress. As far as I can tell the recommendations from that group will provide one possible future evolution of DwC-A covering tabular formats, encoding, micro syntax, JSON and RDF serialization and deserialisations, controlled terms, generic models (i.e. not star-schema) etc. It is because of this group that I have held back on updating the DwC text guidelines to address the issues we all know about, as I believe they will be covered there. By adopting W3C recommendations / standards, it will allow TDWG to focus on biodiversity specific issues - namely vocabularies and classes / models - and less on serialisation formats. I aim to write up / present a proposal on the future of DwC-A built around the recommendations to coincide with the conclusion of the W3C group. It should be a fairly logical progression from where we are today, and backwards compatibility looks doable. I’d be very happy to work with others on that. Thanks, Tim On 21 Aug 2015, at 19:02, Alex Thompson <godfoder@acis.ufl.edu <mailto:godfoder@acis.ufl.edu>> wrote: > As someone who is normally a big proponent of JSON as a general > information representation format, I'd have to say that you're likely to > run into a myriad of issues with this. Chief among those would be that > JSON doesn't tend to play well with progressive decoding - most JSON > libraries force you to parse and decode the entire file, often in > memory, before you have access to any of the information. This works > fine for things like APIs where the information is generally quite > small, but for something a DwC-A, where each item can have 40-50 > properties, and there can be hundreds of thousands of items, it quickly > gets un-manageable. There are progressive json decoding libraries in > most languages, but it is a hurdle to effective usage, since they > normally aren't part of the standard library packages. > > The two major strategies around this are either using some kind of > hybrid JSON-delimited (normally new line or null byte) format, or > writing hundreds of thousands of individual JSON files into a zip or tar > archive directly without first writing them to disk. I've tried both, > and I don't really like either of them for anything more than a quick hack. > > In terms of advanced serializations for DwC-A type data, I'd much rather > see something like DwC-SQLite, or DwC-HDF5 that would start to give us > some real tools to work with something other than a star schema. > > - Alex > > P.S. > You could always do this: > > meta.xml: > <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" > fieldsEnclosedBy="" ignoreHeaderLines="1" > rowType="http://rs.tdwg.org/dwc/terms/Occurrence"> > > occurrence.txt: > id,dynamicProperties > ABC123 <tab> {<all of your actual data>} > > On 08/21/2015 12:43 PM, Bob Morris wrote: >> Is there or should there be a form of DwC-A serialized with JSON? If >> no, should Interest Group X ( X= ???) initiate some discussion or >> Task. If IG X is already at work, where is its discussion? >> >> Alternatively, should my question be something like "What is the JSON >> alternative to DwC-A and where is it, or should it be, discussed?" >> >> Thanks >> Bob >> > > _______________________________________________ > tdwg-tag mailing list > tdwg-tag@lists.tdwg.org <mailto:tdwg-tag@lists.tdwg.org> > http://lists.tdwg.org/mailman/listinfo/tdwg-tag > _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org <mailto:tdwg-tag@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
participants (4)
-
Alex Thompson
-
Bob Morris
-
John Wieczorek
-
Tim Robertson