[tdwg-content] [IPT] Reverting the process of DwC standardization

Menashe' Eliezer menashe.eliezer at gmail.com
Thu Oct 29 15:05:47 CET 2015


Hello,
Resending the same message due to a subscription problem.

--
Menashè


2015-10-29 12:15 GMT+01:00 Menashe' Eliezer <menashe.eliezer at gmail.com>:

> Hello,
> Please see my updated suggestion at
> https://github.com/gbif/ipt/issues/1165
> IMHO Open Refine is not the right tool. One can simply use org.apache.poi
> in his Java application for reading all the information from the different
> files inside the DwC, and create an ODS file with the combined matrix,
> which takes into consideration also possible parentEventID. I'm sorry I
> don't have time to do it myself.
> I hope it's clear.
> --
> Menashè
>
>
> 2015-10-28 18:57 GMT+01:00 Shorthouse, David <
> david.shorthouse at umontreal.ca>:
>
>> All,
>>
>> Is part of the issue being expressed here because the raw ecological data
>> sets we're discussing are small-ish matrices rather than occurrences, with
>> site codes as columns, taxa as rows and measures of density/abundance as
>> cells (and similar for environmental variables)? Such structures are often
>> used as input for software that executes eg ordinations, classification &
>> regression trees, species richness estimates. The shortcoming of such a
>> structure is the inherent idiosyncratic nature of "site codes", with
>> variable numbers of them, i.e. an arbitrary number of columns. I doubt it
>> was ever designed for ease of dataset integration, but rather for ease of
>> computation. Representing this structure as Event core requires significant
>> transposition & potential for error if it were manual. Open Refine is one
>> such tool that could permit bi-directional transpositions (DwC -> matrix
>> and then matrix -> DwC), but it is still clunky and accommodation of
>> extensions is virtually non-existent. But, perhaps Open Refine recipes and
>> guides gets us one step closer to finding a balance between the need for
>> standardized representation & efficient transport (DwC) vs. end-users who
>> want matrices for ease of computation.
>>
>> David P. Shorthouse
>>
>> On Tue, Oct 27, 2015 at 7:36 AM, David Valentim Dias <dvdias at sibbr.gov.br
>> > wrote:
>>
>>> Hi again,
>>>
>>> I think the problem target both. DwC because is a solution to a problem
>>> creating another problem to researchers less "skilled" in table
>>> manipulation. Ecological data with occurrence is resulting in three tables
>>> and manipulation of these are getting harder with the number of core or
>>> extensions used.
>>> Two possible solutions comes in mind: create a new term describing the
>>> original layout of the columns (so we can use csvjoin like Menashe suggest)
>>> or ipt with option to store the original table associated with resource.
>>> We can always use external links in eml and save the file somewhere but
>>> this means creating another service and managing more login (aka resource
>>> cost and new problems).
>>>
>>> I think any solution will need ipt changes.
>>>
>>> 2015-10-27 9:08 GMT-02:00 Menashe' Eliezer <menashe.eliezer at gmail.com>:
>>>
>>>> Hi Tim,
>>>> I believe that the IPT feature I've requested long ago could be helpful
>>>> for David: https://github.com/gbif/ipt/issues/1165
>>>> Consumers and also the data providers don't have a DwC-A viewer, and
>>>> they need to join the separate csv files for having one table in a
>>>> worksheet.
>>>> Web applications like the one at OBIS website do let end users download
>>>> one big table.
>>>>
>>>> Best regards,
>>>> Menashè
>>>>
>>>>
>>>> 2015-10-27 9:53 GMT+01:00 Tim Robertson <trobertson at gbif.org>:
>>>>
>>>>> Hi David
>>>>> (CC’ing the IPT list as this might be an IPT specific thread -
>>>>> http://lists.gbif.org/mailman/listinfo/ipt)
>>>>>
>>>>> For clarification - is your question specific to the DwC-A standard
>>>>> which is possible as Alex says or is it specific to the IPT tool please?
>>>>>
>>>>> Do you imagine a scenario where you’d effectively map the same
>>>>> extension 2 times - once to interpreted and once to verbatim - or do you
>>>>> envisage a different data schema for each?
>>>>>
>>>>> Thanks,
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>> On 23 Oct 2015, at 16:00, Alex Thompson <godfoder at acis.ufl.edu> wrote:
>>>>>
>>>>> David,
>>>>>
>>>>> It's certainly possible, within the context of a Darwin Core Archive,
>>>>> to include other files within the ZIP file that lie outside the schema of
>>>>> the archive. Both GBIF and iDigBio do this when generating downloads for
>>>>> various reasons (RIGHTS & LICENSE files, additional EML metadata, etc).
>>>>> However, I do not believe it is possible to do this within IPT. You might
>>>>> submit an issue on the IPT issue tracker (
>>>>> https://github.com/gbif/ipt/issues) for potential inclusion of this
>>>>> feature in a future version of IPT.
>>>>>
>>>>> There are workarounds you can use to include additional data in Darwin
>>>>> Core archives, but none of them will exactly match your old format. For
>>>>> instance, including an additional Occurrence file with the values as JSON
>>>>> in dynamicProperties or in some other verbatim format in the
>>>>> occurrenceRemarks field. Both of those would at least give some method of
>>>>> single-row access (vs joining multiple measurementOrFacts to a single event
>>>>> id) if that is the primary concern, even if they would require additional
>>>>> parsing steps to be useful.
>>>>>
>>>>> Alex Thompson
>>>>> iDigBio Infrastructure
>>>>>
>>>>>
>>>>> On 10/23/2015 09:40 AM, David Valentim Dias wrote:
>>>>>
>>>>> Dear colleagues,
>>>>>
>>>>> Here on SiBBr we're using the new eventCore and measurementOrFacts and
>>>>> after the process of standardization to DwC and publishing we think some
>>>>> users/researchers will want the "original" table format because of multiple
>>>>> reasons.
>>>>>
>>>>> Is possible to have a vertabimTable or some place where we can store
>>>>> the original table/column format?
>>>>>
>>>>> Regards
>>>>>
>>>>>
>> _______________________________________________
>> IPT mailing list
>> IPT at lists.gbif.org
>> http://lists.gbif.org/mailman/listinfo/ipt
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20151029/e8bd27ae/attachment.html 


More information about the tdwg-content mailing list