[tdwg-tag] Unresloved technical issues - proposed decisions.

Fri Aug 27 18:38:42 CEST 2010

To clarify: My understanding of "everyone writes to their own table" 
doesn't mean "each app writes to its own table" (which gives us nothing), but, rather, 
"each instance of each app writes to its own table" which provides 
additional security to the data, and more reliable provenance.

Here's how this could work:
i. Before a field session, the app says to the user "Do you want to create 
a new table? If so, enter your Google id and password." Then the app creates 
the table, and shares it with the Google account of the data_merge_script.

ii. The merged table, like the Identifications table, will have columns 
"table_id" and "row_id" for identifying observations.

If one of the apps can provide (i), and Michael can provide (ii), that 
would be great (and was my first choice). But given our aggressive 
deadline of everything working by 9/15, I think we should allow developers to write to a common table.

(Michael, to test your script can you just make a bunch of copies of the 
table, and populate them with data from the EOL Flickr group (using Javier's scraper) or with junk? Or do you need 
something more?)

A new twist: We're talking with the Fusion Table folks this afternoon, and may end up as beta-testers 
for new functionality that allows row-level authentication within a table. 
Stay posted.

Joel.

On Thu, 26 Aug 2010, Javier de la Torre wrote:

> Hi Michael,
>
> The motivations for using one single table are to make things easier considering the little time we have to test things. That being said I see the following advantages
>
> 1) Simplicity. Not having to merge tables will reduce the problems pretty much, plus we will need less coordination asking people which tables we have to merge and so on.
>
> 2) No lack of flexibility, we are not thousands of developers. I dont see why everybody can not add whatever fields they want to the master table. Plus before adding if they ask, we might have it already and we reduce the work needed for doing mappings. We can have one single wiki page describing every field on the master table.
>
> 3) We get live data, no need for 20min refresh periods. We can create apps that visualize in real time how the data get entered. One of the interesting visualizations is to see things going on in real time.
>
> But again, the biggest reason to me is to have to avoid merges of multiples tables, one per application. I prefer if we dont need to do integration with such little time ahead.
>
> What do you see as the biggest problems for using one single table for everybody?
>
> In any case. A combination of both can also work I have to say. Actually if someone uses his own fusion table that will be the same scenario as scrapping image from the EOL Flickr group.
>
> Javier de la Torre
> www.vizzuality.com
>
> On Aug 26, 2010, at 6:58 PM, Michael Giddens wrote:
>
>>  Joel,
>>
>> Based on our discussion with David R. and the phone conference 2 weeks
>> back you where thinking everyone can write to their own tables then we
>> merge in the data into one master table.  I wrote a script to do this
>> but have not tested it out.  It is something we can do if you have more
>> then 2 sample tables with data.  This way anyone that shares their table
>> with userX will get harvested and that data will be loaded into the
>> master readonly table that is reloaded over x minutes.  And if there are
>> custom tables that are not part of your default table standards we can
>> possibly setup a unique harvester during the conference to transform to
>> the correct column model.  This way everyone is independent but at the
>> same time able to view the master data every 20 minutes or whatever
>> update time we set it to.  We can do the same for different table types.
>>
>> Any questions just let me know or contact me directly.
>>
>> Michael Giddens
>> Office: 225-238-1879
>> skype: mikegiddens
>>
>> On 8/26/2010 11:45 AM, joel sachs wrote:
>>> Hi Everyone -
>>>
>>> Javier and I chatted yesterday about unresolved technical issues. Our
>>> primary goal was simplicity. Here are the outcomes. Feel free to suggest
>>> something different, if either i) you're planning on developing around
>>> your suggestion; or ii) you believe that to not follow your suggestion
>>> would be a grave mistake.
>>>
>>> NOTE: by "app" we mean anything that is writing to the table - could be a
>>> smartphone app, could be a web form, could be a screenscraper or
>>> tweet-parser.
>>>
>>> 1. All apps will write to the same table. Each app will need a Google
>>> account, which will be given write-access to the table.
>>>
>>> 2. The occurrence_id will be the row_id assigned by Fusion Tables. This is
>>> not seen in the web interface, but is available from the API.
>>>
>>> 3. To support crowdsourcing of image identification, alternative
>>> identifications, arguments, etc., there will be a 2nd table with columns
>>> {occurrence_id, scientificName, vernacularName, Kingdom, identifiedBy,
>>> identificationResources, identificationRemarks}. One of each of these
>>> columns will also exist in the Occurrences table. When the observation is
>>> first reported, any identification will be entered in the Occurrences
>>> table, with subsequent identifications in the Identifications
>>> table. (The main motivation for keeping the initial identification in the Occurrences
>>> table is to ease the process for app developers.)
>>>
>>> 4. Multiple multimedia URLs will be listed in a single column, comma
>>> separated.
>>>
>>> 5. The lat and long columns will be WGS84. If app developers wish, they
>>> are welcome to build support for additional datums into their apps, since
>>> the apps will have authority to create additional columns. If this doesn't
>>> happen, anyone planning to use another datum  will need to announce
>>> themselves ahead of time, and we will create  appropriate columns for
>>> them.
>>>
>>> 6. The table will contain Kingdom, Phylum, Class, Order, and Family
>>> columns. Javier will write software to resolve identifications to the
>>> Catalog of Life, and to populate the taxonomy columns. If someones want to
>>> resolve observations to a different classification, they are free to do
>>> so, and are encouraged to publish the results in Fusion Tables.
>>>
>>> 7. In addition to the identification table, there is scope for creating an
>>> Annotations table and front-end, should anyone wish to tackle this.
>>>
>>>
>>> Here's a timetable that would be nice to adhere to:
>>>
>>> Sept. 1: Final versions of the Occurrences and Identifications tables
>>> published to Fusion Tables. Write access will only be granted to
>>> developers. These will be the test tables. The day before the bioblitz,
>>> they will become the development tables, and test records will be
>>> expunged.
>>>
>>> Sept. 3: Table documentation and sample code for writing to the tables.
>>>
>>> Sept. 8: Taxanomic resolution, validation, assignment of taxon
>>> GUIDS/LSIDs.
>>>
>>> Sept. 15: At least one Image identification framework in place.
>>>
>>> Sept. 15 - Sept. 28: Testing and work on visualization and other
>>> data-oriented services (e.g. publication as DwC or Linked Data, etc.)
>>>
>>> Best to all -
>>> Joel.
>>>
>>>
>>> _______________________________________________
>>> tdwg-tag mailing list
>>> tdwg-tag at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>