[tdwg-tapir] Re: AW: AW: TapirLite first steps

Wed Dec 19 11:17:11 CET 2007

Guido,
I've been plaing with your TAPIR service and it seems it doesnt understand
paging. The specs are not awfully clear about whether this is optional, but
I think any service should support start & limit parameters. See
http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-
18.html#toc54

For example:
http://idaho.ipd.uka.de/GgSRS/tapir?op=i&t=names&genus=P%25&limit=10

Should only return the first 10 records, while this should return 10 records
starting with the 20th:

http://idaho.ipd.uka.de/GgSRS/tapir?op=i&t=names&genus=P%25&limit=10&start=2
0

Looking forward to the plazi and golden gate presentation today!
Regards,
Markus

"Guido Sautter" wrote on 22.11.2007 23:25 Uhr:

> Hi Renato,
> 
> I've created the servlet now, and it seems to work ... could you check if
> all the XML files are valid, or if I violated the protocol somewhere? I'm
> especially nervous about the search template, since the inventory templates
> seem less complex. My service URL is
> http://idaho.ipd.uka.de/GgSrsTapir/tapir?, just append what you like and
> tell me if the server's response is (a) sensible, and (b) valid according to
> the TAPIR specification.
> 
> Thanks al lot,
> Guido
> 
> -----Ursprüngliche Nachricht-----
> Von: Renato De Giovanni [mailto:renato at cria.org.br]
> Gesendet: Freitag, 16. November 2007 13:25
> An: Guido Sautter
> Cc: Donat Agosti; Terry Catapano; Markus Döring
> Betreff: Re: AW: TapirLite first steps
> 
> 
> Hi Guido,
> 
> Did you already start to implement something based on that TAPIRLite recipe?
> 
> Although I still have doubts about which concepts to use (DarwinCore, TDWG
> ontology, etc.) I'm pretty much convinced that this work is feasible. By
> the way, this kind of doubt doesn't prevent implementation - it just
> prevents how to formalize the TAPIR documents (xml schemas, output models,
> query templates, capabilities response, etc.). But this should be easy to
> do later - and I can certainly help.
> 
> I'm attaching a document that I'm using to put notes and ideas. The first
> part is what we call a TAPIR CNS file which should contain all definitions
> that we need. I just listed the concepts that I understood from you and
> added a couple of query templates. These are only aliases. As I said we
> can leave the respective GUIDs and documents to the end.
> 
> So, continuing from the last step in the TAPIRLite recipe:
> 
>> 7- When the operation is search, you should get the value of the parameter
>> "template" or "t", probably delegate processing to another part of your
>> code according to the template, and then produce the response according to
>> the format that we still need to define.
> 
> The value for the "template" parameter will be the associated alias or the
> guid. I'm already suggesting some aliases in the attached file for the
> query template - for instance "get_publications".
> 
> Each template will have a known set of parameters that can be passed. I
> would suggest that we start with no parameters for the inventory
> templates. For the search templates you could start with these parameters:
> 
> bllat: bottom left latitude
> bllong: bottom left longitude
> trlat: top right latitude
> trlong: top right longitude
> taxonname: like comparison with the scientific name
> loc: like comparison with the location
> 
> So you should check if each parameter exists and in that case add the
> respective filter condition to your local search.
> 
> Each template will also have it's own response structure. I'm suggesting
> different response structures for the two search templates (see the
> attached document).
> 
> I would recommend to start implementing something like this and then make
> adjustments later. What do you think?
> 
> Please note that you're the ones to conceive the API so that it best suit
> your needs, so please feel free to make any changes in the suggested
> names/parameters/structures (I'll let you know if the changes have some
> incompatibility with the protocol).
> 
> Hope this helps,
> --
> Renato
> 
> 
>> Hi Renato,
>> 
>> thanks a lot for all the tips :-) I think I see TapirLite is what we
>> actually can do, and we are way closer now to our goal. Our data model is
>> as
>> follows (very simplistic):
>> - Each document is identified by some meta data (MODS header) and contains
>> one or more treatments in its body
>> - A treatment is a chunk of text describing exactly one given taxon, maybe
>> comparable to a section or sub-section in other scientific papers
>> - A treatment refers to one specific taxon, which is identified with its
>> Linaean (scientific) name, from now on refered to as "the taxon"
>> - A treatment may contain names and coordinates of locations where
>> specimen
>> of the taxon have been collected
>> - A treatment may contain free text comprising morphological descriptions
>> of
>> the taxon, and other aspect like ecology, discussions, etc.
>> 
>> So from a data point of view, a treatment is basically an atomic record in
>> a
>> biodiversity database, with the following attributes:
>> - exactly one scientific name of the taxon treated, plus individual parts
>> of
>> that name, like genus, subgenus, species, and subspecies
>> - exactly one set of the MODS meta data of the publication the treatment
>> belongs to, plus the number of the first and the last page
>> - zero to many locations with (at least) name, longitude and latitude as
>> subordinate attributes
>> - zero to many paragraphs of free text for further descriptions,
>> discussions, etc.
>> 
>> In the first place, we want to offer this data as pairs of taxon name and
>> location data:
>> - {<scientificName> (genus=G species=S), <locationName> (long=X lat=Y)}
>> - {<scientificName> (genus=G species=S), <linkToPublication>} (the example
>> you used in your last mail)
>> - {<locationName> (long=X lat=Y), <linkToPublication>}
>> 
>> but also as plain lists of either taxon names or locations. If I got it
>> right, the latter would be the purpose of an invetory request. All the
>> attributes could be identified as the respective concepts in DarwinCore.
>> 
>> We want to offer search functionality using either locations (using name
>> or
>> long/lat), or taxon names (the parts, not only the scientificName as a
>> whole) as the predicates, using LIKE for comparing string values, but
>> karthesian distance for long/lat. This allows for:
>> - Computing the dispersal of a taxon
>> - Computing the fauna of a location or area
>> - Getting all the literature on some taxon
>> - Getting all the literature on some area
>> 
>> Hope this explains a little more. If any questions remain, just ask.
>> 
>> All the best,
>> Guido
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Renato De Giovanni [mailto:renato at cria.org.br]
>> Gesendet: Mittwoch, 31. Oktober 2007 19:55
>> An: Guido Sautter
>> Cc: Döring; Markus; Donat Agosti; Terry Catapano
>> Betreff: TapirLite first steps
>> 
>> 
>> Guido,
>> 
>> These are the main points that you need to consider when developing a
>> TAPIRLite provider:
>> 
>> 1- You don't need to worry about XML parsing, only KVP (key-value-pair
>> requests in HTTP GET or HTTP POST)
>> 
>> 2- Your script should first detect which operation was requested (check
>> for a parameter named "operation" or "op" which could contain "metadata",
>> "m", "capabilities", "c", "inventory", "i", "search", "s", "ping", "p",
>> all case insensitive). When the parameter is not passed, assume metadata
>> operation. After detecting the operation you can delegate response
>> processing to different parts of your code.
>> 
>> 3- TAPIR responses are almost always included in a TAPIR envelope (except
>> search responses when the parameter "envelope" or "e" was passed and
>> contains "0" or "false"). I'm attaching a template for the TAPIR envelope
>> (tapir_envelope.xml).
>> 
>> 4- When the operation is metadata, just get the content from a local XML
>> file and put it into the response body. I'm attaching a sample TAPIR
>> metadata content (metadata.xml).
>> 
>> 5- When the operation is capabilities, just get the content from a local
>> XML file and put it into the response body. I'm attaching a sample TAPIR
>> capabilities content (capabilities.xml).
>> 
>> 6- When the operation is inventory, you should get the value of the
>> parameter "template" or "t", probably delegate processing to another part
>> of your code according to the template, and then produce the response. You
>> can find a sample inventory response here:
>> http://wiki.tdwg.org/twiki/bin/view/TAPIR/ExampleTapirMessages#iresp
>> 
>> 7- When the operation is search, you should get the value of the parameter
>> "template" or "t", probably delegate processing to another part of your
>> code according to the template, and then produce the response according to
>> the format that we still need to define.
>> 
>> To better help you defining the structure of search result I would need
>> understand more about your data model (sorry I'm not a biologist). Do you
> 
>> have any UML diagram or ER diagram that could help? I still have basic
>> doubts such as "can a taxonomic treatment be related to more than one
>> scientific name and locality"?
>> 
>> Could you also come up with a flat list of things for which you want to be
>> able to search and return content? Just use natural language and then
>> we'll see how to create some GUID for each concept.
>> 
>> I need to leave now...
>> 
>> Regards,
>> --
>> Renato
>>