[tdwg-tapir] Re: AW: AW: TapirLite first steps
Guido, I've been plaing with your TAPIR service and it seems it doesnt understand paging. The specs are not awfully clear about whether this is optional, but I think any service should support start & limit parameters. See http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07- 18.html#toc54
For example: http://idaho.ipd.uka.de/GgSRS/tapir?op=i&t=names&genus=P%25&limi...
Should only return the first 10 records, while this should return 10 records starting with the 20th:
http://idaho.ipd.uka.de/GgSRS/tapir?op=i&t=names&genus=P%25&limi... 0
Looking forward to the plazi and golden gate presentation today! Regards, Markus
"Guido Sautter" wrote on 22.11.2007 23:25 Uhr:
Hi Renato,
I've created the servlet now, and it seems to work ... could you check if all the XML files are valid, or if I violated the protocol somewhere? I'm especially nervous about the search template, since the inventory templates seem less complex. My service URL is http://idaho.ipd.uka.de/GgSrsTapir/tapir?, just append what you like and tell me if the server's response is (a) sensible, and (b) valid according to the TAPIR specification.
Thanks al lot, Guido
-----Ursprüngliche Nachricht----- Von: Renato De Giovanni [mailto:renato@cria.org.br] Gesendet: Freitag, 16. November 2007 13:25 An: Guido Sautter Cc: Donat Agosti; Terry Catapano; Markus Döring Betreff: Re: AW: TapirLite first steps
Hi Guido,
Did you already start to implement something based on that TAPIRLite recipe?
Although I still have doubts about which concepts to use (DarwinCore, TDWG ontology, etc.) I'm pretty much convinced that this work is feasible. By the way, this kind of doubt doesn't prevent implementation - it just prevents how to formalize the TAPIR documents (xml schemas, output models, query templates, capabilities response, etc.). But this should be easy to do later - and I can certainly help.
I'm attaching a document that I'm using to put notes and ideas. The first part is what we call a TAPIR CNS file which should contain all definitions that we need. I just listed the concepts that I understood from you and added a couple of query templates. These are only aliases. As I said we can leave the respective GUIDs and documents to the end.
So, continuing from the last step in the TAPIRLite recipe:
7- When the operation is search, you should get the value of the parameter "template" or "t", probably delegate processing to another part of your code according to the template, and then produce the response according to the format that we still need to define.
The value for the "template" parameter will be the associated alias or the guid. I'm already suggesting some aliases in the attached file for the query template - for instance "get_publications".
Each template will have a known set of parameters that can be passed. I would suggest that we start with no parameters for the inventory templates. For the search templates you could start with these parameters:
bllat: bottom left latitude bllong: bottom left longitude trlat: top right latitude trlong: top right longitude taxonname: like comparison with the scientific name loc: like comparison with the location
So you should check if each parameter exists and in that case add the respective filter condition to your local search.
Each template will also have it's own response structure. I'm suggesting different response structures for the two search templates (see the attached document).
I would recommend to start implementing something like this and then make adjustments later. What do you think?
Please note that you're the ones to conceive the API so that it best suit your needs, so please feel free to make any changes in the suggested names/parameters/structures (I'll let you know if the changes have some incompatibility with the protocol).
Hope this helps,
Renato
Hi Renato,
thanks a lot for all the tips :-) I think I see TapirLite is what we actually can do, and we are way closer now to our goal. Our data model is as follows (very simplistic):
- Each document is identified by some meta data (MODS header) and contains
one or more treatments in its body
- A treatment is a chunk of text describing exactly one given taxon, maybe
comparable to a section or sub-section in other scientific papers
- A treatment refers to one specific taxon, which is identified with its
Linaean (scientific) name, from now on refered to as "the taxon"
- A treatment may contain names and coordinates of locations where
specimen of the taxon have been collected
- A treatment may contain free text comprising morphological descriptions
of the taxon, and other aspect like ecology, discussions, etc.
So from a data point of view, a treatment is basically an atomic record in a biodiversity database, with the following attributes:
- exactly one scientific name of the taxon treated, plus individual parts
of that name, like genus, subgenus, species, and subspecies
- exactly one set of the MODS meta data of the publication the treatment
belongs to, plus the number of the first and the last page
- zero to many locations with (at least) name, longitude and latitude as
subordinate attributes
- zero to many paragraphs of free text for further descriptions,
discussions, etc.
In the first place, we want to offer this data as pairs of taxon name and location data:
- {<scientificName> (genus=G species=S), <locationName> (long=X lat=Y)}
- {<scientificName> (genus=G species=S), <linkToPublication>} (the example
you used in your last mail)
- {<locationName> (long=X lat=Y), <linkToPublication>}
but also as plain lists of either taxon names or locations. If I got it right, the latter would be the purpose of an invetory request. All the attributes could be identified as the respective concepts in DarwinCore.
We want to offer search functionality using either locations (using name or long/lat), or taxon names (the parts, not only the scientificName as a whole) as the predicates, using LIKE for comparing string values, but karthesian distance for long/lat. This allows for:
- Computing the dispersal of a taxon
- Computing the fauna of a location or area
- Getting all the literature on some taxon
- Getting all the literature on some area
Hope this explains a little more. If any questions remain, just ask.
All the best, Guido
-----Ursprüngliche Nachricht----- Von: Renato De Giovanni [mailto:renato@cria.org.br] Gesendet: Mittwoch, 31. Oktober 2007 19:55 An: Guido Sautter Cc: Döring; Markus; Donat Agosti; Terry Catapano Betreff: TapirLite first steps
Guido,
These are the main points that you need to consider when developing a TAPIRLite provider:
1- You don't need to worry about XML parsing, only KVP (key-value-pair requests in HTTP GET or HTTP POST)
2- Your script should first detect which operation was requested (check for a parameter named "operation" or "op" which could contain "metadata", "m", "capabilities", "c", "inventory", "i", "search", "s", "ping", "p", all case insensitive). When the parameter is not passed, assume metadata operation. After detecting the operation you can delegate response processing to different parts of your code.
3- TAPIR responses are almost always included in a TAPIR envelope (except search responses when the parameter "envelope" or "e" was passed and contains "0" or "false"). I'm attaching a template for the TAPIR envelope (tapir_envelope.xml).
4- When the operation is metadata, just get the content from a local XML file and put it into the response body. I'm attaching a sample TAPIR metadata content (metadata.xml).
5- When the operation is capabilities, just get the content from a local XML file and put it into the response body. I'm attaching a sample TAPIR capabilities content (capabilities.xml).
6- When the operation is inventory, you should get the value of the parameter "template" or "t", probably delegate processing to another part of your code according to the template, and then produce the response. You can find a sample inventory response here: http://wiki.tdwg.org/twiki/bin/view/TAPIR/ExampleTapirMessages#iresp
7- When the operation is search, you should get the value of the parameter "template" or "t", probably delegate processing to another part of your code according to the template, and then produce the response according to the format that we still need to define.
To better help you defining the structure of search result I would need understand more about your data model (sorry I'm not a biologist). Do you
have any UML diagram or ER diagram that could help? I still have basic doubts such as "can a taxonomic treatment be related to more than one scientific name and locality"?
Could you also come up with a flat list of things for which you want to be able to search and return content? Just use natural language and then we'll see how to create some GUID for each concept.
I need to leave now...
Regards,
Renato
participants (1)
-
Döring, Markus