[tdwg-tapir] Interpretation of TAPIR filters

Tue Dec 4 18:17:54 CET 2007

Hi Markus,

I think we could still recommend providers to share their data using 
the same datatypes defined by conceptual schemas, but without being 
too strict. After all, it's better to share something using a 
different datatype than not share anything. In the future, the TAPIR 
Tester could try to check this and raise warnings when the concept 
datatype is different from the mapped (declared) datatype.

So if we all agree, here's a summary of the necessary changes:

* Add an optional attribute (datatype) for each mapped concept in 
capabilities responses.
* Datatypes would come from XML Schema built-in datatypes and should 
be declared with the full URI, such as:
http://www.w3.org/2001/XMLSchema#int
* Providers should declare the underlying datatype used when mapping 
the concept, which should preferably be the same datatype defined by 
the corresponding conceptual schema.
* The default dataype is http://www.w3.org/2001/XMLSchema#string

However, there's one remaining issue: Should we handle custom 
datatypes? For example, what would be the corresponding datatype for 
DarwinCore/ABCD collecting dates? (now it's a custom DateTimeISO).

Regarding standard TAPIR errors, I certainly agree it would be 
interesting to define them. I wish I had more time to revise what we 
have and make a proposal.

About TapirLink, I think I used a similar approach. An error is 
raised if you try to use "like" with non-string datatypes. Also the 
configurator doesn't check if the underlying datatype is compatible 
with the one defined by the conceptual schema.

Best Regards,
--
Renato

On 4 Dec 2007 at 10:48, Döring, Markus wrote:

> Hi,
> Just catching up.
> I agree with Renato about all the filter issues as you probably have
> guessed. Regarding sorting now. In pywrapper I have left the "how" it gets
> sorted to the underlying database type. And that might be very different to
> the conceptual or model one. But I wonder if it is really important to be
> specific about the sorting order? At least it is sorted in some stable way.
> 
> I agree it makes sense to announce that datatype in the capabilities, so you
> understand the sorting. That can easily be done using xml schema datatypes
> (should cover nearly all db types). The underlying datatype also affects
> what COPs you can use with it. You will get an error with PyWrapper for
> example if you do a LIKE on a date or integer type.
> 
> If we will force people to adapt their datatypes in the underlying database
> this is quite a burden. This way every data provider will need a copy of
> their database and they will never be able to use the original dataset. That
> might not be a problem and in fact this allows you to do quite some data
> transformation in between, but for many providers I know this will be too
> much - or they will close to never update their data clone.
> 
> So for now I would suggest to indicate the underlying db type in the concept
> capabilities and just use whatever there is for ordering. We should probably
> come up with a standard error for the CopNotSupportedByLocalDatatype. How do
> you deal with this in TapirLink Renato?
> 
> Markus