[tdwg-tapir] Suggested changes

17 Jan 2007

      Dear all,

I recently finished a new TAPIR provider implementaiton and I've been
taking notes about some issues with the protocol schema and the
specification. Most are minor things that are now unspecified or that have
some contradiction considering schema, specification and wiki notes. Some
are really errors that need to be fixed and others are suggestions to
improve the way that some things are defined now.

I'm listing all issues below suggesting an approach for each one. Most of
them I already talked to Markus and we have a similar opinion. Please feel
free to give your ideas or suggest different approaches. In case there are
no objections, I should probably change the XML Schema and the
specification by the end of this week.

Best Regards,
--
Renato

=========================

Issues and proposed changes to TAPIR

1) Now the schema says that indexingElement is unbounded. However, there
must be one and only one indexingElement for each output model. This is an
error and must be fixed in the schema.

2) Some time ago there was a proposal to allow renaming <value> elements
in inventory responses to ease the parsing
(http://lists.tdwg.org/pipermail/tdwg-tapir/2005-October/000009.html).
Apparently there was an agreement about this, but for some reason the
schema was never changed. Suggested approach: change schema and
specification accordingly. (inventory requests: concept elements will have
an optional attribute "tagName", and inventory responses will need to have
xsd:any inside <record>).

3) Currently the "basicSchemaLanguage" capability for response structures
includes "...ComplexType, simpleType, complexContent and simpleContent
only when related to local definitions". On the other hand,
"xsd:extension" and "xsd:restriction" are optional capabilities. Since
complexContent and simpleContent have to be used with either
"xsd:extension" or "xsd:restriction", this becomes a contradiction.
Suggested approach: remove complexContent and simpleContent from
basicSchemaLanguage, also trying to keep it as "basic" as possible.

4) header/source@accesspoint is a mandatory attribute that can be either
an IP or a name. For TAPIR responses this is fine, when @accesspoint
should be the service accesspoint. But for requests it becomes incovenient
to force clients to put an accesspoint there (which can potentially
contradict with the REMOTE_ADDR environment variable). Suggested approach:
make @accesspoint an optional attribute and change the specification
saying that: @accesspoint must be present in all responses indicating the
service accesspoint. On requests, @accesspoint must be present in all
<source> elements except the last <source>. In these cases, when present
it must contain the IP address. The IP of the last source must always be
taken from REMOTE_ADDR.

5) There's no way to track the original source using KVP when there are
intermediate services. Suggested approach: include an optional KVP
parameter "source_ip" (same DiGIR approach).

6) Currently, local variables are defined with their own elements, like
<dateLastUpdated/>. This makes it complicated for a service to define new
variables (need to extend the schema!) and also to identify variables when
parsing a request. Suggested approach: use the same strategy adopted by
parameters: <variable name="var_name"/> with the difference that @name
will be an extensible controlled vocabulary.

7) <mappedConcept> in capablities responses has an optional @alias.
Suggestion: add an optional @alias attribute to <schema> as well.

8) There's no indication whether response structures must contain a
targetNamespace or not. If there's no targetNamespace and the response
includes the TAPIR envelope, then custom search elements will be forced to
inherit from the TAPIR namespace. Suggested approach: change the specification
to say that response structures (schema) should always indicate a
targetNamespace.

9) The specification says that the default value for "start" (search and
inventory operations) is "1", but schema says the default is "0".
Suggested approach: use "0" (= index of the first record) and change the
specification.

10) The specification says that the default value for "limit" (search and
inventory operations) is "null" (unlimited), but schema says the default
is "1". Suggested approach: when not specified ("null") consider
unlimited. Change the schema to remove "default=1". Otherwise it would be
necessary to define a specific value to indicate "unlimited".

11) Currently, case sensitivity can be separately specified for "equals"
and "like" operators, while nothing is said about the "in" operator.
Suggested approach: have a single setting for case sensitivity related to
the three operators.

12) There's no default behaviour for "like" comparisons when there's no
wildcard in the term. Suggested approach: include * both in the beginning
and the end of the term.

13) There's nothing specificed about how to escape * in like terms.
Suggestion: use _* to escape (avoiding reserved and unsafe URL
characters).

14) Specification is not clear about what content should be included in
reponses when one or more partial parameter is specified. Suggested
approach: response content should be minimal (non-greedy) when partial is
specified. This means "include only the partial nodes and all other nodes
(below and above) that are mandatory". When partial is not specified then
reponses should be greedy (include all possible content).

=================================

Renato De Giovanni

tags

participants (1)