[tdwg-tapir] Tapir > Capabilities > Schemas > Concepts

Thu Nov 3 16:48:10 CET 2005

Hi Roger,

I'm not a PyWrapper developer, so my opinion is certainly not 
authoritative, but my understanding is that PyWrapper only maps concepts 
that correspond to attributes or leaf XML elements (those elements that 
contain only text node children and do not contain other elements). 

In other words, CanonicalName is not considered by PyWrapper to be a 
concept.  Neither is TaxonNames, TaxonName, or DataSet.  The concept 
mapping system in PyWrapper allows only a direct correspondence between 
a database column (or literal or concatenation of a combination of 
literals and columns) and an "atomic" XML element (or attribute).

My understanding is that the goal that lead to this design decision was 
that one would want to query an abstract data model, and not an XML 
document.  This abstract data model is equivalent to a list of 
hashtables that contains concept = value pairs.  In my opinion this 
approach has both advantages and disadvantages. 

On the one hand, using an abstract data model makes it possible to do 
views, which is considered a huge advantage.

On the other hand this definition of concept (essentially 1 XML element 
= 1 column) means that the abstract data model is limited to a simple 
hashtable.  This is like programming without data structures, like C 
with no structs or Java with no classes or objects, and is therefore 
somewhat limiting.

Another disadvantage is that this data model assumes that concepts are 
completely context-free.  In other words, no concept can depend upon 
another to set the context for it's semantic meaning.  This is difficult 
to handle if you have a schema that allows a chunk of XML like the 
following:

<depth>
    <measurement>112</measurement>
    <unit>fathoms</unit>
</depth>

In this example, what we're really concerned with is the concept of 
depth (which cannot be expressed in PyWrapper/TAPIR).  Depth's full 
semantic meaning is dependent upon both measurement and unit.  If one 
substitutes measurement for depth (as you would do with PyWrapper), then 
depth/measurement is contextually constrained by the meaning of the 
independent concept depth/unit.  However there is no way within the 
concept mapping or view system to relate these two concepts 
automatically, so it's up to the developer of a particular view to 
understand the relationship between these concepts and to treat is 
appropriately. 

The problem is one can't even treat it properly for some view schema 
using the view subsystem.  The issue here is that PyWrapper cannot do 
transformation on concept values.  So, if you wanted to map the above 
version of a depth concept into a Darwin Core MaNIS depth concept using 
PyWrapper, you can do one of two things.  You could Forgo translation 
and map depth/unit directly into dwc:DepthInMeters (which would be wrong 
because 112 fathoms != 112 meters) or you could set dwc:DepthInMeters to 
the concatenation of depth/measurement and depth/unit, in this case 
forming the string "112 fathoms" (which is also wrong because this is 
not a "depth in meters" and would prevent the generated Darwin Core 
instance from validating). 

The only proper way to handle this case with PyWrapper is to foresee 
that your user will want to use a Darwin Core view, and to create a new 
column in your underlying database that contains depth in meters, then 
map this concept into PyWrapper.  In essence, this bypasses the view 
subsystem and relies on concept mapping and your own translation program 
that exists outside of PyWrapper to do the job properly.

This is a general problem shared by all designs that use independent 
atomic concepts.

Please don't think I'm denigrating PyWrapper.  The problem of searching 
an arbitrary database structure and mapping the results into an 
arbitrary XML schema instance is difficult enough without having to 
worry about mapping non-atomic concepts.  I just wanted to point out 
that using atomic concepts has some serious limitations.

-Steve

Roger Hyam wrote:

> I am a little confused as to what is included in the capabilities 
> response under capabilities/schemas/schema.
>
> The annotation says:
>
> "Each known and mapped concept of a schema listed with a boolean flag 
> indicating if its searchable (default = true)."
>
> I have a genus field in my database and I map it to the following concept:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus" 
> searchable="true" />
>
> Should I also include the parent concepts:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName" 
> searchable="false" />
> <concept path="/DataSet/TaxonNames/TaxonName" searchable="false" />
> <concept path="/DataSet/TaxonNames" searchable="false" />
> <concept path="/DataSet" searchable="false" />
>
> or are they implied? I presume if they are included then they aren't 
> searchable as they can't be used to build filters.
>
> Does this:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus" 
> searchable="true" />
>
> mean:
>
>     * This concept has been mapped to data and can be used in
>       arbitrary response structures and filters in combination with
>       concepts from any of the other conceptual schemas listed in here.
>
> If so is the entire schemas section of capabilities response optional 
> if arbitrary views are not supported? Both the schemas and concepts 
> are given in views anyhow - so do we need to list them here?
>
> Your thoughts most appreciated - I may just have the wrong end of the 
> stick.
>
> Roger
>
>
>
>-- 
>
>-------------------------------------
> Roger Hyam
> Technical Architect
> Taxonomic Databases Working Group
>-------------------------------------
> http://www.tdwg.org
> roger at tdwg.org
> +44 1578 722782
>-------------------------------------
>  
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>tdwg-tapir mailing list
>tdwg-tapir at lists.tdwg.org
>http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
>  
>