[tdwg-tapir] Tapir > Capabilities > Schemas > Concepts

Fri Nov 4 10:29:34 CET 2005

Steven,
in principal I see the same problems with atomic concepts as you described.
The idea of TAPIR was to deal only with atomic concepts, but allow other software to be aware of a concepts meaning. 

So the vision is to move away from xpath-style concept ids to ids used in "ontologies" at some time. Still the protocol request itself doesnt know how measurement and  unit are related, but an external knowledge could provide this - and the simple atomic concept ID would be the key into the ontology.

So a wrapper receiving this kind of request could understand the relation and transform the values! It would probably not make much sense to allow searches on calculated values, but for additional result generation this would be fine.

Also the current PyWrapper does deal with multiple mappings for a single concepts for example. Its not just 1 XML element = 1 column. A single mapping again can be a concatenation of columns or fixed strings. But surely these are just the most common and simple data transformations. But I dont see that the protocol stops you from creating more sophisticated ones.

Markus

-----Ursprüngliche Nachricht-----
Von: tdwg-tapir-bounces at lists.tdwg.org [mailto:tdwg-tapir-bounces at lists.tdwg.org] Im Auftrag von Steven Perry
Gesendet: Donnerstag, 3. November 2005 16:48
An: tdwg-tapir at lists.tdwg.org
Betreff: Re: [tdwg-tapir] Tapir > Capabilities > Schemas > Concepts

Hi Roger,

I'm not a PyWrapper developer, so my opinion is certainly not authoritative, but my understanding is that PyWrapper only maps concepts that correspond to attributes or leaf XML elements (those elements that contain only text node children and do not contain other elements). 

In other words, CanonicalName is not considered by PyWrapper to be a concept.  Neither is TaxonNames, TaxonName, or DataSet.  The concept mapping system in PyWrapper allows only a direct correspondence between a database column (or literal or concatenation of a combination of literals and columns) and an "atomic" XML element (or attribute).

My understanding is that the goal that lead to this design decision was that one would want to query an abstract data model, and not an XML document.  This abstract data model is equivalent to a list of hashtables that contains concept = value pairs.  In my opinion this approach has both advantages and disadvantages. 

On the one hand, using an abstract data model makes it possible to do views, which is considered a huge advantage.

On the other hand this definition of concept (essentially 1 XML element = 1 column) means that the abstract data model is limited to a simple hashtable.  This is like programming without data structures, like C with no structs or Java with no classes or objects, and is therefore somewhat limiting.

Another disadvantage is that this data model assumes that concepts are completely context-free.  In other words, no concept can depend upon another to set the context for it's semantic meaning.  This is difficult to handle if you have a schema that allows a chunk of XML like the
following:

<depth>
    <measurement>112</measurement>
    <unit>fathoms</unit>
</depth>

In this example, what we're really concerned with is the concept of depth (which cannot be expressed in PyWrapper/TAPIR).  Depth's full semantic meaning is dependent upon both measurement and unit.  If one substitutes measurement for depth (as you would do with PyWrapper), then depth/measurement is contextually constrained by the meaning of the independent concept depth/unit.  However there is no way within the concept mapping or view system to relate these two concepts automatically, so it's up to the developer of a particular view to understand the relationship between these concepts and to treat is appropriately. 

The problem is one can't even treat it properly for some view schema using the view subsystem.  The issue here is that PyWrapper cannot do transformation on concept values.  So, if you wanted to map the above version of a depth concept into a Darwin Core MaNIS depth concept using PyWrapper, you can do one of two things.  You could Forgo translation and map depth/unit directly into dwc:DepthInMeters (which would be wrong because 112 fathoms != 112 meters) or you could set dwc:DepthInMeters to the concatenation of depth/measurement and depth/unit, in this case forming the string "112 fathoms" (which is also wrong because this is not a "depth in meters" and would prevent the generated Darwin Core instance from validating). 

The only proper way to handle this case with PyWrapper is to foresee that your user will want to use a Darwin Core view, and to create a new column in your underlying database that contains depth in meters, then map this concept into PyWrapper.  In essence, this bypasses the view subsystem and relies on concept mapping and your own translation program that exists outside of PyWrapper to do the job properly.

This is a general problem shared by all designs that use independent atomic concepts.

Please don't think I'm denigrating PyWrapper.  The problem of searching an arbitrary database structure and mapping the results into an arbitrary XML schema instance is difficult enough without having to worry about mapping non-atomic concepts.  I just wanted to point out that using atomic concepts has some serious limitations.

-Steve

Roger Hyam wrote:

> I am a little confused as to what is included in the capabilities 
> response under capabilities/schemas/schema.
>
> The annotation says:
>
> "Each known and mapped concept of a schema listed with a boolean flag 
> indicating if its searchable (default = true)."
>
> I have a genus field in my database and I map it to the following concept:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus" 
> searchable="true" />
>
> Should I also include the parent concepts:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName" 
> searchable="false" />
> <concept path="/DataSet/TaxonNames/TaxonName" searchable="false" /> 
> <concept path="/DataSet/TaxonNames" searchable="false" /> <concept 
> path="/DataSet" searchable="false" />
>
> or are they implied? I presume if they are included then they aren't 
> searchable as they can't be used to build filters.
>
> Does this:
>
> <concept path="/DataSet/TaxonNames/TaxonName/CanonicalName/Genus" 
> searchable="true" />
>
> mean:
>
>     * This concept has been mapped to data and can be used in
>       arbitrary response structures and filters in combination with
>       concepts from any of the other conceptual schemas listed in here.
>
> If so is the entire schemas section of capabilities response optional 
> if arbitrary views are not supported? Both the schemas and concepts 
> are given in views anyhow - so do we need to list them here?
>
> Your thoughts most appreciated - I may just have the wrong end of the 
> stick.
>
> Roger
>
>
>
>--
>
>-------------------------------------
> Roger Hyam
> Technical Architect
> Taxonomic Databases Working Group
>-------------------------------------
> http://www.tdwg.org
> roger at tdwg.org
> +44 1578 722782
>-------------------------------------
>  
>
>
>-----------------------------------------------------------------------
>-
>
>_______________________________________________
>tdwg-tapir mailing list
>tdwg-tapir at lists.tdwg.org
>http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
>  
>

_______________________________________________
tdwg-tapir mailing list
tdwg-tapir at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org