[tdwg-tapir] Fwd: Tapir protocol - Harvest methods?

Tue May 20 17:16:57 CEST 2008

Hi Markus,

Since DarwinCore is a generic list of elements that can be used by 
any application schema, I think it's OK to use them in the new schema 
that you're suggesting.

I agree that ideally we should try to define and use a common format 
for index files, although it seems that we will have at least two: 
csv for simple data and probably another one in XML for complex data, 
right?

Regarding the XML for complex data, if you manage to find a generic 
schema that can be used in different contexts (not only biodiversity 
data) then I agree we could avoid extra attributes in the respective 
capabilities element. Otherwise, I would prefer to see some extra 
attribute (such as "outputModel") giving more information about the 
XML. Since TAPIR was designed to be generic, this should not be a 
problem because clients and networks are already free to decide and 
to mandate specific TAPIR capabilities. This doesn't mean that there 
will be lots of formats for index files. It's a matter of agreeing on 
a common format but still keeping the protocol generic to allow 
different uses by other communities.

I also agree we could advertise the index file through some new TAPIR 
element instead of using the custom slot.

Best Regards,
--
Renato 

On 16 May 2008 at 10:29, Markus Döring wrote:

> Renato,
> I was thinking along those lines too. It would be nice for TAPIRs to  
> announce the availablility of the index files. I wouldnt mind adding  
> it even to the regular tapir schema once it has proven to work with  
> the custom slot approach you have given.
> 
> Regarding star shaped data I would prefer to agree on one format  
> instead of allowing different ones to save consumers from this pain.  
> There is a straight forward xml serialisation for this scheme that we  
> could use instead of tab files:
> 
> <record uri="">
>    <dwc:property1 />
>    <dwc:property2 />
>    <extA:record>
>      <extA:property1 />
>      <extA:property2 />
>    </extA:record>
>    <extB:record>
>      <extB:property1 />
>      <extB:property2 />
>    <extB:record>
> <record>
> 
> 
> Advantage is, it can be produced by TAPIR software and xml  
> serialisation is required for many services, eg RSS anyway.
> But then again the whole point of the index files is that they are  
> easy to generate and consume. On the other hand this xml structure is  
> pretty simple to process and can be genereated from databases like  
> sqlserver that have xml output straight away without the need of  
> scripting.
> 
> That touches a different issue I am facing with the star scheme by the  
> way. I have created an identification extension for darwin core that  
> holds the historical list of identification events and their outcome.  
> This is a YAML section of the metafile describing the columns for this  
> extension through fully qualified concepts ala TAPIR:
> 
> identification:
>    - http://rs.tdwg.org/dwc/dwcore/ScientificName
>    - http://rs.tdwg.org/dwc/dwcore/AuthorYearOfScientificName
>    - http://rs.tdwg.org/dwc/dwcore/Family
>    - http://rs.tdwg.org/dwc/dwcore/IdentificationQualifier
>    - http://rs.tdwg.org/dwc/curatorial/DateIdentified
>    - http://rs.tdwg.org/dwc/curatorial/IdentifiedBy
> 
> When creating this I realised that pretty much all concepts I was  
> interested in already existed in darwin core or the curatorial  
> extension. Wouldnt it be wise to reuse those concepts? Or are they  
> strictly tight to the idea of a current identification and therefore  
> cant be used for historical ones? This is probably more of a darwin  
> core question than TAPIR, but we are all on this list anyway ...
> 
> The xml in that case would look sth like this:
> 
> <record uri="http://mygarden.com/specimen/plants/54321-423-43-54-6-3-24-44 
> ">
>    <dwc:ScientificName>Aster alpinus subsp.  
> parviceps<dwc:ScientificName>
>    ...
>    <ident:record>
>      <dwc:ScientificName>Aster alpinus<dwc:ScientificName>
>      <dwc:AuthorYearOfScientificName>L.</dwc:AuthorYearOfScientificName>
>      <dwc:Family>Asteraceae<dwc:Family>
>      <cur:DateIdentified>1913-03-12</cur:DateIdentified>
>      <cur:IdentifiedBy>Karl Marx</cur:IdentifiedBy>
>    </ident:record>
>    <ident:record>
>      <dwc:ScientificName>Aster alpinus subsp.  
> parviceps<dwc:ScientificName>
>      <dwc:AuthorYearOfScientificName>Novopokr.</ 
> dwc:AuthorYearOfScientificName>
>      <dwc:Family>Asteraceae<dwc:Family>
>      <cur:DateIdentified>2003-09-07</cur:DateIdentified>
>      <cur:IdentifiedBy>Keith Richards</cur:IdentifiedBy>
>    </ident:record>
> <record>
> 
> 
> Markus