[tdwg-tapir] Fwd: Tapir protocol - Harvest methods?

Roger Hyam rogerhyam at mac.com
Tue May 20 21:19:27 CEST 2008

The notion of star schemas fits very nicely with what I had in mind  
for the RDF vocabularies. It would be good if any one of the CSV files  
in the star corresponds to a class in the vocabulary and the columns  
in the CSV files map to properties in vocabulary (or some other common  
vocabulary such as VCARD or DC etc. It would then be trivial to map  
the start to a semantic representation (such as the RDF returned from  
an LSID) of vice versa.

We can evolve the vocabularies to help this along.

This is probably all obvious but worth stating.

All the best,


On 20 May 2008, at 16:36, Markus Döring wrote:

> Renato,
> complex data can also be represented by tab files, with a file for
> each extension that has a pointer in the first column.
> That is what we originally had in mind with the star scheme.
> Markus
> On 20 May, 2008, at 17:16, Renato De Giovanni wrote:
>> Hi Markus,
>> Since DarwinCore is a generic list of elements that can be used by
>> any application schema, I think it's OK to use them in the new schema
>> that you're suggesting.
>> I agree that ideally we should try to define and use a common format
>> for index files, although it seems that we will have at least two:
>> csv for simple data and probably another one in XML for complex data,
>> right?
>> Regarding the XML for complex data, if you manage to find a generic
>> schema that can be used in different contexts (not only biodiversity
>> data) then I agree we could avoid extra attributes in the respective
>> capabilities element. Otherwise, I would prefer to see some extra
>> attribute (such as "outputModel") giving more information about the
>> XML. Since TAPIR was designed to be generic, this should not be a
>> problem because clients and networks are already free to decide and
>> to mandate specific TAPIR capabilities. This doesn't mean that there
>> will be lots of formats for index files. It's a matter of agreeing on
>> a common format but still keeping the protocol generic to allow
>> different uses by other communities.
>> I also agree we could advertise the index file through some new TAPIR
>> element instead of using the custom slot.
>> Best Regards,
>> --
>> Renato
>> On 16 May 2008 at 10:29, Markus Döring wrote:
>>> Renato,
>>> I was thinking along those lines too. It would be nice for TAPIRs to
>>> announce the availablility of the index files. I wouldnt mind adding
>>> it even to the regular tapir schema once it has proven to work with
>>> the custom slot approach you have given.
>>> Regarding star shaped data I would prefer to agree on one format
>>> instead of allowing different ones to save consumers from this pain.
>>> There is a straight forward xml serialisation for this scheme that  
>>> we
>>> could use instead of tab files:
>>> <record uri="">
>>>  <dwc:property1 />
>>>  <dwc:property2 />
>>>  <extA:record>
>>>    <extA:property1 />
>>>    <extA:property2 />
>>>  </extA:record>
>>>  <extB:record>
>>>    <extB:property1 />
>>>    <extB:property2 />
>>>  <extB:record>
>>> <record>
>>> Advantage is, it can be produced by TAPIR software and xml
>>> serialisation is required for many services, eg RSS anyway.
>>> But then again the whole point of the index files is that they are
>>> easy to generate and consume. On the other hand this xml structure  
>>> is
>>> pretty simple to process and can be genereated from databases like
>>> sqlserver that have xml output straight away without the need of
>>> scripting.
>>> That touches a different issue I am facing with the star scheme by
>>> the
>>> way. I have created an identification extension for darwin core that
>>> holds the historical list of identification events and their  
>>> outcome.
>>> This is a YAML section of the metafile describing the columns for
>>> this
>>> extension through fully qualified concepts ala TAPIR:
>>> identification:
>>>  - http://rs.tdwg.org/dwc/dwcore/ScientificName
>>>  - http://rs.tdwg.org/dwc/dwcore/AuthorYearOfScientificName
>>>  - http://rs.tdwg.org/dwc/dwcore/Family
>>>  - http://rs.tdwg.org/dwc/dwcore/IdentificationQualifier
>>>  - http://rs.tdwg.org/dwc/curatorial/DateIdentified
>>>  - http://rs.tdwg.org/dwc/curatorial/IdentifiedBy
>>> When creating this I realised that pretty much all concepts I was
>>> interested in already existed in darwin core or the curatorial
>>> extension. Wouldnt it be wise to reuse those concepts? Or are they
>>> strictly tight to the idea of a current identification and therefore
>>> cant be used for historical ones? This is probably more of a darwin
>>> core question than TAPIR, but we are all on this list anyway ...
>>> The xml in that case would look sth like this:
>>> <record uri="http://mygarden.com/specimen/plants/54321-423-43-54-6-3-24-44
>>> ">
>>>  <dwc:ScientificName>Aster alpinus subsp.
>>> parviceps<dwc:ScientificName>
>>>  ...
>>>  <ident:record>
>>>    <dwc:ScientificName>Aster alpinus<dwc:ScientificName>
>>>    <dwc:AuthorYearOfScientificName>L.</
>>> dwc:AuthorYearOfScientificName>
>>>    <dwc:Family>Asteraceae<dwc:Family>
>>>    <cur:DateIdentified>1913-03-12</cur:DateIdentified>
>>>    <cur:IdentifiedBy>Karl Marx</cur:IdentifiedBy>
>>>  </ident:record>
>>>  <ident:record>
>>>    <dwc:ScientificName>Aster alpinus subsp.
>>> parviceps<dwc:ScientificName>
>>>    <dwc:AuthorYearOfScientificName>Novopokr.</
>>> dwc:AuthorYearOfScientificName>
>>>    <dwc:Family>Asteraceae<dwc:Family>
>>>    <cur:DateIdentified>2003-09-07</cur:DateIdentified>
>>>    <cur:IdentifiedBy>Keith Richards</cur:IdentifiedBy>
>>>  </ident:record>
>>> <record>
>>> Markus
>> _______________________________________________
>> tdwg-tapir mailing list
>> tdwg-tapir at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir

More information about the tdwg-tag mailing list