Re: [tdwg-tapir] Fwd: Tapir protocol - Harvest methods?

20 May 2008

      The notion of star schemas fits very nicely with what I had in mind  
for the RDF vocabularies. It would be good if any one of the CSV files  
in the star corresponds to a class in the vocabulary and the columns  
in the CSV files map to properties in vocabulary (or some other common  
vocabulary such as VCARD or DC etc. It would then be trivial to map  
the start to a semantic representation (such as the RDF returned from  
an LSID) of vice versa.

We can evolve the vocabularies to help this along.

This is probably all obvious but worth stating.

All the best,

Roger

On 20 May 2008, at 16:36, Markus Döring wrote:
...
Renato,
complex data can also be represented by tab files, with a file for
each extension that has a pointer in the first column.
That is what we originally had in mind with the star scheme.
Markus
On 20 May, 2008, at 17:16, Renato De Giovanni wrote:
...
Hi Markus,
Since DarwinCore is a generic list of elements that can be used by
any application schema, I think it's OK to use them in the new schema
that you're suggesting.
I agree that ideally we should try to define and use a common format
for index files, although it seems that we will have at least two:
csv for simple data and probably another one in XML for complex data,
right?
Regarding the XML for complex data, if you manage to find a generic
schema that can be used in different contexts (not only biodiversity
data) then I agree we could avoid extra attributes in the respective
capabilities element. Otherwise, I would prefer to see some extra
attribute (such as "outputModel") giving more information about the
XML. Since TAPIR was designed to be generic, this should not be a
problem because clients and networks are already free to decide and
to mandate specific TAPIR capabilities. This doesn't mean that there
will be lots of formats for index files. It's a matter of agreeing on
a common format but still keeping the protocol generic to allow
different uses by other communities.
I also agree we could advertise the index file through some new TAPIR
element instead of using the custom slot.
Best Regards,
--
Renato
On 16 May 2008 at 10:29, Markus Döring wrote:
...
Renato,
I was thinking along those lines too. It would be nice for TAPIRs to
announce the availablility of the index files. I wouldnt mind adding
it even to the regular tapir schema once it has proven to work with
the custom slot approach you have given.
Regarding star shaped data I would prefer to agree on one format
instead of allowing different ones to save consumers from this pain.
There is a straight forward xml serialisation for this scheme that  
we
could use instead of tab files:
<record uri="">
 <dwc:property1 />
 <dwc:property2 />
 <extA:record>
   <extA:property1 />
   <extA:property2 />
 </extA:record>
 <extB:record>
   <extB:property1 />
   <extB:property2 />
 <extB:record>
<record>
Advantage is, it can be produced by TAPIR software and xml
serialisation is required for many services, eg RSS anyway.
But then again the whole point of the index files is that they are
easy to generate and consume. On the other hand this xml structure  
is
pretty simple to process and can be genereated from databases like
sqlserver that have xml output straight away without the need of
scripting.
That touches a different issue I am facing with the star scheme by
the
way. I have created an identification extension for darwin core that
holds the historical list of identification events and their  
outcome.
This is a YAML section of the metafile describing the columns for
this
extension through fully qualified concepts ala TAPIR:
identification:
 - http://rs.tdwg.org/dwc/dwcore/ScientificName
 - http://rs.tdwg.org/dwc/dwcore/AuthorYearOfScientificName
 - http://rs.tdwg.org/dwc/dwcore/Family
 - http://rs.tdwg.org/dwc/dwcore/IdentificationQualifier
 - http://rs.tdwg.org/dwc/curatorial/DateIdentified
 - http://rs.tdwg.org/dwc/curatorial/IdentifiedBy
When creating this I realised that pretty much all concepts I was
interested in already existed in darwin core or the curatorial
extension. Wouldnt it be wise to reuse those concepts? Or are they
strictly tight to the idea of a current identification and therefore
cant be used for historical ones? This is probably more of a darwin
core question than TAPIR, but we are all on this list anyway ...
The xml in that case would look sth like this:
<record uri="http://mygarden.com/specimen/plants/54321-423-43-54-6-3-24-44
">
 <dwc:ScientificName>Aster alpinus subsp.
parviceps<dwc:ScientificName>
 ...
 <ident:record>
   <dwc:ScientificName>Aster alpinus<dwc:ScientificName>
   <dwc:AuthorYearOfScientificName>L.</
dwc:AuthorYearOfScientificName>
   <dwc:Family>Asteraceae<dwc:Family>
   <cur:DateIdentified>1913-03-12</cur:DateIdentified>
   <cur:IdentifiedBy>Karl Marx</cur:IdentifiedBy>
 </ident:record>
 <ident:record>
   <dwc:ScientificName>Aster alpinus subsp.
parviceps<dwc:ScientificName>
   <dwc:AuthorYearOfScientificName>Novopokr.</
dwc:AuthorYearOfScientificName>
   <dwc:Family>Asteraceae<dwc:Family>
   <cur:DateIdentified>2003-09-07</cur:DateIdentified>
   <cur:IdentifiedBy>Keith Richards</cur:IdentifiedBy>
 </ident:record>
<record>
Markus
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
_______________________________________________
tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir

Re: [tdwg-tapir] Fwd: Tapir protocol - Harvest methods?

Roger Hyam