[tdwg-content] Latest DwC as TAPIR output model?
Tim Robertson (GBIF)
trobertson at gbif.org
Sun Aug 8 11:34:36 CEST 2010
> Thanks all for your responses.
> I now understand the Simple Darwin Core a lot better.
> So we have an official tapir model for Simple Darwin Core and that
> will suffice for us for now.
> If we wish to implement it in Tapirlink what is required?
> Another option is to use IPT instead of Tapirlink to provide our
> data - is that correct?
> How stable is IPT in terms of development - is it still beta?
> Is IPT difficult to install and what is required?
You could evaluate the IPT for this yes.
There will shortly be a release of the IPT (the RC4 release which will
be available for first testing next week) which will have some
enhancements and bug fixes and I would recommend waiting for this
before evaluating. The IPT has a TAPIR interface, although I am not
sure how much it has been tested in live deployments, as GBIF use the
DarwinCore Archive interface to harvest content as it is far quicker.
In addition GBIF, have initiated a major refactoring of the IPT to
address the feedback received in the first year of it's use.
Specifically, users have requested that it be lightened up
significantly (e.g. server requirements), made easier to install and
configure, improve performance and address the intuitiveness of some
of the pages on the user interface relating to the publication state
and the registration. We are starting a 4 week coding camp on this
tomorrow, and I anticipate testing of certain components to begin
towards the end of September, known as a "pre-RC5 testing phase".
This will not be a complete IPT and we will canvass the community to
determine how much functionality should be supported by the IPT for
RC5. This means there is potential that the IPT will not support
TAPIR but only the DarwinCore-Archive output format. Following RC5 we
will only address bugs and halt new functionality additions until the
IPT is considered fully stable and considered to be at a General
If you have experience using, and are happy with TAPIRLink and the
operation of your TAPIR network then TAPIRLink would likely be the
best product to use for the time being, and you might consider
evaluating the IPT either at RC4, or get involved around the time RC5
components are available for testing. What the IPT provides:
- the publishing of checklist and occurrence information from either
SQL sources or CSV, tab file (etc) upload
- the ability to author dataset metadata documents to accompany the
data, or where the dataset is not digitally available (e.g.
documenting undigitised collections)
- browse a web application in the IPT for each dataset
- register with GBIF associating the dataset with the Institutions of
interest (the physical owners and the virtual hosting institutes)
- Fast full dataset transfer: due to the DarwinCore-Archive output
format, it becomes trivial to transfer an entire dataset in a single
- Support "many to one" extensions to the core records (occurrence or
taxon). Extension definitions may be registered for others to use.
I hope this helps guide your decision. Please let me know if I can
provide more information on the IPT.
> Really appreciate your quick feedback on this stuff.
> Best Regards
> From: Markus Döring [m.doering at mac.com]
> Sent: 08 August 2010 05:09
> To: Tim Robertson
> Cc: tdwg-content at lists.tdwg.org; Dave Martin; Ajay Ranipeta; Paul
> Subject: Re: [tdwg-content] Latest DwC as TAPIR output model?
> not sure where exactly the conversation started, but I assume you
> are aware of the official tapir model for the simple darwin core? It
> looks as if this is up to date:
> In case you would like a flexible model that also allows for non
> flat extension records, we have created a new terms dwc tapir model
> which is used by the IPT:
> The schema referenced is this one:
> it defines a "DarwinExtensions" hook element in addition to the
> regular dwc terms. But no restrictions are made to how the
> extensions are constructed, it keeps it wide open.
> On Aug 7, 2010, at 9:47, Tim Robertson (GBIF) wrote:
>> Hi all,
>> This thread started as a query about the availability of the latest
>> DwC in TAPIR output models, and specifically for use with TAPIRLink.
>> John correctly points out that the TDWG list should have been
>> copied from the start, and I am remedying that now. I have
>> condensed the thread to only the main points to make it easy to
>> digest, and suggest we continue the discussion here. The current
>> point in the discussion relates to the SimpleDarwinCore being flat
>> in nature, and therefore why it holds a subset of all the DwC terms
>> documented in the standard.
>> On Aug 7, 2010, at 7:59 AM, John Wieczorek wrote:
>>> Hi folks,
>>> I wish that this conversation had taken place on the tdwg-content
>>> list so that others might take advantage of the discussion, but I
>>> don't feel comfortable moving someone else's conversations there.
>> Good point John. Sorry all, this was my mistake.
>>> The Simple Darwin Core Schema is up to date. It does not contain
>>> the MeasurementOrFact terms, nor the ResourceRelationship terms,
>>> precisely because, as Renato said, the Simple Darwin Core is flat
>>> and the other terms in those two sets make little sense in a flat
>>> schema because they would allow you to share no more than one
>>> MeasurementOrFact and one ResourceRelationship.
>>> There is a Generic Darwin Core schema that provides a model for
>>> building other schemas from the Darwin Core, but it is not an
>>> application schema as the SimpleDarwinCore schema is. At least two
>>> groups are using the Generic Darwin Core schema imported into
>>> other schemas to extend the capabilities of the Generic Darwin
>>> Core - the germ plasm folks and the Apiary folks. The former have
>>> a published schema at http://code.google.com/p/darwincore/source/browse/#svn/
>>> trunk/xsd/profiles/germplasm, while the latter are working on one
>>> for herbarium sheet data entry, for which they are interested in
>>> many more types of annotations than just the Identification class
>>> found in Darwin Core presently.
>>> If there is more that you want to share than just the Simple
>>> Darwin Core, but don't actually need a more complex structure, one
>>> simple way to do it would be to use the dynamicProperties term
>>> from Simple Darwin Core. The description is at http://rs.tdwg.org/dwc/terms/index.htm#dynamicProperties
>>> and further explanation on its use can be found at in the "Do
>>> More with Simple Darwin Core" section (http://rs.tdwg.org/dwc/terms/simple/index.htm#domore
>>> ) of the Simple Darwin Core page.
>>> Hope that helps,
>>> Simple Darwin Core is almost what I want but looking at the schema
>>> here http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd
>>> it is lacking the auxillary terms that occur that are listed here http://rs.tdwg.org/dwc/terms/index.htm
>>> - which are the Measurement of Fact terms and the Resource
>>> Relationship terms.
>>> So ideally I would like all the terms listed here http://rs.tdwg.org/dwc/terms/index.htm
>>> included however if that is too difficult or would take too long
>>> then I would be happy with the Simple Darwin Core schema as
>>> described here http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd
>>> I'm not sure if the simple DwC schema is up-to-date, but it seems to
>>> contain more than 150 elements. I think the idea behind "simple"
>>> in the fact that the schema defines a flat structure, not that it
>>> contains a subset of DarwinCore.
>>> I can check if that schema is up-to-date and then build the output
>>> but first I would just need to know from you guys if this kind of
>>> will suit your needs.
>>>> I was hoping the whole schema :-)
>>>> Here in Australia the OZCAM community (all Australian museums)
>>>> has just
>>>> migrated it's schema to the new standard - using about 60 or 70
>>>> of the
>>>> fields so we would like to be able to expose all of them.
>>>> With my TDWG exec hat on I'd like to think that any data
>>>> providers could
>>>> expose their data using the new DwC schema and GBIF being the
>>>> consumer of data should be doing their utmost to facilitate that
>>>> TapirLink is the ideal way to do that as many are using it
>>>> already and so
>>>> upgrading providers to the new schema using Tapirlink should be
>>>>> I would think an output based on the simple DwC should be made
>>>>> and GBIF will promote that with TAPIRLink, but I leave it to
>>>>> Paul to
>>>>> shout if he anticipates something else?
>>>>>> You're right. There's no output model for the latest DwC, but
>>>>>> it should
>>>>>> easy to make one. Is there a specific XML Schema for the data
>>>>>> are you planning to use the simple DwC?
>>>>>> If you want, just let me know the schema and I can quickly
>>>>>> build a
>>>>>> for it.
>>>>>>> Do you know if anyone put together output models for the TAPIR
>>>>>>> the latest DwC?
>>>>>>> http://rs.tdwg.org/tapir/cs/dwc/ does not appear to have any
>>>>>>> and the
>>>>>>> folks in Australia are looking into this. Just want to make
>>>>>>> there aren't any floating around already.
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
> This e-mail message has been scanned for Viruses and Content and
> by MailMarshal
> UNTIL 19 SEPTEMBER 2010
> A dynamic program of events celebrating
> the International Year of Biodiversity
> TALKS • TOURS • WORKSHOPS
> KIDS ACTIVITIES • MUSIC • DISPLAYS
> The Australian Museum.
> The views in this email are those of the user and do not necessarily
> reflect the views of the Australian Museum. The information
> contained in this email message and any accompanying files is or may
> be confidential and is for the intended recipient only. If you are
> not the intended recipient, any use, dissemination, reliance,
> forwarding, printing or copying of this email or any attached files
> is unauthorised. If you are not the intended recipient, please
> delete it and notify the sender. The Australian Museum does not
> guarantee the accuracy of any information contained in this e-mail
> or attached files. As Internet communications are not secure, the
> Australian Museum does not accept legal responsibility for the
> contents of this message or attached files.
> Please consider the environment before printing this email.
More information about the tdwg-content