[tdwg-tapir] DwC extensions

Dave Vieglais vieglais at ku.edu
Thu May 22 23:51:28 CEST 2008

Hi John,
0. Is the DarwinCore a) a data model or b) a set of common terms used
for searching species related data (i.e. indexes)?  The original
intent of the DwC is (b).  Its use as a data model was secondary.
Both are required but are not necessarily the same thing and much
confusion seems to arise when the two intents are mixed.

1. Specimens and observations are within scope of the DwC, but from an
IR point of view, it would be good if DwC elements / concepts can be
used to index other content (e.g. search for documents by
dwc:ScientificName), thus providing a common mechanism for discovery
of species related information.

2. Minimalist with appropriate mechanisms for extension.  The current
situation is ridiculous with 40 or so versions floating around, all
with overlapping concepts, but no mapping between definitions.  There
is a (crude) extension mechanism that as far as I know, has only been
used by OBIS.  All other versions of the DwC have completely ignored
extension, thus leading to the current incompatibility between many
data providers.  There needs to be a robust core that ideally is never
changed, and mechanisms for extension so that specialist groups can
modify the model to their needs without loosing interoperability.

For search terms (0.b), this can be pretty much a simple vocabulary
(list of terms).  For a data model (0.a) there are many mechanisms.
The semantic web offers many examples of content definitions that can
be reused, embedded, and extended.

3. Extension is necessary when the semantics of the available
definitions are insufficient for an application.  A group (any group)
should be able to create an extension without fear of breaking the
DarwinCore.  Ideally, any creator of an extension should carefully
evaluate existing extensions and use those where appropriate.

4.  There should be a search term "GUID" in the core.  Content models
do not need to contain a GUID (though they should), but must be
identifiable and resolvable by a GUID.  Relationships between objects
should be through GUIDs.  Sounds a lot like RDF.

5.  The DRAST (Darwin Record Application Schema for Tapir) looks ok,
but is really orthogonal to the approach taken by a much broader
community (e.g. examine FOAF, DC, etc), though inline with the OGC
models.  Who knows which is "better"?  Both work, though the later
approach is arguably more difficult to utilize in "mashup" approaches
/ applications.

6. (0.a) Defining a data model for interoperability and integration of
content implies restrictive constraints on element definitions.  (0.b)
the search engine (data provider) should be able to figure it out (lax

Dave V.

On Thu, May 22, 2008 at 1:15 PM, John R. WIECZOREK <tuco at berkeley.edu> wrote:
> I got sidetracked on this days ago, but feel in the light of recent star
> 1) Is species occurrence in nature and in collections the right scope for
> the Core?
> 2) Should the general philosophy of the Core be inclusive or minimalist?
> What are the characteristics of a concept that allow it to be in the Core?
> What are the characteristics of a concept that allow it to be added to an
> existing extension?
> 3) What are the defining characteristics of a group of related concepts that
> justify the creation of a new extension? Should extensions be based on
> abstract conceptual groupings/objects (events,
> identifications/determinations, places)? Or on special interests (paleo,
> curation, interaction)? Or on the stability of the concepts (core contains
> the proven stable concepts, extensions are more volatile)?
> 4) Should there be elements in the Core and extensions to hold GUIDs linking
> them to instances of related classes of objects, such as an occurrence to a
> TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every
> extension have a non-mandatory GUID allowing for the external resolution of
> the object?
> 5) What should the Darwin Tapir application schema look like?
> 6) Is it the right approach to have restrictions on content at the concept
> definition level? Where should the line be drawn? Arguments have been raised
> in the past about the DwC and extensions' content with respect to
> being restrictive versus open to incorrect content. For example, DayOfYear
> in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as
> a dwc:dayOfYearDataType, which is defined
> in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
> <xs:simpleType name="dayOfYearDataType">
>  <xs:restriction base="xs:integer">
>  <xs:minInclusive value="1" />
>  <xs:maxInclusive value="366" />
>  </xs:restriction>
> </xs:simpleType>

More information about the tdwg-tag mailing list