Hi John, 0. Is the DarwinCore a) a data model or b) a set of common terms used for searching species related data (i.e. indexes)? The original intent of the DwC is (b). Its use as a data model was secondary. Both are required but are not necessarily the same thing and much confusion seems to arise when the two intents are mixed.
1. Specimens and observations are within scope of the DwC, but from an IR point of view, it would be good if DwC elements / concepts can be used to index other content (e.g. search for documents by dwc:ScientificName), thus providing a common mechanism for discovery of species related information.
2. Minimalist with appropriate mechanisms for extension. The current situation is ridiculous with 40 or so versions floating around, all with overlapping concepts, but no mapping between definitions. There is a (crude) extension mechanism that as far as I know, has only been used by OBIS. All other versions of the DwC have completely ignored extension, thus leading to the current incompatibility between many data providers. There needs to be a robust core that ideally is never changed, and mechanisms for extension so that specialist groups can modify the model to their needs without loosing interoperability.
For search terms (0.b), this can be pretty much a simple vocabulary (list of terms). For a data model (0.a) there are many mechanisms. The semantic web offers many examples of content definitions that can be reused, embedded, and extended.
3. Extension is necessary when the semantics of the available definitions are insufficient for an application. A group (any group) should be able to create an extension without fear of breaking the DarwinCore. Ideally, any creator of an extension should carefully evaluate existing extensions and use those where appropriate.
4. There should be a search term "GUID" in the core. Content models do not need to contain a GUID (though they should), but must be identifiable and resolvable by a GUID. Relationships between objects should be through GUIDs. Sounds a lot like RDF.
5. The DRAST (Darwin Record Application Schema for Tapir) looks ok, but is really orthogonal to the approach taken by a much broader community (e.g. examine FOAF, DC, etc), though inline with the OGC models. Who knows which is "better"? Both work, though the later approach is arguably more difficult to utilize in "mashup" approaches / applications.
6. (0.a) Defining a data model for interoperability and integration of content implies restrictive constraints on element definitions. (0.b) the search engine (data provider) should be able to figure it out (lax definitions).
Dave V.
On Thu, May 22, 2008 at 1:15 PM, John R. WIECZOREK tuco@berkeley.edu wrote:
I got sidetracked on this days ago, but feel in the light of recent star
...
- Is species occurrence in nature and in collections the right scope for
the Core? 2) Should the general philosophy of the Core be inclusive or minimalist? What are the characteristics of a concept that allow it to be in the Core? What are the characteristics of a concept that allow it to be added to an existing extension? 3) What are the defining characteristics of a group of related concepts that justify the creation of a new extension? Should extensions be based on abstract conceptual groupings/objects (events, identifications/determinations, places)? Or on special interests (paleo, curation, interaction)? Or on the stability of the concepts (core contains the proven stable concepts, extensions are more volatile)? 4) Should there be elements in the Core and extensions to hold GUIDs linking them to instances of related classes of objects, such as an occurrence to a TaxonConceptGUID, or an occurrence to a CoreGatheringGUID? Should every extension have a non-mandatory GUID allowing for the external resolution of the object? 5) What should the Darwin Tapir application schema look like? 6) Is it the right approach to have restrictions on content at the concept definition level? Where should the line be drawn? Arguments have been raised in the past about the DwC and extensions' content with respect to being restrictive versus open to incorrect content. For example, DayOfYear in the current DwC 1.4 (http://rs.tdwg.org/dwc/tdwg_dw_core.xsd) is typed as a dwc:dayOfYearDataType, which is defined in http://rs.tdwg.org/dwc/tdwg_basetypes.xsd as:
<xs:simpleType name="dayOfYearDataType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="1" /> <xs:maxInclusive value="366" /> </xs:restriction> </xs:simpleType>