tdwg-tag
Threads by month
- ----- 2024 -----
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
May 2006
- 22 participants
- 41 discussions
[tdwg-tapir] Schema modification: r534 - tags/protocol_proposal_2004 trunk/protocol
by tdwg-tapir@lists.tdwg.org 19 May '06
by tdwg-tapir@lists.tdwg.org 19 May '06
19 May '06
Author: markus
Date: 2006-05-19 14:52:27 +0200 (Fri, 19 May 2006)
New Revision: 534
Added:
tags/protocol_proposal_2004/report_korean_v0.1.pdf
trunk/protocol/docs/
Removed:
trunk/protocol/Makefile
trunk/protocol/makepdf
trunk/protocol/report.pdf
trunk/protocol/report.txt
trunk/protocol/report_korean_v0.1.pdf
Log:
moved 2004 report into 2004 tag and removed it from trunk. Added new documentation "docs" folder
Copied: tags/protocol_proposal_2004/report_korean_v0.1.pdf (from rev 533, trunk/protocol/report_korean_v0.1.pdf)
Deleted: trunk/protocol/Makefile
===================================================================
--- trunk/protocol/Makefile 2006-05-17 14:22:41 UTC (rev 533)
+++ trunk/protocol/Makefile 2006-05-19 12:52:27 UTC (rev 534)
@@ -1,25 +0,0 @@
-#########
-### Latex
-
-%.pdf: %.ps
- ps2pdf $<
-
-%.ps: %.dvi
- dvips -Ppdf $< -o $@
-
-%.rtf: %.tex
- latex2rtf -o $@ $<
-
-%.dvi: %.tex
- latex $<
-
-%.eps: %.dia
- dia --nosplash -e $@ $<
-
-
-cleantex:
- rm -f *~ *.dvi *.log *.aux
-
-cleanps: cleantex
- rm -f *.ps
-
Deleted: trunk/protocol/makepdf
===================================================================
--- trunk/protocol/makepdf 2006-05-17 14:22:41 UTC (rev 533)
+++ trunk/protocol/makepdf 2006-05-19 12:52:27 UTC (rev 534)
@@ -1,4 +0,0 @@
-echo "Transforming to Tex format..."
-txt2tags report.txt
-echo "Transforming to PDF format..."
-make report.pdf
Deleted: trunk/protocol/report.pdf
===================================================================
(Binary files differ)
Deleted: trunk/protocol/report.txt
===================================================================
--- trunk/protocol/report.txt 2006-05-17 14:22:41 UTC (rev 533)
+++ trunk/protocol/report.txt 2006-05-19 12:52:27 UTC (rev 534)
@@ -1,2770 +0,0 @@
-GBIF Data Access and Database Interoperability \\[20mm] A unified protocol for search and retrieval of distributed data
-Markus D�ring \& Renato De Giovanni
-September 2004
-%! Target : tex
-%! Options : --toc --toc-level 6 --enum-title
-%! Encoding : iso-8859-1
-%! PostProc(tex): '\\section' '\\newpage\n\\section'
-%! PostProc(tex): '\[\[' '\\footnote{\\tiny{'
-%! PostProc(tex): '\]\]' '}}'
-%! PostProc(tex): '#' '\#'
-%! PostProc(tex): '--blankline--' '\\addvspace{5mm}'
-%! PostProc(tex): '--newpage--' '\\newpage'
-%! PostProc(tex): '--breakline--' '\\newline'
-
-
-= Summary =
-
-This report is the result of a study promoted by the Global Biodiversity Information Facility (GBIF) to integrate two of the main protocols being used for search and retrieval of biodiversity data. Together, DiGIR and BioCASe are currently serving approximately 40 million records from distributed and heterogeneous databases of biological collections and observation data holders.
-
-The unification of DiGIR and BioCASe will bring greater interoperability between the existing networks, therefore facilitating development of tools, stimulating participation of new data providers, and increasing access to biodiversity data. This effort is also seen as a considerable step towards having a single standard protocol to be used within the biodiversity area.
-
-This document contains a review of both protocols, a general study about other specifications and technologies that could serve as a potential source for new ideas, an integration proposal that could be implemented by the existing tools in a short term, and finally some additional recommendations.
-
-= Acknowledgements =
-
-The authors of this report wish to thank all individuals and institutions that contributed to and participated in various aspects of this work, in special:
-
-Anton G�ntsch, Dave Vieglais, Donald Hobern, Javier de la Torre, John Wieczorek, Stan Blum for their permanent contribution and assistance during the whole process.
-
-Gregor Hagedorn, Mauro Mu�oz, Ronald Bourret, Wolfgang Lipp for all suggestions and healthy discussions.
-
-Global Biodiversity Information Facility (GBIF) for promoting and supporting the whole work.
-
-University of Kansas Natural History Museum and Biodiversity Research Center, California Academy of Sciences, and Museum of Vertebrate Zoology (MVZ) in Berkeley for creating and promoting DiGIR.
-
-Botanischer Garten und Botanisches Museum Berlin-Dahlem (BGBM) for creating and promoting BioCASE and for all assistance during this work.
-
-Centro de Ref�rencia em Informa��o Ambiental (CRIA) for its contributions to DiGIR and for all assistance during this work.
-
-= Revision history =
-
- || Date | Responsible | Notes |
- | 23 Sep 2004 | Markus D�ring & Renato De Giovanni | Initial version |
- | 16 Nov 2004 | Renato De Giovanni | Minor corrections |
-
-= Introduction =
-
-During GBIF's Data Access and Database Interoperability subcommittee meeting held in Oaxaca 2004, the importance of integrating DiGIR and BioCASe has been fully recognized and considered a priority. Since then, GBIF promoted a study involving representatives from both protocols.
-
-Specialists from the biodiversity community and representatives from the existing biodiversity networks have been contacted to provide comments and suggestions about the protocol they were using. Many people participated, and the whole work has been carried out by meetings, electronic messages and a public wiki web site[[http://ww3.bgbm.org/protocolwiki/]].
-
-Both original protocols have been reviewed and documented, and the study also included considerations taken from other existing standards such as XQuery, OGC specifications and SOAP.
-
-The results from that work have been documented in this report, including a complete proposal for the new integrated protocol.
-
-= Review of the main protocols =
-
-== DiGIR protocol ==
-
-=== Brief history ===
-
-DiGIR[[http://digir.net]] was conceived as a replacement for the Z39.50[[http://www.loc.gov/z3950/agency/]] protocol being used by the Species Analyst network (TSA)[[http://speciesanalyst.net]]. TSA was created as part of a research project that aimed at providing access to distributed natural history collections and observation databases with over 120 connected data sources.
-
-The Z39.50 protocol was considered too complicated, resulting in a steep learning curve for developers, and in difficulties to be accepted by network administrators. Other technical reasons at that time included the lack of a more formal language to define conceptual schemas, and also limited support for XML and Unicode.
-
-The DiGIR protocol specification, its "default" conceptual schema known as DarwinCore[[http://speciesanalyst.net/docs/dwc/index.html]], and the development of the first related software originally involved three institutions: University of Kansas Natural History Museum and Biodiversity Research Center[[http://www.nhm.ku.edu]], California Academy of Sciences[[http://www.calacademy.org]], and Museum of Vertebrate Zoology[[http://www.mip.berkeley.edu/mvz/]] in Berkeley. The general strategy for the project was:
-
-- To use open protocols and standards (HTTP, XML and UDDI).
-
-- To clearly de-couple protocol, software and semantics.
-
-- To ease software installation and configuration for data providers as much as possible.
-
-
-Software tools were developed in a collaborative environment[[http://sourceforge.net/projects/digir]] following the "open source" model, and the results were made available under public license.
-
-The Species Analyst data providers are gradually migrating to DiGIR and building separate thematic networks such as Manis[[http://elib.cs.berkeley.edu/manis/]], HerpNet[[http://herpnet.org/]], FishNet[[http://habanero.nhm.ku.edu/fishnet/]] and ORNIS[[http://www.specifysoftware.org/Informatics/informaticsornis/]]. Other networks around the world have also adopted DiGIR as the main protocol (speciesLink[[http://splink.cria.org.br/]], OBIS[[http://iobis.org/]]) or as one of the supported protocols (GBIF[[http://www.gbif.net/]]).
-
-=== Description ===
-
-DiGIR is an XML-based protocol over HTTP to retrieve data from independent and heterogeneous databases. To achieve a uniform virtual view from network participants, each DiGIR resource needs to choose one or more conceptual schemas to use and then map some fields from a local database against the elements from those schemas. A conceptual schema is a common data model that serves as a reference on networks of federated databases.
-
-Different conceptual schemas can be defined by each network, and therefore the protocol can be used by any community regardless the discipline. In DiGIR, a conceptual schema is represented as an XML Schema that follows a specific format and defines a set of elements (concepts) in a flat list. An example of such a schema is DarwinCore[[http://digir.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd]], the conceptual schema originally proposed to be used by biological data sources in DiGIR-enabled networks.
-
-A typical DiGIR network using the components developed by the original project involves three different software. The first one is a client software, known as the "presentation layer". The client software interacts with end users through some interface, and communicates with another component known as the "portal engine". The portal engine is a query dispatcher. It knows the addresses of several data providers either by manual configuration or by interaction with UDDI registries. Queries are then distributed to the selected data providers, which are responsible for translating DiGIR messages to the query language used by local databases and retrieving the requested data.
-
-Other type of interactions and architectures are possible, including monitoring services, indexing services, and even portals on top of other portals.
-
-Data providers can serve data from several "resources". A DiGIR resource is seen as a local data source that has mapped its structure against a single conceptual schema, although a conceptual schema can be an extension from another one. These schemas provide a uniform view for heterogeneous resources. It is also known that a DiGIR resource can map its data structure against more than one conceptual schema, thus allowing for the existence of completely independent and modularized schemas. Networks can for instance map their DiGIR resources against a conceptual schema which is common among other networks, but also map additional conceptual schemas which could be specific to their context.
-
-The original scope of the DiGIR protocol schema[[http://digir.net/schema/protocol/2003/1.0/digir.xsd]] is to validate messages exchanged only with providers. Communication with resources can only be done through providers. In that case resources are referenced inside the messages (they don't have an access point such as the providers' URL). Messages exchanged between other components, like presentation layers and query dispatchers are not covered by DiGIR.
-
-=== Technical details ===
-
-**General message format**
-
-DiGIR messages have three main sections inside a <request> or <response> root element. The <header> section is present in all messages, and it includes information about the software version which produced the message, the time stamp, the URL or IP address from where the message originated, the URL or IP address to where the message is destinated, and the type of request. Some types of request may have an additional attribute in the header destination element to reference a DiGIR resource.
-
-After the <header> there may be a content section depending on the type of message. In requests it may contain filter conditions and specifications about the concepts to be returned. In responses it contains the structured content being returned.
-
-Responses have an additional <diagnostics> section that carries information about possible errors, warnings, or simply some additional information such as the number of records that matched a query.
-
-Since a DiGIR provider can be seen as a query dispatcher regarding its local resources, a provider should be able to accept multiple destination elements in the header related to more than one local resource. In this case, responses from resources are concatenated and enclosed in a <responseWrapper> root element.
-
-The next pages cover all types of requests and responses with several examples of DiGIR messages.
-
---newpage--
-**Request and response types**
-
-There are three different request and response types that could be used to exchange messages with a DiGIR service:
-
---blankline--
-//DiGIR metadata operation//
-
-Metadata requests ask for information about a provider service and its resources. It is also the default request when a provider's URL is accessed without any parameters. Metadata requests contain only the header section, and the destination element should be a provider's access point.
-
-Sample DiGIR metadata request using an XML message:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://digir.net/schema/protocol/2003/1.0">
- <header>
- <version>1.0.0</version>
- <sendTime>2004-07-20T08:51:50-0500</sendTime>
- <source>127.0.0.1</source>
- <destination>
- http://example.net/provider/DiGIR.php
- </destination>
- <type>metadata</type>
- </header>
-</request>
-```
-
-Metadata responses include general information about the provider (only name and access point); about the entity who is hosting the service (name, web site address, contact information, and an optional abstract); and about each available resource. Resources also have some general information such as name, code (used to reference resources in requests); content information such as number of available records, timestamp of the last updated record, abstract, keywords, and record basis; technical information such as maximum number of records that can be retrieved, minimum number of characters to be used in <like> operations; and other elements related to how to make citations and if there are any restrictions to use retrieved data. An important part of resource metadata is what conceptual schemas have been mapped.
-
---newpage--
-Sample DiGIR metadata response:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response xmlns="http://digir.net/schema/protocol/2003/1.0">
- <header>
- <version>$Revision$</version>
- <sendTime>2004-07-20T08:51:55-0500</sendTime>
- <source>http://example.net:80/provider/DiGIR.php</source>
- <destination>127.0.0.1</destination>
- <type>metadata</type>
- </header>
- <content>
- <metadata>
- <provider>
- <name>Example Biodiversity Center</name>
- <accessPoint>
- http://example.net:80/provider/DiGIR.php
- </accessPoint>
- <implementation>$Revision$</implementation>
- <host>
- <name>Some Hosting Institution</name>
- <code>SHI</code>
- <relatedInformation>
- http://example.net/
- </relatedInformation>
- <contact type="technical">
- <name>Person A</name>
- <title>Support specialist</title>
- <emailAddress>person_a(a)example.net</emailAddress>
- <phone>+11 22 333333</phone>
- </contact>
- <abstract>Default hosting institution to be used
- in examples.
- </abstract>
- </host>
- <resource>
- <name>Herbarium dataset</name>
- <code>HDS</code>
- <relatedInformation>
- http://example.net/herbarium
- </relatedInformation>
- <contact type="administrative">
- <name>Person B</name>
- <title>Curator</title>
- <emailAddress>person_b(a)example.net</emailAddress>
- <phone>+11 22 334444</phone>
- </contact>
- <contact type="technical">
- <name>Person C</name>
- <title>Systems Analyst</title>
- <emailAddress>person_c(a)example.net</emailAddress>
- <phone>+11 22 334445</phone>
- </contact>
- <abstract>
- Plant specimen dataset from some region.
- </abstract>
- <keywords>plant, specimen</keywords>
- <citation>Herbarium dataset DiGIR provider.
- Retrieved on (date accessed).
- http://example.net:80/provider/DiGIR.php
- </citation>
- <useRestrictions>Users may not distribute, modify,
- transmit, reuse, repost, transfer, or use any
- content or information from this service for
- commercial purposes without prior written
- permission.
- </useRestrictions>
- <conceptualSchema schemaLocation=
- "http://example.net/schema/darwin2.xsd">
- http://digir.net/schema/conceptual/darwin/2003/1.0
- </conceptualSchema>
- <recordIdentifier>HDS</recordIdentifier>
- <recordBasis>specimen</recordBasis>
- <numberOfRecords>7887</numberOfRecords>
- <dateLastUpdated>
- 2004-05-06T13:56:00-0500
- </dateLastUpdated>
- <minQueryTermLength>3</minQueryTermLength>
- <maxSearchResponseRecords>
- 100
- </maxSearchResponseRecords>
- <maxInventoryResponseRecords>
- 1000
- </maxInventoryResponseRecords>
- </resource>
- </provider>
- </metadata>
- </content>
- <diagnostics>
- <diagnostic code="STATUS_INTERVAL" severity="info">
- 3600
- </diagnostic>
- <diagnostic code="STATUS_DATA" severity="info">
- 1,0,0
- </diagnostic>
- </diagnostics>
-</response>
-```
-
---blankline--
-//DiGIR inventory operation//
-
-Inventory requests ask for distinct values of a specific concept. An optional filter can be used, and a concept from one of the mapped conceptual schemas must be specified. It accepts an additional element that can be used to turn on counting of the total number of distinct values.
-
-Sample DiGIR inventory request:
-
-```
-<?xml version="1.0" encoding="UTF-8"?>
-<request xmlns="http://digir.net/schema/protocol/2003/1.0"
- xmlns:darwin=
- "http://digir.net/schema/conceptual/darwin/2003/1.0">
- <header>
- <version>1.0.0</version>
- <sendTime>2004-07-20T09:15:30-0500</sendTime>
- <source>127.0.0.1</source>
- <destination resource="HDS">
- http://example.net/provider/DiGIR.php
- </destination>
- <type>inventory</type>
- </header>
- <inventory>
- <filter>
- <equals>
- <darwin:Collector>Thomas, J.</darwin:Collector>
- </equals>
- </filter>
- <darwin:Genus/>
- <count>true</count>
- </inventory>
-</request>
-```
-
-Inventory responses include record elements enclosing each distinct value from the concept. A count of the number of occurrences of each distinct value is provided, and a total count for the number of distinct values is returned in the diagnostics section if this was requested. According to the protocol schema, paging is not possible in inventory requests, although some provider implementations do offer this feature.
-
-Sample DiGIR inventory response:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response xmlns="http://digir.net/schema/protocol/2003/1.0">
- <header>
- <version>$Revision$</version>
- <sendTime>2004-07-20T09:15:32-0500</sendTime>
- <source resource="HDS">
- http://example.net:80/provider/DiGIR.php
- </source>
- <destination>127.0.0.1</destination>
- <type>inventory</type>
- </header>
- <content xmlns:darwin=
- "http://digir.net/schema/conceptual/darwin/2003/1.0"
- xmlns:xsd="http://www.w3.org/2001/XMLSchema"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <record>
- <darwin:Genus count="4">Agave</darwin:Genus>
- </record>
- <record>
- <darwin:Genus count="8">Passiflora</darwin:Genus>
- </record>
- <record>
- <darwin:Genus count="26">Rubus</darwin:Genus>
- </record>
- <record>
- <darwin:Genus count="3">Solanum</darwin:Genus>
- </record>
- </content>
- <diagnostics>
- <diagnostic code="STATUS_INTERVAL" severity="info">
- 3600
- </diagnostic>
- <diagnostic code="STATUS_DATA" severity="info">
- 1,0,1
- </diagnostic>
- <diagnostic code="MATCH_COUNT" severity="info">
- 4
- </diagnostic>
- <diagnostic code="RECORD_COUNT" severity="info">
- 4
- </diagnostic>
- <diagnostic code="END_OF_RECORDS" severity="info">
- true
- </diagnostic>
- </diagnostics>
-</response>
-```
-
---blankline--
-//DiGIR search operation//
-
-Search requests have a mandatory <filter> element and an optional--breakline-- <records> element to specify what should be returned and how records should be structured in results. The record structure is optional because some requests may simply ask for the total number of records that match a filter condition. But when they are needed, the whole structure can be directly specified, as in the next example, or it can be referenced through an optional "schemaLocation" attribute in the <structure> element. Search requests also accept an additional element that can be used to turn on counting of the total number of matched records. Paging is possible by using the attributes "limit" and "start".
-
-Sample DiGIR search request:
-
-```
-<?xml version="1.0" encoding="UTF-8"?>
-<request
- xmlns="http://digir.net/schema/protocol/2003/1.0"
- xmlns:xsd="http://www.w3.org/2001/XMLSchema"
- xmlns:darwin=
- "http://digir.net/schema/conceptual/darwin/2003/1.0"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <header>
- <version>1.0.0</version>
- <sendTime>2004-07-20T09:56:00-0500</sendTime>
- <source>127.0.0.1</source>
- <destination resource="HDS">
- http://example.net/provider/DiGIR.php
- </destination>
- <type>search</type>
- </header>
- <search>
- <filter>
- <and>
- <equals>
- <darwin:Collector>Thomas, J.</darwin:Collector>
- </equals>
- <equals>
- <darwin:Genus>Rubus</darwin:Genus>
- </equals>
- </and>
- </filter>
- <records limit="3" start="0">
- <structure>
- <xsd:element name="record"
- minOccurs="0"
- maxOccurs="unbounded">
- <xsd:complexType>
- <xsd:sequence>
- <xsd:element ref="darwin:InstitutionCode"/>
- <xsd:element ref="darwin:CollectionCode"/>
- <xsd:element ref="darwin:CatalogNumber"/>
- <xsd:element ref="darwin:ScientificName"/>
- <xsd:element ref="darwin:Latitude"/>
- <xsd:element ref="darwin:Longitude"/>
- </xsd:sequence>
- </xsd:complexType>
- </xsd:element>
- </structure>
- </records>
- <count>true</count>
- </search>
-</request>
-```
-
-Search responses include the requested records that have matched the query. They are formatted according to the given record structure.
-
-Sample DiGIR search response:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response xmlns="http://digir.net/schema/protocol/2003/1.0">
- <header>
- <version>$Revision$</version>
- <sendTime>2004-07-20T09:56:03-0500</sendTime>
- <source resource="HDS">
- http://example.net:80/provider/DiGIR.php
- </source>
- <destination>127.0.0.1</destination>
- <type>search</type>
- </header>
- <content xmlns:darwin=
- "http://digir.net/schema/conceptual/darwin/2003/1.0"
- xmlns:xsd="http://www.w3.org/2001/XMLSchema"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <record>
- <darwin:InstitutionCode>INS</darwin:InstitutionCode>
- <darwin:CollectionCode>COL</darwin:CollectionCode>
- <darwin:CatalogNumber>27694</darwin:CatalogNumber>
- <darwin:ScientificName>
- Rubus brasiliensis
- </darwin:ScientificName>
- <darwin:Latitude>-23.6625</darwin:Latitude>
- <darwin:Longitude>-46.7738</darwin:Longitude>
- </record>
- <record>
- <darwin:InstitutionCode>INS</darwin:InstitutionCode>
- <darwin:CollectionCode>COL</darwin:CollectionCode>
- <darwin:CatalogNumber>27907</darwin:CatalogNumber>
- <darwin:ScientificName>
- Rubus brasiliensis
- </darwin:ScientificName>
- <darwin:Latitude>-23.412</darwin:Latitude>
- <darwin:Longitude>-46.7899</darwin:Longitude>
- </record>
- <record>
- <darwin:InstitutionCode>INS</darwin:InstitutionCode>
- <darwin:CollectionCode>COL</darwin:CollectionCode>
- <darwin:CatalogNumber>27908</darwin:CatalogNumber>
- <darwin:ScientificName>
- Rubus urticaefolius
- </darwin:ScientificName>
- <darwin:Latitude>-23.412</darwin:Latitude>
- <darwin:Longitude>-46.7899</darwin:Longitude>
- </record>
- </content>
- <diagnostics>
- <diagnostic code="STATUS_INTERVAL" severity="info">
- 3600
- </diagnostic>
- <diagnostic code="STATUS_DATA" severity="info">
- 1,1,1
- </diagnostic>
- <diagnostic code="MATCH_COUNT" severity="info">
- 26
- </diagnostic>
- <diagnostic code="RECORD_COUNT" severity="info">
- 3
- </diagnostic>
- <diagnostic code="END_OF_RECORDS" severity="info">
- false
- </diagnostic>
- </diagnostics>
-</response>
-```
-
-When there are NULL values or unmapped concepts that are present in record structures, the corresponding elements in responses have the "xsi:nil" attribute set to "true".
-
-**DiGIR filter encoding**
-
-The DiGIR protocol has its own generic query mechanism defined through the filter element, and it does not assume anything either about how data is locally stored (Relational database, XML files, Object database, etc) or about its particular query language (SQL, XPath, etc). The task of translating between a DiGIR query and a local database query language is completely delegated to the DiGIR Provider implementation.
-
-DiGIR filters can directly contain a single comparative expression like:
-
-```
-<filter>
- <equals>
- <darwin:ScientificName>
- Rubus brasiliensis
- </darwin:ScientificName>
- </equals>
-</filter>
-```
-
-Comparison operators include <equals>, <notEquals>, <lessThan>,--breakline--<lessThanOrEquals>, <greaterThan>, <greaterThanOrEquals>, and--breakline-- <like>. All of them are used to represent simple comparisons between a concept and a single value.
-
-Comparisons with NULL values can be performed using the "xsi:nil" attribute, which is already part of the XML Schema definition language:
-
-```
-<filter>
- <equals>
- <darwin:Latitude xsi:nil="true"/>
- </equals>
-</filter>
-```
-
-Another special comparison operator <in> can be used to represent comparisons between a single concept and a list of values:
-
-```
-<filter>
- <in>
- <list>
- <darwin:Genus>Physalis</darwin:Genus>
- <darwin:Genus>Rubus</darwin:Genus>
- <darwin:Genus>Byrsonima</darwin:Genus>
- </list>
- </in>
-</filter>
-```
-
-Comparison expressions can be combined through logical operators. There are four logical operators available: <and>, <andNot>, <or>, <orNot>. All of them are defined as binary operators, therefore it is necessary to nest logical operations if they involve more than two comparisons:
-
-```
-<filter>
- <and>
- <equals>
- <darwin:ScientificName>
- Rubus brasiliensis
- </darwin:ScientificName>
- </equals>
- <or>
- <greaterThan>
- <darwin:YearIdentified>1980</darwin:YearIdentified>
- </greaterThan>
- <like>
- <darwin:IdentifiedBy>Koch%</darwin:IdentifiedBy>
- </like>
- </or>
- </and>
-</filter>
-```
-
-It is interesting to mention that concepts from different conceptual schemas can also be combined in the same filter. The namespace prefix in concept element names is always used to identify their schema.
-
-**DiGIR record structure**
-
-DiGIR search requests can also specify which content elements should be returned and how they should be structured. This is done through a record structure specification. The DiGIR record structure uses a simplified subset of XML Schema definitions, where content elements are referenced by the XML "ref" attribute:
-
-```
-<structure>
- <xsd:element name="record"
- minOccurs="0"
- maxOccurs="unbounded">
- <xsd:complexType>
- <xsd:sequence>
- <xsd:element ref="darwin:InstitutionCode"/>
- <xsd:element ref="darwin:CollectionCode"/>
- <xsd:element ref="darwin:CatalogNumber"/>
- <xsd:element ref="darwin:ScientificName"/>
- <xsd:element ref="darwin:Latitude"/>
- <xsd:element ref="darwin:Longitude"/>
- </xsd:sequence>
- </xsd:complexType>
- </xsd:element>
-</structure>
-```
-
-The syntax of record structures is not specified by the DiGIR protocol schema. The <structure> element is actually defined there as an XML complex type which accepts any content, although the definition of a record element is mandatory.
-
-The <record> element is the basis for looping, paging, and limiting records. With a relational database, the usual approach is to have an underlying "root table" defined during resource configuration whose records are bound to the <record> element.
-
-Since content elements are being referenced, their cardinality should be taken from the corresponding element definitions in the conceptual schema. Another possibility using record structures is to request further grouping of specific elements, as shown in the next example.
-
---newpage--
-```
-<structure>
- <xsd:element name="record"
- minOccurs="0"
- maxOccurs="unbounded">
- <xsd:complexType>
- <xsd:sequence>
- <xsd:element ref="darwin:InstitutionCode"/>
- <xsd:element ref="darwin:CollectionCode"/>
- <xsd:element ref="darwin:CatalogNumber"/>
- <xsd:element ref="darwin:ScientificName"/>
- <xsd:element name="coordinates">
- <xsd:complexType>
- <xsd:sequence>
- <xsd:element ref="darwin:Latitude"/>
- <xsd:element ref="darwin:Longitude"/>
- </xsd:sequence>
- </xsd:complexType>
- </xsd:element>
- </xsd:sequence>
- </xsd:complexType>
- </xsd:element>
-</structure>
-```
-
-And finally, the same structures could be remotely referenced by using the "schemaLocation" attribute.
-
-```
-<structure schemaLocation=
- "http://example.net/searchStructure.xml"/>
-```
-
-One important point regarding the way record structures are defined and used is that results will always be either custom grouping elements or instances of concept elements. It is not possible to rename elements (elements using the XML "ref" attribute cannot override names of the referenced elements[[http://www.w3.org/TR/xmlschema-1/#section-Constraints-on-XML-Repr…) and it is not possible to return values as XML attributes (attribute declarations cannot reference elements, they can only reference global attribute declarations[[http://www.w3.org/TR/xmlschema-1/#Attribute_Declaration_detai…). So, although the record structure provides some flexibility about how data should be returned, it is not flexible enough to produce a result which could be validated against external XML Schema representing content elements which are independent from the protocol.
-
-Concerning the use of modularized conceptual schemas, elements from a record structure can perfectly reference concepts from different schemas inside the same structure. Conceptual schemas are always associated with the namespace prefix inside the "ref" attribute.
-
-**DiGIR conceptual binding**
-
-As shown in previous examples, content elements from conceptual schemas are either directly used in messages (as it happens in filter comparison expressions and in inventory requests) or referenced (as it happens in record structure specifications).
-
-It was a protocol design decision that each content element from a conceptual schema should be derived from one of three abstract elements named <searchableData>, <returnableData>, or <searchableReturnableData>,--breakline-- which are defined in the main DiGIR protocol schema. Therefore, a concept could be searchable but not returnable, as it happens with the "BoundingBox" element from DarwinCore. There could also be returnable elements which are not searchable (maybe in the case of binary data types).
-
-This approach ensures that a filter expression can only use searchable concepts, and that an inventory operation can only request values from returnable concepts. Record structures should in theory have a similar restriction, in the sense that content elements should only point to returnable concepts, but as mentioned before, this is not being validated by the protocol.
-
-The original approach also considered the possibility of adding another inheritance layer to classify concepts according to their data types, but that would bring an additional complexity since XML Schema doesn't allow for multiple inheritances. In that case, concepts would be derived from "alphaSearchableData", or "numericSearchableData", and so on. And that would ensure that string operations would only be performed against concepts that are searchable strings. But this has been discarded.
-
-The ways concepts are used in the protocol bring two consequences for conceptual schemas. The first one is that all concepts should be definitely bound to the protocol since they need to be classified as searchable and/or returnable. Binding is done through the XML Schema "substitutionGroup" technique, and therefore it is not possible to use a completely independent conceptual schema.
-
-Another consequence is that the "substitutionGroup" technique requires that both the "head" element (abstract concept types in this case) and the member elements (the "real" concept elements) must always be declared in the global scope of the schema[[http://www.w3.org/TR/xmlschema-1/#Element_Equivalence_Class]].
-
-And due to the way DiGIR treats unmapped concepts and NULL values in results, all non-mandatory elements from a conceptual schema need to be declared as "nillable".
-
-So, a DiGIR-compatible conceptual schema is always a flat list of elements which are derived from one of the three abstract data elements defined by the protocol. And although this approach brings some restrictions, it also enables additional validation regarding the use of concepts in the protocol.
-
-**DiGIR operation calls**
-
-Unfortunately there is no official specification about how messages should be exchanged using DiGIR. As a possible reference, the original provider implementation accepts requests in several ways:
-
-- As a single GET or POST parameter called "request" or "doc" containing either the DiGIR XML request document or a URL pointing to a request document.
-
-- A single GET or POST default parameter containing a URL pointing to a request document.
-
-- A set of GET or POST parameters containing parts of a request document: "filter", "resource", "startrec", "maxrecs", "clientkey", "recordstruct", "sortstruct", and "countrecs".
-
-
-=== Existing DiGIR software ===
-
-At this moment, there are already several software supporting the DiGIR protocol. The implementations that have been developed as part of the original project are the most widely used. They are all freely available from the sourceforge site under a project called "digir", which includes:
-
-- DiGIR PHP provider: Component responsible for connecting local data sources to DiGIR networks. It has been implemented in PHP and works in conjunction with a web server. Although considered stable and fully functional, it does not support requests destinated to more than one resource at the same time. Local data sources are restricted to Relational Databases. There are drivers to most vendors. Besides the files available from the sourceforge site, packages including web server and UDDI registration capabilities can be obtained from the GBIF site[[http://www.gbif.org/]] for a number of different platforms. The PHP provider is also being distributed as part of the Specify[[http://www.specifysoftware.org/]] software.
-
-- DiGIR portal engine: Component responsible for encapsulating a DiGIR network of distributed data providers. It receives requests from user interfaces or search agents, distributes them to the corresponding data providers, and integrates the responses into a single document before sending back to the requestor. It was implemented in java and runs with Tomcat. Provider access points are either manually configured or taken from UDDI registries.
-
-- DiGIR presentation layer: Component responsible for offering a user interface to query a DiGIR network. It communicates with a portal engine. It was also implemented in java and runs in another Tomcat instance.
-
-
-Besides the original provider software, there are two other known provider implementations:
-
-- GBIF data repository tool[[http://circa.gbif.net/Public/irc/gbif/ict/library?l=/download_gbif_to…: A package including Zope with a Python implementation of a DiGIR provider, and a MySQL database with pre-configured tables.
-
-- Biota[[http://viceroy.eeb.uconn.edu/biota]] provider: A functional java implementation of a provider which is currently being tested. It should be part of the next versions of the Biota distribution.
-
-
-And besides the original portal engine, there is another portal implementation using PHP, known as the PHP DiGIR Portal[[http://digirportal.berkeley.edu/]]. It includes a user interface and it communicates with several providers using the socket-based script available from the original PHP provider implementation.
-
-And finally, there are some additional libraries which are also related to DiGIR:
-
-- GBIF portal library[[http://sourceforge.net/projects/gbif]]: The GBIF data portal uses java libraries to directly communicate with providers.
-
-- perl client library: The speciesLink search interface[[http://splink.cria.org.br/simple_search]] uses a perl library to communicate with a portal engine. Only metadata and search messages are handled.
-
-- Globus DiGIR wrapper: The SEEK project[[http://seek.ecoinformatics.org]] has also developed a web service on top of the Globus toolkit[[http://www.globus.org/toolkit/]] to communicate with a portal engine.
-
-
-== BioCASe protocol ==
-
-=== Brief history ===
-
-The BioCASe protocol[[http://www.biocase.org/dev/protocol]] was developed by the Biodiversity Informatics department of the Botanical Garden Botanical Museum Berlin[[http://www.bgbm.org/BioDivInf]] for the BioCASE project[[http://www.biocase.org]] in 2003 with similar goals as the DiGIR project. But instead of using Darwin Core as the default conceptual schema BioCASE aimed to implement a distributed and heterogenous network using ABCD[[http://www.bgbm.org/TDWG/CODATA/Schema/]] which is a rather complex and hierarchically structured xml standard for exchanging biological specimen or observation data. Initially it was intended to use the DiGIR protocol, but very soon it was discovered that DiGIR was not suitable for these needs, as its conceptual binding using XML substitution groups required for the conceptual schema:
-
-- The insertion of protocol specific parts into the conceptual schema
-
-- A flat list of global element declarations
-
-
-The only possible solution would have been to adopt a modified ABCD schema which contained a list of all its hierarchical elements as global elements. At the TDWG meeting in Indaiatuba 2002 such a list with a couple of thousand elements was presented and was not considered acceptable. From the BioCASE point of view, the protocol should not influence conceptual schema design in any major way.
-
-Based on the DiGIR protocol a new protocol with a different conceptual binding was created using simple XPaths[[http://www.w3.org/TR/xpath]] to refer to elements of the conceptual schema. This allowed for using any hierarchical schema, but has the disadvantage of losing strong XML based validation all the way back into the conceptual schema. Additionally some minor changes were made to the protocol with a new capabilities request type being the most prominent one. All software was made publicly available through the Mozilla public license.
-
-=== Description ===
-
-With the DiGIR protocol as its basis, BioCASe[[http://www.bgbm.org/biodivinf/Schema/protocol_1_3.xsd]] is also an XML-based protocol over HTTP to retrieve data from independent and heterogeneous databases. To achieve a uniform virtual view from network participants, each BioCASe data source needs to choose one or more conceptual schemas (or data exchange standards) and then map some fields from a local database against (some of) the elements of those schemas.
-
-Different conceptual schemas can be defined by each network, and therefore the protocol can be used by any community without regard to discipline. Conceptual schemas don't need to contain any protocol specific add-ons and apart from recursive structures any XML Schema structure including choices and repeating elements are supported. Each xml element or attribute is regarded as a concept that can be referred to via an XPath-based conceptual binding. Two examples of such schemas in current use are the Access to Biological Collection Data standard[[http://www.bgbm.org/TDWG/CODATA/Schema/ABCD-1.20.xsd]] and the BioCASE metaprofile for describing collections[[http://www.bgbm.org/biodivinf/Schema/BioCASE-MetaProfile-123.x….
-
-The BioCASE network consists of a central registry, a message broker service called the Unitloader[[http://www.biocase.org/dev/unitloader]] and providers running the BioCASE provider software[[http://www.biocase.org/dev/provider/]].
-
-The unitloader software is a java class generating protocol documents and distributing them via threads to the list of "datasource" services requested. It is not directly connected to a registry but needs a list of desired providers to be queried.
-
-A provider is regarded as a host which runs the BioCASE provider software. This is a collection of software tools for querying local datasources, for graphically configuring datasources[[http://www.biocase.org/dev/configtool]] and of course a number of wrappers that act as datasource services. Each datasource is restricted to a single database but may be configured for any number of accepted conceptual schemas. This way it is possible to work with ABCD and DarwinCore for example. Contrary to DiGIR there is no provider service giving access to all the datasources, but instead each individual datasource has its own access point URL.
-
-=== Technical details ===
-
-**General message format**
-
-A BioCASe message can be a request or a response - both made up of 3 major parts.
-
-The "header" section gives some information about the origin and destination of the message like a list of all services that have been involved in the chain of the message flow together with a timestamp. The software versions involved in creating the message are also specified.
-
-The "content" section only exists for responses and search or scan requests. It carries the requested response data in the format specified by the conceptual schema or specifies the request to be carried out by a datasource.
-It also contains attributes that hold status information for paging and record counts such as the overall matched records in the database and how many records were returned or "dropped". This is the number of records that matched the query but did not validate against the conceptual schema because of missing or erroneous mandatory data.
-
-The "diagnostics" section is used for debugging information and to provide warnings or error messages.
-
-**Request and response types**
-
-Similar to DiGIR there are 3 different message types supported.
-But instead of a metadata operation there is a slightly different capabilities operation.
-The DiGIR inventory type is called "scan" (as it was called in its original proposal at the time the BioCASe protocol was established).
-
---blankline--
-//BioCASe capabilities operation//
-
-A capabilities request allows a client to get information about which concepts are mapped (defined) in a provider database. This request type returns a list of xpaths identifying all mapped concepts. There are no parameters involved in a capabilities request and this is also the default response for the provider software if no operation was specified.
-
---newpage--
-Sample BioCASe capabilities request:
-
-```
-<?xml version="1.0" encoding="UTF-8" ?>
-<request xmlns="http://www.biocase.org/schemas/protocol/1.3">
- <header>
- <version software="unitloader">0.98</version>
- <sendTime>2003-09-25T17:02:45+02:00</sendTime>
- <source>198.14.7.54</source>
- <source>192.168.1.153</source>
- <type>capabilities</type>
- </header>
-</request>
-```
-
-Sample BioCASe capabilities response:
-
-```
-<?xml version='1.0' encoding='UTF-8'?>
-<response xmlns='http://www.biocase.org/schemas/protocol/1.3'>
- <header>
- <version software='Python Interpreter'>
- 2.3 (#46, Jul 29 2003, 18:54:32)
- [MSC v.1200 32 bit (Intel)]
- </version>
- <version software='Wrapper'>1.4.0 alpha</version>
- <version software='OS'>nt</version>
- <sendTime>2003-09-25T17:02:46+02:00</sendTime>
- <source>192.168.1.12</source>
- <destination>192.168.1.153</destination>
- <destination>198.14.7.54</destination>
- <type>capabilities</type>
- </header>
- <content>
- <capabilities>
- <SupportedSchemas request='true'
- namespace='http://www.tdwg.org/schemas/abcd/1.2'
- response='true'>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/DateSupplied
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Description
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Rights/CopyrighDeclaration
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Rights/IPRDeclaration
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Rights/LegalOwner/URLs/URL
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Rights/RightsURL
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Rights/SpecificRestrictions
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Rights/TermsOfUse
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Statements/Acknowledgement
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Statements/Disclaimer
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Statements/LogoURL
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Statements/StatementURL
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Supplier/Addresses/Address
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Supplier/Person/PersonName
- </Concept>
- <Concept>
- /DataSets/DataSet/DatasetDerivations/
-DatasetDerivation/Supplier/URLs/URL
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceExpiryDate
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceInstitutionCode
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceLastUpdatedDate
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceName
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceNumberOfRecords
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceVersion
- </Concept>
- <Concept>
- /DataSets/DataSet/OriginalSource/
-SourceWebAddress
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/
-CollectorsFieldNumber
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/
-GatheringAgents/GatheringAgent/Person/PersonName
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/
-GatheringDateTime/DateText
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/
-GatheringDateTime/ISODateTimeBegin
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Altitude/MeasurementAtomized/MeasurementLowerValue
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Altitude/MeasurementAtomized/MeasurementScale
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Aspect/AspectText
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-BiotopeData/BiotopeText
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Country/CountryName
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Country/ISO2Letter
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-LocalityText
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-NamedAreas/NamedArea/NamedAreaName
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Slope/MeasurementAtomized/MeasurementLowerValue
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/GatheringSite/
-Slope/MeasurementAtomized/MeasurementScale
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Gathering/
-Project/ProjectTitle
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/Identifier/IdentifierPersonName
- /PersonName
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/AuthorString
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/HigherTaxa/HigherTaxon
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/NameAuthorYearString
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/ScientificNameAtomized/
-Botanical/FirstEpithet
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-/Identification/TaxonIdentified/ScientificNameAtomized/
-Botanical/Genus
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/ScientificNameAtomized/
-Botanical/Rank</Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/ScientificNameAtomized/
-Botanical/SecondEpithet
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/UnitID
- </Concept>
- <Concept>
- /DataSets/DataSet/Units/Unit/UnitStateDomain/
-SpecimenUnit/UnitPreparation/PreparationType
- </Concept>
- </SupportedSchemas>
- <SupportedSchemas request='true'
- namespace='http://www.namespacetbd.org/darwin2'
- response='true'>
- <Concept>/RecordSet/Record/CatalogNumber</Concept>
- <Concept>/RecordSet/Record/CollectionCode</Concept>
- <Concept>/RecordSet/Record/Collector</Concept>
- <Concept>/RecordSet/Record/Country</Concept>
- <Concept>/RecordSet/Record/DateLastModified</Concept>
- <Concept>/RecordSet/Record/Family</Concept>
- <Concept>/RecordSet/Record/FieldNumber</Concept>
- <Concept>/RecordSet/Record/Genus</Concept>
- <Concept>/RecordSet/Record/IdentifiedBy</Concept>
- <Concept>/RecordSet/Record/InstitutionCode</Concept>
- <Concept>/RecordSet/Record/ScientificName</Concept>
- <Concept>/RecordSet/Record/Subspecies</Concept>
- </SupportedSchemas>
- </capabilities>
- </content>
- <diagnostics>
- <diagnostic>OK</diagnostic>
- </diagnostics>
-</response>
-```
-
-The previous example shows the <source> element occurring twice in the header, with the original source being the first source element of the sequence. The response lists the mapped concepts for 2 configured conceptual schemas, DarwinCore and ABCD. The schemas are identified by their namespace only.
-
---blankline--
-//BioCASe scan operation//
-
-A scan request concentrates on one concept referenced by an xpath to the element of the respective conceptual schema. It is essentially a select distinct in SQL and returns all unique values for this concept. In contrast to the DiGIR inventory it is not possible to specify a filter for a scan.
-
-Here is an abbreviated scan request example asking for distinct values of botanical genera:
-
-```
-<?xml version="1.0" encoding="UTF-8"?>
-<request xmlns="http://www.biocase.org/schemas/protocol/1.3">
- <header>
- <sendTime>2003-09-25T17:02:45+02:00</sendTime>
- <source>198.14.7.54</source>
- <type>scan</type>
- </header>
- <scan>
- <requestFormat>
- http://www.tdwg.org/schemas/abcd/1.2
- </requestFormat>
- <concept>
- /DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/ScientificNameAtomized/
-Botanical/Genus
- </concept>
- </scan>
-</request>
-```
-
-Scan responses deliver each sorted distinct value in a <value> element and provide the total number of values in the content attribute "recordCount":
-
-```
-<?xml version='1.0' encoding='UTF-8'?>
-<response
- xmlns='http://www.biocase.org/schemas/protocol/1.3'
- xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
- xsi:schemaLocation=
- 'http://www.biocase.org/schemas/protocol/1.3
- http://www.bgbm.org/biodivinf/schema/protocol_1_3.xsd'>
- <header>
- <version software='Python Interpreter'>
- 2.3 (#46, Jul 29 2003, 18:54:32)
- [MSC v.1200 32 bit (Intel)]
- </version>
- <version software='PyWrapper'>0.98a</version>
- <version software='DB module'>
- MS SQL Server module v0.91, using mxODBC 2.0.1
- </version>
- <version software='OS'>nt</version>
- <sendTime>2003-09-25T17:02:46+02:00</sendTime>
- <source>192.168.1.12</source>
- <destination>198.14.7.54</destination>
- <type>scan</type>
- </header>
- <content recordDropped='0'
- recordStart='0'
- recordCount='188'>
- <scan>
- <value>Acantholimon</value>
- <value>Achillea</value>
- <value>Aethionema</value>
- <value>Ajuga</value>
- <value>Alchemilla</value>
- <value>Alkanna</value>
- <value>Allium</value>
- <value>Alopecurus</value>
- <value>Alyssum</value>
- ...
- <value>Viola</value>
- <value>Xeranthemum</value>
- <value>Ziziphora</value>
- </scan>
- </content>
- <diagnostics>
- <diagnostic>OK</diagnostic>
- </diagnostics>
-</response>
-```
-
---blankline--
-//BioCASe search operation//
-
-The BioCASe protocol requires a filter to be specified in a search request as well as the conceptual schema to be used for responding ("responseFormat"). It is possible to use concepts taken from a different schema than the response schema for creating the query filter which is specified by the "requestFormat" namespace.
-To only count matching records while not returning any other data the optional element "count" should be set to true. The search is stateful. Paging is supported by using the "start" and "limit" attributes which default to 0.
-
-Sample BioCASe search request:
-
-```
-<?xml version="1.0" encoding="UTF-8"?>
-<request xmlns="http://www.biocase.org/schemas/protocol/1.3">
- <header>
- <version software="unitloader">0.98</version>
- <sendTime>2003-09-25T17:02:45+02:00</sendTime>
- <source>198.14.7.54</source>
- <source>192.168.1.153</source>
- <type>search</type>
- </header>
- <search>
- <requestFormat>
- http://www.tdwg.org/schemas/abcd/1.2
- </requestFormat>
- <responseFormat start="0" limit="2">
- http://www.tdwg.org/schemas/abcd/1.2
- </responseFormat>
- <filter>
- <like path= "/DataSets/DataSet/Units/Unit/
-/Identifications/Identification/TaxonIdentified/
-NameAuthorYearString">
- Ast*
- </like>
- </filter>
- <count>false</count>
- </search>
-</request>
-```
-
-The response using the ABCD schema is as follows:
-
-```
-<?xml version='1.0' encoding='latin1'?>
-<response xmlns=
- 'http://www.biocase.org/schemas/protocol/1.3'>
- <header>
- <version software='Python Interpreter'>
- 2.3 (#46, Jul 29 2003, 18:54:32)
- [MSC v.1200 32 bit (Intel)]
- </version>
- <version software='Wrapper'>1.4.0 alpha</version>
- <version software='DB module'>
- MS SQL Server module v0.91, using mxODBC 2.0.1
- </version>
- <version software='OS'>nt</version>
- <sendTime>2003-09-25T17:02:47+02:00</sendTime>
- <source>192.168.1.12</source>
- <destination>192.168.1.153</destination>
- <destination>198.14.7.54</destination>
- <type>search</type>
- </header>
- <content recordDropped='0'
- recordCount='2'
- recordStart='0'
- totalSearchHits='122'>
- <DataSets xmlns='http://www.tdwg.org/schemas/abcd/1.2'>
- <DataSet>
- <OriginalSource>
- <SourceInstitutionCode>B</SourceInstitutionCode>
- <SourceName>PonTaurus DB</SourceName>
- <SourceLastUpdatedDate>
- 2001-03-01
- </SourceLastUpdatedDate>
- <SourceNumberOfRecords>3068</SourceNumberOfRecords>
- </OriginalSource>
- <DatasetDerivations>
- <DatasetDerivation>
- <DateSupplied>2003-08-11</DateSupplied>
- <Supplier>
- <Organisation>
- <OrganisationName>
- Botanic Garden and
- Botanical Museum Berlin-Dahlem
- </OrganisationName>
- </Organisation>
- <Addresses>
- <Address>
- K�nigin Luise Str. 6-8, D-14191
- Berlin, Germany
- </Address>
- </Addresses>
- <EmailAddresses>
- <EmailAddress>
- m.doering(a)bgbm.org
- </EmailAddress>
- </EmailAddresses>
- <URLs>
- <URL>http://www.bgbm.org</URL>
- </URLs>
- </Supplier>
- <Rights>
- <LegalOwner>
- <Person>
- <PersonName>Markus D�ring</PersonName>
- </Person>
- <Addresses>
- <Address>
- K�nigin Luise Str. 6-8, D-14191
- Berlin, Germany
- </Address>
- </Addresses>
- <EmailAddresses>
- <EmailAddress>
- m.doering(a)bgbm.org
- </EmailAddress>
- </EmailAddresses>
- </LegalOwner>
- </Rights>
- </DatasetDerivation>
- </DatasetDerivations>
- <Units>
- <Unit>
- <UnitID>1008_1</UnitID>
- <Identifications>
- <Identification>
- <TaxonIdentified>
- <HigherTaxa>
- <HigherTaxon>Fabaceae</HigherTaxon>
- </HigherTaxa>
- <NameAuthorYearString>
- Astragalus haussknechtii Bunge
- </NameAuthorYearString>
- <AuthorString>Bunge</AuthorString>
- <ScientificNameAtomized>
- <Botanical>
- <Genus>Astragalus</Genus>
- <FirstEpithet>
- haussknechtii
- </FirstEpithet>
- </Botanical>
- </ScientificNameAtomized>
- </TaxonIdentified>
- <Identifier>
- <IdentifierPersonName>
- <PersonName>Markus D�ring</PersonName>
- </IdentifierPersonName>
- </Identifier>
- </Identification>
- </Identifications>
- <UnitStateDomain>
- <SpecimenUnit>
- <UnitPreparation>
- <PreparationType>
- dried and pressed
- </PreparationType>
- </UnitPreparation>
- </SpecimenUnit>
- </UnitStateDomain>
- <Gathering>
- <GatheringDateTime>
- <DateText>30-07-1999</DateText>
- <ISODateTimeBegin>
- 1999-07-30
- </ISODateTimeBegin>
- </GatheringDateTime>
- <GatheringAgents>
- <GatheringAgent>
- <Person>
- <PersonName>Markus D�ring</PersonName>
- </Person>
- </GatheringAgent>
- </GatheringAgents>
- <Project>
- <ProjectTitle>PonTaurus 1999</ProjectTitle>
- </Project>
- <GatheringSite>
- <LocalityText>
- upper Arpalik (Maden) Deresi, slopes along
- the road from Darbogaz to tarn Karag�l.
- N37�27�47�� O34�36�82��
- </LocalityText>
- <Country>
- <CountryName>Turkey</CountryName>
- <ISO2Letter>TR</ISO2Letter>
- </Country>
- <NamedAreas>
- <NamedArea>
- <NamedAreaName>Nigde</NamedAreaName>
- </NamedArea>
- </NamedAreas>
- <Altitude>
- <MeasurementAtomized>
- <MeasurementScale>Meter</MeasurementScale>
- <MeasurementLowerValue>
- 2080
- </MeasurementLowerValue>
- </MeasurementAtomized>
- </Altitude>
- <BiotopeData>
- <BiotopeText>
- Grazed swards and open
- thorn-cushion communities.
- </BiotopeText>
- </BiotopeData>
- <Aspect>
- <AspectText>N</AspectText>
- </Aspect>
- <Slope>
- <MeasurementAtomized>
- <MeasurementScale>
- decimal degree
- </MeasurementScale>
- <MeasurementLowerValue>
- 27
- </MeasurementLowerValue>
- </MeasurementAtomized>
- </Slope>
- </GatheringSite>
- </Gathering>
- <CollectorsFieldNumber>
- 1008
- </CollectorsFieldNumber>
- </Unit>
- <Unit>
- <UnitID>1008_2</UnitID>
- <Identifications>
- <Identification>
- <TaxonIdentified>
- <HigherTaxa>
- <HigherTaxon>Fabaceae</HigherTaxon>
- </HigherTaxa>
- <NameAuthorYearString>
- Astragalus haussknechtii Bunge
- </NameAuthorYearString>
- <AuthorString>Bunge</AuthorString>
- <ScientificNameAtomized>
- <Botanical>
- <Genus>Astragalus</Genus>
- <FirstEpithet>
- haussknechtii
- </FirstEpithet>
- </Botanical>
- </ScientificNameAtomized>
- </TaxonIdentified>
- <Identifier>
- <IdentifierPersonName>
- <PersonName>Parolly</PersonName>
- </IdentifierPersonName>
- </Identifier>
- </Identification>
- </Identifications>
- <UnitStateDomain>
- <SpecimenUnit>
- <UnitPreparation>
- <PreparationType>
- dried and pressed
- </PreparationType>
- </UnitPreparation>
- </SpecimenUnit>
- </UnitStateDomain>
- <Gathering>
- <GatheringDateTime>
- <DateText>30-07-1999</DateText>
- <ISODateTimeBegin>
- 1999-07-30
- </ISODateTimeBegin>
- </GatheringDateTime>
- <GatheringAgents>
- <GatheringAgent>
- <Person>
- <PersonName>Markus D�ring</PersonName>
- </Person>
- </GatheringAgent>
- </GatheringAgents>
- <Project>
- <ProjectTitle>PonTaurus 1999</ProjectTitle>
- </Project>
- <GatheringSite>
- <LocalityText>
- upper Arpalik (Maden) Deresi, slopes along
- the road from Darbogaz to tarn Karag�l.
- N37�27�47�� O34�36�82��
- </LocalityText>
- <Country>
- <CountryName>Turkey</CountryName>
- <ISO2Letter>TR</ISO2Letter>
- </Country>
- <NamedAreas>
- <NamedArea>
- <NamedAreaName>Nigde</NamedAreaName>
- </NamedArea>
- </NamedAreas>
- <Altitude>
- <MeasurementAtomized>
- <MeasurementScale>Meter</MeasurementScale>
- <MeasurementLowerValue>
- 2080
- </MeasurementLowerValue>
- </MeasurementAtomized>
- </Altitude>
- <BiotopeData>
- <BiotopeText>
- Grazed swards and open
- thorn-cushion communities.
- </BiotopeText>
- </BiotopeData>
- <Aspect>
- <AspectText>N</AspectText>
- </Aspect>
- <Slope>
- <MeasurementAtomized>
- <MeasurementScale>
- decimal degree
- </MeasurementScale>
- <MeasurementLowerValue>
- 27
- </MeasurementLowerValue>
- </MeasurementAtomized>
- </Slope>
- </GatheringSite>
- </Gathering>
- <CollectorsFieldNumber>
- 1008
- </CollectorsFieldNumber>
- </Unit>
- </Units>
- </DataSet>
- </DataSets>
- </content>
- <diagnostics>
- <diagnostic>OK</diagnostic>
- </diagnostics>
-</response>
-```
-
-**BioCASe filter encoding**
-
-The <filter> wraps the where clause of a SQL statement. It is a nested structure using different combinations of logical and comparison operators specified as XML tags.
-The following operators are supported:
-
-- Binary comparison operators: equals, notEquals, lessThan,--breakline-- lessThanOrEquals, greaterThan, greaterThanOrEquals, like
-
-- Unary comparison operators: isNull, isNotNull
-
-- Comparison operators for multiple arguments: in
-
-- Unary logical operators: not
-
-- Binary logical operators: and, or
-
-
-Here is a more complex filter example illustrating several conditions based on ABCD:
-
-```
-<filter>
- <and>
- <like path="/DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/NameAuthorYearString">
- Abies*
- </like>
- <or>
- <like path="/DataSets/DataSet/Units/Unit/Identifications/
-Identification/TaxonIdentified/HigherTaxa/HigherTaxon">
- Pinace*
- </like>
- <and>
- <like path="/DataSets/DataSet/Units/Unit/Gathering/
-GatheringSite/Country/CountryName">
- *Russia*
- </like>
- <greaterThan path="/DataSets/DataSet/Units/Unit/
-Gathering/GatheringDateTime/ISODateTimeBegin">
- 2002-04
- </greaterThan>
- </and>
- </or>
- </and>
-</filter>
-```
-
-It is not possible to compare 2 concepts with each other.
-The IN operator comes in a special form and takes a list of values for a concept to be compared to:
-
-```
-<filter>
- <in path="/DataSets/DataSet/Units/Unit/
-Identifications/Identification/TaxonIdentified/HigherTaxa/
-HigherTaxon">
- <value>Pinaceae</value>
- <value>Pinophyta</value>
- <value>Pinophytina</value>
- </in>
-</filter>
-```
-
-**BioCASe conceptual binding**
-
-As mentioned before, the conceptual binding (in other words, how concepts from conceptual schemas are referenced from messages) is done by using a simple XPath alike expression. So a protocol message cannot be validated with a simple XML parser but needs a separate program analyzing the message and comparing it to the conceptual schema definitions.
-
-XPaths serve as an identifier for a concept and are not really interpreted as being an XPath. They don't carry any more meaning than serving as an ID for a single concept within a conceptual schema which is identified by its namespace. Therefore only absolute paths using the path separator '/' in combination with the child:: axis are valid. Attributes are denoted by using a final '[@=name]' after the attribute holding element path.
-
-This technique allows working with any non-recursive potentially nested schema.
-
---newpage--
-**BioCASe operation calls**
-
-There is no existing specification about which CGI parameters are to be used for passing messages. It is recommended to use the parameter 'query' though.
-
-=== Existing BioCASe software ===
-
-Software implementing the BioCASe protocol are currently being developed mainly by the BioCASE project[[http://www.biocase.org/dev]] itself:
-
-- PyWrapper[[http://www.biocase.org/dev/wrapper/]] being the core of the provider software parsing requests and assembling responses. It is written entirely in Python.
-
-- Provider software package including the local querytool[[http://www.biocase.org/dev/provider/]]. Changes of the PyWrapper will also result in necessary changes for the configtool[[http://www.biocase.org/dev/configtool]].
-
-- Java based unitloader classes[[http://www.biocase.org/dev/unitloader]] acting as a message broker. They are generating protocol messages via their simple API, sending threaded queries in parallel to several datasources and pooling their responses. They also implement alternative emailing services when required to get complete results with a large timeout.
-
-- The BioCASE collection metadata network, especially the java based metaloader software[[http://www.biocase.org/dev/metaloader]] responsible for harvesting a network of metaprofile data providers.
-
-- The Austrian GBIF node is currently developing another message broker and indexing system for their national network.
-
-
-= Other standards and technologies =
-
-During the integration process, other standards and technologies were investigated. In particular, XQuery for being a powerful and generic query language that could possibly replace both protocols; OGC standards due to the similar problems that are being addressed by some of their geospatial and location based services; and finally SOAP for its increasing popularity as a means to develop web services.
-
-== XQuery ==
-
-XQuery is part of a new generation of query languages and it aims at being far more generic than the existing options (SQL for instance). Statements can be expressed in order to act across different data sources at the same time, including structured and semi-structured documents, relational databases, object repositories, or even web services[[http://www.w3.org/TR/xquery-use-cases/]]. XQuery could be used not only on search and update operations, but also on transformation and re-structuring of data being retrieved.
-
-In a network of distributed and heterogeneous databases, a possible way to use XQuery would be to write statements based on the data structure of a conceptual schema. Since each data provider would have mapped its local database against the same conceptual schema, those statements could be translated according to the local data structure by wrapper software. Due to the richness and complexity of XQuery, it would be too costly to write complete parsers for it. So instead of having such a parser in the wrapper, the wrapper could pass on the translated queries to a local XML server or middleware software capable of dealing with XQuery statements to be processed.
-
-There could be an interface for advanced users to write their own XQuery statements, but also other simplified interfaces which should be able to automatically generate at least the most used and required types of XQuery statements.
-
-However, one of the consequences of XQuery's flexibility is that it remains unclear how data providers could restrict the amount of data to be returned in a single response. This issue would also affect paging capabilities. Both features (limiting the number of records and paging) are present and considered important in the DiGIR protocol.
-
-The current XQuery specification version, although considered stable, is still in the stage of a working draft, and there are only a few implementations of it. Presently, most part of the available XML servers, and XML middleware tools are proprietary software[[http://www.rpbourret.com/xml/ProdsMiddleware.htm]], and not all of them support the complete XQuery specification. Using commercial software would certainly be too expensive for networks with so many data providers. And the costs should not only include licenses, but equipment and training as well.
-
-XQuery parsers have been mostly implemented in native XML databases, some of them are free and open source[[http://www.rpbourret.com/xml/ProdsNative.htm]]. But this would require frequent data exporting for almost all production databases, and performance could also be a problem when querying larger data sets. Moreover, most database administrators are still unfamiliar with XML database products.
-
-On the other hand, only a few commercial relational databases have released prototype XQuery interfaces[[http://www.oreillynet.com/lpt/wlg/4991]], which is certainly not enough to cover the diverse infrastructure of existing data providers.
-
-Still, one specific tool has been identified as a potential candidate to be used in a scenario similar to the one previously suggested. The XQuark project[[http://xquark.objectweb.org/index.html]] is working on a suite of free and open source tools (XQuark Bridge and XQuark Fusion) that seem to implement the whole XQuery specification in a distributed environment. Both tools are platform independent (java), and each local data source needs to map its data structure against a conceptual schema through an object-relational mapping technique. Although the number of supported databases is still limited and the web services interface is not available yet, it would definitely be interesting to conduct a more detailed study to assess the viability of using these tools. With positive conclusions from such a study, the protocol could be substituted by XQuery, and almost all existing software could be replaced by the aforementioned tools or by similar ones.
-
-== OGC standards ==
-
-The Open Geospatial Consortium (OGC) is an international organisation which is developing standards for geospatial and location based services. At least two of the OGC standards are of particular interest to the protocols being discussed here. The Web Feature Service (WFS)[[http://www.opengis.org/docs/02-058.pdf]] and the Common Catalogue Query Language (CQL)[[http://www.opengis.org/docs/02-059.pdf]].
-
-Web Feature Services expose a set of features (that could be seen as generic objects) allowing discovery, query, and transformation operations like insert, delete, and update.
-
-Each service can handle one or more feature types (types of objects) and each feature type has an associated XML Schema defining its structure. Feature types and their structure can be discovered through "DescribeFeatureType" requests.
-
-Each feature type is free to have its own custom structure, although the associated XML element definition needs to inherit from a generic GML[[http://www.opengis.org/docs/02-023r4.pdf]] feature element using the "substitutionGroup" mechanism. And the associated XML type needs to extend a GML "AbstractFeatureType".
-
-Among its properties (which can also be complex) features typically have a geometry-valued property based on one of the GML simple geometry types. Simple geometries are either expressed by coordinates in two dimensions or by a curve delineation which is subject to linear interpolation. OGC has defined many spatial operators that can be used against geographic features: "Equals", "Disjoint", "Touches", "Within", "Overlaps", "Crosses", "Intersects", "Contains", "DWithin", and "Beyond".
-
-Since collecting and observational events fall in the category of geographic features, they could benefit from all these operators.
-It would definitely be interesting to add similar spatial operators in the new protocol.
-
-Another aspect of features is that each one needs a unique identifier to make database operations possible. Locally unique identifiers are sufficient for this purpose. However, combined with the service scope represented by the service URL, locally unique identifiers can easily become globally unique identifiers. The WFS specification correctly points out the usefulness of being able to directly reference feature instances through a global unique identifier which would include the service URL. So, besides being an identifier it would also be an address, and a default representation of the feature could be easily served by accessing that address. Although recommended, this functionality is left as an implementation specific decision, and it is not covered by the WFS specification. In our case, it would also be interesting to be able to directly reference specimen or other biodiversity objects using a URL address. A useful attribute in this case would be the object version, which is also an optional feature attribute defined in the WFS specification.
-
-Features (or collection of features) can be retrieved through "GetFeature" operations, which accept a filter specification quite similar to DiGIR and BioCASe filters, except for the spatial operators. The OGC filter encoding definition, including comparison, logical and spatial operators, is addressed by the CQL specification, and it includes the possibility of referencing features through their identifiers.
-
-The comparison operators defined in the filter encoding specification are: "PropertyIsEqualTo", "PropertyIsNotEqualTo", "PropertyIsLessThan",--breakline-- "PropertyIsGreaterThan", "PropertyIsLessThanOrEqualTo", --breakline-- "PropertyIsGreaterThanOrEqualTo", "PropertyIsLike", "PropertyIsNull", and "PropertyIsBetween". And the logical operators are "and", "or", and "not".
-
-The default representation of a feature is its complete structure, but a "GetFeature" request can ask for specific properties. Feature properties are referenced through a simplified XPath expression, both in filters and in "GetFeature" requests for a partial feature structure. The service can also decide about mandatory properties that should always be included in responses regardless of being asked in requests.
-
-The WFS specification covers additional methods which are not essential at this moment for the new protocol being suggested, including feature locking ("GetFeatureWithLock", "LockFeature") and transactions ("Transaction" with insert, update and delete operations). A transaction may include several operations at the same time.
-
-However, according to the WFS specification, a service may not implement all methods. A complete description of the service can be retrieved using a "GetCapabilities" request. A response for such request includes general information about the service, request types supported, the list of available feature types (and operations allowed on each one), and a section about filter encoding capabilities (listing the available functions and operators).
-
-Standards being defined in the biodiversity area, such as ABCD and DarwinCore, could possibly be seen as global feature type definitions. However, in the case of ABCD, the most logical candidate for a feature would be the "unit" element. This would exclude all metadata elements defined above it. In the case of DarwinCore, its elements could be feature properties from a generic "biological record" element. Regardless of these changes, in both cases, the schemas could not be directly used, since a feature type definition should be an element derived from another GML element, and its type should extend an abstract type also defined in GML. As it happens with DiGIR/DarwinCore, WFS binds content definition to the protocol.
-
-Comparing WFS to the protocols being considered in this report, there is one request type which is defined in both DiGIR and BioCASe (named "inventory" and "scan", respectively) that is not addressed by the WFS specification. This request type gives us the possibility to retrieve distinct values of mapped "concepts" (or feature properties in the WFS jargon).
-
-It also doesn't seem to be possible to get a customized view of features, by restructuring their properties, as it can happen now in DiGIR through the record structure specification in search requests.
-
-Considering extensibility, feature properties can be extended by changing the feature type definition, either by including new elements or by extending their XML types. Feature property names may be namespace qualified.
-
-Regarding specific Structured Descriptive Data[[http://160.45.63.11/Projects/TDWG-SDD/]] (SDD) needs, when retrieving data from WFS it doesn't seem to be possible to output "normalized" features referencing terminology or dictionary elements. And it is unclear if the capability of returning instances from different feature types would be enough to represent relationships between current SDD elements. Such capability is normally associated to a request explicitly including multiple queries involving different feature types.
-
-One important information about all OGC standards, including WFS, is that the specifications are being changed[[http://www.opengis.org/press/?page=pressrelease&view=20040318OWS2P… to be completely based on SOAP and WSDL to benefit from the growing number of existing SOAP tools and wrappers. Therefore, before seriously considering a migration to the WFS specification it would be wiser to wait for its next version.
-
-== SOAP ==
-
-SOAP is a generic XML-based protocol for exchanging information in a distributed environment. It defines an extensible messaging framework, including the possibility of making remote method calls, and it also offers a simplified way to encode parameters and return values. Messages can actually be exchanged over several underlying protocols (HTTP, TCP, SMTP) and all its design tries to abstract anything that could be specific to any platform or programming language.
-
-It must be clear that SOAP is only an additional protocol layer over which the existing protocols could be based on. Just by using SOAP would not reconcile the differences between DiGIR and BioCASe. However, an integrated protocol could be based on SOAP or could be independent from it.
-
-At the present moment, a considerable number of web services have been implemented using SOAP, and there are many SOAP wrappers available for most part of the programming languages. This situation makes SOAP almost a natural choice when implementing a web service.
-
-Actually, SOAP and web service are not synonyms. According to W3C's definition, a "web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols"[[http://www.w3.org/TR/2002/WD-ws-gloss-20021114/]].
-
-Clearly, SOAP is not a requirement to develop a system that satisfies this definition, although it perfectly fits into the web service idea. The current implementations of both DiGIR and BioCASe could already be seen as web services, except for the fact that they still lack a formal interface description. In this context, interfaces are usually described through WSDL[[http://www.w3.org/TR/wsdl.html]], which at the moment includes bindings for SOAP messages, HTTP GET and POST verbs, and MIME format. Specific bindings for services that exchange XML messages without using SOAP can be described with WSDL through the mimeXml element[[http://www.w3.org/TR/wsdl#_mime:mimeXml]] from the MIME binding specification.
-
-Nevertheless, there are many similarities between some aspects of the protocols being discussed here and SOAP-based protocols. Both have a similar message structure, with a header and a content (or body) parts. This generic design is usually seen as a disadvantage of using SOAP since it makes it difficult, if not impossible, to validate through XML Schema the content inside the body part. Due to the similar approach, DiGIR and BioCASe have the same restriction at least on search response messages, but the other types of messages can be completely validated through XML Schema. By using SOAP, the content of all types of messages would have this restriction.
-
-From the perspective of protocol specification, moving to SOAP would be relatively easy. The header elements could move to the SOAP header, the content elements could move to the SOAP body part, and the standard fault strategy of SOAP could be used in place of the diagnostics elements.
-
-From an implementation perspective, developers could take advantage of the large amount of existing SOAP wrappers. Depending on the SOAP binding style being used, the wrapper could take complete care of XML parsing, data serialization and deserialization, thus allowing developers to directly use objects and variables in the way they are used to when programming. Generally, this would make both servers and clients easier to develop, although it is unclear if the wrappers would properly deal with more complex data structures such as filter and record structure parameters. But these facilities are mostly related to one specific SOAP binding style, the RPC/encoded. There are four possible binding styles[[http://www-106.ibm.com/developerworks/webservices/library/ws-whichw… for SOAP. The use of SOAP encoding is actually being discouraged due to interoperability problems between different wrappers. And the RPC binding style is also being discouraged due to scalability problems[[http://www-106.ibm.com/developerworks/webservices/library/ws-soap…. So, in case of using SOAP, the recommendation is to use the binding called document/literal to avoid scalability and interoperability issues, and this would mean almost the same amount of work being done now on the developer side to parse and produce messages.
-
-It also remains unclear how performance could be affected by introducing SOAP wrappers in the existing architecture, since part of the message parsing task would move from the current software to the SOAP wrappers. In particular, wrappers for interpreted languages could probably slow down performance, although in the case of PHP the newest version (5.0.0) already includes a built-in SOAP module. This would be better assessed through a more specific case study.
-
-In theory, it could be possible to embed SOAP functionalities in the existing software without using a wrapper, but this would mean an additional protocol level to worry about and to keep up with new specifications. This cost could be significant over the time. So, in case of using SOAP, the recommendation is to use SOAP wrappers. And this would probably cause a considerable impact on code.
-
-It is interesting to mention that depending on the way that SOAP wrappers are used, client or server code tend to be bound to that specific wrapper implementation and also to the web service SOAP binding itself. Therefore, switching between different SOAP wrappers, or between different versions of the same SOAP wrapper, or between different web service bindings could be a non-trivial and time consuming task. There are already more generic wrappers[[http://www-106.ibm.com/developerworks/webservices/library/ws-wsif… that provide binding independent access to web services by directly using the logical description of the service from the associated WSDL files. In case of using SOAP, the recommendation for developers is to use these generic wrappers, or at least to use a layer between their code and the SOAP wrapper.
-
-Deciding between using SOAP and keeping an application specific XML protocol is not a simple task in this case. Both approaches have advantages and disadvantages. A SOAP-based protocol could benefit from all tools and extensions that are being developed for the SOAP community, for instance security strategies[[http://www.nwfusion.com/news/tech/2002/1216techupdate.html]] and message brokering tools. Developers would have a large set of wrapping tools available. On the other hand, there are still several interoperability problems between different SOAP wrappers[[http://www.xmethods.com/soapbuilders/interop.html]], and it is also possible that sometimes SOAP-related issues consume most part of the development efforts[[http://www.artima.com/webservices/articles/whysoapP.html]]. Considering mainly the risks related to significant impact on code and performance, the recommendation is to keep the protocol independent from SOAP. If the advantages of using SOAP become clearer in the future, newer versions of the protocol could be based on it.
-
-However, not basing the protocol on SOAP does not prevent to have a SOAP service on top of a message broker. This would be an interesting case study, since message brokers are certainly the kind of service most used by the community. A SOAP service on top of it could perhaps ease the development of user interfaces and also the integration of message brokers with networks that are trying to connect distributed web services of different types.
-
-= Integration proposal =
-
-== Overall strategy ==
-
-The proposed protocol aims at eliminating the current differences between DiGIR and BioCASe, but also including new features, and incorporating some ideas from other standards. Impact on existing software and networks was also considered, so that the new protocol could be implemented in a relatively short term.
-
-When reviewing other technologies and standards, XQuery has been considered a potential query language to be adopted in the future due to its remarkable flexibility and power. While still being at the stage of a working draft, it is expected to soon become a standard. By using a well-known and globally accepted query language, one could benefit from tools, documentation, and knowledge produced by other communities. But there would certainly be a dependency on XQuery parsers (which are still limited in number and in compliance with the specification). Using XQuery in a distributed and heterogeneous scenario would also require additional research in order to find a way of abstracting local data models. A consistent proposal using XQuery would deserve a deeper study and working prototypes before being suggested.
-
-OGC specifications, namely WFS and CQL, have interesting similarities if compared to both DiGIR and BioCASe. The CQL filtering specification is actually more powerful than the filters being used by DiGIR and BioCASe, and therefore provided new ideas to the suggested protocol. And the WFS specification uses a different paradigm when it defines "features" as the object of queries, which is somehow similar to DiGIR retrieving records, but different from BioCASe retrieving documents. Although OGC specifications have been a source for new ideas, they also lack some of the existing functionalities of the protocols being integrated (such as inventory operations). Regardless these missing functionalities, suggesting the adoption of only one OGC specification would not be coherent since most part of the specifications have some connection - it would probably be necessary to adopt all of them, including the extensive GML specification. Considering that OGC specifications are being changed to adhere to the SOAP protocol, any serious proposal to adopt them could first wait for their new versions.
-
-Using SOAP as a generic protocol layer has been considered as well. It would not add any functionality to the current protocols, but implementations could benefit from the existing SOAP wrappers, code generators, and also from easier interface definitions (WSDL includes a SOAP binding). Similarities between the SOAP message structure and the general message format used by DiGIR and BioCASe (kept by the suggested protocol) would certainly facilitate a migration to SOAP. However, a consistent proposal to adopt SOAP as a top level protocol layer would deserve working prototypes and thorough testing due to the significant number of existing performance and interoperability issues between SOAP implementations.
-
-The protocol that is being proposed kept the main message structure and integrated the operations already present in DiGIR and BioCASe.
-
-== Service types ==
-
-Typical architectures of networks performing search and retrieval operations across distributed databases involve at least three possible different services:
-
-- Message broker service: Responsible for distributing messages to other services and pooling their responses.
-
-- Provider service: A specialized version of a message broker, since messages are only distributed to local resources. It is usually a software installation with many configurable underlying data sources.
-
-- Datasource service: Situated in the end of any request process, it is the service responsible for translating a request into a local query language and retrieving data from a specific data repository. It is usually associated to a single database, or to a subset of it.
-
-
-All services have many similarities that apparently suggest a common protocol. Even so, the services are clearly not the same, and using the same protocol schema to validate messages from all services would probably make the schema more complicated and less restrictive (allowing for some unreal combinations of elements). A single specification targeted to all services would likely lack the desirable clarity.
-
-Differences become more evident if we think that a message broker service could potentially be configured to access other message brokers, encapsulating whole networks in a cascading way. In this scenario, there could be many ways of reaching a datasource. Attention should be given to avoid endless loops, and special additional requests might be desirable to retrieve tree representations of networks in order to optimize queries.
-
-The integration proposal presented here concentrates only in the protocol used to communicate directly with datasources, which has been considered the most critical part in the integration process.
-
-== Access points ==
-
-One of the main differences between DiGIR and BioCASe networks is that datasource services don't exist in DiGIR networks. DiGIR resources are only addressed by the protocol through a specific attribute inside the header. On the other hand, BioCASe does not have provider services - it addresses datasources directly through their access points.
-
-An access point is a URL used to communicate with a service. To optimize service discovery through repositories like UDDI it is desirable for each datasource to have its own access point, especially when providers have several underlying datasources related to different themes. Discovery mechanisms, message broker configurations, and even client software benefit from the possibility of directly communicating and referencing specific datasources. In the proposed protocol, each datasource must have its own access point.
-
-== Conceptual binding ==
-
-One of the main differences between DiGIR and BioCASe is related to how concepts are referenced. DiGIR chose to define concepts as extensions of abstract elements defined by the protocol, and this is done via substitution groups. Although this approach offered more validation possibilities to control how concepts are used, it also limited conceptual schemas to be a flat list of global elements bound to protocol definitions.
-
-This was the main reason for BioCASE to develop its own protocol to be able to reference any element from external and hierarchical schemas. BioCASe references concepts through a simple absolute XPath expression using only the child axis in its abbreviated form (/).
-
-The main requirements for the new protocol concerning conceptual binding were:
-
-- To be able to reference concepts from schemas completely external and independent from the protocol.
-
-- To specify conceptual schemas when referencing concepts, therefore allowing datasources to use multiple conceptual schemas.
-
-- To allow conceptual schemas to be defined in different formats.
-
-
-The suggested protocol thus extends BioCASe's conceptual binding by allowing any kind of reference to external concept definitions, and by enabling clear distinction between concepts from different schemas. The proposed conceptual binding has the following properties:
-
-- Concepts are referenced through an XML attribute called "path", and are completely external to the protocol.
-
-- Paths can be seen as identifiers for concepts.
-
-- Concept paths must be prefixed by the namespace prefix of the respective conceptual schema (except when they are listed in capabilities responses).
-
-
-After the conceptual schema prefix, concept identifiers can use almost any string. When conceptual schemas are defined as pure XML Schemas, the suggested format to identify concepts is to use a simple XPath expression pointing the corresponding elements in instance documents.
-
-The following XML Schema could be seen as a simple conceptual schema:
-
-```
-<schema xmlns="http://www.w3.org/2001/XMLSchema">
- <element name="dataset">
- <complexType>
- <sequence>
- <element name="record"
- minOccurs="0"
- maxOccurs="unbounded">
- <complexType>
- <sequence>
- <element name="scientificName" type="string"/>
- <element name="basisOfRecord" type="string"/>
- </sequence>
- <attribute name="catalogNumber"
- type="string"
- use="required"/>
- </complexType>
- </element>
- </sequence>
- </complexType>
- </element>
-</schema>
-```
-
-In this case, concepts could be referenced within the protocol as:
-
-```
-<concept path="cs:dataset/record/scientificName"/>
-```
-
-or
-
-```
-<concept path="cs:dataset/record/@catalogNumber"/>
-```
-
-Where "cs" is a namespace prefix associated to the conceptual schema and previously declared in the document like:
-
-```
-xmlns:cs="http://example.net/schemas/cs/1.0"
-```
-
-== Filter encoding ==
-
-The differences between DIGIR and BioCASe regarding filter encoding include:
-
-- While DiGIR uses combined logical operators to express negation (<andNot>, <orNot>) BioCASe uses a unary <not> operator.
-
-- While DiGIR uses "xsi:nil" attributes to express NULL values, BioCASe has specific comparison operators (<isNull>, <isNotNull>).
-
-- While DiGIR defines values as content of concept elements, BioCASe defines values as direct content of comparison operator elements.
-
-
-Analysing these differences, the suggested protocol considered a better approach to have a unary logical operator to express negation. In practical terms, it allows for a filter to begin with a negation condition without any preceding binary logical operator (which is not possible in DiGIR). As a consequence, all combined operators were dropped. The <isNull> operator was considered more explicit than "xsi:nil", and better suited the chosen conceptual binding technique. Finally, instead of defining values as content of operator elements or concept elements, it was considered a better solution to have a new element <literal> defining values as one attribute.
-
-The suggested protocol has more changes and also incorporates new features to provide additional functionality, including:
-
-- Possibility of using more than two conditions inside binary logical elements (in this case, the position of binary logical elements represent only the boundaries for a sequence of conditions linked through the respective logical operator).
-
-- Possibility of comparing two concepts in the same condition (an example could be a typical data cleaning request searching for records where minimum altitude is greater than maximum altitude).
-
-- Possibility of using parameters (instructing the service to look for values in the environment variables).
-
-- Possibility of using arithmetic operators.
-
-
-Other enhancements considered important, such as spatial operators and functions, were left to future versions since they would demand additional study.
-
-A summary with examples about the new filter encoding follows.
-
-=== Expressions ===
-
-Expressions, new to both DiGIR and BioCASe protocols, are the basic atoms of a filter. They are solely used to build comparison statements, and a valid expression may consist of a literal, a concept, a parameter or an arithmetic operation.
-
-A literal represents a fixed value and takes the form of:
-
-```
-<literal value="42"/>
-```
-
-A concept uses the proposed conceptual binding to reference a concept defined in a conceptual schema (which in turn is mapped to local data repositories):
-
-```
-<concept path="cs:dataset/record/scientificName"/>
-```
-
-Parameters are similar to literals but instead of having a fixed value they take it from external sources such as the HTTP POST and GET environment variables. The name of the parameter within the POST/GET variables is given by the parameter definition.
-
-```
-<parameter name="sname"/>
-```
-
-The previous parameter expression would evaluate to "Ficus" for the following sample URL:
-
-```
-http://example.net/a.cgi?sname=Ficus
-```
-
-The 4 basic arithmetic operations <add>, <sub>, <mul>, <div> are all binary and take 2 further expressions as their arguments. The first argument is treated as the leftmost one. The arithmetic expression 13+7 would be encoded as:
-
-```
-<add>
- <literal value="13"/>
- <literal value="7"/>
-</add>
-```
-
-=== Comparison operators ===
-
-The proposed operators can be classified according to the number of arguments they take:
-
-- Unary operators: <isNull>
-
-- Binary operators: <equals>, <greaterThan>, <lessThan>,--breakline--<greaterThanOrEquals>, <lessThanOrEquals>, <like>
-
-- Unbounded operators: <in>
-
-
-All operators take a concept as their first argument. Unary operators take no additional arguments. Binary operators can take an expression as the second argument. Unbounded operators take one or more literal expressions as additional arguments. A simple comparison looks like:
-
-```
-<equals>
- <concept path="cs:dataset/record/scientificName"/>
- <literal value="Stipa"/>
-</equals>
-```
-
-=== Logical operators ===
-
-The integration proposal reduced the logical operators to the 3 basic ones: <and>, <or>, <not>. The unary <not> operator takes a single argument which can be a comparison or a logical operator. The other two operators <and> and <or> were changed from their binary form to unbounded and can therefore take any number of arguments, but at least two. A more elaborated expression could be:
-
-```
-<and>
- <like>
- <concept path="cs:dataset/record/scientificName"/>
- <literal value="Stipa%"/>
- </like>
- <equals>
- <concept path="cs:dataset/record/basisOfRecord"/>
- <literal value="observation"/>
- </equals>
- <not>
- <in>
- <concept path="cs:dataset/record/@catalogNumber"/>
- <values>
- <literal value="101"/>
- <literal value="102"/>
- </values>
- </in>
- </not>
-</and>
-```
-
-== Response structure and view ==
-
-In spite of being able to return structured responses in search operations, both DiGIR and BioCASe have substantial differences in how results are specified and generated.
-
-DiGIR depends on a "record structure" parameter in order to generate search responses. A DiGIR record structure uses a schema (using a subset of the XML Schema language) to represent what concepts should be returned and how they should be structured in responses. At the same time, each DiGIR resource needs to define during configuration a "root table" which will serve as a basis for looping and generating "record" elements in responses. For this reason, record structures must contain the definition of a <record> element which is made of a complex type containing a sequence of either elements referencing concepts or further complex types. Since concepts are referenced through the XML Schema "ref" attribute, values can only be returned as content from XML elements with the same name as the corresponding concept.
-
-On the other hand BioCASe generates responses based on the structure of conceptual schemas used by datasources. Each conceptual schema is converted into another XML file as a basis for mapping and generating responses. Consequently, responses always follow the same structure of conceptual schemas, but they are not bound to records from a single underlying "root table".
-
-The suggested protocol tried to reconcile both approaches by extending the idea of "record structures" using a more generic alternative of "response structures". A response structure is also a schema in XML Schema language, specifying how concepts should be returned, but without any fixed record binding. Response structures are actually part of a view definition which also comprises a mapping section and an indexing element definition that should be used as a reference for counting and paging.
-
-A view could look like:
-
-```
-<view>
- <structure>
- <schema
- xmlns:xs="http://www.w3.org/2001/XMLSchema"
- targetNamespace="http://example.net/schemas/sp1">
- <xs:element name="dataset">
- <xs:complexType>
- <xs:sequence>
- <xs:element name="specimen"
- minOccurs="0"
- maxOccurs="unbounded">
- <xs:complexType>
- <xs:sequence>
- <xs:element name="acnum" type="xs:string"/>
- <xs:element name="sname" type="xs:string"/>
- </xs:sequence>
- <xs:attribute name="lastupdate"
- type="xs:dateTime"
- use="optional"/>
- </xs:complexType>
- </xs:element>
- </xs:sequence>
- </xs:complexType>
- </xs:element>
- </schema>
- </structure>
- <indexingElement path="/dataset/specimen"/>
- <mapping xmlns:s="http://example.net/cs/1.0">
- <nodes>
- <node path="/dataset/specimen/acnum"/>
- <concept path="s:CatalogNumber"/>
- </nodes>
- <nodes>
- <node path="/dataset/specimen/sname"/>
- <concept path="s:ScientificName"/>
- </nodes>
- <nodes>
- <node path="/dataset/specimen/@lastupdate"/>
- <concept path="s:DateLastUpdated"/>
- </nodes>
- </mapping>
-</view>
-```
-
-The first <element> definition inside a <schema> should always be considered the root element when producing responses, so a possible result according to the given structure could be (excluding external protocol elements):
-
-```
-<dataset xmlns="http://example.net/schemas/sp1">
- <specimen lastupdate="2002-05-15T13:20:00Z">
- <acnum>423</acnum>
- <sname>Thylacinus cynocephalus</sname>
- </specimen>
- <specimen lastupdate="2001-02-19T16:04:01Z">
- <acnum>567</acnum>
- <sname>Sarcophilus Harrisii</sname>
- </specimen>
-</dataset>
-```
-
-The mapping section uses a simple one-to-one mapping that could be extended in future versions of the protocol. It maps nodes (content elements or attributes) from the response structure to concepts from conceptual schemas used by the datasource.
-
-Since the XML Schema language is vast and complex, wrappers are not expected to support the whole specification. Only a restricted subset of the language is required, basically including target namespaces, elements (with cardinality), attributes, sequences, and local declarations of complex and simple types (with simple and complex content). Additional aspects of the XML Schema language can be supported by wrappers and advertised in capabilities responses.
-
-The containment relationship between complex elements and its sub elements or attributes must be translated to the datasource backend specific query language in order to generate responses. In the case of relational databases, one or more SQL statements could be used taking into account datasource's configuration settings regarding tables, joins, and mappings with the conceptual schemas.
-
-Resulting structures can arrange concepts in many ways, as elements' content or attributes' values, all accepting custom names, and the result can be validated against an external XML Schema.
-
-By separating the notion of response structures from conceptual schemas, not only different views can be taken from multiple conceptual schemas, but also conceptual schemas are left to be defined in different ways, perhaps not using XML Schema.
-
-== General message format ==
-
-The general message format has been kept. A root element, <request> or <response>, should contain a <header> section followed by another section with the operation name. The only difference is that both previous protocols defined the operation using a <type> element inside the header, and then used a generic <content> element following the header. Therefore, it allowed for inconsistencies between the specified type and the actual content elements. By replacing the general <content> element by the operation name, validity becomes stricter and the schema more readable.
-
-In requests the operation element contains possible parameters to be passed:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <search>
- <!-- search operation specific parameters -->
- </search>
-</request>
-```
-
-In responses, the operation element contains possible results followed by a <diagnostics> section containing any relevant additional information, as well as warnings or errors:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <search>
- <!-- search operation result -->
- </search>
- <diagnostics>
- <!-- diagnostics information -->
- </diagnostics>
-</response>
-```
-
-=== Header ===
-
-Headers were also kept essentially the same, but some additional changes are being suggested. Besides removing the <type> element the main changes proposed are:
-
-- The attribute "resource" used by DiGIR networks has been removed from the destination element, since datasources are now referenced by their access points.
-
-- <source> elements are allowed to have multiple occurrences to track the addresses used by all services (stages) involved in a communication process. Intermediary services must include their address in the end of the list. By keeping track of the process, implementations of cascading message brokers will have a means to avoid endless loops.
-
-- The access point is now an attribute of <source> instead of its content, and a timestamp "sendtime" has also been incorporated as an attribute.
-
-- Inside <source> there's now an optional <software> element to indicate name and version of the software used to process the message.
-
-- A destination element is not necessary anymore, since the protocol covers only datasource services which are directly reached by their access points.
-
-
-A header from a request that has been dispatched by a message broker is shown in the next example.
-
---newpage--
-```
-<header>
- <source accesspoint="11.12.13.14"
- sendtime="2001-12-17T09:30:47-05:00">
- <software name="generic client software"/>
- </source>
- <source accesspoint="15.16.17.18"
- sendtime="2001-12-17T09:30:49-05:00">
- <software name="generic portal" version="0.95"/>
- </source>
-</header>
-```
-
-A header from a response to the previous request could be:
-
-```
-<header>
- <source
- accesspoint="http://example.net/datasource/a.cgi"
- sendtime="2001-12-17T09:30:50-05:00">
- <software name="generic wrapper" version="1.1.4"/>
- </source>
- <destination accesspoint="11.12.13.14"/>
-</header>
-```
-
-=== Diagnostics ===
-
-The <diagnostics> section present in responses is almost the same as it was before, containing one or more <diagnostic> elements, as in the next example:
-
-```
-<diagnostics>
- <diagnostic code="PRG_MISSING_LIBRARY" type="error">
- Could not find module MBSTRING
- </diagnostic>
-</diagnostics>
-```
-
-The former "severity" was renamed to "type" and contains as before the following categories: debug, info, warn, error, fatal
-
-However, from a client software perspective, unification of diagnostic codes was also considered important. The complete list of codes should be present in a final specification of this protocol.
-
-== Request operations and response types ==
-
-The proposed operations for the protocol include:
-
-- **Metadata:** Containing basic information to describe the service.
-
-- **Inventory:** Used to retrieve distinct values from a list of concepts.
-
-- **Search:** Main operation used to retrieve data from local data sources in a customizable response structure.
-
-- **View:** Used to get pre-defined views from local data.
-
-- **Capabilities:** Containing essential settings and technical information about the service functionality.
-
-- **Ping:** Used for monitoring purposes to check availability of services.
-
-
-=== Metadata operation ===
-
-The main idea of a metadata operation is to retrieve some basic description about a service, including things like label, abstract, keywords, related entities and contacts, etc.
-
-The existing DiGIR metadata operation served as the basis for this proposal but it has been changed in several ways in order to address a number of known issues. Main changes are:
-
-- Inclusion of the datasource access point (could be seen as redundant considering only direct communications with datasources, but it is actually essential considering that the response may go to a client software that doesn't know the service's address).
-
-- Inclusion of "lang" attribute in content elements that could be served in many languages, and change of their cardinality to "unbounded".
-
-- Removal of <implementation> element which was duplicated (header element already has this information in the <software> element).
-
-- Move of <minQueryTermLength> to the capabilities response.
-
-- Removal of elements <maxSearchResponseRecords> and--breakline-- <maxInventoryResponseRecords> which were substituted to--breakline-- <maxElementLevels> and <maxElementRepetitions> in the capabilities response. <maxElementLevels> can be used to restrict the maximum number of nested elements in response structures acting on the XML depth dimension. <maxElementRepetitions> can be used to restrict the maximum number of repetitions of each element being produced by search responses, acting on the XML breadth dimension.
-
-- Removal of elements <recordBasis> and <recordIdentifier> which only make sense to specific datasources.
-
-- Change of <useRestrictions> to a generic <rights> element.
-
-- Renaming of previous resource <name> to <label>.
-
-- Inclusion of available local views in a list.
-
-- Change of <dateLastUpdated> and <numberOfRecords> elements to optional attributes from views (since a datasource can now actually serve representations of different things).
-
-
-Another issue worth mentioning is that, even if the datasource could also be reached through a provider service, there's no explicit relationship shown in the metadata response. A datasource is seen as a completely independent service.
-
-The term "provider" has caused a lot of confusion in the past due to all possible meanings: provider service, provider software, original data provider, and provider institution. In the proposed protocol there is no explicit relationship with a provider service in datasources metadata responses, but it is possible to give clear credits to a provider institution. Actually a datasource service can now give credits to as many entities (organisations, institutions, companies, etc) it wishes.
-
-Entities can also play more than one role at the same time, and roles are defined by the networks. The list of contacts associated to entities is the same as used by DiGIR. Another interesting point is that a same entity can be referenced by more than one service. For those cases, there is now an <identifier> element that can be used as a global unique identifier to produce lists of entities without repetitions across many services.
-
-A metadata request requires no parameters:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <metadata/>
-</request>
-```
-
-A metadata response looks like:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <metadata>
- <label lang="en">Plant specimen database</label>
- <label lang="pt_BR">
- Banco de dados de Esp�cimes de Plantas
- </label>
- <accesspoint>
- http://example.net/datasource/a.cgi
- </accesspoint>
- <abstract lang="en">
- This resource contains specimen records
- of plants from all over the world
- </abstract>
- <keywords lang="en">plant, specimen</keywords>
- <citation lang="en">
- Plant specimen database. Retrieved on (date
- accessed). http://example.net/datasource/a.cgi
- </citation>
- <rights lang="en">Users may not distribute, modify,
- transmit, reuse, repost, transfer, or use any
- content or information from this service for
- commercial purposes without prior written permission.
- </rights>
- <conceptualSchemas>
- <conceptualSchema namespace=
- "http://example.net/schemas/specimen/1.0"/>
- </conceptualSchemas>
- <views>
- <view name="specimen"
- dateLastUpdated="2004-08-01T20:00:00-03"
- numberOfRecords="100"/>
- </views>
- <relatedEntities>
- <entity lang="en">
- <identifier>
- http://example.net/organisation/mydata.xml
- </identifier>
- <name>Biodiversity Informatics Institute</name>
- <name lang="pt_BR">
- Instituto de Inform�tica para Biodiversidade
- </name>
- <acronym>BII</acronym>
- <logoURL>http://example.net/mylogo.gif</logoURL>
- <role>provider</role>
- <role>host</role>
- <description>
- Organisation created for this example
- </description>
- <relatedInformation>
- http://example.net/organisation/
- </relatedInformation>
- <contact type="administrative">
- <name>Person A</name>
- <title>Curator</title>
- <email>person_a(a)example.net</email>
- <phone>+111 11 111111</phone>
- </contact>
- <contact type="technical">
- <name>Person B</name>
- <title>Sysadmin</title>
- <email>person_b(a)example.net</email>
- <phone>+111 11 222222</phone>
- </contact>
- </entity>
- </relatedEntities>
- </metadata>
-</response>
-```
-
-As shown in the last example, the "lang" attribute can also be declared in elements <metadata>, <entity> or <contact> as a default setting to all language aware elements which are below them.
-
-=== Inventory operation ===
-
-Inventory operations, which are used to retrieve distinct values from concepts, are present in both DiGIR and BioCASe with a few minor differences. Apart from the different naming ("inventory" in DiGIR, and "scan" in BioCASe), DiGIR accepts filters in requests and has two different counting procedures: the total number of distinct values and the number of occurrences of each distinct value. All these features are present in the new protocol. However, both DiGIR and BioCASe are unable to request distinct values from a combination of concepts, which should now be possible.
-
-The main features of inventory operations in the suggested protocol are:
-
-- An optional filter can be used in requests.
-
-- One or more concepts can be used as parameters (when more than one concept is specified, distinct combinations of their values are returned).
-
-- Concepts can be part of different conceptual schemas.
-
-- An optional "count" parameter can be used to request the total number of distinct records and the number of occurrences of each record in the local datasource repository.
-
-- Paging can be done using the parameters "start" and "limit" ("start" represents the index of the first record to be returned and begins with "0", while "limit" represents the number of records to be returned).
-
-- Responses include a <summary> element where "start" indicates the index of the first record returned, "next" indicates the index of the next record that could be retrieved using a subsequent request, "totalReturned" indicates the number of records returned, "totalMatched" indicates the total number of records that matched the query (not necessarily returned)
-
-
-An inventory request looks like:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <inventory count="true"
- start="0"
- limit="2"
- xmlns:ns1="http://example.net/ns1"
- xmlns:ns2="http://example.net/ns2">
- <concepts>
- <concept path="ns1:record/taxon/fullname"/>
- <concept path="ns2:gazeteer/location/isocountry"/>
- </concepts>
- <filter>
- <!-- filter specific elements -->
- </filter>
- </inventory>
-</request>
-```
-
-An inventory response for this request could be:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response>
- <header>
- <!-- header specific elements -->
- </header>
- <inventory>
- <record count="13">
- <value>Abies alba Mill.</value>
- <value>DE</value>
- </record>
- <record count="45">
- <value>Quercus robur L.</value>
- <value>DE</value>
- </record>
- <summary start="0"
- next="2"
- totalReturned="2"
- totalMatched="117"/>
- </inventory>
- <diagnostics>
- <!-- diagnostics information -->
- </diagnostics>
-</response>
-```
-
-The order of values inside <record> must correspond to the order of concepts specified in the request.
-
-The "totalMatched" and "count" attributes in responses are only present if the request asked for counting.
-
-If a request asks for counting but specifies "limit" zero, than no records are returned, but a "totalMatched" is present in responses.
-
-=== Search operation ===
-
-Search operations simply put together two of the main ideas already discussed: views and filters. Additional parameters can be used to request partial views and to perform paging and counting.
-
-The main features of search operations are:
-
-- Search requests can specify how responses should structure results by means of view definitions.
-
-- View definitions can be completely declared in requests, or they can be remotely located and referenced, or a default local view can be used if none is specified.
-
-- Partial views can be requested by specifying nodes from the view.
-
-- Filters are now optional.
-
-- Paging and counting is based on the <indexingElement> from the view section.
-
-
-Supposing that the view used as an example in section "Response structure and view" could be available from "http://example.net/viewlib/specimen.vwd", then a search request asking only for the element--breakline-- "/dataset/specimen/sname" could look like:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <search count="true" start="0" limit="2">
- <view
- location="http://example.net/viewlib/specimen.vwd"/>
- <partial>
- <node path="/dataset/specimen/sname"/>
- </partial>
- <filter xmlns:s="http://example.net/cs/1.0">
- <equals>
- <concept path="s:CatalogNumber"/>
- <literal value="423"/>
- </equals>
- </filter>
- </search>
- <diagnostics>
- <!-- diagnostics information -->
- </diagnostics>
-</request>
-```
-
-Concerning requests:
-
-- If no <view> section is specified, than the default local view should be used (and if there are no local views, than an error should be raised).
-
-- If no <partial> section with specific nodes is specified, than all view structure should be returned.
-
-- If no <filter> is present then no filtering is used.
-
-
-
-A search response for this request could be:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response>
- <header>
- <!-- header specific elements -->
- </header>
- <search>
- <dataset xmlns="http://example.net/schemas/sp1">
- <specimen>
- <acnum>423</acnum>
- <sname>Thylacinus cynocephalus</sname>
- </specimen>
- </dataset>
- <summary start="0"
- totalReturned="1"
- totalMatched="1"/>
- </search>
- <diagnostics>
- <!-- diagnostics information -->
- </diagnostics>
-</response>
-```
-
-Notice that element <acnum> was not requested, but since it is a mandatory element according to the view definition then it should be always returned. But the optional attribute "lastupdate" was not returned.
-
-Even being able to specify in search requests how many elements (using the <indexingElement> tag) should be returned, responses may be limited by a service setting called "maxElementRepetitions". This setting acts on the breadth dimension the XML result (considering only the XML content inside the <search> element). It means that no element originated from the same "xsd:element" definition can be repeated more times than the maximum number of allowed repetitions. This setting gives data providers some control to avoid server overload and to avoid complete data dumping requests, and at the same time it can be seen as a paging limit.
-
-=== View operation ===
-
-This operation (not present in DiGIR and BioCASe) came up as a natural consequence after having defined all features of the suggested protocol. It actually does not offer more functionality than what could already be achieved through regular search operations. But it offers a simpler way of performing searches, leveraging direct access to original data providers and opening new possibilities.
-
-Since search responses are essentially the result of a combination of a view definition and a filter, direct access to local data can be considerably facilitated through local definitions of parameterized views. The idea is that datasources may be able to define one or more XML views from its local data.
-
-These local views can be designed at will by data providers considering the conceptual schemas being used, or they can simply reference remote pre-defined views. Each local view has a local unique name that can be referenced by search and view operations. One of the local views should be set as being the default.
-
-As already seen in the "Filter encoding" section, comparison expressions inside filters can use a literal value or a parameterized value to be taken from environment variables (HTTP GET or HTTP POST parameters). This entire infrastructure is enough to provide simple direct access to local data.
-
-A local view could be defined and listed in the capabilities response as:
-
-```
-<views default="specimen">
- <view name="specimen"
- xmlns:s="http://example.net/cs/1.0">
- <structure>
- <schema
- xmlns:xs="http://www.w3.org/2001/XMLSchema"
- targetNamespace="http://example.net/schemas/sp1">
- <xs:element name="dataset">
- <xs:complexType>
- <xs:sequence>
- <xs:element name="specimen"
- minOccurs="0"
- maxOccurs="unbounded">
- <xs:complexType>
- <xs:sequence>
- <xs:element name="acnum"
- type="xs:string"/>
- <xs:element name="sname"
- type="xs:string"/>
- </xs:sequence>
- </xs:complexType>
- </xs:element>
- </xs:sequence>
- </xs:complexType>
- </xs:element>
- </schema>
- </structure>
- <indexingElement path="/dataset/specimen"/>
- <mapping>
- <nodes>
- <node path="/dataset/specimen/acnum"/>
- <concept path="s:CatalogNumber"/>
- </nodes>
- <nodes>
- <node path="/dataset/specimen/sname"/>
- <concept path="s:ScientificName"/>
- </nodes>
- </mapping>
- <filter>
- <equals>
- <concept path="s:CatalogNumber"/>
- <parameter name="id"/>
- </equals>
- </filter>
- </view>
-</views>
-```
-
-The view operation is the only one which is only available through HTTP GET or HTTP POST invocation without using an XML message as input. A simple call to the previously defined view would be:
-
-```
-http://example.net/a.cgi?operation=view&name=specimen&id=123
-```
-
-The response would be an XML representation of part of the local data. In the given example, the parameterized filter takes the value of the "id" parameter and uses it in a comparison with the local mapping of the concept "CatalogNumber". If "CatalogNumber" maps to a field with unique values, then the response would be an XML representation of a specimen record, which could be even bookmarked or referenced:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<dataset>
- <specimen>
- <acnum>123</acnum>
- <sname>Rubus rosaefolius</sname>
- </specimen>
-</dataset>
-```
-
-Responses from view operations are the only ones that do not include by default specific protocol elements (like header and diagnostics). They contain just the XML representation given by the view structure definition. However, it is possible to request the presence of protocol elements by using the parameter called "verbose" (in this case, the response will follow the same structure of search responses).
-
-Views can also perform paging when there are multiple "objects" (indexing elements) involved. Paging can be done using the "start" and "limit" parameters.
-
-Since views are only defined against concepts from conceptual schemas, they can be used by any datasource which uses the same conceptual schemas and which have mapped the necessary concepts. Libraries of views are also possible, and could be used during service configuration.
-
-One interesting aspect of the view given as an example, is that the respective calling string could serve as a global unique identifier for an underlying object. It is therefore a special kind of identifier which also happens to be an address, from where the latest version of the object can always be directly retrieved.
-
-=== Capabilities operation ===
-
-This operation was originally conceived in BioCASe to indicate the conceptual schemas used by datasources, as well as all mapped concepts from each schema. Although it can also be seen as metadata about services, it has been decided that two operations were more convenient, one with basic descriptions about the service (metadata) and another with technical information (capabilities). Some of the DiGIR metadata elements were moved to the capabilities operation described here. So the idea of a capabilities operation is to retrieve settings, available functionalities, and technical information about services.
-
-A capabilities request requires no parameters:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <capabilities/>
-</request>
-```
-
-A capabilities response includes several different sections:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response>
- <header>
- <!-- header specific elements -->
- </header>
- <capabilities>
- <schemas>
- <!-- conceptual schemas being used -->
- </schemas>
- <views>
- <!-- possible XML views from local data -->
- </views>
- <settings>
- <!-- general service settings -->
- </settings>
- <operators>
- <!-- supported operators in filters -->
- </operators>
- <structure>
- <!-- supported set of response structure language -->
- </structure>
- </capabilities>
-</response>
-```
-
-**Conceptual schemas and mapped concepts**
-
-This section should contain a list of conceptual schemas being used by the service, with two mandatory attributes: namespace and location. Each conceptual schema has a list of mapped concepts, each one with two optional attributes:
-
-- Searchable: indicating if the concept can be used inside filter expressions (defaults to true).
-
-- Mandatory: indicating if the concept should be present in all search responses (defaults to false).
-
-
-This section could look like:
-
-```
-<schemas>
- <conceptualSchema
- namespace="http://example.net/schemas/specimen/1.0"
- location="http://example.net/schemas/specimen/1.0/sp.xsd">
- <concept path="scientificName"/>
- <concept path="catalogNumber"/>
- <concept path="basis" searchable="false"/>
- <concept path="ipr" mandatory="true"/>
- </conceptualSchema>
-</schemas>
-```
-
-**Views**
-
-This section simply lists all local views that are available to be used by search and view requests. Each view has a unique name, and one of them should be selected as a default. Views are an optional feature for data providers.
-
-```
-<views default="specimen">
- <view name="specimen">
- <!-- view definition -->
- </view>
- <view name="collector">
- <!-- view definition -->
- </view>
-</views>
-```
-
-**General settings**
-
-There are at least five possible settings of interest to clients. All of them giving control to service providers to avoid server overload caused by requests for excessive amount of data:
-
-- minQueryTermLength: The minimum length of a wildcarded string used in "like" expressions.
-
-- maxElementRepetitions: The maximum number of repetitions allowed for any repeatable elements in responses (related only to the content section). It can also be used as a reference for paging.
-
-- maxElementLevels: The maximum number of levels allowed for responses (related only to the content section).
-
-- maxResponseTags: The maximum number of tags that can be returned in responses (related only to the content section).
-
-- maxResponseSize: The maximum size in kilobytes allowed to be returned in responses.
-
-
-This section could look like:
-
-```
-<settings>
- <minQueryTermLength>3</minQueryTermLength>
- <maxElementRepetitions>200</maxElementRepetitions>
- <maxElementLevels>30</maxElementLevels>
-</settings>
-```
-
-**Supported operators**
-
-This section is used to indicate which operators can be used by filter encodings. The operators are divided into three categories:
-
-- logical: used to indicate capability of dealing with "and", "or" and "not" operators.
-
-- comparative: can be used to indicate support to "in", "isNull", "like" and "basicComparativeOperators" (=,<,<=,>,>=).
-
-- basic arithmetic operators: used to indicate support to addition, subtraction, multiplication and division.
-
-
-This section could look like:
-
-```
-<operators>
- <logical/>
- <comparative>
- <basicComparativeOperators/>
- <in/>
- <isNull/>
- <like/>
- </comparative>
- <basicArithmeticOperators/>
-</operators>
-```
-
-**Supported response structure**
-
-This section is used to indicate which subset of the XML Schema language is supported by the service in order to interpret response structures. A minimal subset is required for all wrapper implementations, and it is represented by the tag <basicSchemaLanguage>. Additional features of the XML Schema language may be optionally supported by wrappers.
-
-This section could look like:
-
-```
-<structure>
- <basicSchemaLanguage/>
- <choice/>
-</structure>
-```
-
-Other features include <group>, <import>, <references>, <extension>, <restriction>, and <substitutiongroup>.
-
-=== Ping operation ===
-
-This new operation has been introduced to enable better monitoring of services, without needing to use a metadata request (which usually needs to connect to some local database). Requests and responses are quite simple and can be used to check reachability, response time, and also to check if wrapper software is properly installed.
-
-Sample ping request:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<request xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <ping/>
-</request>
-```
-
-Sample ping response:
-
-```
-<?xml version="1.0" encoding="utf-8" ?>
-<response xmlns="http://example.net/protocol/1.0">
- <header>
- <!-- header specific elements -->
- </header>
- <pong/>
- <diagnostics>
- <!-- diagnostics information -->
- </diagnostics>
-</response>
-```
-
-== Operation calls ==
-
-Concerning operations that can use an XML document as input, it has been agreed that communication with the service will be done through a single HTTP POST or HTTP GET parameter called "request" which must contain either the XML message according to the specified protocol or an URL pointing to the XML message. These operations include: "metadata", "capabilities", "inventory", "search", and "ping".
-
-Operations that do not require complex parameters, like "metadata", "capabilities" and "ping" can also be invoked by passing an HTTP POST or an HTTP GET parameter called "operation" whose value should be the name of the operation.
-
-View operations do not use an XML message in requests, so they are only invoked by HTTP POST or HTTP GET parameters including: "operation" (always equals to "view"), "name" (name of the view), "verbose" (optionally requests additional protocol elements), "start" (index of first object to be retrieved), and "limit" (total number of objects to be retrieved).
-
-The default operation when a service is invoked without any parameters is a "metadata" operation.
-
-== Known and possible limitations ==
-
-Since all efforts were concentrated in the integration process, the use of this protocol by other data exchange schemas that are being proposed as standards to the biodiversity community was not properly studied. Schemas like SDD and Taxonomic Concept Transfer Schema (TCS)[[http://www.soc.napier.ac.uk/tdwg/index.php]] require features that may or may not be addressed by this protocol when producing search responses, such as:
-
-- Returning recursive elements.
-
-- Returning inter-related elements (when an element references another in a different part of the structure)
-
-
-Because the suggested protocol accepts flexible response structures, it might be possible to use some combination of mapping technique and response structure definition that could produce the desired results to the aforementioned schemas. However, additional research is required to better investigate these issues.
-
-Since great part of biodiversity data has some geographic property, it would be highly desirable to support spatial operators in the protocol, as provided by the OGC CQL specification. However, the suggested protocol does not include any spatial operators in its current version.
-
-The proposed protocol also does not enable the definition and use of custom functions inside filters. This could be an interesting feature to improve even more filtering capabilities.
-
-When developing software to be compliant with the proposed protocol, the greatest challenge will probably be to dynamically generate responses to a requested custom structure with its own indexing element definition. When analyzing BioCASe experiences with dynamic response structures this seems to be feasible. Although the BioCASe protocol did not allow custom response structures, the internal way of storing them is very similar and has proven to work successfully.
-
-By using a similar implementation approach, the dynamic generation of SQL could be based on an acyclic connected graph representation of the database structure with tables' aliases as vertices and foreign/primary key relationships as edges. For a given response structure, a rooted tree representation of the graph would need to be extracted to be able to uniquely identify a path to the requested tables that could be used to generate dynamic SQL joins.
-
-Hierarchical and valid XML could be created with only one problem being identified regarding repeatable elements. If the relational database structure is different from the "relational" XML structure, the algorithm used for XML generation in some cases may have problems identifying the correct XML node that represents the "many" side of a relationship. Any repeatable node needs to be created in the XML result for each table record joined to the parental record related to preceding XML nodes. Problems could arise when there are cascading repeatable elements without any attributes or any other mapped sub elements in the top of the chain. In this case, an algorithm may not know in which node the repetition should happen. As the BioCASe system has fixed and locally configured structures used for responses, it was possible to use local provider configurations to solve this. However, some ideas were proposed to overcome this problem in a custom response structure scenario, including the possibility to directly map repeatable elements and therefore providing additional references to the algorithms. These ideas can possibly eliminate the problem in most but maybe not all situations. So there could be some specific types of structures that will not be supported by datasource services depending on the algorithm used and on the mapping provided.
-
-In case of denormalized databases an automatic normalization of the generated XML is not desired, so redundant data in the responded XML could therefore not be avoided. However, if denormalization is the result of using the same value of a concept for the whole datasource (such as global metadata concepts like collection name, institution name) it is possible to overcome the problem by allowing fixed values when mapping conceptual schemas. In this case, wrappers could easily avoid redundant information being returned.
-
-== Impact on software ==
-
-All changes and additions proposed will certainly impact every software that needs to deal with DiGIR or BioCASe protocols. All types of services will be affected.
-
-The least affected software are probably message brokers, which will need to stamp their IP address in headers and maybe handle the different error codes. In particular, DiGIR message brokers will need to address datasources by access points (affecting configuration and message routing).
-
-Client software from both sides will need to adapt themselves to changes in filter and response structure specification. However, response parsers will not have significant changes, except if they want to make use of new functionalities like inventories accepting multiple concepts, or the new view operation. But they will need to consider some changes made in element names.
-
-For the moment, DiGIR provider services will not be officially covered by the protocol. Their configuration and installation will need to be changed to also serve as a set of datasource services.
-
-DiGIR providers (when also seen as datasource services) and BioCASe wrappers will be the most affected software. Accepting and parsing the new filter and response structure specifications, producing results according to any valid given structure, and taking into account many other changes, although feasible in a short term, will demand considerable work. However, new features like the view operation, and inventory operations with multiple concepts will not produce much impact.
-
-Configurators, which are usually integrated with wrappers, will need to adapt themselves to the proposed service metadata and to additional configuration parameters available from the new protocol.
-
-Considering all these consequences, networks must carefully decide on a strategy to upgrade all their software components.
-
-== Migration strategy ==
-
-When choosing software to be changed, priority should be clearly given to DiGIR providers and BioCASe wrappers, which should all become datasources compliant with the new protocol. Recommendation when making changes to these components is to keep as much as possible compatibility with previous protocols, so that they can be gradually substituted without experiencing significant service downtime.
-
-Backwards compatibility can be achieved by including an additional verification on top of these components to detect the protocol being used by requests, and then addressing each message from different protocols by different libraries or procedures. This should ideally be done keeping the same access point, in case of installed BioCASe wrappers, so that other client software or message brokers from different networks accessing the same service could be upgraded at different times. Another possibility for BioCASe wrappers could be to install and configure new datasources as separate and parallel services, but at some point it would require updates in UDDI registries for networks that use them.
-
-DiGIR providers when acting on behalf of several datasources will necessarily have new access points for each current resource. Thematic DiGIR networks could gradually install and configure separate message brokers and client software, and then redirect their official addresses when all datasources have been upgraded.
-
-Tools to automate updates on configuration files will definitely be needed when installing new versions of datasources. If feasible, the best solution would be to allow the configuration parsing software (configurators or wrapper) to be able to read old and new versions of the configuration files and write out new versions when asked to. This reduces the number of independent software components needed.
-
-Another alternative is to develop proxy services that could serve as message translators between different protocols. However it may not be an easy task, since new features from the proposed protocol cannot be translated to the existing ones. And any features from existing protocols which are not present in the new one could make it difficult to perform translations in the opposite way, although this should not be the case here (only a few deprecated metadata elements have been removed).
-
-== Considerations about conceptual schemas ==
-
-The suggested protocol clearly separates conceptual schemas from possible views of mapped data. This fact provides great flexibility when conceiving conceptual schemas, which could be defined in several ways, not necessarily using the XML Schema language but at least using XML to benefit from the same parsing technology being used. Modularization and extension of conceptual schemas could be achieved by different means since datasources accept multiple conceptual schemas that are not bound to the protocol.
-
-The advantage of modularizing conceptual schemas is that they can be re-used by different networks that have only a partial intersection between their conceptual domains. When modularized, changes in one conceptual schema will not affect other parts of the conceptual domain. But from a data exchange perspective it doesn't matter if concepts are in the same schema or not, since they can be freely combined and aggregated by different views.
-
-However, it is still an open issue how to best represent and modularize conceptual schemas - this was not part of this work.
-
-The great flexibility provided by the protocol makes it possible for schemas like DarwinCore to choose between keeping its current approach, as a simple list of concept definitions, or not. Proper concept grouping and cardinalities when exchanging data can be achieved by XML views. And it is also possible for XML Schemas like ABCD to be modularized when seen as a conceptual schema.
-
-It is important to observe that if different conceptual schemas define the same concept using a different format, it may be difficult to achieve full interoperability between data providers even by using the same protocol. A classical example is related to how dates can be represented. Conceptual schemas that separate dates in three concepts (day, month, and year) like DarwinCore does, need to use more protocol features to represent a condition like: date >= '2002-05-19'. The associated filter encoding to represent this condition could be:
-
---newpage--
-```
-<or>
- <greaterThan>
- <concept path="c:year"/>
- <literal value="2002"/>
- </greaterThan>
- <and>
- <equals>
- <concept path="c:year"/>
- <literal value="2002"/>
- </equals>
- <or>
- <greaterThan>
- <concept path="c:month"/>
- <literal value="05"/>
- </greaterThan>
- <and>
- <equals>
- <concept path="c:month"/>
- <literal value="05"/>
- </equals>
- <greaterThanOrEquals>
- <concept path="c:day"/>
- <literal value="19"/>
- </greaterThanOrEquals>
- </and>
- </or>
- </and>
-</or>
-```
-
-While a conceptual schema defining dates using a single concept, like ABCD, would represent the same condition as:
-
-```
-<greaterThanOrEquals>
- <concept path="c:ISODateTime"/>
- <literal value="2002-05-19"/>
-</greaterThanOrEquals>
-```
-
-This example illustrates how differences between conceptual schema definitions may lead to interoperability issues even using the same protocol.
-
-The meaning of concepts should be as clear as possible when dealing with different conceptual schemas in order to avoid more problems. Another example could be a conceptual schema defining a concept called "ScientificName" as being the latest identification accepted for a specimen. Another conceptual schema could use the same name for a concept but with a different meaning, such as any identification of the specimen, not only the latest one. A response structure assuming a one-to-one relationship between specimen and identification would certainly get wrong results from the later concept definition.
-
-Sometimes cardinality can be embedded in a concept definition, such as a "collector" concept that may contain several collector names concatenated in a single string, whereas other conceptual schemas may have multiple occurrences of a single collector concept. Response structures must be aware of these differences to avoid more problems.
-
-So the next step to achieve full integration and interoperability between DiGIR and BioCASe networks resides on an agreement at the conceptual schema level, by using at least a fully compatible set of concepts; or even better, by adopting a common set of concepts which could be derived from a concept library or an ontology for biodiversity data.
-
-= Additional recommendations =
-
-During the whole integration process, several additional topics that are inter-related to the protocol being suggested have also been discussed. They vary from a list of recommended specifications and desirable tools to specific research areas that could be interesting to investigate. A summary about these discussions is presented in the next paragraphs.
-
-== Specifications ==
-
-As soon as an agreement is reached about the common protocol, one of the first things necessary is to produce a formal specification describing all aspects of the new protocol and including all necessary implementation details. It would also be highly desirable to produce additional specifications to get full advantage from the web services technology and to cover interoperation between other components that are part of the existing networks.
-
-=== WSDL service specification ===
-
-To be fully compliant with the web services definition it is necessary to have an XML description of the service interface and bindings. This is usually done through a WSDL document which formally specifies all possible interactions with the service. In theory, such a description could automate interaction between different services and could help on more sophisticated scientific workflows. A number of tools already exist which can take a WSDL document and automatically generate code in specific programming languages to communicate with the service. A WSDL document describing the suggested protocol could therefore provide additional means to facilitate development of network components as well as other tools.
-
-=== Provider and message broker services ===
-
-The protocol being proposed could be used almost in its complete form by other types of services that are typically part of the networks querying distributed data, namely providers and message brokers. These services are not officially covered by this work and they would likely deserve at least one additional specification.
-
-Having a set of specifications covering all services used by typical components from distributed query architectures would certainly facilitate the development of additional tools. Moreover, by using the same protocol these tools could potentially be shared by all networks.
-
-== Other suggested tools ==
-
-When upgrading the current code base to adjust it to the new protocol, and also when migrating the existing networks to use new software releases, some additional tools could be of great value.
-
-=== Service validators ===
-
-During the process of updating existing datasource software or producing new implementations conforming to a new protocol, it would be highly desirable to have an external tool capable of automatically checking protocol conformance and thus validating or not the implementation. XML validation is only a first step to verify correct functionality of a datasource service. There are many possible constructions for filters and response structures that could produce wrong responses even inside valid XML documents. These situations will demand careful attention and more elaborate tests.
-
-=== Automatic updates ===
-
-It could also be worthwhile to check if there is any tool that could be used to enable automatic updates on data provider software, or even if it would be feasible to develop such a tool. In this case, networks could potentially schedule the upgrade of all data providers' software in a short period of time. The same tool could be used not only during drastic network migration periods, but also during regular software updates that happen frequently.
-
-=== Proxy services ===
-
-Although translation between messages from the suggested protocol and messages from the existing ones may not be fully possible, proxy services could help on the gradual migration of networks. It would be necessary to carry out an initial study to assess how far and in which cases a translation service could successfully intermediate communication between the different protocols. Depending on the conclusions, development and use of proxy services could play an important role when upgrading the networks.
-
-== Additional researches ==
-
-Besides being a source for new ideas, some of the technologies considered during the integration process deserve more detailed investigation. Conclusions from such studies could drive future versions of the protocol to different directions.
-
-=== XQuark prototype ===
-
-As already mentioned in the XQuery section, a specific tool has been identified as a potential alternative for the protocol. To better assess its possible advantages and limitations compared to what has been proposed here and considering biodiversity networks' needs, it would be necessary to develop a prototype. The following items would require special attention:
-
-- Evaluate the possibility of having interfaces capable of generating XQuery statements.
-
-- Verify limitations and capabilities of the mapping language being used.
-
-- Verify how conceptual schemas are defined.
-
-- Evaluate the mapping interface.
-
-- Estimate the amount of work to extend the number of supported database drivers.
-
-- Conduct performance tests with large databases.
-
-- Check if there is a way for data providers to limit the amount of data returned per request.
-
-
-=== SOAP prototype ===
-
-To better evaluate advantages and limitations of basing the entire networks on the SOAP technology, it would also be necessary to develop a prototype service. As already mentioned, this service could be an additional layer on top of a message broker or on top of a comprehensive data provider. The following items would require special attention:
-
-- Verify interoperability issues, especially concerning complex data types like filters and response structures.
-
-- Conduct performance tests in comparison to existing services.
-
-
-Regardless the results, frequent studies about such tools and technologies would certainly bring new ideas and contribute to the improvement of existing networks.
-
-=== Ontologies ===
-
-Another wide research area concerns how to better represent conceptual schemas. Some of the conceptual schemas being used now are limited to property definitions with their corresponding data types. Richer representations including classes and relationships would certainly offer more possibilities, especially for networks that need to represent more complex situations, such as species interactions.
-
-A number of ontology languages are available with different levels of robustness and expressivity. Many have been created to enable the Semantic Web idea and are based on XML. Richer conceptual representations combined with richer mappings will be sooner or later necessary to biodiversity networks.
-
-= Final comments =
-
-For several reasons, the adoption of a common protocol by DiGIR and BioCASe networks can be considered a priority:
-
-- The number of different implementations that rely on DiGIR and BioCASe protocols has grown considerably in recent periods, making the upgrade of existing code base already difficult. Implementations tend to be bound to one of the existing protocols since the effort to produce software compliant with both is usually unaffordable. However, it is highly desirable for all software implementations to be fully interoperable. By using interoperable server software, data providers will be able to get more visibility. On the other hand, fully interoperable client software will be able to access a larger number of data providers. It is expected that any software implementation will prefer to use a common protocol, and therefore new implementations being planned could save considerable development time if a common protocol is adopted by the community.
-
-- New thematic networks are also being planned and implemented. As soon as there are available tools compliant with a common protocol they could be immediately used by those networks, avoiding subsequent upgrades on installed software.
-
-- The number of data providers, which is already considerable, keeps growing on a regular basis. The later a common protocol is supported by tools and adopted by networks, the more difficult it will be to migrate them in the future.
-
-
-The protocol being proposed could be supported by existing software on a relatively short term, so that networks could already begin a migration process. Although many features were included, most of them don't need to be implemented from the beginning. Greater levels of functionality can be gradually achieved without breaking conformance with the protocol.
-
-However, further discussions with the community are still necessary to reach a common agreement as to the protocol to be adopted. Whether accepted in its current form or not, this work has been done with the hope to serve as another step towards complete integration of biodiversity data services.
-
-= References =
-
-**ABCD:** Access to Biological Collection Data. Task Group web site:--breakline--
-http://bgbm3.bgbm.fu-berlin.de/TDWG/CODATA/default.htm
-
-**BioCASE:** Biological Collection Access Service for Europe. Web site:--breakline--
-http://www.biocase.org/ --breakline--
-Protocol XML Schema:--breakline--
-http://www.bgbm.org/biodivinf/Schema/protocol_1_3.xsd
-
-**CQL:** Open Geospatial Consortium, Common Catalogue Query Language, "Filter Encoding Implementation Specification", Panagiotis A. Vretanos, version 1.0.0, Adopted Specification, 19 September 2001.--breakline--
-http://www.opengis.org/docs/02-059.pdf
-
-**DarwinCore:** A set of data element definitions designed to search primary biodiversity data. Task Group web site:--breakline--
-http://darwincore.calacademy.org
-
-**DiGIR:** Distributed Generic Information Retrieval. Web site:--breakline--
-http://digir.net --breakline--
-Protocol XML Schema:--breakline--
-http://digir.net/schema/protocol/2003/1.0/digir.xsd
-
-**GML:** Open Geospatial Consortium, "Geography Markup Language Implementation Specification", Simon Cox, Paul Daisey, Ron Lake, Clemens Portele, Arliss Whiteside, version 3.00, 29 January 2003.--breakline--
-http://www.opengis.org/docs/02-023r4.pdf
-
-**SDD:** Structure of Descriptive Data. Task Group web site:--breakline--
-http://160.45.63.11/Projects/TDWG-SDD/
-
-**Semantic Web:** World Wide Web Consortium, "Semantic Web Activity".--breakline--
-http://www.w3.org/2001/sw/
-
-**SOAP:** World Wide Web Consortium, "SOAP Version 1.2 Part 1: Messaging Framework", W3C Recommendation, 24 June 2003.--breakline--
-http://www.w3.org/TR/soap12-part1/
-
-**TCS:** Taxonomic Concept Transfer Schema. Task Group web site:--breakline--
-http://www.soc.napier.ac.uk/tdwg/index.php
-
-**UDDI:** OASIS Standards Consortium, "Universal Description, Discovery, and Integration". Web site:--breakline--
-http://www.uddi.org/
-
-**WFS:** Open Geospatial Consortium, "Web Feature Service Implementation Specification", Panagiotis A. Vretanos, version 1.0.0, Adopted Specification, 19 September 2002.--breakline--
-http://www.opengis.org/docs/02-058.pdf
-
-**WSDL:** World Wide Web Consortium, "Web Services Description Language (WSDL) 1.1", W3C Note 15 March 2001.--breakline--
-http://www.w3.org/TR/wsdl.html
-
-**XML:** World Wide Web Consortium, "Extensible Markup Language (XML) 1.0 (Third Edition)", W3C Recommendation, 04 February 2004.--breakline--
-http://www.w3c.org/TR/REC-xml
-
-**XML Schema:** World Wide Web Consortium, "XML Schema specification (Part0: Primer, Part1: Structures, and Part2: Datatypes)", W3C Recommendation, May 2001.--breakline--
-http://www.w3.org/XML/Activity.html#schema-wg
-
-**XPath:** World Wide Web Consortium, "XML Path Language (XPath) Version 1.0", W3C Recommendation, 16 November 1999.--breakline--
-http://www.w3.org/TR/xpath
-
-**XQuery:** World Wide Web Consortium, "XQuery: A Query Language for XML", W3C Working Draft, 23 July 2004.--breakline--
-http://www.w3c.org/TR/xquery/
-
-
Deleted: trunk/protocol/report_korean_v0.1.pdf
===================================================================
(Binary files differ)
1
0
Dear all,
Many thanks for the useful feedback on the strawman paper. Chuck is right in that there is a broader issue - is there a form of GUID that will work for everything and is that essential?
I was working on the assumptions that TDWG had already chosen LSIDs as the favo(u)red flavo(u)r of GUID and that, yes, it makes sense to have the same one in as many domains as possible. Hence my attempt to push thinly-disguised LSIDs onto the BHL. However, if it is the case that there is no requirement for the same GUID scheme to be adopted across all domains, then the picture alters substantially and I would be more likely to press for DOIs for the reasons stated by Rod.
There is a BHL workshop coming up on June 12th and I would appreciate any guidance on what I should be encouraging BHL to do so that we can interoperate with the GBIF architecture and systems. I am also involved with the EU-funded project EDIT (European Distributed Institute of Taxonomy) who want a "virtual library" and I am keen that there should be essentially one system for BHL, EDIT and GBIF to find and retrieve published (and unpublished) literature and documentation.
I have a further question about granularity. I have just seen a demonstration by a major publisher of a system that separately indexes and allows retrieval of all images, diagrams, tables, figures etc. within journal articles. Naturally, this is very expensive and beyond the means of BHL, but has gone down a storm with researchers on trial and is an indicator of the future. Can LSIDs cope with that level of detail?
With thanks,
Neil
-------
Neil Thomson
Head of Data & Digital Systems
The Natural History Museum, Cromwell Road, London SW7 5BD
Tel: +44 (0)20 7942 5294,
Fax: +44 (0)20 7942 5559,
Email: n.thomson(a)nhm.ac.uk
http://www.nhm.ac.uk
2
1
My initial thought is that a FAN-type system is probably intended for
the uncommon/un-handled case where a thrid party needs to add
annotations to someone elses LSIDs. I would imagine that in most cases
linkages between data stored at different locations/authorities would be
specified in the RDF of each LSID. Eg one LSID metadata might have a
field that indicated it was a "correctedSpellingOf" another LSID.
Kevin
>>> Steven Perry <smperry(a)ku.edu> 05/19/06 4:14 PM >>>
I've been thinking about annotation, attribution, and LSID Foreign
Authority Notification and wanted to continue the discussions from last
month.
While I like the idea of notification of annotations, I'm not a fan of
FAN. In the first paragraph of the FAN proposal it states "An example
authority might return metadata service endpoints that contain metadata
(for example annotations) about LSIDs for which it is not the
authority".
I'll explore this idea using two example LSID authorities, A and B.
Assume that A has published some metadata identified by an LSID it
assigned (A is the original source) and that B wants to annotate it. B
could be doing so for one of three reasons: to suggest a revision to A's
metadata (correction), to propose additional information that
supplements A's metadata (annotation), or to intentionally confuse
clients and disrupt the system (spam).
A close reading of the FAN proposal makes me think that, according to
the authors, B will create assertions directly about A's metadata
(meaning B will use A's LSIDs as the subject of assertions). FAN
provides a method for B to notify A that it has done so and suggests a
method for telling clients of A's resolver to directly contact B (using
a modified form of resolution) to get additional information about A's
LSID. This leads me to believe that the authors think that a single
metadata object could be split between multiple authorities.
Because of this I see three problems with FAN. It increases the burden
on LSID resolution clients and perhaps on the LSID resolution system as
a whole, it makes attribution of assertions about an LSIDs indirect, and
it makes it easier to spam the network.
FAN-aware resolution is more intensive than normal resolution and
requires the following process: The client knows an LSID. It performs
normal resolution by querying DNS for the service record of the
authority, contacting the authority to find the correct endpoint, and
calling getMetadata() on that endpoint.. Next it must parse this
metadata and look for the special predicate which indicates that a
foreign authority contains more information about an LSID. Since the
client does not know if it has retrieved the entire metadata object
(some pieces may exist on other authorities), it must perform a modified
getMetadata() call for each foreign authority. To do so, the proposal
suggests that the client again queries DNS for the foreign authority
(because only the authority name is returned in the first resolution),
gets its endpoint, and calls getMetadata() with the original LSID
(though it was assigned by a different authority). The implied final
step is that the metadata from the foreign authority is merged with the
metadata from the authoritative source before being handed back to
calling code.
The assumption that a metadata object *could* be split across multiple
authorities forces the client to query all foreign authorities, even if
those authorities only contain annotations that the client is not
interested in. To look at an example case of names and specimens, it
might be that we want notification of usage of names by specimens so we
can immediately retrieve all specimens for a name. Assume A issues a
name and B issues a specimen.
Under FAN, the B authority could add an assertion to their data store
that says [urn:lsid:A:names:1 hasSpecimen urn:lsid:B:specimens:3]. I
consider this an annotation (rather than a revision, modification, or
correction) because the fact that B linked a specimen to a name does not
change the semantic meaning of the name object (it merely provides
additional information). If you're interested in just the name and you
don't know by resolving it against A that you've got the complete set of
name metadata, you're forced to get a bunch of information about
specimens that you're not interested in (and there could be a great deal
of this kind of information).
To complicate matters, C might want to propose a correction to A. C
might believe that A misspelled the name. Consider this simple example:
A's original assertion:
urn:lsid:A:names:1 fullScientificName "Salmo saler, Linnaeus,
1758"
C's proposed modification:
urn:lsid:A:names:1 fullScientificName "Salmo salar, Linnaeus,
1758"
Under FAN, C would propose its modification and notify A that it did
so. When a FAN-aware client goes to resolve A:names:1, it will get back
some data that changes the meaning of A's original object (C's
assertion) and a lot of data that's not directly about it (all the
specimen assertions). What's worse, it now has both of these assertions
in the same model:
urn:lsid:A:names:1 fullScientificName "Salmo saler, Linnaeus,
1758"
urn:lsid:A:names:1 fullScientificName "Salmo salar, Linnaeus,
1758"
If the names class is defined with fullScientificName as a functional
property (can have only one assertion about the property on an instance)
we're in trouble and we no longer have a valid OWL instance. So the FAN
merge process has to know both that this is a problem, by understanding
the ontology for names, and how to fix it, by deciding whether to trust
C or not and select one or the other assertion but not both.
How does the client decide to trust or distrust C?. The FAN proposal
suggests that the client gets a semantic description of C and uses that
to make the decision. I think it more likely that the code that calls
the client on behalf of the user should decide whether or not to trust
C.
So to fix all this we could propose that the FAN-aware client has to be
triply smart. In addition to having to know how to deal with foreign
authorities it should provide 1.) some mechanism for getting information
about a foreign authority and deciding whether to trust it or not, 2.)
some mechanism for conditionally merging the foreign metadata it cares
about (proposed modifications for example) while discarding other
information (specimens), and 3.) the ability to know whether or not its
merge process violates the integrity of the metadata. This seems to be
quite a burden. I think it will add not only complexity to LSID
resolution, but also slow it down tremendously. LSID resolution is a
low-level data identification and access mechanism. Since so many other
services will build upon it, it ought to be fast.
That brings up the idea of attribution. In FAN, the attribution of an
assertion is not directly tied to the authority of the LSID that acts as
the subject of the assertion. Instead, since foreign authorities are
allowed to make statements directly about an LSID issued by someone
else, the resolution client has to deduce attribution of each assertion
during the merge process (using the foreign authority predicate as
evidence). After the merge process is complete, downstream processes
will have lost this attribution information.
Finally, FAN opens the door to spammers because attribution is not
directly tied to the authority name. If I trust A and I resolve the
name id above I could get back information from C. It could easily be
the case that I don't trust C (because I believe it to be a spammer) but
that my LSID resolution client doesn't know this fact. The end user of
the client may be using a FAN-enabled resolver or not and never know.
The FAN proposal suggests that clients be notified, but what is your
average piece of software that calls a resolution client on behalf of a
user to do with this information?
I propose we take an alternative approach. I like the notification
system in FAN, but I dislike the assumption about distributed objects
and indirect attribution. I would rather we agree on an annotation and
linking system that holds the following to be true:
1.) An authority is never allowed to make an assertion whose subject is
an LSID that it is not authoritative for.
2.) LSIDs authoritatively attribute an assertion to the organization
that owns the fully qualified authority domain (this includes sub
domains to support the case of hosted authority services)
3.) Objects are not distributed across authorities but instead link to
other first-class objects that may exist within other authorities.
4.) Annotations, corrections, etc. are valuable, and as such should be
first class objects that are attributed to their issuers by the normal
process of 2 above. This is annotation as linking; the subject of the
annotation is the LSID of the annotation, it is attributed to the
authority of the issuer, and an agreed upon predicate is used to define
the object that acts as domain of an annotation (what it applies to).
This guarantees correct attribution of annotations throughout the
network.
5.) We cannot stop spammers from publishing malicious assertions but we
can validate a given assertion acquired from some source external to the
LSID system by resolving the LSID and checking the resulting metadata
against the assertion. Spammers will find it difficult to circumvent
this method because it requires hijacking the DNS system. If you trust
that a source gave you a complete and full copy of an object you don't
have to resolve its LSID.
6.) We will often have to use systems that are external to LSID in
order to find the data/metadata we're interested in (at the very least
to identify LSIDs to resolve). These other systems have to understand
how to attribute assertions to their owners and can do so using the
rules above without relying on metadata about a foreign authority.
7.) We don't have to assume (as FAN does) that an authority is
responsible for providing all information in the universe that relates
in any way to its LSIDs
8.) If you get back invalid metadata from an authority (not incorrect,
but metadata that, for example, violates a functional property
restriction), you automatically know who is at fault (the organization
that owns the authority name).
9.) A FAN style notification system could still be used for logging and
tracking the use of one's data or for proposing corrections to and
issuer who might want to approve and incorporate those suggestions into
their published data.
So we might use a FAN-like system, but only for foreign annotation
notification, not foreign authority notification.
Any comments? Have I misunderstood the assumptions behind FAN?
-Steve
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID(a)mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error. If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.
The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1
0
I've been thinking about annotation, attribution, and LSID Foreign
Authority Notification and wanted to continue the discussions from last
month.
While I like the idea of notification of annotations, I'm not a fan of
FAN. In the first paragraph of the FAN proposal it states "An example
authority might return metadata service endpoints that contain metadata
(for example annotations) about LSIDs for which it is not the authority".
I'll explore this idea using two example LSID authorities, A and B.
Assume that A has published some metadata identified by an LSID it
assigned (A is the original source) and that B wants to annotate it. B
could be doing so for one of three reasons: to suggest a revision to A's
metadata (correction), to propose additional information that
supplements A's metadata (annotation), or to intentionally confuse
clients and disrupt the system (spam).
A close reading of the FAN proposal makes me think that, according to
the authors, B will create assertions directly about A's metadata
(meaning B will use A's LSIDs as the subject of assertions). FAN
provides a method for B to notify A that it has done so and suggests a
method for telling clients of A's resolver to directly contact B (using
a modified form of resolution) to get additional information about A's
LSID. This leads me to believe that the authors think that a single
metadata object could be split between multiple authorities.
Because of this I see three problems with FAN. It increases the burden
on LSID resolution clients and perhaps on the LSID resolution system as
a whole, it makes attribution of assertions about an LSIDs indirect, and
it makes it easier to spam the network.
FAN-aware resolution is more intensive than normal resolution and
requires the following process: The client knows an LSID. It performs
normal resolution by querying DNS for the service record of the
authority, contacting the authority to find the correct endpoint, and
calling getMetadata() on that endpoint.. Next it must parse this
metadata and look for the special predicate which indicates that a
foreign authority contains more information about an LSID. Since the
client does not know if it has retrieved the entire metadata object
(some pieces may exist on other authorities), it must perform a modified
getMetadata() call for each foreign authority. To do so, the proposal
suggests that the client again queries DNS for the foreign authority
(because only the authority name is returned in the first resolution),
gets its endpoint, and calls getMetadata() with the original LSID
(though it was assigned by a different authority). The implied final
step is that the metadata from the foreign authority is merged with the
metadata from the authoritative source before being handed back to
calling code.
The assumption that a metadata object *could* be split across multiple
authorities forces the client to query all foreign authorities, even if
those authorities only contain annotations that the client is not
interested in. To look at an example case of names and specimens, it
might be that we want notification of usage of names by specimens so we
can immediately retrieve all specimens for a name. Assume A issues a
name and B issues a specimen.
Under FAN, the B authority could add an assertion to their data store
that says [urn:lsid:A:names:1 hasSpecimen urn:lsid:B:specimens:3]. I
consider this an annotation (rather than a revision, modification, or
correction) because the fact that B linked a specimen to a name does not
change the semantic meaning of the name object (it merely provides
additional information). If you're interested in just the name and you
don't know by resolving it against A that you've got the complete set of
name metadata, you're forced to get a bunch of information about
specimens that you're not interested in (and there could be a great deal
of this kind of information).
To complicate matters, C might want to propose a correction to A. C
might believe that A misspelled the name. Consider this simple example:
A's original assertion:
urn:lsid:A:names:1 fullScientificName "Salmo saler, Linnaeus, 1758"
C's proposed modification:
urn:lsid:A:names:1 fullScientificName "Salmo salar, Linnaeus, 1758"
Under FAN, C would propose its modification and notify A that it did
so. When a FAN-aware client goes to resolve A:names:1, it will get back
some data that changes the meaning of A's original object (C's
assertion) and a lot of data that's not directly about it (all the
specimen assertions). What's worse, it now has both of these assertions
in the same model:
urn:lsid:A:names:1 fullScientificName "Salmo saler, Linnaeus, 1758"
urn:lsid:A:names:1 fullScientificName "Salmo salar, Linnaeus, 1758"
If the names class is defined with fullScientificName as a functional
property (can have only one assertion about the property on an instance)
we're in trouble and we no longer have a valid OWL instance. So the FAN
merge process has to know both that this is a problem, by understanding
the ontology for names, and how to fix it, by deciding whether to trust
C or not and select one or the other assertion but not both.
How does the client decide to trust or distrust C?. The FAN proposal
suggests that the client gets a semantic description of C and uses that
to make the decision. I think it more likely that the code that calls
the client on behalf of the user should decide whether or not to trust C.
So to fix all this we could propose that the FAN-aware client has to be
triply smart. In addition to having to know how to deal with foreign
authorities it should provide 1.) some mechanism for getting information
about a foreign authority and deciding whether to trust it or not, 2.)
some mechanism for conditionally merging the foreign metadata it cares
about (proposed modifications for example) while discarding other
information (specimens), and 3.) the ability to know whether or not its
merge process violates the integrity of the metadata. This seems to be
quite a burden. I think it will add not only complexity to LSID
resolution, but also slow it down tremendously. LSID resolution is a
low-level data identification and access mechanism. Since so many other
services will build upon it, it ought to be fast.
That brings up the idea of attribution. In FAN, the attribution of an
assertion is not directly tied to the authority of the LSID that acts as
the subject of the assertion. Instead, since foreign authorities are
allowed to make statements directly about an LSID issued by someone
else, the resolution client has to deduce attribution of each assertion
during the merge process (using the foreign authority predicate as
evidence). After the merge process is complete, downstream processes
will have lost this attribution information.
Finally, FAN opens the door to spammers because attribution is not
directly tied to the authority name. If I trust A and I resolve the
name id above I could get back information from C. It could easily be
the case that I don't trust C (because I believe it to be a spammer) but
that my LSID resolution client doesn't know this fact. The end user of
the client may be using a FAN-enabled resolver or not and never know.
The FAN proposal suggests that clients be notified, but what is your
average piece of software that calls a resolution client on behalf of a
user to do with this information?
I propose we take an alternative approach. I like the notification
system in FAN, but I dislike the assumption about distributed objects
and indirect attribution. I would rather we agree on an annotation and
linking system that holds the following to be true:
1.) An authority is never allowed to make an assertion whose subject is
an LSID that it is not authoritative for.
2.) LSIDs authoritatively attribute an assertion to the organization
that owns the fully qualified authority domain (this includes sub
domains to support the case of hosted authority services)
3.) Objects are not distributed across authorities but instead link to
other first-class objects that may exist within other authorities.
4.) Annotations, corrections, etc. are valuable, and as such should be
first class objects that are attributed to their issuers by the normal
process of 2 above. This is annotation as linking; the subject of the
annotation is the LSID of the annotation, it is attributed to the
authority of the issuer, and an agreed upon predicate is used to define
the object that acts as domain of an annotation (what it applies to).
This guarantees correct attribution of annotations throughout the network.
5.) We cannot stop spammers from publishing malicious assertions but we
can validate a given assertion acquired from some source external to the
LSID system by resolving the LSID and checking the resulting metadata
against the assertion. Spammers will find it difficult to circumvent
this method because it requires hijacking the DNS system. If you trust
that a source gave you a complete and full copy of an object you don't
have to resolve its LSID.
6.) We will often have to use systems that are external to LSID in
order to find the data/metadata we're interested in (at the very least
to identify LSIDs to resolve). These other systems have to understand
how to attribute assertions to their owners and can do so using the
rules above without relying on metadata about a foreign authority.
7.) We don't have to assume (as FAN does) that an authority is
responsible for providing all information in the universe that relates
in any way to its LSIDs
8.) If you get back invalid metadata from an authority (not incorrect,
but metadata that, for example, violates a functional property
restriction), you automatically know who is at fault (the organization
that owns the authority name).
9.) A FAN style notification system could still be used for logging and
tracking the use of one's data or for proposing corrections to and
issuer who might want to approve and incorporate those suggestions into
their published data.
So we might use a FAN-like system, but only for foreign annotation
notification, not foreign authority notification.
Any comments? Have I misunderstood the assumptions behind FAN?
-Steve
2
1
Rod,
Are you sure we're grown ups? I'm still working on that.
I think things certainly can be simplified if we allow ourselves to accept some imperfections and focus on getting something done that works.
Seems to me there is a really broad point to consider. Does the same GUID approach work for names, specimens and publications? DOI doesn't work well for names. But, LSID doesn't work well for publications.
Chuck
-----Original Message-----
From: Roderic Page [mailto:r.page@bio.gla.ac.uk]
Sent: Thursday, May 18, 2006 1:09 PM
To: tdwg-guid(a)mailman.nhm.ku.edu
Cc: Neil Thomson
Subject: Re: [Tdwg-guid] "Publication Bank", LSIDs and the BHL
Some quick, ill thought out thoughts:
The whole "semantically opaque" argument is probably overblown, in that
it overlooks one advantage of semantic content -- it helps debugging.
If an identifier breaks, it can help to have an identifier that is
interpretable. Furthermore, pretty much any GUID in use on the web has
loads of implicit semantic content, even DOIs. So unless we want GUIDs
like A4DA1824-E695-11DA-9693-000D93425524, I suggest we let this red
herring drop.
The semantically opaque argument seems to me to be a statement that we
can't rely on any interpretation we may put on the identifier string.
That's fine, but in practice if there are semantics they can be useful,
so long as we're grown ups and realise it might break. There's an
interesting discussion related to this at
http://lists.w3.org/Archives/Public/public-swls-ws/2004Sep/att-0013/
MayoBMIPosition.html.
I'm puzzled that Handles weren't mentioned, simply because they are (a)
free, and (b) already in use in digital library projects (e.g.,
http://digitallibrary.amnh.org/dspace/). Indeed, much as I love LSIDs,
and at the risk of opening a can of worms, if I were setting up a
digital literature repository, LSIDs would be last on my list of GUIDs.
I'd look seriously at DOIs (and talk to the DOI people about what it
would actually cost, and figure out how come Germany can make this free
for scientists -- http://www.std-doi.de/front_content.php), then look
at handles (which some BHL members are already using), then LSIDs. My
argument for using DOIs would be the added value from CrossRef -- you'd
get immediate integration into electronic publishing (i.e., linkable
references), and isn't that part of the goal...?
One complication about DOIs is whether there should be only one DOI for
a publication, issued by the copyright holder/publisher. I don't mean
one can't have DOIs for parts, just that are there issues if, say,
Springer has a DOI for a publication and so does BHL (or anybody else).
Lastly, IMHO any GUID that is not resolvable now is a waste of time, so
I see no value in an identifier like urn:bhl:...
Regards
Rod
On 18 May 2006, at 15:56, Donald Hobern wrote:
> After my post yesterday, Neil Thomson forwarded me this paper he has
> prepared for the Biodiversity Heritage Library on their need for
> GUIDs. His document ends by explaining how BHL identifiers could end
> up looking "a bit like an LSID". Obviously I would recommend that
> they should actually just be LSIDs, but Neil is interested in any
> feedback on this paper and it clearly relates very much to all that we
> discussed under the general heading of "Publication Bank".
>
> Thanks,
>
> Donald
>
> --
>
> ---------------------------------------------------------------
> Donald Hobern (dhobern(a)gbif.org)
> Programme Officer for Data Access and Database Interoperability
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
> ---------------------------------------------------------------
>
> <GUIDs-BHL-2-1.doc>_______________________________________________
> TDWG-GUID mailing list
> TDWG-GUID(a)mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>
------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page(a)bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
_______________________________________________
TDWG-GUID mailing list
TDWG-GUID(a)mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
1
0
Hi,
I am new to these discussions so I am sorry if am not interpreting
things correctly.
I think of a species as a concept. The label for that concept changes when
the name changes but the idea that this is a species remains.
So if I want to represent the species Aedes triseriatus, I would mark
it up as a class.
Since I am focusing on species found in Wisconsin I might consider
having the Wisconsin Aedes triseriatus be a subclass of the North American
Subclass of Aedes triseriatus which is a subclass of all Aedes triseriatus.
If the label for Aedes triseriatus changes to Ochlerotatus triseriatus, the
species concept is still the same. What has changed is that others feel
that this species relationship to other species is different than what they
thougth previously.
In looking over these discussions it would seem that if species should
be represented as classes then lsid's should not be used to represent them.
It would be ok, however, to represent one individual Ochlerotatus
triseriatus using
an lsid.
This makes me wonder if the descriptors used for this species should
be represented as classes such as
PalpiBanded
PalipiUnbanded
Then you would be able to train a computer to assign a given image
to either the set of PalpiBanded or PalpiUnbanded.
It also makes me think that the best way to represent the Wisconsin
species is with a unique URL that includes metadata pointing to
the other representations of this concept at the uBioNamebank
NCBI etc.
A species concept url could be something like
www.ex.org/SpeciesConcepts/549173 -> points to RDF, Web, uBio, NCBI
A character concept url could be something like
www.ex.org/CharacterConcepts/2343234 <- PalpiBanded
www.ex.org/CharacterConcepts/2342945 <- PalliUnbanded
A given image might be marked up as an instance of PalpiBanded
This would allow a set of PalpiBanded images to be created and
accessed making imaged based computer identification easier.
So far I am thinking the following structure
/TaxonConcepts/Kingdom
/TaxonConcepts/Phylum
...
/SpeciesConcepts/549173 <- use number since name changes? i.e. concept
of Aedes triseriatus
/CharacterConcepts/2343234 <- number? for a particular character state
/HabitatConcepts/Boreal_Forest <- points to specific definition, examples
/BioticIndexConcepts/9 <- points to definition of a Biotic Index of 9, examples
Am I thinking this through correctly?
Thanks in Advance :-)
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries(a)wisc.edu
---------------------------------------------------------------
1
0
4
3
1
0
[tdwg-tapir] Schema modification: r533 - in trunk/protocol: . examples
by tdwg-tapir@lists.tdwg.org 17 May '06
by tdwg-tapir@lists.tdwg.org 17 May '06
17 May '06
Author: markus
Date: 2006-05-17 16:22:41 +0200 (Wed, 17 May 2006)
New Revision: 533
Modified:
trunk/protocol/examples/metadata_response.xml
trunk/protocol/tapir.xsd
Log:
custom metadata extensions are optional. updated exmaple to validate
Modified: trunk/protocol/examples/metadata_response.xml
===================================================================
--- trunk/protocol/examples/metadata_response.xml 2006-05-17 14:04:27 UTC (rev 532)
+++ trunk/protocol/examples/metadata_response.xml 2006-05-17 14:22:41 UTC (rev 533)
@@ -1,28 +1,31 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
-TAPIR Metadata Response example. This example only includes mandatory concepts.
-The metadata part in TAPIR is likely to be changed after comunity discussion.
+TAPIR Metadata Response example.
-->
<response xmlns="http://res.tdwg.org/tapir/1.0"
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:dct="http://purl.org/dc/terms/"
+ xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
+ xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://res.tdwg.org/tapir/1.0 http://ww2.biocase.org/svn/tapir/trunk/protocol/tapir.xsd">
- <header>
+ <header xmlns="">
<source accesspoint="http://accesspoint_url" sendtime="2005-11-11T12:23:56.023+01:00">
<software name="TAPIR_Implementation" version="1.0"/>
</source>
</header>
<metadata>
<dc:title>Australian Fruit Flies</dc:title>
- <dc:type>http://purl.org/dc/dcmitype/Service<dc:type>
- <accesspoint>http://www.flies.au/pywrapper/flies</accesspoint>
+ <dc:type>http://purl.org/dc/dcmitype/Service</dc:type>
+ <accesspoint xmlns="">http://www.flies.au/pywrapper/flies</accesspoint>
<dc:description>Australian fruit flies collection of georeferenced records observations</dc:description>
<dc:language>EN</dc:language>
<dc:subject>fuit flies observation australia diptera Tephritidae</dc:subject>
<dct:bibliographicCitation>Australian Fruit Flies TAPIR provider</dct:bibliographicCitation>
<dc:rights>free</dc:rights>
<dct:modified>2006-01-01T00:00:00</dct:modified>
- <indexingPreferences startTime="09:30:00Z" maxDuration="PT1H" frequency="P1M" />
- <relatedEntity>
+ <indexingPreferences xmlns="" startTime="09:30:00Z" maxDuration="PT1H" frequency="P1M" />
+ <relatedEntity xmlns="">
<role>technical host</role>
<entity>
<identifier>someGUID</identifier>
@@ -31,19 +34,19 @@
<logoURL>http://somehost/entitylogo.png</logoURL>
<description>Description about the TAPIR working group</description>
<relatedInformation>http://somehost/moreinfo</relatedInformation>
- <geo:Point>
- <geo:lat>45.256</geo:lat>
- <geo:long>-71.92</geo:long>
- </geo:Point>
<hasContact>
<role>system administrator</role>
<vcard:VCARD>
<vcard:FN>Frank Sinatra</vcard:FN>
<vcard:TITLE>Main singer</vcard:TITLE>
+ <vcard:TEL>some phone number</vcard:TEL>
<vcard:EMAIL>f.sinatra(a)tapir.com</vcard:EMAIL>
- <vcard:TEL>some phone number</vcard:TEL>
</vcard:VCARD>
</hasContact>
+ <geo:Point>
+ <geo:lat>45.256</geo:lat>
+ <geo:long>-71.92</geo:long>
+ </geo:Point>
</entity>
</relatedEntity>
</metadata>
Modified: trunk/protocol/tapir.xsd
===================================================================
--- trunk/protocol/tapir.xsd 2006-05-17 14:04:27 UTC (rev 532)
+++ trunk/protocol/tapir.xsd 2006-05-17 14:22:41 UTC (rev 533)
@@ -1478,7 +1478,7 @@
supplier</xsd:documentation>
</xsd:annotation>
</xsd:element>
- <xsd:element ref="custom"/>
+ <xsd:element ref="custom" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="relatedEntityInformationType">
@@ -1574,7 +1574,7 @@
<xsd:documentation>Location of the entity in decimal WGS84 latitude and longitude (and optional altitude) as defined by the W3C Basic Geo Vocabulary</xsd:documentation>
</xsd:annotation>
</xsd:element>
- <xsd:element ref="custom"/>
+ <xsd:element ref="custom" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute ref="lang"/>
</xsd:complexType>
1
0
[tdwg-tapir] Schema modification: r532 - in trunk/protocol: . examples
by tdwg-tapir@lists.tdwg.org 17 May '06
by tdwg-tapir@lists.tdwg.org 17 May '06
17 May '06
Author: markus
Date: 2006-05-17 16:04:27 +0200 (Wed, 17 May 2006)
New Revision: 532
Added:
trunk/protocol/w3c_geo.xsd
Removed:
trunk/protocol/georss.xsd
trunk/protocol/gmlgeorss.xsd
trunk/protocol/xlink.xsd
Modified:
trunk/protocol/dcterms.xsd
trunk/protocol/examples/metadata_response.xml
trunk/protocol/tapir.xsd
trunk/protocol/vcard.xsd
Log:
Hopefully the final modification to the metadata response in tapir.
Instead of GeoRSS the W3C Basic Geo Vocabulary is used. All external schemas are included in the repository with a simplified version that supports everything needed by TAPIR. This way we dont need to parse huge schemas and follow endless import links.
Modified: trunk/protocol/dcterms.xsd
===================================================================
--- trunk/protocol/dcterms.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/dcterms.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -1,331 +1,112 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
- xmlns:dc="http://purl.org/dc/elements/1.1/"
- xmlns:dcmitype="http://purl.org/dc/dcmitype/"
- targetNamespace="http://purl.org/dc/terms/"
- xmlns="http://purl.org/dc/terms/"
- elementFormDefault="qualified"
- attributeFormDefault="unqualified">
-
- <xs:annotation>
- <xs:documentation xml:lang="en">
- DCterms XML Schema
- XML Schema for http://purl.org/dc/terms/ namespace
-
- Created 2003-04-02
-
- Created by
-
- Tim Cole (t-cole3(a)uiuc.edu)
- Tom Habing (thabing(a)uiuc.edu)
- Jane Hunter (jane(a)dstc.edu.au)
- Pete Johnston (p.johnston(a)ukoln.ac.uk),
- Carl Lagoze (lagoze(a)cs.cornell.edu)
-
- This schema declares XML elements for the DC elements and
- DC element refinements from the http://purl.org/dc/terms/ namespace.
-
- It reuses the complexType dc:SimpleLiteral, imported from the dc.xsd
- schema, which permits simple element content, and makes the xml:lang
- attribute available.
-
- This complexType permits the derivation of other complexTypes
- which would permit child elements.
-
- DC elements are declared as substitutable for the abstract element dc:any, and
- DC element refinements are defined as substitutable for the base elements
- which they refine.
-
- This means that the default type for all XML elements (i.e. all DC elements and
- element refinements) is dc:SimpleLiteral.
-
- Encoding schemes are defined as complexTypes which are restrictions
- of the dc:SimpleLiteral complexType. These complexTypes restrict
- values to an appropriates syntax or format using data typing,
- regular expressions, or enumerated lists.
-
- In order to specify one of these encodings an xsi:type attribute must
- be used in the instance document.
-
- Also, note that one shortcoming of this approach is that any type can be
- applied to any of the elements or refinements. There is no convenient way
- to restrict types to specific elements using this approach.
-
- </xs:documentation>
-
- </xs:annotation>
-
-
- <xs:import namespace="http://www.w3.org/XML/1998/namespace"
- schemaLocation="http://www.w3.org/2001/03/xml.xsd">
- </xs:import>
-
- <xs:import namespace="http://purl.org/dc/elements/1.1/"
- schemaLocation="dc.xsd"/>
-
- <xs:import namespace="http://purl.org/dc/dcmitype/"
- schemaLocation="dcmitype.xsd"/>
-
- <xs:element name="alternative" substitutionGroup="dc:title"/>
-
- <xs:element name="tableOfContents" substitutionGroup="dc:description"/>
- <xs:element name="abstract" substitutionGroup="dc:description"/>
-
- <xs:element name="created" substitutionGroup="dc:date"/>
- <xs:element name="valid" substitutionGroup="dc:date"/>
- <xs:element name="available" substitutionGroup="dc:date"/>
- <xs:element name="issued" substitutionGroup="dc:date"/>
- <xs:element name="modified" substitutionGroup="dc:date"/>
- <xs:element name="dateAccepted" substitutionGroup="dc:date"/>
- <xs:element name="dateCopyrighted" substitutionGroup="dc:date"/>
- <xs:element name="dateSubmitted" substitutionGroup="dc:date"/>
-
- <xs:element name="extent" substitutionGroup="dc:format"/>
- <xs:element name="medium" substitutionGroup="dc:format"/>
-
- <xs:element name="isVersionOf" substitutionGroup="dc:relation"/>
- <xs:element name="hasVersion" substitutionGroup="dc:relation"/>
- <xs:element name="isReplacedBy" substitutionGroup="dc:relation"/>
- <xs:element name="replaces" substitutionGroup="dc:relation"/>
- <xs:element name="isRequiredBy" substitutionGroup="dc:relation"/>
- <xs:element name="requires" substitutionGroup="dc:relation"/>
- <xs:element name="isPartOf" substitutionGroup="dc:relation"/>
- <xs:element name="hasPart" substitutionGroup="dc:relation"/>
- <xs:element name="isReferencedBy" substitutionGroup="dc:relation"/>
- <xs:element name="references" substitutionGroup="dc:relation"/>
- <xs:element name="isFormatOf" substitutionGroup="dc:relation"/>
- <xs:element name="hasFormat" substitutionGroup="dc:relation"/>
- <xs:element name="conformsTo" substitutionGroup="dc:relation"/>
-
- <xs:element name="spatial" substitutionGroup="dc:coverage"/>
- <xs:element name="temporal" substitutionGroup="dc:coverage"/>
-
- <xs:element name="audience" substitutionGroup="dc:any"/>
-
- <xs:element name="mediator" substitutionGroup="audience"/>
- <xs:element name="educationLevel" substitutionGroup="audience"/>
-
- <xs:element name="accessRights" substitutionGroup="dc:rights"/>
-
- <xs:element name="bibliographicCitation" substitutionGroup="dc:identifier"/>
-
- <xs:complexType name="LCSH">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="MESH">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="DDC">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="LCC">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="UDC">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="Period">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="W3CDTF">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="DCMIType">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="dcmitype:DCMIType"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="IMT">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="URI">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:anyURI"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="ISO639-2">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="RFC1766">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:language"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="RFC3066">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:language"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="Point">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="ISO3166">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="Box">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:complexType name="TGN">
- <xs:simpleContent>
- <xs:restriction base="dc:SimpleLiteral">
- <xs:simpleType>
- <xs:restriction base="xs:string"/>
- </xs:simpleType>
- <xs:attribute ref="xml:lang" use="prohibited"/>
- </xs:restriction>
- </xs:simpleContent>
- </xs:complexType>
-
- <xs:group name="elementsAndRefinementsGroup">
- <xs:annotation>
- <xs:documentation xml:lang="en">
- This group is included as a convenience for schema authors
- who need to refer to all the DC elements and element refinements
- in the http://purl.org/dc/elements/1.1/ and
- http://purl.org/dc/terms namespaces.
- N.B. Refinements available via substitution groups.
- </xs:documentation>
- </xs:annotation>
-
- <xs:sequence>
- <xs:choice minOccurs="0" maxOccurs="unbounded">
- <xs:element ref="dc:any" />
- </xs:choice>
- </xs:sequence>
- </xs:group>
-
- <xs:complexType name="elementOrRefinementContainer">
- <xs:annotation>
- <xs:documentation xml:lang="en">
- This is included as a convenience for schema authors who need to define a root
- or container element for all of the DC elements and element refinements.
- </xs:documentation>
- </xs:annotation>
-
- <xs:choice>
- <xs:group ref="elementsAndRefinementsGroup"/>
- </xs:choice>
- </xs:complexType>
-
-
-</xs:schema>
+<?xml version="1.0" encoding="UTF-8"?>
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:dcmitype="http://purl.org/dc/dcmitype/"
+ targetNamespace="http://purl.org/dc/terms/"
+ xmlns="http://purl.org/dc/terms/"
+ elementFormDefault="qualified"
+ attributeFormDefault="unqualified">
+
+ <xs:annotation>
+ <xs:documentation xml:lang="en">
+ DCterms XML Schema
+ XML Schema for http://purl.org/dc/terms/ namespace
+
+ Created 2003-04-02
+
+ Created by
+
+ Tim Cole (t-cole3(a)uiuc.edu)
+ Tom Habing (thabing(a)uiuc.edu)
+ Jane Hunter (jane(a)dstc.edu.au)
+ Pete Johnston (p.johnston(a)ukoln.ac.uk),
+ Carl Lagoze (lagoze(a)cs.cornell.edu)
+
+ This schema declares XML elements for the DC elements and
+ DC element refinements from the http://purl.org/dc/terms/ namespace.
+
+ It reuses the complexType dc:SimpleLiteral, imported from the dc.xsd
+ schema, which permits simple element content, and makes the xml:lang
+ attribute available.
+
+ This complexType permits the derivation of other complexTypes
+ which would permit child elements.
+
+ DC elements are declared as substitutable for the abstract element dc:any, and
+ DC element refinements are defined as substitutable for the base elements
+ which they refine.
+
+ This means that the default type for all XML elements (i.e. all DC elements and
+ element refinements) is dc:SimpleLiteral.
+
+ Encoding schemes are defined as complexTypes which are restrictions
+ of the dc:SimpleLiteral complexType. These complexTypes restrict
+ values to an appropriates syntax or format using data typing,
+ regular expressions, or enumerated lists.
+
+ In order to specify one of these encodings an xsi:type attribute must
+ be used in the instance document.
+
+ Also, note that one shortcoming of this approach is that any type can be
+ applied to any of the elements or refinements. There is no convenient way
+ to restrict types to specific elements using this approach.
+
+ </xs:documentation>
+
+ </xs:annotation>
+
+
+ <xs:import namespace="http://www.w3.org/XML/1998/namespace"
+ schemaLocation="http://www.w3.org/2001/03/xml.xsd">
+ </xs:import>
+
+ <xs:import namespace="http://purl.org/dc/elements/1.1/"
+ schemaLocation="dc.xsd"/>
+
+ <xs:import namespace="http://purl.org/dc/dcmitype/"
+ schemaLocation="dcmitype.xsd"/>
+
+ <xs:element name="alternative" substitutionGroup="dc:title"/>
+
+ <xs:element name="tableOfContents" substitutionGroup="dc:description"/>
+ <xs:element name="abstract" substitutionGroup="dc:description"/>
+
+ <xs:element name="created" substitutionGroup="dc:date"/>
+ <xs:element name="valid" substitutionGroup="dc:date"/>
+ <xs:element name="available" substitutionGroup="dc:date"/>
+ <xs:element name="issued" substitutionGroup="dc:date"/>
+ <xs:element name="modified" substitutionGroup="dc:date"/>
+ <xs:element name="dateAccepted" substitutionGroup="dc:date"/>
+ <xs:element name="dateCopyrighted" substitutionGroup="dc:date"/>
+ <xs:element name="dateSubmitted" substitutionGroup="dc:date"/>
+
+ <xs:element name="extent" substitutionGroup="dc:format"/>
+ <xs:element name="medium" substitutionGroup="dc:format"/>
+
+ <xs:element name="isVersionOf" substitutionGroup="dc:relation"/>
+ <xs:element name="hasVersion" substitutionGroup="dc:relation"/>
+ <xs:element name="isReplacedBy" substitutionGroup="dc:relation"/>
+ <xs:element name="replaces" substitutionGroup="dc:relation"/>
+ <xs:element name="isRequiredBy" substitutionGroup="dc:relation"/>
+ <xs:element name="requires" substitutionGroup="dc:relation"/>
+ <xs:element name="isPartOf" substitutionGroup="dc:relation"/>
+ <xs:element name="hasPart" substitutionGroup="dc:relation"/>
+ <xs:element name="isReferencedBy" substitutionGroup="dc:relation"/>
+ <xs:element name="references" substitutionGroup="dc:relation"/>
+ <xs:element name="isFormatOf" substitutionGroup="dc:relation"/>
+ <xs:element name="hasFormat" substitutionGroup="dc:relation"/>
+ <xs:element name="conformsTo" substitutionGroup="dc:relation"/>
+
+ <xs:element name="spatial" substitutionGroup="dc:coverage"/>
+ <xs:element name="temporal" substitutionGroup="dc:coverage"/>
+
+ <xs:element name="audience" substitutionGroup="dc:any"/>
+
+ <xs:element name="mediator" substitutionGroup="audience"/>
+ <xs:element name="educationLevel" substitutionGroup="audience"/>
+
+ <xs:element name="accessRights" substitutionGroup="dc:rights"/>
+
+ <xs:element name="bibliographicCitation" substitutionGroup="dc:identifier"/>
+
+</xs:schema>
Modified: trunk/protocol/examples/metadata_response.xml
===================================================================
--- trunk/protocol/examples/metadata_response.xml 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/examples/metadata_response.xml 2006-05-17 14:04:27 UTC (rev 532)
@@ -11,20 +11,42 @@
<software name="TAPIR_Implementation" version="1.0"/>
</source>
</header>
- <metadata>
- <label>Example Data Provider</label>
- <accesspoint>http://accesspoint_url</accesspoint>
- <relatedEntities>
- <entity>
- <name>TAPIR working group</name>
- <contact>
- <name>Frank Sinatra</name>
- <email>f.sinatra(a)tapir.com</email>
- <role>system administrator</role>
- </contact>
- <role>Hosting Provider</role>
- </entity>
- </relatedEntities>
- </metadata>
+ <metadata>
+ <dc:title>Australian Fruit Flies</dc:title>
+ <dc:type>http://purl.org/dc/dcmitype/Service<dc:type>
+ <accesspoint>http://www.flies.au/pywrapper/flies</accesspoint>
+ <dc:description>Australian fruit flies collection of georeferenced records observations</dc:description>
+ <dc:language>EN</dc:language>
+ <dc:subject>fuit flies observation australia diptera Tephritidae</dc:subject>
+ <dct:bibliographicCitation>Australian Fruit Flies TAPIR provider</dct:bibliographicCitation>
+ <dc:rights>free</dc:rights>
+ <dct:modified>2006-01-01T00:00:00</dct:modified>
+ <indexingPreferences startTime="09:30:00Z" maxDuration="PT1H" frequency="P1M" />
+ <relatedEntity>
+ <role>technical host</role>
+ <entity>
+ <identifier>someGUID</identifier>
+ <name>TAPIR working group</name>
+ <acronym>TAPIR-WG</acronym>
+ <logoURL>http://somehost/entitylogo.png</logoURL>
+ <description>Description about the TAPIR working group</description>
+ <relatedInformation>http://somehost/moreinfo</relatedInformation>
+ <geo:Point>
+ <geo:lat>45.256</geo:lat>
+ <geo:long>-71.92</geo:long>
+ </geo:Point>
+ <hasContact>
+ <role>system administrator</role>
+ <vcard:VCARD>
+ <vcard:FN>Frank Sinatra</vcard:FN>
+ <vcard:TITLE>Main singer</vcard:TITLE>
+ <vcard:EMAIL>f.sinatra(a)tapir.com</vcard:EMAIL>
+ <vcard:TEL>some phone number</vcard:TEL>
+ </vcard:VCARD>
+ </hasContact>
+ </entity>
+ </relatedEntity>
+ </metadata>
+
</response>
Deleted: trunk/protocol/georss.xsd
===================================================================
--- trunk/protocol/georss.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/georss.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -1,113 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:georss="http://www.georss.org/georss/10" targetNamespace="http://www.georss.org/georss/10" xmlns:gml="http://www.opengis.net/gml" elementFormDefault="qualified" version="1.0">
- <xs:import namespace="http://www.opengis.net/gml" schemaLocation="gmlgeorss.xsd"/>
-
- <xs:element name="where">
- <xs:annotation>
- <xs:documentation>Root property element of a GeoRSS GML instance</xs:documentation>
- <xs:documentation>Container for optional GeoRSS attributes</xs:documentation>
- </xs:annotation>
- <xs:complexType>
- <xs:choice>
- <xs:element ref="gml:Point"/>
- <xs:element ref="gml:LineString"/>
- <xs:element ref="gml:Polygon"/>
- <xs:element ref="gml:Envelope"/>
- </xs:choice>
- <xs:attributeGroup ref="georss:whereAttrGroup"/>
- </xs:complexType>
- </xs:element>
-
- <xs:element name="point" type="georss:SimplePositionType">
- <xs:annotation>
- <xs:documentation>Point property element containing a pair of coordinates representing latitude then longitude in the WGS84 coordinate reference system</xs:documentation>
- <xs:documentation>This GeoRSS Simple element maps completely onto the where + gml:Point combination of GeoRSS GML</xs:documentation>
- </xs:annotation>
- </xs:element>
-
- <xs:element name="line" type="georss:SimplePositionType">
- <xs:annotation>
- <xs:documentation>Line property element containing a list of pairs of coordinates representing latitude then longitude in the WGS84 coordinate reference system</xs:documentation>
- <xs:documentation>This GeoRSS Simple element maps completely onto the where + gml:LineString combination of GeoRSS GML</xs:documentation>
- </xs:annotation>
- </xs:element>
-
- <xs:element name="polygon" type="georss:SimplePositionType">
- <xs:annotation>
- <xs:documentation>Closed ring property element containing a list of pairs of coordinates (first pair and last pair identical) representing latitude then longitude in the WGS84 coordinate reference system</xs:documentation>
- <xs:documentation>This GeoRSS Simple element maps completely onto the where + gml:Polygon combination of GeoRSS GML</xs:documentation>
- </xs:annotation>
- </xs:element>
-
- <xs:element name="box" type="georss:SimplePositionType">
- <xs:annotation>
- <xs:documentation>Rectangular envelope property element containing two pairs of coordinates (lower left envelope corner, upper right envelope corner) representing latitude then longitude in the WGS84 coordinate reference system</xs:documentation>
- <xs:documentation>This GeoRSS Simple element maps completely onto the where + gml:Envelope combination of GeoRSS GML</xs:documentation>
- </xs:annotation>
- </xs:element>
-
- <!-- ================================================= -->
-
- <xs:complexType name="SimplePositionType">
- <xs:simpleContent>
- <xs:annotation>
- <xs:documentation>Extended doubleList with the addition of GeoRSS where attributes</xs:documentation>
- </xs:annotation>
-
- <xs:extension base="georss:doubleList">
- <xs:attributeGroup ref="georss:whereAttrGroup"/>
- </xs:extension>
- </xs:simpleContent>
- </xs:complexType>
-
- <!-- ================================================= -->
-
- <xs:simpleType name="doubleList">
- <xs:annotation>
- <xs:documentation>XML List based on XML Schema double type, identical to gml:doubleList. An element of this type contains a space-separated list of double values</xs:documentation>
- </xs:annotation>
- <xs:list itemType="xs:double"/>
- </xs:simpleType>
-
- <!-- ================================================= -->
-
- <xs:attributeGroup name="whereAttrGroup">
- <xs:annotation>
- <xs:documentation>Optional additional parameters for a georss location property </xs:documentation>
- </xs:annotation>
-
- <xs:attribute name="featuretypetag" type="xs:NCName" use="optional">
- <xs:annotation>
- <xs:documentation>Optional where attribute indicating the type of geographic entity is being referred to. Default is "location"</xs:documentation>
- </xs:annotation>
- </xs:attribute>
-
- <xs:attribute name="relationshiptag" type="xs:NCName" use="optional">
- <xs:annotation>
- <xs:documentation>Optional where attribute indicating how geotagged content is related to the represented location. Default is "isLocatedAt"</xs:documentation>
- </xs:annotation>
- </xs:attribute>
-
- <xs:attribute name="elev" type="xs:double" use="optional">
- <xs:annotation>
- <xs:documentation>Optional where attribute indicating a GPS-measured elevation in meters (e.g. WGS84 geoid height)</xs:documentation>
- </xs:annotation>
- </xs:attribute>
-
- <xs:attribute name="floor" type="xs:double" use="optional">
- <xs:annotation>
- <xs:documentation>Optional where attribute indicating elevation by building floor</xs:documentation>
- </xs:annotation>
- </xs:attribute>
-
- <xs:attribute name="radius" type="xs:double" use="optional">
- <xs:annotation>
- <xs:documentation>Optional where attribute indicating size in meters of a radius or buffer being indicated around the geometry (e.g. radius of circular area around a point geometry </xs:documentation>
- </xs:annotation>
- </xs:attribute>
-
- </xs:attributeGroup>
-
- <!-- ================================================= -->
-
-</xs:schema>
\ No newline at end of file
Deleted: trunk/protocol/gmlgeorss.xsd
===================================================================
--- trunk/protocol/gmlgeorss.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/gmlgeorss.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -1,1038 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:gml="http://www.opengis.net/gml" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:smil20="http://www.w3.org/2001/SMIL20/" xmlns:smil20lang="http://www.w3.org/2001/SMIL20/Language" targetNamespace="http://www.opengis.net/gml" elementFormDefault="qualified" version="3.1.1">
- <annotation>
- <documentation>GML Subset schema for gml:Point,gml:LineString,gml:Polygon,gml:Envelope written by gmlSubset.xslt. </documentation>
- <documentation>Manually revised for reference by the GeoRSS application schema</documentation>
- </annotation>
- <import namespace="http://www.w3.org/1999/xlink" schemaLocation="http://schemas.opengeospatial.net/gml/3.1.1/xlink/xlinks.xsd"/>
-
- <!-- ================================================= -->
- <element name="Point" type="gml:PointType" substitutionGroup="gml:_GeometricPrimitive"/>
-
- <!-- ================================================= -->
- <complexType name="PointType">
-
- <annotation>
-
- <documentation>A Point is defined by a single coordinate tuple.</documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractGeometricPrimitiveType">
-
- <sequence>
-
- <choice>
-
- <annotation>
-
- <documentation>GML supports two different ways to specify the direct poisiton of a point. 1. The "pos" element is of type DirectPositionType.</documentation>
-
- </annotation>
-
- <element ref="gml:pos"/>
-
- <element ref="gml:coordinates">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.1.0 for coordinates with ordinate values that are numbers. Use "pos" instead. The "coordinates" element shall only be used for coordinates with ordinates that require a string representation, e.g. DMS representations.</documentation>
-
- </annotation>
-
- </element>
-
- <element ref="gml:coord">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.0. Use "pos" instead. The "coord" element is included for backwards compatibility with GML 2.</documentation>
-
- </annotation>
-
- </element>
-
- </choice>
-
- </sequence>
-
- </extension>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <complexType name="AbstractGeometricPrimitiveType" abstract="true">
-
- <annotation>
-
- <documentation>This is the abstract root type of the geometric primitives. A geometric primitive is a geometric object that is not decomposed further into other primitives in the system. All primitives are oriented in the direction implied by the sequence of their coordinate tuples.</documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractGeometryType"/>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <complexType name="AbstractGeometryType" abstract="true">
-
- <annotation>
-
- <documentation>All geometry elements are derived directly or indirectly from this abstract supertype. A geometry element may have an identifying attribute ("gml:id"), a name (attribute "name") and a description (attribute "description"). It may be associated with a spatial reference system (attribute "srsName"). The following rules shall be adhered: - Every geometry type shall derive from this abstract type. - Every geometry element (i.e. an element of a geometry type) shall be directly or indirectly in the substitution group of _Geometry.</documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractGMLType">
-
- <attribute name="gid" type="string" use="optional">
-
- <annotation>
-
- <documentation>This attribute is included for backward compatibility with GML 2 and is deprecated with GML 3. This identifer is superceded by "gml:id" inherited from AbstractGMLType. The attribute "gid" should not be used anymore and may be deleted in future versions of GML without further notice.</documentation>
-
- </annotation>
-
- </attribute>
-
- <attributeGroup ref="gml:SRSReferenceGroup"/>
-
- </extension>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <complexType name="AbstractGMLType" abstract="true">
-
- <annotation>
-
- <documentation>All complexContent GML elements are directly or indirectly derived from this abstract supertype
- to establish a hierarchy of GML types that may be distinguished from other XML types by their ancestry.
- Elements in this hierarchy may have an ID and are thus referenceable. </documentation>
-
- </annotation>
-
- <sequence>
-
- <group ref="gml:StandardObjectProperties"/>
-
- </sequence>
-
- <attribute ref="gml:id" use="optional"/>
-
- </complexType>
-
- <!-- ================================================= -->
- <group name="StandardObjectProperties">
-
- <annotation>
-
- <documentation>This content model group makes it easier to construct types that
- derive from AbstractGMLType and its descendents "by restriction".
- A reference to the group saves having to enumerate the standard object properties. </documentation>
-
- </annotation>
-
- <sequence>
-
- <element ref="gml:metaDataProperty" minOccurs="0" maxOccurs="unbounded"/>
-
- <element ref="gml:description" minOccurs="0"/>
-
- <element ref="gml:name" minOccurs="0" maxOccurs="unbounded">
-
- <annotation>
-
- <documentation>Multiple names may be provided. These will often be distinguished by being assigned by different authorities, as indicated by the value of the codeSpace attribute. In an instance document there will usually only be one name per authority. </documentation>
-
- </annotation>
-
- </element>
-
- </sequence>
-
- </group>
-
- <!-- ================================================= -->
- <element name="metaDataProperty" type="gml:MetaDataPropertyType">
-
- <annotation>
-
- <documentation>Contains or refers to a metadata package that contains metadata properties. </documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="MetaDataPropertyType">
-
- <annotation>
-
- <documentation>Base type for complex metadata property types.</documentation>
-
- </annotation>
-
- <choice minOccurs="0">
-
- <element ref="gml:_MetaData"/>
-
- <any processContents="lax"/>
-
- </choice>
-
- <attributeGroup ref="gml:AssociationAttributeGroup"/>
-
- <attribute name="about" type="anyURI" use="optional"/>
-
- </complexType>
-
- <!-- ================================================= -->
- <attributeGroup name="AssociationAttributeGroup">
-
- <annotation>
-
- <documentation>Attribute group used to enable property elements to refer to their value remotely. It contains the “simple link” components from xlinks.xsd, with all members “optional”, and the remoteSchema attribute, which is also optional. These attributes can be attached to any element, thus allowing it to act as a pointer. The 'remoteSchema' attribute allows an element that carries link attributes to indicate that the element is declared in a remote schema rather than by the schema that constrains the current document instance. </documentation>
-
- </annotation>
-
- <attributeGroup ref="xlink:simpleLink"/>
-
- <attribute ref="gml:remoteSchema" use="optional"/>
-
- </attributeGroup>
-
- <!-- ================================================= -->
- <attribute name="remoteSchema" type="anyURI">
-
- <annotation>
-
- <documentation>Reference to an XML Schema fragment that specifies the content model of the property’s value. This is in conformance with the XML Schema Section 4.14 Referencing Schemas from Elsewhere. </documentation>
-
- </annotation>
-
- </attribute>
-
- <!-- ================================================= -->
- <element name="description" type="gml:StringOrRefType">
-
- <annotation>
-
- <documentation>Contains a simple text description of the object, or refers to an external description. </documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="StringOrRefType">
-
- <annotation>
-
- <documentation>
-This type is available wherever there is a need for a "text" type property. It is of string type, so the text can be included inline, but the value can also be referenced remotely via xlinks from the AssociationAttributeGroup. If the remote reference is present, then the value obtained by traversing the link should be used, and the string content of the element can be used for an annotation. </documentation>
-
- </annotation>
-
- <simpleContent>
-
- <extension base="string">
-
- <attributeGroup ref="gml:AssociationAttributeGroup"/>
-
- </extension>
-
- </simpleContent>
-
- </complexType>
-
- <!-- ================================================= -->
-
-
- <!-- ================================================= -->
- <element name="name" type="gml:CodeType">
-
- <annotation>
-
- <documentation>Label for the object, normally a descriptive name. An object may have several names, typically assigned by different authorities. The authority for a name is indicated by the value of its (optional) codeSpace attribute. The name may or may not be unique, as determined by the rules of the organization responsible for the codeSpace. </documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="CodeType">
-
- <annotation>
-
- <documentation>Name or code with an (optional) authority. Text token.
- If the codeSpace attribute is present, then its value should identify a dictionary, thesaurus
- or authority for the term, such as the organisation who assigned the value,
- or the dictionary from which it is taken.
- A text string with an optional codeSpace attribute. </documentation>
-
- </annotation>
-
- <simpleContent>
-
- <extension base="string">
-
- <attribute name="codeSpace" type="anyURI" use="optional"/>
-
- </extension>
-
- </simpleContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <attribute name="id" type="ID">
-
- <annotation>
-
- <documentation>Database handle for the object. It is of XML type “ID”, so is constrained to be unique in the XML document within which it occurs. An external identifier for the object in the form of a URI may be constructed using standard XML and XPointer methods. This is done by concatenating the URI for the document, a fragment separator “#”, and the value of the id attribute. </documentation>
-
- </annotation>
-
- </attribute>
-
- <!-- ================================================= -->
- <attributeGroup name="SRSReferenceGroup">
-
- <annotation>
-
- <documentation>Optional reference to the CRS used by this geometry, with optional additional information to simplify use when a more complete definition of the CRS is not needed. </documentation>
-
- </annotation>
-
- <attribute name="srsName" type="anyURI" use="optional">
-
- <annotation>
-
- <documentation>In general this reference points to a CRS instance of gml:CoordinateReferenceSystemType (see coordinateReferenceSystems.xsd). For well known references it is not required that the CRS description exists at the location the URI points to. If no srsName attribute is given, the CRS must be specified as part of the larger context this geometry element is part of, e.g. a geometric element like point, curve, etc. It is expected that this attribute will be specified at the direct position level only in rare cases.</documentation>
-
- </annotation>
-
- </attribute>
-
- <attribute name="srsDimension" type="positiveInteger" use="optional">
-
- <annotation>
-
- <documentation>The "srsDimension" is the length of coordinate sequence (the number of entries in the list). This dimension is specified by the coordinate reference system. When the srsName attribute is omitted, this attribute shall be omitted. </documentation>
-
- </annotation>
-
- </attribute>
-
- <attributeGroup ref="gml:SRSInformationGroup"/>
-
- </attributeGroup>
-
- <!-- ================================================= -->
- <attributeGroup name="SRSInformationGroup">
-
- <annotation>
-
- <documentation>Optional additional and redundant information for a CRS to simplify use when a more complete definition of the CRS is not needed. This information shall be the same as included in the more complete definition of the CRS, referenced by the srsName attribute. When the srsName attribute is included, either both or neither of the axisLabels and uomLabels attributes shall be included. When the srsName attribute is omitted, both of these attributes shall be omitted. </documentation>
-
- </annotation>
-
- <attribute name="axisLabels" type="gml:NCNameList" use="optional">
-
- <annotation>
-
- <documentation>Ordered list of labels for all the axes of this CRS. The gml:axisAbbrev value should be used for these axis labels, after spaces and forbiddden characters are removed. When the srsName attribute is included, this attribute is optional. When the srsName attribute is omitted, this attribute shall also be omitted. </documentation>
-
- </annotation>
-
- </attribute>
-
- <attribute name="uomLabels" type="gml:NCNameList" use="optional">
-
- <annotation>
-
- <documentation>Ordered list of unit of measure (uom) labels for all the axes of this CRS. The value of the string in the gml:catalogSymbol should be used for this uom labels, after spaces and forbiddden characters are removed. When the axisLabels attribute is included, this attribute shall also be included. When the axisLabels attribute is omitted, this attribute shall also be omitted. </documentation>
-
- </annotation>
-
- </attribute>
-
- </attributeGroup>
-
- <!-- ================================================= -->
- <simpleType name="NCNameList">
-
- <annotation>
-
- <documentation>A set of values, representing a list of token with the lexical value space of NCName. The tokens are seperated by whitespace.</documentation>
-
- </annotation>
-
- <list itemType="NCName"/>
-
- </simpleType>
-
- <!-- ================================================= -->
-
-
- <!-- ================================================= -->
- <element name="pos" type="gml:DirectPositionType">
-
- <annotation>
-
- <appinfo>
-
- <sch:pattern>
-
- <sch:rule context="gml:pos">
-
- <sch:extends rule="CRSLabels"/>
-
- </sch:rule>
-
- </sch:pattern>
-
- </appinfo>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="DirectPositionType">
-
- <annotation>
-
- <documentation>DirectPosition instances hold the coordinates for a position within some coordinate reference system (CRS). Since DirectPositions, as data types, will often be included in larger objects (such as geometry elements) that have references to CRS, the "srsName" attribute will in general be missing, if this particular DirectPosition is included in a larger element with such a reference to a CRS. In this case, the CRS is implicitly assumed to take on the value of the containing object's CRS.</documentation>
-
- </annotation>
-
- <simpleContent>
-
- <extension base="gml:doubleList">
-
- <attributeGroup ref="gml:SRSReferenceGroup"/>
-
- </extension>
-
- </simpleContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <simpleType name="doubleList">
-
- <annotation>
-
- <documentation>XML List based on XML Schema double type. An element of this type contains a space-separated list of double values</documentation>
-
- </annotation>
-
- <list itemType="double"/>
-
- </simpleType>
-
- <!-- ================================================= -->
-
-
- <!-- ================================================= -->
- <element name="coordinates" type="gml:CoordinatesType">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.1.0.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="CoordinatesType">
-
- <annotation>
-
- <documentation>Tables or arrays of tuples.
- May be used for text-encoding of values from a table.
- Actually just a string, but allows the user to indicate which characters are used as separators.
- The value of the 'cs' attribute is the separator for coordinate values,
- and the value of the 'ts' attribute gives the tuple separator (a single space by default);
- the default values may be changed to reflect local usage.
- Defaults to CSV within a tuple, space between tuples.
- However, any string content will be schema-valid. </documentation>
-
- </annotation>
-
- <simpleContent>
-
- <extension base="string">
-
- <attribute name="decimal" type="string" default="."/>
-
- <attribute name="cs" type="string" default=","/>
-
- <attribute name="ts" type="string" default=" "/>
-
- </extension>
-
- </simpleContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="coord" type="gml:CoordType">
-
- <annotation>
-
- <documentation>Deprecated with GML 3.0 and included for backwards compatibility with GML 2. Use the "pos" element instead.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="CoordType">
-
- <annotation>
-
- <documentation>Represents a coordinate tuple in one, two, or three dimensions. Deprecated with GML 3.0 and replaced by DirectPositionType.</documentation>
-
- </annotation>
-
- <sequence>
-
- <element name="X" type="decimal"/>
-
- <element name="Y" type="decimal" minOccurs="0"/>
-
- <element name="Z" type="decimal" minOccurs="0"/>
-
- </sequence>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="_GeometricPrimitive" type="gml:AbstractGeometricPrimitiveType" abstract="true" substitutionGroup="gml:_Geometry">
-
- <annotation>
-
- <documentation>The "_GeometricPrimitive" element is the abstract head of the substituition group for all (pre- and user-defined) geometric primitives.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="_Geometry" type="gml:AbstractGeometryType" abstract="true" substitutionGroup="gml:_GML">
-
- <annotation>
-
- <documentation>The "_Geometry" element is the abstract head of the substituition group for all geometry elements of GML 3. This includes pre-defined and user-defined geometry elements. Any geometry element must be a direct or indirect extension/restriction of AbstractGeometryType and must be directly or indirectly in the substitution group of "_Geometry".</documentation>
-
- <appinfo>
-
- <sch:pattern>
-
- <sch:rule context="gml:_Geometry">
-
- <sch:extends rule="CRSLabels"/>
-
- </sch:rule>
-
- </sch:pattern>
-
- </appinfo>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="_GML" type="gml:AbstractGMLType" abstract="true" substitutionGroup="gml:_Object">
-
- <annotation>
-
- <documentation>Global element which acts as the head of a substitution group that may include any element which is a GML feature, object, geometry or complex value</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="_Object" abstract="true">
-
- <annotation>
-
- <documentation>This abstract element is the head of a substitutionGroup hierararchy which may contain either simpleContent or complexContent elements. It is used to assert the model position of "class" elements declared in other GML schemas. </documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="LineString" type="gml:LineStringType" substitutionGroup="gml:_Curve"/>
-
- <!-- ================================================= -->
- <complexType name="LineStringType">
-
- <annotation>
-
- <documentation>A LineString is a special curve that consists of a single segment with linear interpolation. It is defined by two or more coordinate tuples, with linear interpolation between them. It is backwards compatible with the LineString of GML 2, GM_LineString of ISO 19107 is implemented by LineStringSegment.</documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractCurveType">
-
- <sequence>
-
- <choice>
-
- <annotation>
-
- <documentation>GML supports two different ways to specify the control points of a line string. 1. A sequence of "pos" (DirectPositionType) or "pointProperty" (PointPropertyType) elements. "pos" elements are control points that are only part of this curve, "pointProperty" elements contain a point that may be referenced from other geometry elements or reference another point defined outside of this curve (reuse of existing points). 2. The "posList" element allows for a compact way to specifiy the coordinates of the control points, if all control points are in the same coordinate reference systems and belong to this curve only. The number of direct positions in the list must be at least two.</documentation>
-
- </annotation>
-
- <choice minOccurs="2" maxOccurs="unbounded">
-
- <element ref="gml:pos"/>
-
- <element ref="gml:pointProperty"/>
-
- <element ref="gml:pointRep">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.1.0. Use "pointProperty" instead. Included for backwards compatibility with GML 3.0.0.</documentation>
-
- </annotation>
-
- </element>
-
- <element ref="gml:coord">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.0. Use "pos" instead. The "coord" element is included for backwards compatibility with GML 2.</documentation>
-
- </annotation>
-
- </element>
-
- </choice>
-
- <element ref="gml:posList"/>
-
- <element ref="gml:coordinates">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.1.0. Use "posList" instead.</documentation>
-
- </annotation>
-
- </element>
-
- </choice>
-
- </sequence>
-
- </extension>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <complexType name="AbstractCurveType" abstract="true">
-
- <annotation>
-
- <documentation>An abstraction of a curve to support the different levels of complexity. The curve can always be viewed as a geometric primitive, i.e. is continuous.</documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractGeometricPrimitiveType"/>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="pointProperty" type="gml:PointPropertyType">
-
- <annotation>
-
- <appinfo>
-
- <sch:pattern>
-
- <sch:rule context="gml:pointProperty">
-
- <sch:extends rule="hrefOrContent"/>
-
- </sch:rule>
-
- </sch:pattern>
-
- </appinfo>
-
- <documentation>This property element either references a point via the XLink-attributes or contains the point element. pointProperty is the predefined property which can be used by GML Application Schemas whenever a GML Feature has a property with a value that is substitutable for Point.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="PointPropertyType">
-
- <annotation>
-
- <documentation>A property that has a point as its value domain can either be an appropriate geometry element encapsulated in an element of this type or an XLink reference to a remote geometry element (where remote includes geometry elements located elsewhere in the same document). Either the reference or the contained element must be given, but neither both nor none.</documentation>
-
- </annotation>
-
- <sequence>
-
- <element ref="gml:Point" minOccurs="0"/>
-
- </sequence>
-
- <attributeGroup ref="gml:AssociationAttributeGroup">
-
- <annotation>
-
- <documentation>This attribute group includes the XLink attributes (see xlinks.xsd). XLink is used in GML to reference remote resources (including those elsewhere in the same document). A simple link element can be constructed by including a specific set of XLink attributes. The XML Linking Language (XLink) is currently a Proposed Recommendation of the World Wide Web Consortium. XLink allows elements to be inserted into XML documents so as to create sophisticated links between resources; such links can be used to reference remote properties. A simple link element can be used to implement pointer functionality, and this functionality has been built into various GML 3 elements by including the gml:AssociationAttributeGroup. </documentation>
-
- </annotation>
-
- </attributeGroup>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="pointRep" type="gml:PointPropertyType">
-
- <annotation>
-
- <documentation>Deprecated with GML version 3.1.0. Use "pointProperty" instead. Included for backwards compatibility with GML 3.0.0.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="posList" type="gml:DirectPositionListType">
-
- <annotation>
-
- <appinfo>
-
- <sch:pattern>
-
- <sch:rule context="gml:posList">
-
- <sch:extends rule="CRSLabels"/>
-
- </sch:rule>
-
- </sch:pattern>
-
- </appinfo>
-
- <appinfo>
-
- <sch:pattern>
-
- <sch:rule context="gml:posList">
-
- <sch:extends rule="Count"/>
-
- </sch:rule>
-
- </sch:pattern>
-
- </appinfo>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="DirectPositionListType">
-
- <annotation>
-
- <documentation>DirectPositionList instances hold the coordinates for a sequence of direct positions within the same coordinate reference system (CRS).</documentation>
-
- </annotation>
-
- <simpleContent>
-
- <extension base="gml:doubleList">
-
- <attributeGroup ref="gml:SRSReferenceGroup"/>
-
- <attribute name="count" type="positiveInteger" use="optional">
-
- <annotation>
-
- <documentation>"count" allows to specify the number of direct positions in the list. If the attribute “count” is present then the attribute “srsDimension” shall be present, too.</documentation>
-
- </annotation>
-
- </attribute>
-
- </extension>
-
- </simpleContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="_Curve" type="gml:AbstractCurveType" abstract="true" substitutionGroup="gml:_GeometricPrimitive">
-
- <annotation>
-
- <documentation>The "_Curve" element is the abstract head of the substituition group for all (continuous) curve elements.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="Polygon" type="gml:PolygonType" substitutionGroup="gml:_Surface"/>
-
- <!-- ================================================= -->
- <complexType name="PolygonType">
-
- <annotation>
-
- <documentation>A Polygon is a special surface that is defined by a single surface patch. The boundary of this patch is coplanar and the polygon uses planar interpolation in its interior. It is backwards compatible with the Polygon of GML 2, GM_Polygon of ISO 19107 is implemented by PolygonPatch.</documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractSurfaceType">
-
- <sequence>
-
- <element ref="gml:exterior" minOccurs="0"/>
-
- <element ref="gml:interior" minOccurs="0" maxOccurs="unbounded"/>
-
- </sequence>
-
- </extension>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <complexType name="AbstractSurfaceType">
-
- <annotation>
-
- <documentation>
- An abstraction of a surface to support the different levels of complexity. A surface is always a continuous region of a plane.
- </documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractGeometricPrimitiveType"/>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="exterior" type="gml:AbstractRingPropertyType">
-
- <annotation>
-
- <documentation>A boundary of a surface consists of a number of rings. In the normal 2D case, one of these rings is distinguished as being the exterior boundary. In a general manifold this is not always possible, in which case all boundaries shall be listed as interior boundaries, and the exterior will be empty.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="AbstractRingPropertyType">
-
- <annotation>
-
- <documentation>
- Encapsulates a ring to represent the surface boundary property of a surface.
- </documentation>
-
- </annotation>
-
- <sequence>
-
- <element ref="gml:_Ring"/>
-
- </sequence>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="_Ring" type="gml:AbstractRingType" abstract="true" substitutionGroup="gml:_Geometry">
-
- <annotation>
-
- <documentation>The "_Ring" element is the abstract head of the substituition group for all closed boundaries of a surface patch.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <complexType name="AbstractRingType" abstract="true">
-
- <annotation>
-
- <documentation>
- An abstraction of a ring to support surface boundaries of different complexity.
- </documentation>
-
- </annotation>
-
- <complexContent>
-
- <extension base="gml:AbstractGeometryType"/>
-
- </complexContent>
-
- </complexType>
-
- <!-- ================================================= -->
- <element name="interior" type="gml:AbstractRingPropertyType">
-
- <annotation>
-
- <documentation>A boundary of a surface consists of a number of rings. The "interior" rings seperate the surface / surface patch from the area enclosed by the rings.</documentation>
-
- </annotation>
-
- </element>
-
- <!-- ================================================= -->
- <element name="_Surface" type="gml:AbstractSurfaceType" abstract="true" substitutionGroup="gml:_GeometricPrimitive">
-
- <annotation>
-
- <documentation>The "_Surface" element is the abstract head of the substituition group for all (continuous) surface elements.</documentation>
-
- </annotation>
-
- </element>
- <!-- =========== J.Lieberman - elements added manually to script result ========= -->
- <!-- =========== Abstract Metadata supertype ========================= -->
- <element name="_MetaData" type="gml:AbstractMetaDataType" abstract="true" substitutionGroup="gml:_Object">
- <annotation>
- <documentation>Abstract element which acts as the head of a substitution group for packages of MetaData properties. </documentation>
- </annotation>
- </element>
- <!-- =========================================================== -->
- <complexType name="AbstractMetaDataType" abstract="true" mixed="true">
- <annotation>
- <documentation> An abstract base type for complex metadata types.</documentation>
- </annotation>
- <attribute ref="gml:id" use="optional"/>
- </complexType>
- <!-- =========================================================== -->
- <!-- Envelope -->
- <!-- =========================================================== -->
- <element name="Envelope" type="gml:EnvelopeType"/>
- <!-- =========================================================== -->
- <complexType name="EnvelopeType">
- <annotation>
- <documentation>Envelope defines an extent using a pair of positions defining opposite corners in arbitrary dimensions. The first direct position is the "lower corner" (a coordinate position consisting of all the minimal ordinates for each dimension for all points within the envelope), the second one the "upper corner" (a coordinate position consisting of all the maximal ordinates for each dimension for all points within the envelope).</documentation>
- </annotation>
- <choice>
- <sequence>
- <element name="lowerCorner" type="gml:DirectPositionType"/>
- <element name="upperCorner" type="gml:DirectPositionType"/>
- </sequence>
- <element ref="gml:coord" minOccurs="2" maxOccurs="2">
- <annotation>
- <appinfo>deprecated</appinfo>
- <documentation>deprecated with GML version 3.0</documentation>
- </annotation>
- </element>
- <element ref="gml:pos" minOccurs="2" maxOccurs="2">
- <annotation>
- <appinfo>deprecated</appinfo>
- <documentation>Deprecated with GML version 3.1. Use the explicit properties "lowerCorner" and "upperCorner" instead.</documentation>
- </annotation>
- </element>
- <element ref="gml:coordinates">
- <annotation>
- <documentation>Deprecated with GML version 3.1.0. Use the explicit properties "lowerCorner" and "upperCorner" instead.</documentation>
- </annotation>
- </element>
- </choice>
- <attributeGroup ref="gml:SRSReferenceGroup"/>
- </complexType>
- <!-- =========================================================== -->
- <element name="LinearRing" type="gml:LinearRingType" substitutionGroup="gml:_Ring"/>
- <!-- =========================================================== -->
- <complexType name="LinearRingType">
- <annotation>
- <documentation>A LinearRing is defined by four or more coordinate tuples, with linear interpolation between them; the first and last coordinates must be coincident.</documentation>
- </annotation>
- <complexContent>
- <extension base="gml:AbstractRingType">
- <sequence>
- <choice>
- <annotation>
- <documentation>GML supports two different ways to specify the control points of a linear ring.
-1. A sequence of "pos" (DirectPositionType) or "pointProperty" (PointPropertyType) elements. "pos" elements are control points that are only part of this ring, "pointProperty" elements contain a point that may be referenced from other geometry elements or reference another point defined outside of this ring (reuse of existing points).
-2. The "posList" element allows for a compact way to specifiy the coordinates of the control points, if all control points are in the same coordinate reference systems and belong to this ring only. The number of direct positions in the list must be at least four.</documentation>
- </annotation>
- <choice minOccurs="4" maxOccurs="unbounded">
- <element ref="gml:pos"/>
- <element ref="gml:pointProperty"/>
- <element ref="gml:pointRep">
- <annotation>
- <documentation>Deprecated with GML version 3.1.0. Use "pointProperty" instead. Included for backwards compatibility with GML 3.0.0.</documentation>
- </annotation>
- </element>
- </choice>
- <element ref="gml:posList"/>
- <element ref="gml:coordinates">
- <annotation>
- <documentation>Deprecated with GML version 3.1.0. Use "posList" instead.</documentation>
- </annotation>
- </element>
- <element ref="gml:coord" minOccurs="4" maxOccurs="unbounded">
- <annotation>
- <documentation>Deprecated with GML version 3.0 and included for backwards compatibility with GML 2. Use "pos" elements instead.</documentation>
- </annotation>
- </element>
- </choice>
- </sequence>
- </extension>
- </complexContent>
- </complexType>
-<!-- =========================================================== -->
-</schema>
Modified: trunk/protocol/tapir.xsd
===================================================================
--- trunk/protocol/tapir.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/tapir.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -5,7 +5,7 @@
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dct="http://purl.org/dc/terms/"
- xmlns:georss="http://www.georss.org/georss/10"
+ xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#"
elementFormDefault="unqualified"
attributeFormDefault="unqualified" version="1.0" xml:lang="EN" >
@@ -30,8 +30,8 @@
schemaLocation="dc.xsd"/>
<xsd:import namespace="http://purl.org/dc/terms/"
schemaLocation="dcterms.xsd"/>
- <xsd:import namespace="http://www.georss.org/georss/10"
- schemaLocation="georss.xsd"/>
+ <xsd:import namespace="http://www.w3.org/2003/01/geo/wgs84_pos#"
+ schemaLocation="w3c_geo.xsd"/>
<xsd:import namespace="http://www.w3.org/2001/vcard-rdf/3.0#"
schemaLocation="vcard.xsd"/>
<!-- ============================================= -->
@@ -1569,9 +1569,9 @@
</xsd:sequence>
</xsd:complexType>
</xsd:element>
- <xsd:element ref="georss:where" minOccurs="0">
+ <xsd:element ref="geo:Point" minOccurs="0">
<xsd:annotation>
- <xsd:documentation>Location of the entity in decimal WGS84 latitude and longitude as defined by GeoRSS/GML</xsd:documentation>
+ <xsd:documentation>Location of the entity in decimal WGS84 latitude and longitude (and optional altitude) as defined by the W3C Basic Geo Vocabulary</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element ref="custom"/>
Modified: trunk/protocol/vcard.xsd
===================================================================
--- trunk/protocol/vcard.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/vcard.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -3,7 +3,6 @@
targetNamespace="http://www.w3.org/2001/vcard-rdf/3.0#"
xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
- xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
elementFormDefault="qualified"
attributeFormDefault="qualified">
@@ -32,10 +31,7 @@
</annotation>
- <import namespace="http://www.w3.org/XML/1998/namespace"
- schemaLocation="http://www.w3.org/2001/03/xml.xsd"/>
- <import namespace="http://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/>
-
+ <import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/03/xml.xsd"/>
<element name="VCARD">
<complexType>
<sequence>
Added: trunk/protocol/w3c_geo.xsd
===================================================================
--- trunk/protocol/w3c_geo.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/w3c_geo.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -0,0 +1,59 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ targetNamespace="http://www.w3.org/2003/01/geo/wgs84_pos#"
+ xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
+ elementFormDefault="qualified" version="1.0">
+ <xs:annotation>
+ <xs:documentation xml:lang="en">
+ W3C Basic Geo Vocabulary XML Schema.
+ This schema represents the Geo vocabulary for RDF in the XML schema language.
+ Its developed for the use in TAPIR
+ </xs:documentation>
+
+ <xs:appinfo xmlns:dc="http://purl.org/dc/elements/1.1/">
+ <dc:title>W3C Basic Geo Vocabulary XML Schema</dc:title>
+ <dc:creator>Markus Döring, m.doering(a)bgbm.org</dc:creator>
+ <dc:relation>http://www.w3.org/2003/01/geo/</dc:relation>
+ </xs:appinfo>
+ </xs:annotation>
+
+ <!-- ================================================= -->
+
+ <xs:element name="Point">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element ref="geo:lat"/>
+ <xs:element ref="geo:long"/>
+ <xs:element ref="geo:alt" minOccurs="0"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="lat" type="geo:latitudeType"/>
+ <xs:element name="long" type="geo:longitudeType"/>
+ <xs:element name="alt" type="xs:float"/>
+
+ <!-- ================================================= -->
+
+ <xs:simpleType name="latitudeType">
+ <xs:annotation>
+ <xs:documentation>A simple type representing decimal latitude values from -90 to 90 degrees</xs:documentation>
+ </xs:annotation>
+ <xs:restriction base="xs:float">
+ <xs:maxInclusive value="180"></xs:maxInclusive>
+ <xs:minInclusive value="-180"></xs:minInclusive>
+ </xs:restriction>
+
+ </xs:simpleType>
+
+ <xs:simpleType name="longitudeType">
+ <xs:annotation>
+ <xs:documentation>A simple type representing decimal longitude values from -180 to 180 degrees</xs:documentation>
+ </xs:annotation>
+ <xs:restriction base="xs:float">
+ <xs:maxInclusive value="180"></xs:maxInclusive>
+ <xs:minInclusive value="-180"></xs:minInclusive>
+ </xs:restriction>
+
+ </xs:simpleType>
+
+</xs:schema>
\ No newline at end of file
Deleted: trunk/protocol/xlink.xsd
===================================================================
--- trunk/protocol/xlink.xsd 2006-05-16 15:24:07 UTC (rev 531)
+++ trunk/protocol/xlink.xsd 2006-05-17 14:04:27 UTC (rev 532)
@@ -1,104 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<schema targetNamespace="http://www.w3.org/1999/xlink"
- xmlns="http://www.w3.org/2001/XMLSchema"
- xmlns:xlink="http://www.w3.org/1999/xlink"
- elementFormDefault="qualified"
- attributeFormDefault="qualified">
-
- <annotation>
- <documentation xml:lang="en">
- XLink XML Schema
-
- The source of this XSD is the METS: Metadata Encoding and Transmission Standard
- project at the Library of Congress
- http://www.loc.gov/standards/mets/xlink.xsd
- </documentation>
-
- <appinfo xmlns:dc="http://purl.org/dc/elements/1.1/">
- <dc:title>XLink XML Schema</dc:title>
- <dc:creator>METS: Metadata Encoding and Transmission Standard</dc:creator>
- <dc:source>http://www.loc.gov/standards/mets/xlink.xsd</dc:source>
- <dc:relation>http://www.loc.gov/standards/mets/</dc:relation>
- </appinfo>
- </annotation>
-
- <!-- global attributes -->
- <attribute name="href" type="anyURI"/>
- <attribute name="role" type="string"/>
- <attribute name="arcrole" type="string"/>
- <attribute name="title" type="string" />
- <attribute name="show">
- <simpleType>
- <restriction base="string">
- <enumeration value="new" />
- <enumeration value="replace" />
- <enumeration value="embed" />
- <enumeration value="other" />
- <enumeration value="none" />
- </restriction>
- </simpleType>
- </attribute>
- <attribute name="actuate">
- <simpleType>
- <restriction base="string">
- <enumeration value="onLoad" />
- <enumeration value="onRequest" />
- <enumeration value="other" />
- <enumeration value="none" />
- </restriction>
- </simpleType>
- </attribute>
- <attribute name="label" type="string" />
- <attribute name="from" type="string" />
- <attribute name="to" type="string" />
-
- <attributeGroup name="simpleLink">
- <attribute name="type" type="string" fixed="simple" form="qualified" />
- <attribute ref="xlink:href" use="optional" />
- <attribute ref="xlink:role" use="optional" />
- <attribute ref="xlink:arcrole" use="optional" />
- <attribute ref="xlink:title" use="optional" />
- <attribute ref="xlink:show" use="optional" />
- <attribute ref="xlink:actuate" use="optional" />
- </attributeGroup>
-
- <attributeGroup name="extendedLink">
- <attribute name="type" type="string" fixed="extended" form="qualified" />
- <attribute ref="xlink:role" use="optional" />
- <attribute ref="xlink:title" use="optional" />
- </attributeGroup>
-
- <attributeGroup name="locatorLink">
- <attribute name="type" type="string" fixed="locator" form="qualified" />
- <attribute ref="xlink:href" use="required" />
- <attribute ref="xlink:role" use="optional" />
- <attribute ref="xlink:title" use="optional" />
- <attribute ref="xlink:label" use="optional" />
- </attributeGroup>
-
- <attributeGroup name="arcLink">
- <attribute name="type" type="string" fixed="arc" form="qualified" />
- <attribute ref="xlink:arcrole" use="optional" />
- <attribute ref="xlink:title" use="optional" />
- <attribute ref="xlink:show" use="optional" />
- <attribute ref="xlink:actuate" use="optional" />
- <attribute ref="xlink:from" use="optional" />
- <attribute ref="xlink:to" use="optional" />
- </attributeGroup>
-
- <attributeGroup name="resourceLink">
- <attribute name="type" type="string" fixed="resource" form="qualified" />
- <attribute ref="xlink:role" use="optional" />
- <attribute ref="xlink:title" use="optional" />
- <attribute ref="xlink:label" use="optional" />
- </attributeGroup>
-
- <attributeGroup name="titleLink">
- <attribute name="type" type="string" fixed="title" form="qualified" />
- </attributeGroup>
-
- <attributeGroup name="emptyLink">
- <attribute name="type" type="string" fixed="none" form="qualified" />
- </attributeGroup>
-
-</schema>
\ No newline at end of file
1
0