[tdwg-tapir] Minor changes
Dear all,
Yesterday I finally managed to produce valid ABCD output from an instance of TapirLink that has mapped DarwinCore concepts. It could be possible to achieve this before by using a simplified XML Schema for ABCD, but I wanted to use the official one. The corresponding output model (inside our repository under cs/dwc/1.4/models) still needs to include more mappings between ABCD and DarwinCore elements, but it already works with the most important concepts.
One of the things that I needed to do was to create more environment variables to be used by output models, because there are mandatory elements in ABCD that don't correspond to any DarwinCore concept. However, they do correspond to TAPIR metadata elements.
The current TAPIR specification recognizes the following environment variables:
date timestamp dataSourceName accessPoint lastUpdated
I created these new ones in TapirLink, not only because of ABCD but also for the RSS2 output model:
dateCreated metadataLanguage dataSourceLanguage dataSourceDescription rights technicalContactName technicalContactEmail contentContactName contentContactEmail
If the specification doesn't include these variables, they will remain unofficial and this situation may not be good for future interoperability. So one of the things I would like to suggest is to include these variables in the controlled vocabulary defined by TAPIR.
Please let me know if this is OK and if you have more suggestions. (Markus, sometime ago there was the idea of including all TAPIR metadata as environment variables so that they could also be returned in search responses, would you like to suggest more variables?).
Remember that TAPIR environment variables are optional (providers don't need to support them) and extensible (providers can define new variables if they want to).
Here are two other minor changes suggested to the protocol and raised by issues that were found during the development of TAPIR tools for the GISIN network:
* Restrict POST requests to "application/x-www-form-urlencoded" (the default POST encoding). Reason: TAPIR requests can include multiple parameters with the same name ("concept" and "tagname" in the inventory operation) but some languages like PHP cannot handle this properly - only the last parameter becomes available from the $_REQUEST global variable. In these cases it is necessary to manually extract the parameters and this doesn't seem to be an easy task if POST is used with "multipart/form-data". Since this last encoding is mostly used for sending large quantities of binary data, which is definitely not the case of a TAPIR request, I think we can safely avoid it.
* Include an optional attribute "alias" in output models and query templates that are advertised in capabilities responses. The CNS configuration file already allows aliases for output models and query templates, but these are not present in capabilities responses as it happens with concept and schema aliases.
Please let me know if you have any comments, suggestions or objections to these changes, otherwise I'll take the liberty to make them in the next few days.
Best Regards, -- Renato
Renato, as you might have guessed all your proposed changes are fine with me.
Regarding the metadata variables I would make all existing metadata values available as TAPIR variables. Maybe we can leave out the indexing preferences. This would mean to add dc:subject, dct:bibliographicCitation and dct:modified
I am not sure yet how you will retrieve metadata values for RelatedEntities though. There might be multiple and do you have a simple rule in mind what to do in that case? That's the case for all contact data.
cheers -- Markus
On 15.05.2007, at 16:03, Renato De Giovanni wrote:
Dear all,
Yesterday I finally managed to produce valid ABCD output from an instance of TapirLink that has mapped DarwinCore concepts. It could be possible to achieve this before by using a simplified XML Schema for ABCD, but I wanted to use the official one. The corresponding output model (inside our repository under cs/dwc/1.4/models) still needs to include more mappings between ABCD and DarwinCore elements, but it already works with the most important concepts.
One of the things that I needed to do was to create more environment variables to be used by output models, because there are mandatory elements in ABCD that don't correspond to any DarwinCore concept. However, they do correspond to TAPIR metadata elements.
The current TAPIR specification recognizes the following environment variables:
date timestamp dataSourceName accessPoint lastUpdated
I created these new ones in TapirLink, not only because of ABCD but also for the RSS2 output model:
dateCreated metadataLanguage dataSourceLanguage dataSourceDescription rights technicalContactName technicalContactEmail contentContactName contentContactEmail
If the specification doesn't include these variables, they will remain unofficial and this situation may not be good for future interoperability. So one of the things I would like to suggest is to include these variables in the controlled vocabulary defined by TAPIR.
Please let me know if this is OK and if you have more suggestions. (Markus, sometime ago there was the idea of including all TAPIR metadata as environment variables so that they could also be returned in search responses, would you like to suggest more variables?).
Remember that TAPIR environment variables are optional (providers don't need to support them) and extensible (providers can define new variables if they want to).
Here are two other minor changes suggested to the protocol and raised by issues that were found during the development of TAPIR tools for the GISIN network:
- Restrict POST requests to "application/x-www-form-urlencoded" (the
default POST encoding). Reason: TAPIR requests can include multiple parameters with the same name ("concept" and "tagname" in the inventory operation) but some languages like PHP cannot handle this properly
- only
the last parameter becomes available from the $_REQUEST global variable. In these cases it is necessary to manually extract the parameters and this doesn't seem to be an easy task if POST is used with "multipart/form-data". Since this last encoding is mostly used for sending large quantities of binary data, which is definitely not the case of a TAPIR request, I think we can safely avoid it.
- Include an optional attribute "alias" in output models and query
templates that are advertised in capabilities responses. The CNS configuration file already allows aliases for output models and query templates, but these are not present in capabilities responses as it happens with concept and schema aliases.
Please let me know if you have any comments, suggestions or objections to these changes, otherwise I'll take the liberty to make them in the next few days.
Best Regards,
Renato
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Markus,
I agree we can add "subject" and "bibliographicalCitation" to the list of predefined variables. dct:modified is already present in "lastUpdated".
We never considered multiplicity of environment variables or relationships between them. In principle they should be just single independent values. My suggestion is to keep using this simple approach, unless we really need more complex functionality.
Anyway, adding more values to the official list of variables is the kind of change that we should always be able to do without bringing impact to the existing implementations. Perhaps we could be more strict by adding a prefix to all TAPIR predefined variables and then defining the corresponding type as the union between the list of prefixed values and any string that does not start with the prefix. This way at least we completely avoid name clashes with custom variables.
I'm using simple rules to get the content of technicalContactName and technicalContactEmail. I just get the first system administrator of the first technical host entity. In the case of contentContactName and contentContactEmail I get the first data administrator of the first data supplier entity. The specification should also be explicit about this.
Let me know if you have other ideas.
Best Regards, -- Renato
Renato, as you might have guessed all your proposed changes are fine with me.
Regarding the metadata variables I would make all existing metadata values available as TAPIR variables. Maybe we can leave out the indexing preferences. This would mean to add dc:subject, dct:bibliographicCitation and dct:modified
I am not sure yet how you will retrieve metadata values for RelatedEntities though. There might be multiple and do you have a simple rule in mind what to do in that case? That's the case for all contact data.
cheers
Markus
Renato, I fully agree to keep the variables simple. And if we specify one way of retrieving the contact details thats fine for me. I just wanted to point out this issue. I also dont mind to not have contact variables at all. In this case a provider simply needs to map to those few ABCD concepts in addition to pure darwin core. -- Markus
On 22.05.2007, at 01:12, Renato De Giovanni wrote:
Hi Markus,
I agree we can add "subject" and "bibliographicalCitation" to the list of predefined variables. dct:modified is already present in "lastUpdated".
We never considered multiplicity of environment variables or relationships between them. In principle they should be just single independent values. My suggestion is to keep using this simple approach, unless we really need more complex functionality.
Anyway, adding more values to the official list of variables is the kind of change that we should always be able to do without bringing impact to the existing implementations. Perhaps we could be more strict by adding a prefix to all TAPIR predefined variables and then defining the corresponding type as the union between the list of prefixed values and any string that does not start with the prefix. This way at least we completely avoid name clashes with custom variables.
I'm using simple rules to get the content of technicalContactName and technicalContactEmail. I just get the first system administrator of the first technical host entity. In the case of contentContactName and contentContactEmail I get the first data administrator of the first data supplier entity. The specification should also be explicit about this.
Let me know if you have other ideas.
Best Regards,
Renato
Renato, as you might have guessed all your proposed changes are fine with me.
Regarding the metadata variables I would make all existing metadata values available as TAPIR variables. Maybe we can leave out the indexing preferences. This would mean to add dc:subject, dct:bibliographicCitation and dct:modified
I am not sure yet how you will retrieve metadata values for RelatedEntities though. There might be multiple and do you have a simple rule in mind what to do in that case? That's the case for all contact data.
cheers
Markus
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
participants (2)
-
Markus Döring
-
Renato De Giovanni