[tdwg-tapir] Response content type specification
Hi Everyone, I've come across a minor issue with some existing TAPIR installations that should be easily fixed and will likely save some frustrations down the road.
The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-...) indicates a response Content-type of "text/xml". RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case, when no "charset" parameter is specified in the HTTP response header, the implied character encoding of the response document is "us-ascii" (see s8.5).
so for example:
Good: response header = Content-type: text/xml; charset="utf-8"
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be UTF-8
Not so good: response header = Content-type: text/xml
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be us-ascii
All TAPIR installations that I've examined so far do not set a charset value, and hence the character encoding of "us-ascii" is assumed by the consumer application, which is likely to cause some issues for consumer applications. This was also a significant issue for DiGIR provider installations.
The solution is likely to be quite simple, and there seems to be two basic options:
1. Configure the webserver / application to insert a charset value of "UTF-8" to avoid the consumer falling back to the default of us-ascii.
or
2. Return a Content-type of "Application/xml" or one of its subtypes. In this case RFC 3023 indicates the default character encoding should be assumed to be UTF-8.
Note that simply specifying the content type does not automatically make the response properly encoded - it is still up to the web application (TAPIR in this case) to ensure that the output stream is actually UTF-8 encoded.
regards, Dave V.
Good spotting Dave.
This is fixed now in the TapirDotNET implementation of Tapir. I have updated the HerbIMI TapirDotNET implementation at http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI So if you could like to check this provider, I would be interested in your results.
Kevin
"Dave Vieglais" vieglais@ku.edu 9/11/2007 7:59 a.m. >>>
Hi Everyone, I've come across a minor issue with some existing TAPIR installations that should be easily fixed and will likely save some frustrations down the road.
The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-...) indicates a response Content-type of "text/xml". RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case, when no "charset" parameter is specified in the HTTP response header, the implied character encoding of the response document is "us-ascii" (see s8.5).
so for example:
Good: response header = Content-type: text/xml; charset="utf-8"
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be UTF-8
Not so good: response header = Content-type: text/xml
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be us-ascii
All TAPIR installations that I've examined so far do not set a charset value, and hence the character encoding of "us-ascii" is assumed by the consumer application, which is likely to cause some issues for consumer applications. This was also a significant issue for DiGIR provider installations.
The solution is likely to be quite simple, and there seems to be two basic options:
1. Configure the webserver / application to insert a charset value of "UTF-8" to avoid the consumer falling back to the default of us-ascii.
or
2. Return a Content-type of "Application/xml" or one of its subtypes. In this case RFC 3023 indicates the default character encoding should be assumed to be UTF-8.
Note that simply specifying the content type does not automatically make the response properly encoded - it is still up to the web application (TAPIR in this case) to ensure that the output stream is actually UTF-8 encoded.
regards, Dave V. _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Kevin, you're good:
--- python responsetests.py -v 20 -p " http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI?op=inventory&cou... "
Testing: http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI?op=inventory&cou... INFO:root:[HTTP HEADER] content-length: 12019 INFO:root:[HTTP HEADER] x-powered-by: ASP.NET INFO:root:[HTTP HEADER] set-cookie: ASP.NET_SessionId=uxfg0l55colhnjmxzinwxp55; path=/; HttpOnly INFO:root:[HTTP HEADER] x-aspnet-version: 2.0.50727 INFO:root:[HTTP HEADER] server: Microsoft-IIS/6.0 INFO:root:[HTTP HEADER] connection: close INFO:root:[HTTP HEADER] pragma: no-cache: INFO:root:[HTTP HEADER] cache-control: private INFO:root:[HTTP HEADER] date: Thu, 08 Nov 2007 21:29:37 GMT INFO:root:[HTTP HEADER] content-type: text/xml; charset="utf-8" == Results == Test: "HTTP Status" [OK]: No Worries Test: "Response Encoding" [OK]: No Worries Test: "Document Encoding" [OK]: No Worries
---
The GBIF rest service doesn't do quite so well though:
--- python responsetests.py -v 20 -p " http://newportal.gbif.org/ws/rest/provider/list?stylesheet=&maxresults=1..."
Testing: http://newportal.gbif.org/ws/rest/provider/list?stylesheet=&maxresults=1... INFO:root:[HTTP HEADER] date: Thu, 08 Nov 2007 21:37:01 GMT INFO:root:[HTTP HEADER] transfer-encoding: chunked INFO:root:[HTTP HEADER] connection: close INFO:root:[HTTP HEADER] content-type: text/xml INFO:root:[HTTP HEADER] server: Apache/2.0.52 (Red Hat) == Results == Test: "HTTP Status" [OK]: No Worries Test: "Response Encoding" [WARNING]: No character encoding was specified for the text/[*+]xml content type | "Content-type: text/xml" Test: "Response Encoding" [WARNING]: Falling back to the RFC 3023 default of us-ascii character encoding. Test: "Document Encoding" [ERROR]: Bozo exception. | Document declared as us-ascii, but parsed as utf-8
---
These tests (plus a whole bunch more) will be available as an online service real soon, with the intent being to help data providers test their service fairly rigorously before exposing them to the real world.
regards, Dave V.
On Nov 8, 2007 1:46 PM, Kevin Richards RichardsK@landcareresearch.co.nz wrote:
Good spotting Dave.
This is fixed now in the TapirDotNET implementation of Tapir. I have updated the HerbIMI TapirDotNET implementation at http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI So if you could like to check this provider, I would be interested in your results.
Kevin
"Dave Vieglais" vieglais@ku.edu 9/11/2007 7:59 a.m. >>>
Hi Everyone, I've come across a minor issue with some existing TAPIR installations that should be easily fixed and will likely save some frustrations down the road.
The TAPIR spec (
http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-... )
indicates a response Content-type of "text/xml". RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case, when no "charset" parameter is specified in the HTTP response header, the implied character encoding of the response document is "us-ascii" (see s8.5).
so for example:
Good: response header = Content-type: text/xml; charset="utf-8"
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be UTF-8
Not so good: response header = Content-type: text/xml
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be us-ascii
All TAPIR installations that I've examined so far do not set a charset value, and hence the character encoding of "us-ascii" is assumed by the consumer application, which is likely to cause some issues for consumer applications. This was also a significant issue for DiGIR provider installations.
The solution is likely to be quite simple, and there seems to be two basic options:
- Configure the webserver / application to insert a charset value of
"UTF-8" to avoid the consumer falling back to the default of us-ascii.
or
- Return a Content-type of "Application/xml" or one of its subtypes.
In this case RFC 3023 indicates the default character encoding should be assumed to be UTF-8.
Note that simply specifying the content type does not automatically make the response properly encoded - it is still up to the web application (TAPIR in this case) to ensure that the output stream is actually UTF-8 encoded.
regards, Dave V. _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Dave, Thanks for pointing this out. Should be fixed in PyWrapper now too. Markus
"Dave Vieglais" wrote on 08.11.2007 19:59 Uhr:
Hi Everyone, I've come across a minor issue with some existing TAPIR installations that should be easily fixed and will likely save some frustrations down the road.
The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-... 8.html#toc16) indicates a response Content-type of "text/xml". RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case, when no "charset" parameter is specified in the HTTP response header, the implied character encoding of the response document is "us-ascii" (see s8.5).
so for example:
Good: response header = Content-type: text/xml; charset="utf-8"
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be UTF-8
Not so good: response header = Content-type: text/xml
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be us-ascii
All TAPIR installations that I've examined so far do not set a charset value, and hence the character encoding of "us-ascii" is assumed by the consumer application, which is likely to cause some issues for consumer applications. This was also a significant issue for DiGIR provider installations.
The solution is likely to be quite simple, and there seems to be two basic options:
- Configure the webserver / application to insert a charset value of
"UTF-8" to avoid the consumer falling back to the default of us-ascii.
or
- Return a Content-type of "Application/xml" or one of its subtypes.
In this case RFC 3023 indicates the default character encoding should be assumed to be UTF-8.
Note that simply specifying the content type does not automatically make the response properly encoded - it is still up to the web application (TAPIR in this case) to ensure that the output stream is actually UTF-8 encoded.
regards, Dave V. _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Hi Dave,
It's now fixed in TapirLink too - thanks for pointing out the problem.
I also added a recommendation in the TAPIR spec for not omitting the charset in the HTTP header. I'll still wait for other amendments before publishing the next version of the document.
Best Regards, -- Renato
On 8 Nov 2007 at 10:59, Dave Vieglais wrote:
Hi Everyone, I've come across a minor issue with some existing TAPIR installations that should be easily fixed and will likely save some frustrations down the road.
The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-...) indicates a response Content-type of "text/xml". RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case, when no "charset" parameter is specified in the HTTP response header, the implied character encoding of the response document is "us-ascii" (see s8.5).
so for example:
Good: response header = Content-type: text/xml; charset="utf-8"
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be UTF-8
Not so good: response header = Content-type: text/xml
response document signature = <?xml version="1.0" encoding="utf-8"?>
result = document is assumed to be us-ascii
All TAPIR installations that I've examined so far do not set a charset value, and hence the character encoding of "us-ascii" is assumed by the consumer application, which is likely to cause some issues for consumer applications. This was also a significant issue for DiGIR provider installations.
The solution is likely to be quite simple, and there seems to be two basic options:
- Configure the webserver / application to insert a charset value of
"UTF-8" to avoid the consumer falling back to the default of us-ascii.
or
- Return a Content-type of "Application/xml" or one of its subtypes.
In this case RFC 3023 indicates the default character encoding should be assumed to be UTF-8.
Note that simply specifying the content type does not automatically make the response properly encoded - it is still up to the web application (TAPIR in this case) to ensure that the output stream is actually UTF-8 encoded.
regards, Dave V.
participants (5)
-
Dave Vieglais
-
Donald Hobern
-
Döring, Markus
-
Kevin Richards
-
Renato De Giovanni