[tdwg-tapir] Response content type specification

Renato De Giovanni renato at cria.org.br
Sat Nov 10 01:27:43 CET 2007


Hi Dave,

It's now fixed in TapirLink too - thanks for pointing out the 
problem.

I also added a recommendation in the TAPIR spec for not omitting the 
charset in the HTTP header. I'll still wait for other amendments 
before publishing the next version of the document.

Best Regards,
--
Renato

On 8 Nov 2007 at 10:59, Dave Vieglais wrote:

> Hi Everyone,
> I've come across a minor issue with some existing TAPIR installations
> that should be easily fixed and will likely save some frustrations
> down the road.
> 
> The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-18.html#toc16)
> indicates a response Content-type of "text/xml".  RFC 3023
> (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case,
> when no "charset" parameter is specified in the HTTP response header,
> the implied character encoding of the response document is "us-ascii"
> (see s8.5).
> 
> so for example:
> 
> Good:
>   response header =  Content-type: text/xml; charset="utf-8"
> 
>   response document signature =  <?xml version="1.0" encoding="utf-8"?>
> 
> result = document is assumed to be UTF-8
> 
> Not so good:
>   response header =  Content-type: text/xml
> 
>   response document signature =  <?xml version="1.0" encoding="utf-8"?>
> 
> result = document is assumed to be us-ascii
> 
> 
> All TAPIR installations that I've examined so far do not set a charset
> value, and hence the character encoding of "us-ascii" is assumed by
> the consumer application, which is likely to cause some issues for
> consumer applications.  This was also a significant issue for DiGIR
> provider installations.
> 
> The solution is likely to be quite simple, and there seems to be two
> basic options:
> 
> 1. Configure the webserver / application to insert a charset value of
> "UTF-8" to avoid the consumer falling back to the default of us-ascii.
> 
> or
> 
> 2. Return a Content-type of "Application/xml" or one of its subtypes.
> In this case RFC 3023 indicates the default character encoding should
> be assumed to be UTF-8.
> 
> Note that simply specifying the content type does not automatically
> make the response properly encoded - it is still up to the web
> application (TAPIR in this case) to ensure that the output stream is
> actually UTF-8 encoded.
> 
> regards,
>   Dave V.




More information about the tdwg-tag mailing list