[tdwg-tapir] Response content type specification

Renato De Giovanni renato at cria.org.br
Sat Nov 10 01:27:43 CET 2007

Hi Dave,

It's now fixed in TapirLink too - thanks for pointing out the 

I also added a recommendation in the TAPIR spec for not omitting the 
charset in the HTTP header. I'll still wait for other amendments 
before publishing the next version of the document.

Best Regards,

On 8 Nov 2007 at 10:59, Dave Vieglais wrote:

> Hi Everyone,
> I've come across a minor issue with some existing TAPIR installations
> that should be easily fixed and will likely save some frustrations
> down the road.
> The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-18.html#toc16)
> indicates a response Content-type of "text/xml".  RFC 3023
> (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case,
> when no "charset" parameter is specified in the HTTP response header,
> the implied character encoding of the response document is "us-ascii"
> (see s8.5).
> so for example:
> Good:
>   response header =  Content-type: text/xml; charset="utf-8"
>   response document signature =  <?xml version="1.0" encoding="utf-8"?>
> result = document is assumed to be UTF-8
> Not so good:
>   response header =  Content-type: text/xml
>   response document signature =  <?xml version="1.0" encoding="utf-8"?>
> result = document is assumed to be us-ascii
> All TAPIR installations that I've examined so far do not set a charset
> value, and hence the character encoding of "us-ascii" is assumed by
> the consumer application, which is likely to cause some issues for
> consumer applications.  This was also a significant issue for DiGIR
> provider installations.
> The solution is likely to be quite simple, and there seems to be two
> basic options:
> 1. Configure the webserver / application to insert a charset value of
> "UTF-8" to avoid the consumer falling back to the default of us-ascii.
> or
> 2. Return a Content-type of "Application/xml" or one of its subtypes.
> In this case RFC 3023 indicates the default character encoding should
> be assumed to be UTF-8.
> Note that simply specifying the content type does not automatically
> make the response properly encoded - it is still up to the web
> application (TAPIR in this case) to ensure that the output stream is
> actually UTF-8 encoded.
> regards,
>   Dave V.

More information about the tdwg-tag mailing list