[tdwg-tapir] Response content type specification

Kevin Richards RichardsK at landcareresearch.co.nz
Thu Nov 8 22:46:25 CET 2007

Good spotting Dave.

This is fixed now in the TapirDotNET implementation of Tapir.
I have updated the HerbIMI TapirDotNET implementation at http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI
So if you could like to check this provider, I would be interested in your results.


>>> "Dave Vieglais" <vieglais at ku.edu> 9/11/2007 7:59 a.m. >>>

Hi Everyone,
I've come across a minor issue with some existing TAPIR installations
that should be easily fixed and will likely save some frustrations
down the road.

The TAPIR spec (http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-18.html#toc16)
indicates a response Content-type of "text/xml".  RFC 3023
(http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case,
when no "charset" parameter is specified in the HTTP response header,
the implied character encoding of the response document is "us-ascii"
(see s8.5).

so for example:

  response header =  Content-type: text/xml; charset="utf-8"

  response document signature =  <?xml version="1.0" encoding="utf-8"?>

result = document is assumed to be UTF-8

Not so good:
  response header =  Content-type: text/xml

  response document signature =  <?xml version="1.0" encoding="utf-8"?>

result = document is assumed to be us-ascii

All TAPIR installations that I've examined so far do not set a charset
value, and hence the character encoding of "us-ascii" is assumed by
the consumer application, which is likely to cause some issues for
consumer applications.  This was also a significant issue for DiGIR
provider installations.

The solution is likely to be quite simple, and there seems to be two
basic options:

1. Configure the webserver / application to insert a charset value of
"UTF-8" to avoid the consumer falling back to the default of us-ascii.


2. Return a Content-type of "Application/xml" or one of its subtypes.
In this case RFC 3023 indicates the default character encoding should
be assumed to be UTF-8.

Note that simply specifying the content type does not automatically
make the response properly encoded - it is still up to the web
application (TAPIR in this case) to ensure that the output stream is
actually UTF-8 encoded.

  Dave V.
tdwg-tapir mailing list
tdwg-tapir at lists.tdwg.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20071109/003f3d4c/attachment.html 

More information about the tdwg-tag mailing list