[tdwg-tapir] Response content type specification

Dave Vieglais vieglais at ku.edu
Thu Nov 8 22:42:26 CET 2007


Hi Kevin,
you're good:

---
python responsetests.py -v 20 -p "
http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI?op=inventory&count=true&concept=http://rs.tdwg.org/ontology/voc/TaxonOccurrence#/rdf:RDF/to:TaxonOccurrence/to:collector
"

Testing:
http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI?op=inventory&count=true&concept=http://rs.tdwg.org/ontology/voc/TaxonOccurrence#/rdf:RDF/to:TaxonOccurrence/to:collector
INFO:root:[HTTP HEADER] content-length: 12019
INFO:root:[HTTP HEADER] x-powered-by: ASP.NET
INFO:root:[HTTP HEADER] set-cookie:
ASP.NET_SessionId=uxfg0l55colhnjmxzinwxp55; path=/; HttpOnly
INFO:root:[HTTP HEADER] x-aspnet-version: 2.0.50727
INFO:root:[HTTP HEADER] server: Microsoft-IIS/6.0
INFO:root:[HTTP HEADER] connection: close
INFO:root:[HTTP HEADER] pragma: no-cache:
INFO:root:[HTTP HEADER] cache-control: private
INFO:root:[HTTP HEADER] date: Thu, 08 Nov 2007 21:29:37 GMT
INFO:root:[HTTP HEADER] content-type: text/xml; charset="utf-8"
== Results ==
Test: "HTTP Status" [OK]: No Worries
Test: "Response Encoding" [OK]: No Worries
Test: "Document Encoding" [OK]: No Worries

---

The GBIF rest service doesn't do quite so well though:

---
python responsetests.py -v 20 -p "
http://newportal.gbif.org/ws/rest/provider/list?stylesheet=&maxresults=10"

Testing:
http://newportal.gbif.org/ws/rest/provider/list?stylesheet=&maxresults=10
INFO:root:[HTTP HEADER] date: Thu, 08 Nov 2007 21:37:01 GMT
INFO:root:[HTTP HEADER] transfer-encoding: chunked
INFO:root:[HTTP HEADER] connection: close
INFO:root:[HTTP HEADER] content-type: text/xml
INFO:root:[HTTP HEADER] server: Apache/2.0.52 (Red Hat)
== Results ==
Test: "HTTP Status" [OK]: No Worries
Test: "Response Encoding" [WARNING]: No character encoding was specified for
the text/[*+]xml content type | "Content-type: text/xml"
Test: "Response Encoding" [WARNING]: Falling back to the RFC 3023 default of
us-ascii character encoding.
Test: "Document Encoding" [ERROR]: Bozo exception. | Document declared as
us-ascii, but parsed as utf-8

---

These tests (plus a whole bunch more) will be available as an online service
real soon, with the intent being to help data providers test their service
fairly rigorously before  exposing them to the real world.

regards,
 Dave V.


On Nov 8, 2007 1:46 PM, Kevin Richards <RichardsK at landcareresearch.co.nz>
wrote:
>
>
> Good spotting Dave.
>
> This is fixed now in the TapirDotNET implementation of Tapir.
> I have updated the HerbIMI TapirDotNET implementation at
> http://lsid.herbimi.info/TapirDotNET/tapir.aspx/herbIMI
> So if you could like to check this provider, I would be interested in your
> results.
>
> Kevin
>
> >>> "Dave Vieglais" <vieglais at ku.edu> 9/11/2007 7:59 a.m. >>>
>
>
> Hi Everyone,
> I've come across a minor issue with some existing TAPIR installations
> that should be easily fixed and will likely save some frustrations
> down the road.
>
> The TAPIR spec
> (
http://www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-07-18.html#toc16
)
> indicates a response Content-type of "text/xml".  RFC 3023
> (http://www.ietf.org/rfc/rfc3023.txt) indicates that in this case,
> when no "charset" parameter is specified in the HTTP response header,
> the implied character encoding of the response document is "us-ascii"
> (see s8.5).
>
> so for example:
>
> Good:
>   response header =  Content-type: text/xml; charset="utf-8"
>
>   response document signature =  <?xml version="1.0" encoding="utf-8"?>
>
> result = document is assumed to be UTF-8
>
> Not so good:
>   response header =  Content-type: text/xml
>
>   response document signature =  <?xml version="1.0" encoding="utf-8"?>
>
> result = document is assumed to be us-ascii
>
>
> All TAPIR installations that I've examined so far do not set a charset
> value, and hence the character encoding of "us-ascii" is assumed by
> the consumer application, which is likely to cause some issues for
> consumer applications.  This was also a significant issue for DiGIR
> provider installations.
>
> The solution is likely to be quite simple, and there seems to be two
> basic options:
>
> 1. Configure the webserver / application to insert a charset value of
> "UTF-8" to avoid the consumer falling back to the default of us-ascii.
>
> or
>
> 2. Return a Content-type of "Application/xml" or one of its subtypes.
> In this case RFC 3023 indicates the default character encoding should
> be assumed to be UTF-8.
>
> Note that simply specifying the content type does not automatically
> make the response properly encoded - it is still up to the web
> application (TAPIR in this case) to ensure that the output stream is
> actually UTF-8 encoded.
>
> regards,
>   Dave V.
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20071108/d541175a/attachment.html 


More information about the tdwg-tag mailing list