No to XHTML

Introduction

Quite a few people have jumped on the XHTML bandwagon, this article examines the claims made about XHTML, and it lists the problems that arise from using it.

Claims made about XHTML

  • XHTML is clean markup, HTML uses deprecated code like <font> tags

    An amazing amount of people associate XHTML with clean non presentational markup, and HTML with presentational markup that uses deprecated code such as <font> tags. Sadly this includes quite a few authors of books on (X)HTML and/or CSS. XHTML 1.0 under a Transitional doctype allows presentational deprecated code like <font> tags, and there is nothing to stop authors from coding HTML without the deprecated presentational code.

  • It's stricter.

    There's nothing to stop an author from applying the same "strictness" to HTML. This is purely down to the author, it doesn't need a stricter DTD. You prefer having closing paragraph tags? What's stopping you from using them? They are valid under HTML. For those that like the idea that validation warns them if they for example forget to close a paragraph element, it's not difficult to validate using a custom HTML DTD that is as "strict" as the XHTML DTD. A short guide to validating against a custom DTD.

  • W3C recommends it.

    They don't, the rules for both HTML and XHTML have been laid out in W3C Recommendations, but W3C does not recommend one over the other.

  • Lot's of other sites use it, so it must be good.

    Lot's of Lemmings are jumping off cliffs, do you want to be a Lemming?

  • I serve XHTML because I use XML authoring tools.

    If your authoring process benefits from using XML then by all means use it. It does not form an argument for serving XHTML to clients. Anyone using XML tools knows how easy it is to generate HTML from X(HT)ML.

  • It promotes web standards.

    HTML 4.01 Strict is also a web standard. For those with a craving for displaying those silly "Valid XHTML" buttons on their pages: users shouldn't be bothered with useless information about the site's mechanics onto the main content pages.

  • It's the future, I want to prepare for it now.

    Forgive me for being sceptical about people who claim that they can see into the future (unless you've managed to steal the All Seeing Eye from your Deity). XHTML may well turn out to be a dead duck with Web Applications 1.0 aka HTML 5 as developed by the WhatWG as an alternative way forward.

    Current UAs are all native tag soup slurpers that at best have a XML parser tacked onto them. It's inconceivable that a client meant to handle web content could consist of an XHTML parser only given the enormous amount of text/html content on the web, so this will remain to be the case for the foreseeable future.

    Richard Cornford wrote a very informative post on the DOM & future proofing of XHTML served as text/html that shatters any illusions on this issue.

  • XHTML will clean up the web, it has to be well formed to be successfully parsed by the client.
    • This only applies if XHTML is served as application/xhtml+xml (read on to find out about the problems caused by serving XHTML as application/xhtml+xml).
    • The real problem with the content on the web today is that it's not properly structured and semantically marked up. Validity is far less important, well formdness is merely a technical requirement that stems from the way X(HT)ML is parsed, it has no relevance or benefit otherwise.
  • It's better for resource strapped devices such as cellphones

    This might apply if XHTML is served as application/xhtml+xml, but such devices would only benefit if they could be equipped with a XML parser only, which in turn would exclude them from accessing HTML. Given that Opera has proven that it's possible to use a full blown HTML client onto current day mobile devices, how likely do you think it is that we'll see mobile devices with a XML only parser? Would you buy one if you could also get one with a full blown HTML client that can access far more content? With advances in mobile hardware capabilities this scenario will only become less likely.

  • XHTML allows mixed namespace content

    This only applies if the XHTML is served as application/xhtml+xml as in this XHTML and SVG demo. Achieving the same result with HTML is done by embedding content as in this HTML and SVG demo. I'm not aware of any significant advantage that the mixed namespace method has over the embedding method.

The problems with XHTML

XHTML 1.0 Transitional

People authoring or converting HTML documents to XHTML 1.0 Transitional claiming that it's "the latest standard" has got to be one of the great idiocies on the web today. Properly structured and semantically marked up documents without presentational markup is what it's all about. Authoring to a Strict standard should be the first step to achieve that. If you want to raise a flag for ignorance then there's nothing better than the use of XHTML 1.0 Transitional coding, naturally proudly displaying those sexy "Valid XHTML" buttons on all your pages.

Serving XHTML Strict 1.0 as text/html

This is at best pointless, UAs will treat it as tag soup. You'll kick IE and some versions of Opera into quirks mode if you add the XML declaration on top of your documents as W3C recommends, omit the XML declaration and the document can only use the default character encodings UTF-8 or UTF-16.

Serving XHTML 1.1 as text/html

This violates W3C guidelines, the exemption that XHTML that follows Appendix C guidelines may be served as text/html only applies to XHTML 1.0.

Serving XHTML as application/xhtml+xml

A number of problems result from this:

  • IE doesn't support XHTML served as such, it will prompt the user on what to do with it. [Demo]
  • It's not supported as well as HTML on UAs that are in principle capable of handling XHTML, for example entity references are not supported by various UAs that are capable of rendering XHTML as XHTML.
  • Javascript document.write can't be used, it would need to be replaced by convoluted DOM methods.
  • Gecko based UAs (Mozilla family of browsers) are currently not capable of incrementally rendering XHTML served as such. This is means that a Gecko based UA will not render anything until the document has completely downloaded. At best this will delay document rendering, if for example due to a network problem a document doesn't completely download it prevents document rendering full stop. It's amazing how people can argue against the use of tables for layouts using the argument that it typically causes reflows as all content downloads, but these same people then happily serve true XHTML to Gecko causing an even more serious problem. Note that Mozilla themselves recommend not using XHTML and serve HTML 4.01 instead.

Content negotiation

Some have resorted to server-client content negotiation so that they can serve XHTML as application/xhtml+xml to UAs that claim an ability to handle it by parsing UA http accept header strings, if the UA claims an ability to handle XHTML they get served XHTML with the proper application/xhtml+xml content type, if not then the client gets served XHTML with text/html as the content type, or they get their server to generate HTML code on the fly and serve that to clients.

This approach suffers from a number of problems:

  • UA content type header strings are unreliable, the most notorious one being IE, it claims to handle everything by including */* in it's accept header. That can be worked around, but even well implemented content negotiation cannot guarantee that you won't trip one or more UAs up, you can't test the http accept header strings of all existing UAs.
  • Reflecting a diminished capacity to render XHTML served as such, some versions of Opera 7 indicate in their http accept header that they prefer text/html, properly implemented content negotiation will respect that by serving it text/html.
  • Generating XHTML or HTML on the fly depending on the client's accept header causes server overhead and throws up cache issues.
  • In another very informative post, Richard Cornford explains the scripting issues with content negotiation.

Gecko based browsers are the only client with a reasonable market share that you could serve true XHTML to, and as previously mentioned this causes problems for users because Gecko currently does not render XHTML incrementally.

Conclusion

All these problems, and for what? So that you can claim to use "the latest standard"? Say it with me: oh please ... Just say no to XHTML.