Jero.net



HTML vs. XHTML

Most of the web authors who code their websites by hand have switched to XHTML the last few years. Most of them used the fact that it was newer than HTML and that it was an XML application (which was cool, because it's new as well) as a motive to switch to XHTML. They were right of course, but is new always the best option?

XHTML was designed as an XML application. We all know what XML is. If not, there's always Google. HTML, however, is an SGML application. XML is based on SGML, but still has some major differences which, of course, also apply to both markup languages: XHTML and HTML. Let's have a look at them:

That means no more tag soup code and the use of ownage namespaces like SVG. Oh wait, I think I forgot something… Of course: Internet Explorer (abbreviated as IE). The very browser who was the first to support CSS1, bits of CSS2 and brought us AJAX, but is now the most hated browser in web design land.

After several years, IE still doesn't support XHTML properly. With proper XHTML I mean XHTML served with the correct MIME type. A MIME type is a string that defines what kind of filetype the file is. HTML has the MIME type text/html. XHTML has the MIME type application/xhtml+xml. So because IE doesn't support the MIME type application/xhtml+xml, you can't use real XHTML in IE. The "solution" most web authors find is to just send their XHTML document as text/html. However, when a browser reads the file, it thinks it's an HTML document and not an XHTML document. Because of this, you can not use XHTML with the proper MIME type if you don't want to screw over all IE users.

Note: You might think that the DOCTYPE would let the browser know you're using XHTML, but that's not the case. The DOCTYPE has no effect on that at all. The DOCTYPE doesn't serve any purpose in a document besides the fact that it's acting cool sitting on the first line of the document. So the MIME type tells the browser which markup language you're using, not the DOCTYPE.

Some of you might wonder what the problem is with sending XHTML as text/html. The problem is very simple: it's not XHTML anymore. You're sending XHTML as text/html, so the browser thinks it's an HTML document. This means that you can't use any of XHTML's advantages. No more strictness that prevents a web author from writing tag soup and no more support for namespaces. But besides that, it's also invalid HTML. All XHTML DOCTYPEs are not allowed by the HTML 4.01 specification. The same obviously also apply to both the xmlns and the xml:lang attribute.

Also, imagine this piece of code: <img src="./pic" alt="…"/>. This is of course well-formed XHTML, but in HTML, we've got a problem. As I stated before, HTML is an SGML application. SGML was a complex markup language with many features. One of them was a feature called "Shorttag". This feature allowed the character / to be used to open and close tags like this: <strong/some text/. So if we look back at the XHTML example, it should be interperted as <img src="./pic" alt="…">> by confirming SGML parsers which should be the case if the document is sent as text/html. Luckily for most web authors, browsers don't parse HTML as SGML because the parsers are raped to work with tag soup instead leaving us with no problem. Unfortunately for web authors who actually care about the HTML standard, serving well-formed XHTML documents as text/html is different than you intended them to be.

As you can see, there are quite reasons to not use XHTML served as text/html. Lets summarize them:

So I made the decision use HTML instead of XHTML and I don't think that's bound to change soon. There are just no reasons for me to use XHTML instead besides that it will likely be what we'll be using in the future. Unfortunately, the future is not now because we're still screwed with browsers specialized in parsing tag soup and web authors who have no idea what they're doing. Maybe a really interesting XML namespace will persuade me to change to XHTML but I haven't seen one yet.

Right now, I'm sure some of you are waiting for the line where I say that content negotiating is the solution for all problems. Unfortunatelty, it's not that easy. As I previously stated, there are quite some differences between HTML and XHTML so scoping the Accept header for application/xhtml+xml and changing the DOCTYPE and start tag (adding the xmlns and xml:lang) attributes is clearly not enough.

First of all, you need to make sure your scripts are compatible with both application/xhtml+xml and text/html. In XHTML, the document.documentElement.nodeName method returns html while in HTML it would return HTML. document.write() also doesn't work in XHTML.

When you're done fixing your scripts, you're definitely not done yet. By looking at the XHTML 1.0 DTD you'll see the content of both the SCRIPT and STYLE elements should be treated as PCDATA instead of CDATA as we were used to in HTML. Crap indeed. In order to make sure your CSS and scripts work correctly in XHTML, we have to put the content of these elements in CDATA marked sections like this:

<script type="application/javascript">
<![CDATA[
 …script…
]]>
</script>

Of course, you have to make sure these CDATA marked sections are not there when the document is text/html because that would result in invalid HTML.

Congrats! You've now successfully made sure your site works as both application/xhtml+xml and text/html! But why!? Why do you want it so badly!? You still got no advantage compared to text/html because you still can't use any of XML's powers. And that, only that, is the only difference between HTML 4.01 and XHTML 1.0. You're not helping the user with this, only your own satisfaction. And if you made an error in your content negotiation script, you're even in risk of screwing over some of your potential customers.

Although the Appendix C of the XHTML 1.0 Specification tells us we're allowed to use text/html as the MIME type to serve XHTML because XHTML is supposed to be backwards compatible with HTML, I still hold to my conclusion: XHTML should not be sent as text/html. This makes it invalid HTML because HTML 4.01 is simply not forward compatible with XHTML.