Jero.net

HTML support

One of the arguments I use when I try to convince someone that they should use HTML instead, is the fact that IE doesn't support real XHTML. With XHTML obviously mean XHTML served with the application/xhtml+xml media type. However, when you take a closer look at HTML, you'll see that no user agent supports real HTML.

Lets go back in history, way back. Back to the year 1967. In that ancient time the concept of "markup languages" was born. Born by the hands of William W. Tunnicliffe. However, it was Charles Goldfarb who came up with the first widely-used descriptive markup language: SGML.

As you probably know, HTML is in theory an SGML application. In fact, the relation between HTML and SGML is exactly the same as the relation between XML and XHTML. HTML is an SGML application as XHTML is an XML namespace. However, in the real world the relation between SGML and HTML is far from found. HTML has broken up with its mother and is now living by itself.

Of course, every piece of HTML code that validates according to the HTML Validator will also validate according to an SGML validator (link to an online SGML validator, anyone?). So nothing has really changed to HTML over the years. The W3C has of course taken good care of this. However, over the years things did not change, things got lost.

When we take a look at the SGML declaration of HTML 4, you'll see a document that Harry Potter probably even can't decipher. Apparently, some people and machines can. Unfortunately, I'm not among these people. However, I do have enough knowledge about SGML declarations to know that the part at the bottom (the FEATURES clause) is fairly interesting. As you can guess, this part tells the parser what SGML features are allowed in this SGML application.

You might have read about the OMITTAG and SHORTTAG features before. The HTML Validator mentions them in its error messages. However, there's still a big chance you've never heard or read about them before. No worries, because I'll give you an explanation of these features.

OMITTAG
The OMITTAG feature allows tag omission. As you might know, XML requires all elements to have start and end tags. However, in HTML this is different. The reason is because HTML has OMITTAG set to YES in its SGML declaration. This means you can leave out certain start and end tags for some elements. An example is the LI element. The end tag for this element can be left out. So the code below is 100% valid HTML:
<ul>
 <li>Item 1
 <li>Item 2
 <li>Item 3
</ul>
SHORTTAG
SHORTTAG is even stranger than OMITTAG. SHORTTAG allows multiple things to make your markup even shorter. The first is allowing the author to use empty tags. With this, we can make the previous example even smaller:
<ul>
 <li>Item 1
 <>Item 2
 <>Item 3
</>

Another feature we can use is keeping our tags unclosed. What this means is that you can forget about the > character if you have multiple tags next to each other. So lets use this to make out previous example again somewhat smaller:

<ul<li>Item 1
 <>Item 2
 <>Item 3
</>

Last but not least, we have a feature called "null end-tags". What this means is that instead of using a closing tag like </element> or </>, we can just use the forward slash ("/") instead. Unfortunately we can not use this on our previous example. We already managed to decrease the amount of characters with 32%, so I guess we should be happy with that result. However, lets use a different example to show how null end-tags work:

<p/This is some text/

This is exactly the same as a normal P element, but here we use the / to close the start tag instead of the > character. And of course the entire end tag is replaced by the / character.

So now that you understand these two features, lets see what popular user agents do with them. In order to do this, I created several documents (or testcases) which can be used to check how user agents handle them. I tested these testcases only on Windows with Firefox, Opera and Internet Explorer. Results for other browsers can posted in the comments.

In the first testcase which tests OMITTAG, you'll see that no browser has any trouble with using OMITTAG. Not even IE! Suprised? I'm not. Just look at the IMG element. Ever closed it? Don't think so... So basically if you're coding HTML, you're always using this feature. Bet you didn't know that.

Anyway, when you look at the testcase for SHORTTAG, you'll see that we're not so fortunate. The lang attribute, which leaves out the quotation marks, is the only feature of SHORTTAG that is supported by all browsers. All others are giving the browsers a hard time. Firefox seems to be the only browser to render the TITLE element correctly, but by taking a look at the DOM Inspector, you'll see that that's because it thinks </</> is a comment (very odd).

So as you can see, after almost 40 years, modern user agents are still incapable of handling SHORTTAG. I find this pretty sad to be honest. If it were possible, I'd use it on some occasions, because I'm forced to use HTML anyway. But I do realise that SHORTTAG is only one of the many things that SGML (and thus also HTML) has to offer. However, with this article I'm not trying to tell that we should actually use XHTML unlike I said earlier. This article is just to show everyone that even HTML, the one markup language we can safely rely on because we "know" it's "widely supported", is not well-supported at all. A great example of the current state of user agents this is…