One of the arguments I use when I try to convince someone that they should
use HTML instead, is the fact
that IE doesn't support real
XHTML. With XHTML
obviously mean XHTML served with the application/xhtml+xml media
type. However, when you take a closer look at HTML, you'll see that no
user agent supports real HTML.
Lets go back in history, way back. Back to the year 1967. In that ancient time the concept of "markup languages" was born. Born by the hands of William W. Tunnicliffe. However, it was Charles Goldfarb who came up with the first widely-used descriptive markup language: SGML.
As you probably know, HTML is in theory an SGML application. In fact, the relation between HTML and SGML is exactly the same as the relation between XML and XHTML. HTML is an SGML application as XHTML is an XML namespace. However, in the real world the relation between SGML and HTML is far from found. HTML has broken up with its mother and is now living by itself.
Of course, every piece of HTML code that validates according to the HTML Validator will also validate according to an SGML validator (link to an online SGML validator, anyone?). So nothing has really changed to HTML over the years. The W3C has of course taken good care of this. However, over the years things did not change, things got lost.
When we take a look at the SGML
declaration of HTML 4, you'll see a document that Harry Potter probably
even can't decipher. Apparently, some people and machines can. Unfortunately,
I'm not among these people. However, I do have enough knowledge about SGML
declarations to know that the part at the bottom (the FEATURES
clause) is fairly interesting. As you can guess, this part tells the parser
what SGML features are allowed in this SGML application.
You might have read about the OMITTAG and SHORTTAG
features before. The HTML Validator mentions them in its error
messages. However, there's still a big chance you've never heard or read
about them before. No worries, because I'll give you an explanation of these
features.
OMITTAG
OMITTAG feature allows tag omission. As you might know,
XML requires all elements to have start and end tags. However, in HTML this is
different. The reason is because HTML has OMITTAG set to
YES in its SGML declaration. This means you can leave out certain
start and end tags for some elements. An example is the LI element.
The end tag for this element can be left out. So the code below is 100% valid
HTML:
<ul> <li>Item 1 <li>Item 2 <li>Item 3 </ul>
SHORTTAG
SHORTTAG is even stranger than OMITTAG.
SHORTTAG allows multiple things to make your markup even shorter.
The first is allowing the author to use empty tags. With this, we can make the
previous example even smaller:
<ul> <li>Item 1 <>Item 2 <>Item 3 </>
Another feature we can use is keeping our tags unclosed. What this means is
that you can forget about the > character if you have multiple
tags next to each other. So lets use this to make out previous example again
somewhat smaller:
<ul<li>Item 1 <>Item 2 <>Item 3 </>
Last but not least, we have a feature called "null end-tags". What this
means is that instead of using a closing tag like </element>
or </>, we can just use the forward slash ("/") instead.
Unfortunately we can not use this on our previous example. We already managed
to decrease the amount of characters with 32%, so I guess we should be happy
with that result. However, lets use a different example to show how null
end-tags work:
<p/This is some text/
This is exactly the same as a normal P element, but here we
use the / to close the start tag instead of the > character.
And of course the entire end tag is replaced by the / character.
So now that you understand these two features, lets see what popular user agents do with them. In order to do this, I created several documents (or testcases) which can be used to check how user agents handle them. I tested these testcases only on Windows with Firefox, Opera and Internet Explorer. Results for other browsers can posted in the comments.
In the first testcase which tests
OMITTAG, you'll see that no browser has any trouble with using
OMITTAG. Not even IE! Suprised? I'm not. Just look at the
IMG element. Ever closed it? Don't think so... So basically if
you're coding HTML, you're always using this feature. Bet you didn't know that.
Anyway, when you look at the testcase
for SHORTTAG, you'll see that we're not so fortunate. The
lang attribute, which leaves out the quotation marks, is the only
feature of SHORTTAG that is supported by all browsers. All others
are giving the browsers a hard time. Firefox seems to be the only browser to
render the TITLE element correctly, but by taking a look at the
DOM Inspector, you'll see that that's because it thinks
</</> is a comment (very odd).
So as you can see, after almost 40 years, modern user agents are still
incapable of handling SHORTTAG. I find this pretty sad to be
honest. If it were possible, I'd use it on some occasions, because I'm forced to use
HTML anyway. But I do realise that SHORTTAG is only one of the
many things that SGML (and thus also HTML) has to offer. However, with this
article I'm not trying to tell that we should actually use XHTML unlike I said
earlier. This article is just to show everyone that even HTML, the one markup
language we can safely rely on because we "know" it's "widely supported", is not
well-supported at all. A great example of the current state of user agents this
is…
Copyright © 2005 - 2007 Jeroen van der Meer. All rights reserved.