As you probably know, every HTML
document requires a DOCTYPE which links to a DTD. That DTD
specifies three things:
This is of course no news if you've read an HTML DTD once, but if you haven't, it might be interesting to know.
I've read the HTML 4.01 Strict DTD because after I made an HTML parser using regular expressions, I wanted to make a real HTML parser. One that could interpret an entire document, instead of only a few inline elements. But since HTML is an SGML application, it would make more sense to make an SGML parser first. And what do all SGML applications start with? Exactly, a DTD.
So I started to read the HTML 4.01 Strict DTD, but the more I read it, the more it seemed useless. At first it seems almost logical to have a file that specifies the allowed elements and attributes, but when you start working with it, it doesn't seem so logical anymore. What should you do if have a document that uses an element that is not defined in the DTD? Should you just delete it from the DOM? Of course not, so you just display its content and try if you can do anything with its attributes.
But that kind of feels like you're completely dissing the DTD. User agents
don't need the DTD because the DTD just specifies how the markup should look
like, but not what to do with it. That's all hardcoded in the user agent, so
that kind of makes the DTD useless. Besides that, modern user agents don't even
do anything with it besides using the DOCTYPE to trigger quirks
or standards mode.
So if it were up to me, I would just get rid of DTDs. Of course, for HTML it would not be a logical step because it's an SGML application. But XHTML uses a DTD as well and is in no way related to SGML. So getting rid of the DTD there would make could well be possible. The specifications could be used to hold the information that the DTD would've held.
If the specification could hold all that information (and I don't see why it couldn't), we could just get rid of the DTD thing and start working with only markup. Why would we make it ourselves so hard if no one uses the DTD anyway. Except for the character references of course… But wait! XML has all the character references you need:
<>&'"Combine those entities with UTF-8, and you can use all the characters you want. Because of this, I can safely repeat myself by saying that DTDs are useless. It was nice in SGML, but because the makers of SGML never thought about what applications should do if they encountered a document that heavily abused the markup (aka tag soup), it kind of lost its charms. The next generation of markup languages should therefore just forget about DTD because hardly any application uses them, and if they do, they use them incorrectly…
Copyright © 2005 - 2007 Jeroen van der Meer. All rights reserved.