Wednesday, June 1, 2022

Margin notes to “HTML 4.0 in Netscape and Explorer” (2000)

Margin notes to HTML 4.0 in Netscape and Explorer

This document consists of annotations to HTML 4.0 in Netscape and Explorer, which is a very valuable review of differences between the specification and its implementation in Netscape Navigator 4.0 and Microsoft Internet Explorer 4.0. The review, written by Stephanos Piperoglou and published on the Web by WebReference, is very valuable to Web authors who wish to use HTML 4.0 features. In addition to describing support or lack of support to HTML 4.0 in the two browsers, it contains a well-written discussion of some essential topics of HTML authoring.

As a whole, the document HTML 4.0 in Netscape and Explorer gives a fine overview of its subject area, and it also presents very valuable background information. However, in details there are numerous errors.

The structure of this document reflects the structure of the document being commented:

This document tries to clarify some details and emphasize some issues in the review - as well as point out some errors. (Notes about errors are partly based on practical testing, partly on the documents of the browsers themselves. For Netscape, there is a convenient reference for HTML tags as supported by Netscape 4.0 and earlier but of course we cannot know for sure whether it exactly corresponds to the reality.) Disclaimer: I have not systematically checked the details in the document I'm commenting on; there might well be errors which I have missed.

It is interesting that the document itself does not conform to the HTML 4.0 specification, for instance due to the lack of ALT attributes in IMG elements. This, however, is probably caused by a technical editing process at WebReference for making the document conform to their company policies (which thus would need some revisions). Notice that WebReference's site is a very useful resource to Web authors and contains, among other things, a page about Internet glossaries and Dmitry's design lab.

The headings below are links to the corresponding sections in the document being commented.

The introductory section explains well what the document is intended to cover and what has been left out and why. However, it fails to emphasize things in a suitable manner. It is crucial for a reader to realize what the document is about and what it is not about.

Specifically, it should be emphasized that the document does not discuss "internationalization" issues. It is understandable that they were left out, but it is a very important omission, since "i18n" is one of the important new features, if not the new feature, in HTML 4.0 as compared with HTML 3.2.

Similar lack of emphasis can be observed as regards to browser versions. Although the introduction says that the document only discusses the 4.0 versions of the two browser, a reader - especially a casual reader who just visits a page discussing the support to some specific HTML feature - may easily get the wrong impression.

It should also be noted that Netscape 4.0 and IE 4.0 are not monoliths. There are differences in minor versions and between platforms. For this reason, when these annotations say that there is an error in HTML 4.0 in Netscape and Explorer, this is to be taken as referring to the observed behavior of at least some releases of Netscape 4.0 and IE 4.0. - Notice, for example, that Brian Wilson's HTML Tag Support History flags several features as being supported by Netscape 4.0 and IE 4.0 from some specific minor version onwards. And it does not even mention any support in Netscape 4.0 for the important OBJECT element! (This probably means that the first releases of Netscape 4.0 did not support it.)

It is somewhat confusing that the document discusses, in addition to support to HTML 4.0 in the browsers, some elements and attributes which are proprietary in the sense of not being in HTML 4.0. In addition to the section Proprietary Elements, such notes (admittedly interesting per se to some authors) appear in different parts of the document alongside with information relating to the central topic of the document.

This short section contains a nice introduction to some basic characteristics of the HTML language. However, there are some exaggerations:

The HTML 4.0 specification is a huge, intensely technical document that contains hundreds of different elements and can have all sorts of applications. It is at times difficult to understand. This is not an error on behalf of the authors. In fact, very few people are expected to read the specification; HTML was originally intended to be a document format that the author wouldn't learn or see, but something that would be created by programs designed for this task.

First, the specification can be called large, but hardly huge. It does not contain hundreds of elements, at least if the word "element" is taken in the sense that it is used in the specification. It is hardly true that very few people are expected to read it. The specification contains several introductory and tutorial parts - many of them rather good - which obviously imply that it was written for a wide audience. And the specification itself says, in its About the HTML 4.0 Specification part:

This document has been written with two types of readers in mind: authors and implementors. We hope the specification will provide authors with the tools they need to write efficient, attractive, and accessible documents, without over-exposing them to HTML's implementation details.

And who says HTML was not intended to be writeable and readable by humans? There seems to a rumor that Tim Berners-Lee himself has said something like that, but if that's true, perhaps someone did not get the joke. Anyway, the HTML language was from the beginning defined as being of media type (Mime type) text/html, and the choice of major type text (as opposite to application) implied a certain commitment. RFC 1521 (now superseded by RFC 2046) contained the following characterization of the type text:

The primary subtype, "plain", indicates plain (unformatted) text. No special software is required to get the full meaning of the text, aside from support for the indicated character set. Subtypes are to be used for enriched text in forms where application software may enhance the appearance of the text, but such software must not be required in order to get the general idea of the content.

And finally, the statement that Netscape and IE "implement most of it [the HTML 4.0 specification] incorrectly" is gross exaggeration. Useful it is may be as a warning to people who have fallen into the widespread propaganda that the version numbers of HTML, Netscape and IE go hand in hand, it simply isn't true. It might be regarded as true if "it" referred to the new features in HTML 4.0, but that's another story.

This section discusses the often misrepresented issue of "standards" in a very enlightening way. However, some descriptions of the background paint the wrong picture. I am specifically referring to the statement that "pages written for Navigator used features that didn't exist in any other browser, and since people wanted them, they used Navigator". This is a confusing oversimplification. In reality, Netscape Navigator became popular because it was the only graphic browser from a large company. One need not postulate that "people" wanted Netscape specific enhancements.

The document recommends HTML 3.2 "for simple applications". This probably gives the wrong impression. The rational reason for starting a gradual switch from HTML 3.2 to HTML 4.0 (usually via HTML 4.0 Transitional) is not the ability include new features that might work on Netscape 4.0 and IE 4.0 at least partially. One part of the reason is that HTML 4.0, especially in the Strict version, contains more restrictions, such as the requirement that all IMG elements must contain an ALT attribute. But more importantly, HTML 4.0 allows us to include things like the LANG attribute, which are an investment for the future; we should expect them to be ignored by current browsers but supported by future ones. And by using such features with great potential on one hand and graceful degradation on the other we encourage the developers of browsers and other user agents to start actually making use of the enhanced markup.

The statement that HTML 4.0 is "most probably" the last HTML specification is, hopefully, just provocative. The continuation "If all goes well, HTML will slowly die off and give its place to XML, which solves almost all of the problems present in HTML", when written by a competent author, must be some kind of joke.

This section contains very important information under heading which could hardly be less exciting. It warns about the inadequacies of Netscape and IE in the field of basic processing of HTML, at the level of lexical and syntactic analysis. One can refer to this section when one wants to give an explanation of what it means to call those browsers "tag soup" or "tag sallad" browsers.

In the discussion of character references, the document is as confused as the HTML 4.0 specification. It uses the term "character reference" for two things that must be kept as distinct, both for theoretical and for practical reasons:

  • numeric character references, for example å
  • named entity references which are used to denote characters, for example å

Although those constructs can be used for similar purposes, they are essentially different from the SGML point of view. Notice that as far as specifications are considered, the numeric character references only depend on the so-called document character set whereas the named entity references depend on separate entity declarations which vary from HTML version to another. Thus, numeric character references work more universally then named entity references. This applies especially to Netscape 4.0. The situation is worth emphasizing, since it is opposite to the idea most people intuitively have if they have understood the basics of character code issues!

The practical recommendation on comments is a good one as regards to the intended meaning, but the formulation is confusing, if not misleading. There is a better formulation in the Web Design Group's Web Authoring FAQ (see answer to question 3).

The discussion of URIs in this section is very confusing. For example, what does it refer to when it says that browsers do "this" automatically? Obviously to encoding characters in URIs - the correct term is "encode", not "escape" - but does it mean that the browsers really do the encoding? The reality seems to be that sometimes they do, sometimes they don't. Anyway, the question is irrelevant in the comparison of the browsers against the HTML 4.0 specification, since the specification imposes no requirements on the processing of incorrect URIs (i.e. URIs which contain characters which may not occur in a URI without encoding).

The paragraph about Frame Target Names seems to be based on a misunderstanding. It complains that "The two browsers both interpret target attributes as window names and not frame names if a corresponding frame doesn't exist". But the observed behavior is just what browsers are required to do by the specification (right where you'd expect to find it, in in the specification of target semantics):

If any target attribute refers to an unknown frame F, the user agent should create a new window and frame, assign the name F to the frame, and load the resource designated by the element in the new frame.

(That is, a new window must be opened, and it will act as a frame for the purpose of interpreting any subsequent link traversals with the TARGET attribute set.)

Consequently, the practical advice to avoid "this technique" (apparently, the TARGET attribute) "to display documents in other windows" and to "use a scripting language like JavaScript to do the same" gets things the wrong way. If one wants to have a link which opens in a new window by default or, as in Netscape in IE, as the only way of following it, then the TARGET attribute works more universally than JavaScript.

The statement "Both Netscape and Explorer ignore - - any SGML construct in HTML except for comments" is potentially very misleading. Of course, HTML markup itself is "SGML constructs". One really cannot express things like this very compactly, so it would be wise to refer to subsection SGML features with limited support in the specification.

The document seems to recommend omitting HTML and HEAD tags, by saying that they "can safely be omitted". Even more strangely, it says that the HTML tags have no real use; it actually says "it" has no use in a context where the only singular correlate would be "the html element" (perhaps confusing tags and elements). The recommendation is not wise since the HTML tags give the author the best way to specify the overall language used in the document (using e.g. <HTML LANG="en-US">). Moreover, the use of HEAD tags makes the structure clearer to anyone reading the HTML source; authors often have difficulties in realizing what belongs to the head part and what belongs to the body part, and explicitly marking up these parts should help in the mental effort.

In the discussion of the TITLE attribute, there is a good recommendation "useful to use title to give more information about hyperlinks that don't contain a lot of information about the link destination", but the continuation "(like the all too common phrase 'please click here')" might be read as an encouragement to use such link texts and just sugar them with TITLE attributes!

The document discusses various refresh mechanisms and makes good points up to a certain point. Although the entire topic is off-topic in a comparison of the browsers against the HTML 4.0 specification, the disinformation given calls for a correction. One should not think that "the same effect can easily by re-created using JavaScript". There is no easy way, or any way, to make a non-JavaScript browser execute JavaScript. Moreover, to the extent that "refresh" is really needed, it should be handled at the server level, as Web Design Group's Web Authoring FAQ suggests (see answer to question 27).

The document says that it is important (emphasis mine) "to note that the body element can contain character data that is not included in a block-level element such as a paragraph". Notice that it is only allowed in HTML 4.0 Transitional, not in HTML 4.0 Strict, and hardly serves a useful purpose. Moreover, it can confuse things when style sheets are used. (The document sort of says this, but in a somewhat odd way.)

The discussion of whitespace is confusing. It first says that Netscape and IE "support whitespace rules with the exception of the rule that whitespace immediately after a start tag or before an end tag should be ignored". This implies that they break the rules - notice that it is a "shall", not "should" - and they actually do. But then the continuation "this should not be a problem as long as you code according to the specification" does not make any sense. For a short discussion of the real problems Netscape and IE cause in this area, please refer to White Space Bugs in Browsers by E. Stephen Mack.

The information about the presentation of the DFN element is incorrect: Netscape presents its content as normal text (as explained in section DFN - Defined Term in the HTML 4.0 Reference by WDG).

The statement that "subscripts and superscripts are by default rendered at the same size as the surrounding text" is incorrect. Both Netscape and IE present them in a smaller font.

The statement which says that IE displays a tooltip for an ACRONYM element when it has a TITLE element does not seem to be true generally.

As regards to hyphenation, it is true that the soft hyphen is treated as a normal hyphen by both browsers. However, as a simple test shows, IE may break a line at a hyphen (normal or short).

For the PRE element as supported by Netscape, the proprietary COLS attribute does not correspond to the WIDTH attribute as defined in the specifications. Instead, it instructs Netscape to wrap lines if the specified number of columns is exceeded, quite contrary to the basic meaning of PRE!

Although it is true that IE supports the INS and DEL elements, the support is rather simplistic and potentially misleading especially for the former. IE uses (by default) underline to indicate insertion (INS), which may very easily make people regard it as a link, due to the very common practice of using underline for links. Thus, the implementation is hardly satisfactory; notice that the HTML 4.0 specification says that "user agents should render inserted and deleted text in ways that make the change obvious" (emphasis mine).

The statement that "lists are supported by both browsers exactly as the specification states" should be taken with some salt. People who programmed the browsers never bothered trying to implement the DIR and MENU elements in the manner suggested in the specifications. And the HTML 4.0 specification now says, with resignation:

The DIR element was designed to be used for creating multicolumn directory lists. The MENU element was designed to be used for single column menu lists. Both elements have the same structure as UL, just different rendering. In practice, a user agent will render a DIR or MENU list exactly as a UL list.

I'd say that the difference between UL, DIR and MENU was structural, reflecting substantial differences in what kind of a list we have. It's probably mostly due to lack of adequate implementations that this issue has now been turned into a purely presentational one.

Similarly to the general statement about HTML 4.0 support in the browsers in section Of HTML and learning it, a claim is made that "Netscape supports almost none of [the HTML 4.0 Table model]". Here, luckily, this is immediately followed by a statement which indicates that the author is referring to the novelties of HTML 4.0 as compared with HTML 3.2.

It is well known that there are serious bugs in the implementation of table rendering in the browsers. For example, the Netscape bug with nested tables (often completely incorrect rendering when omissible end tags are omitted) still prevails in Netscape 4.0.

It is true that "Netscape also supports the nowrap attribute to table cells as defined in the HTML 4.0 specification", but since this appears in a paragraph which begins "Netscape supports one or two additional features on top of the HTML 3.2 Table Model", it must be noted that the nowrap attribute was defined already in HTML 3.2.

The statement which says that IE supports the THEAD and TFOOT is technically correct in the sense that it recognizes the tags and is able to present the TFOOT part after the content of the table. However, both the heading part and the footer part are rendered similarly to normal table rows, without any scrollability with fixed headings on screen and without any repeated headings on new pages on paper output. This is rather far from the intended implementation; the HTML 4.0 specification says (in fact twice) the following:

Table rows may be grouped into a head, foot, and body sections, (via the THEAD, TFOOT and TBODY elements, respectively). Row groups convey additional structural information and may be rendered by user agents in ways that emphasize this structure. User agents may exploit the head/body/foot division to support scrolling of body sections independently of the head and foot sections. When long tables are printed, the head and foot information may be repeated on each page that contains table data.

Since simply ignoring the THEAD and TFOOT tags (but not their content), which is what Netscape probably does, involves graceful degradation for THEAD but not for TFOOT, we could draw the following practical conclusion: it is safe, and advisable, to use THEAD, but it is probably best to avoid using TFOOT.

The document does not discuss the problems that have been observed in the implementation of links (A element) in some versions of the browsers at least. These include the failure to support an A NAME element with empty content (<A NAME="name"></A>) and failures to follow properly a link with a target anchor inside a table.

The remark that the TITLE attribute for an A element "is very useful when referring to documents without specifying their full title" is valid, but the parenthetic remark '(like the all too common "please click here")' is misleading. It is easy to see it is a suggestion to keep using "please click here" and just add the TITLE attribute!

The statement "Sadly, the link element remains largely unimplemented" is true but the explanation "mostly because there was never a concerned effort to define standard link types before the HTML 4.0 specification" is questionable. First, the main reason is that Netscape and Microsoft did not want to implement the element even in a very simple and obvious way - such as constructing buttons corresponding to LINK elements, using whatever REL (or REV) value there is. Second, there were many "concerned efforts". Third, the HTML 4.0 specification does not change the situation very much; it is about as vague as the HTML 3.2 specification in this respect, and the list of link types in HTML 4.0 specification is wanting - it does not even contain a value indicating a simple and very common link upwards in a hierarchy (named Up in some proposals). As I remarked already in my comments on the HTML 4.0 draft:

The draft lists some values without making any requirement on or even suggestion to supporting them. Authors "may use" some "recognized" link types which are claimed to have "conventional interpretations".

The full heading of this section continues with "Make it look nice". This suggests a rather one-sided view on the use of images and other illustrations.

The statement that "the img element is supported as stated in the specification by both browsers, with the exception of the longdesc attribute" can probably be regarded as technically true. However, there are serious flaws in the quality of implementation. The presentation of ALT texts when browsing with image loading off is often very poor. In particular, if the WIDTH and HEIGHT attributes are specified, they determine the area reserved for the image, and if the ALT text does not fit there, the browsers make no effort to fix the situation. Moreover, when those attributes are used to scale an image, the quality of the scaled image is often very poor. This suggests that authors should consider HEIGHT and WIDTH for IMG as defectively implemented and use them with caution.

The description of the implementation (or, rather, lack of implementation) of the OBJECT element is mostly correct and enlightening. However, the wording is partly sloppy. One should not characterize the OBJECT element as "the most reliable way to embed objects in HTML documents to date"! (The intended meaning is probably that it would be the most reliable if implemented adequately.)

Antti Näyhä has composed a detailed comparison of OBJECT implementation on a set of browsers, including IE 4 and Netscape 4.

In the statement "The applet element is supported - - except for the object element" the latter "element" should read as "attribute".

This short section discusses only support to style sheet related HTML constructs, not support to style sheets themselves. This is natural due to the purpose of the document, but the short note about stylesheet support might give too optimistic a view: "Netscape and Explorer support stylesheets to a limited degree."

In reality, the support is not only limited but also buggy. See e.g. CSS References by WDG for information about the status of CSS support. In particular, there is a detailed Style Sheets Compatibility Chart by Web Review.

No comment.

The following statement is oddly formulated:

The contents of the noframes element are ignored by both browsers, so it is a good idea to follow the specification when including such an element in a frameset document.

Of course, the reason for including a noframes element (which is strongly recommendable, even required by the proposed WAI rules) is that there are browsers which are capable of presenting its content, either as the only alternative for pages with frames or according to user options (since there are browsers which are both frames-capable and nonframes-capable). The fact that Netscape and IE do not render the content of NOFRAMES simply implies that an author need not worry about causing problems to users of these browsers when he writes a NOFRAMES element.

Note: The ACCEPT attribute for the FORM element, in addition to being not supported by Netscape and IE, is not even included into the formal syntax (DTD) in the HTML 4.0 specification. This is probably an oversight. Notice that due to this, the W3C validator gives an error message if a FORM element contains an ACCEPT attribute.

The discussion of the WRAP attribute has a typo: "wrap=soft or wrap=physical" should read "wrap=soft or wrap=virtual". But more importantly, it seems that the basic content here is wrong; both browsers support wrap=soft and wrap=hard. The major difference is that wrap=soft is the default on IE whereas Netscape has the correct default, no wrapping (which can be explicitly specified as wrap=off. For more information, see my document How to limit the number of characters entered in a textarea in an HTML form, especially section Implementations, especially wrapping.

This overview of proprietary elements recognized by Netscape and IE might be interesting to people who have to maintain pages written using them. However, despite the adequate warning which discourages their use, some formulations (e.g. "it is useful to note that multicol elements can be nested) might be read as encourageing, not discourageing.


Originally written in June 1998. No major changes made after that, except that a link was added and the presentation was (hopefully) made more pleasant by adding some styles and and a table of content, and a major correction about WRAPN for TEXTAREA was added.

Date of last update: 2000-02-18

Jukka Korpela, jkorpela@malibutelecom.com


from Hacker News https://ift.tt/RbvXHsi

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.