Instructional Module X11c

How to Do XHTML Right


to Top Overview

to Top

Since XHTML is based on the XML standard, doing XHTML right involves following the rules for XML as well as XHTML itself. This is referred to as making a document well-formed and valid.

  • Following the rules of XML corresponds closely to what is meant by making a document well-formed.
    Web pages can not be valid if they aren't also well-formed.
  • Following the rules specific to XHTML is what generally makes a document valid.
    Web pages can be well-formed but not valid, if they have appropriate XML structure, but don't follow the proper XHTML rules.

 


to Top Well-formed Documents
Definition from W3C

to Top
to Top

The specifications (rules) for XML are managed and shown at the W3C site: http://www.w3.org/TR/REC-xml. These specifications are very formal and technical. They are intended for software developers rather than Web-page coders.

For a taste of formal specifications, here is the definition of a well-formed XML document:

[Definition: A textual object is a well-formed XML document if:]

  1. Taken as a whole, it matches the production labeled document.
  2. It meets all the well-formedness constraints given in this specification.
  3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

Document

[1] document ::= prolog element Misc*

Matching the document production implies that:

  1. It contains one or more elements.
  2. [Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.] For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

[Definition: As a consequence of this, for each non-root element C in the document, there is one other element P in the document such that C is in the content of P, but is not in the content of any other element that is in the content of P. P is referred to as the parent of C, and C as a child of P.]

What does this mean? From this very abstract series of definitions we get to very specific rules. The following section explains how...

Document structure
to Top
to Top

Prolog - Element - MiscLet's start by looking at the overall definition of well-formed:

A textual object is a well-formed XML document if:

Taken as a whole, it matches the production labeled document.

This refers to the definition of Document given right below:

[1] document ::= prolog element Misc*

This means a document needs to be made up of a prolog, followed by an element, and may optionally have miscellaneous content at the end.

Prolog is the part the gives information about the type of document this is. In XHTML, these lines would be considered the prolog:

<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Root Element


to Top

Element is defined here:

  1. It contains one or more elements.
  2. [Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.] For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

This means that a well-formed XHTML document must have a root element, which contains all the other elements. The root element in an XHTML document is:

<html xmlns="http://www.w3.org/1999/xhtml">
</html>

This can't be inside any other element - that is, there can't be in any other tags that enclose the <html> </html> tags. (The tags that make up the Prolog don't have end tags.)


The Misc. (miscellaneous) section is defined simply to allow the file to contain "white space" after the element.

White space can be spaces (generated by pressing the space bar), tabs, or new-lines (from pressing the Enter or Return keys). These characters are often encountered at the ends of text files and may make it easier for text editing software to manipulate the text; the formal definition allows for this as a convenience.

Parents and Children


to Top

to Top

The final part of the definition describes the concept of nesting. It means you can put one element inside another, but one element can't be partly inside and partly outside another. Think of a set of kitchen mixing bowls.

Organizing the document this way results in a hierarchical structure - a bit like a family, more like a company. The final definition of the well-formedness series goes into detail about this:

[Definition: As a consequence of this, for each non-root element C in the document, there is one other element P in the document such that C is in the content of P, but is not in the content of any other element that is in the content of P. P is referred to as the parent of C, and C as a child of P.]

The reason for this parent-child arrangement is partly to take advantage of a very efficient data structure that can be used in computers, called a binary tree structure. When a browser reads an XHTML file, rather than looking at it simply as a string of letters, if organizes it into a binary tree. This actually simplifies that way the files are processed and speeds up displaying the Web page. But if the elements (tags) aren't properly nested, processing is slowed down and confused. The Web page may be displayed wrong.

Here is a diagram of what a well-formed XHTML document looks like to a browser:

This example is described in more detail in module X20d "CSS Anatomy".

Summary of Well-Formedness

Well-Formedness can be summarized in these three simple rules:

  1. All non-empty elements must have start and end tags. For example, <p>...</p>
  2. All empty elements must be self-identifying with /> at the end. For example, <br />
  3. All elements must be nested properly: First-In Last-Out. For example,

<body>     <p>Hello, world!</p> </body>


 
to Top Validity
XML Definition

to Top
to Top

A document is valid when all the elements (tags) conform to the XML-based definition.

XML definitions are put into Document Type Definitions (DTDs). The DTD for XHTML is the document referred to in the Prolog of the Web page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Don't worry - I won't quote any of the Document Type Definition here! But take a look at the DTD yourself: browse to http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd. This is a document designed, not primarily for humans, but for computer software to read. However, more than any human-oriented description, textbook or instructional material, it is the ultimate authority on what XHTML is.

Elements and Attributes

to Top
to Top

XML languages are built of elements, which are the chunks of text that include:

  1. Start tag
  2. Text and nested elements
  3. End tag

For example:

<p>Hello, World!</p>

or

<p>XML languages are build of <strong>elements</strong></p>

Empty elements consist of:

  1. Tag ending with />

Empty element examples:

<meta name="Author" content="Sarah Strong" />

<br />

Attributes are modifiers or characteristics of elements. The attributes belonging to each type of element are described in the DTD. XML attributes consists of two parts:

  1. Name
    Only names defined in the DTD for a particular element are allowed in the start tag of that element. Names are followed by the "=" (equals) sign.
  2. Value
    Possible values are also defined in the DTD. They must be enclosed in quote marks.

Attribute examples:

<p align="center"> What's Up, doc?</p>

<meta name="Author" content="Sarah Strong" />

Validation Service

to Top
to Top

Because XHTML is somewhat more complex and picky than HTML, W3C offers a validation service on-line. This determines whether a file is valid XHTML or not. (And remember that to be valid, a file must be well-formed as well.)

At the validation service Website, you can have a file validated either on the Web or from a file on your computer. Selecting the file is quite easy.

The hard part is understanding the error messages that come up! They can be very frustrating, and the best guide is experience.

Validate all your XHTML files at W3C's MarkUp Validation Service:
http://validator.w3.org/


to Top About This Document
Review Button

Click here for review questions.

Audience

to Top
to Top

This module is for people who are begnning to learn XHTML and need to know the rules for creating valid, well-formed code.

 

Objectives

On successful completion of this module, you will be able to:

  1. Identify the rules for a valid, well-formed XHTML document
  2. Validate your document using a web-based validator to check for poorly formed markup
to Top
Module X11c: How to Do XHTML Right
This document is part of a modular instruction series in Computer Instruction. For more information, see the overview or the list of modules in this series, X, XML, etc.. This document has been used in the following classes: INP 150.
History:
Original: 9 September 2003, by Laurence J. Krieg
Last modification: Monday, 31-Aug-2009 11:48:07 EDT
Copyright
Copyright © 2003, Laurence J. Krieg, Washtenaw Community College
Instructors: You may point to this file in your Web-based materials; however, its location may change without notice.
Students: You are welcome to make a copy for your personal use.
All other uses: Please contact the author, Laurence J. Krieg, for permission: krieg@ieee.org.

to Top