HTML Concepts

Module W22c

Contents

 

Wherever you see this separator line in the document, clicking on it will return you to the Contents.

Browsers and HTML

Overview


We've all done it: opened a Web page in a browser. But have you ever stopped to wonder how a nicely formatted Web page is made to look the way it does?

Every Web page starts out as a plain text file - the simplest kind of file there is. This text file is stored on a Web server and sent over the Internet to our computers in the least complicated kind of file there is. It's the browser's job to take that plain file and make it look pretty. But of course the browser doesn't know what "pretty" is - the person who designed the page has to tell it. And the instructions have to be written in a particular language: HTML, the HyperText Markup Language.

So if you want to understand how a Web page is created, you have to understand how HTML works. Let's start by examining the role of the browser.

The Role of a Browser


The browser - a program like Netscape, Internet Explorer, Safari, Opera, and others - has two main jobs:

  1. To ask Web servers for files;
  2. To display the files according to their type.

When the browser receives a file, its first job is to figure out what type of file it is. It could be any of a limitless variety of file types! Here are just a few:

  • Image files - pictures, diagrams...
  • Sound files - music, news reports...
  • Video clips
  • Plain text files with no formatting instructions - information intended entirely for humans
  • Plain text files with HTML formatting - information for humans mixed with formatting instructions for the browser

The browser determines what kind of file it is either by looking at the first few bytes (characters) in the file, or by something outside the file that gives its MIME type.

Here, we're only concerned about plain text files with HTML formatting.

How a Browser Reads HTML


Plain text files with no HTML are intended entirely for humans to read. The browser doesn't need to understand any of it - all it needs to do is send the characters straight to the screen (pretty much). But HTML documents contain two types of information: only part of it is for us humans. The rest of it is a set of directions for the browser. How does a browser know it's looking at an HTML file? The first few characters must identify the file as HTML! Here are two ways:

  1. The simple HTML tag:
    <HTML>
  2. The more specific Document Type Identifier - here is an example:
    <!doctype HTML public "-//w3c//dtd HTML 4.0 transitional//en">

 

For Browsers Only

What's in it for Me?

 


In a way, an HTML file is like the way scripts are written for movies and plays. Part of the material is for the audience to hear, but the rest identifies the characters speaking and gives the stage directions. When actors read the script, they memorize the part they need to speak for the audience, and they interpret the stage directions with their movements. They have to understand how to read the script and tell the difference between the stage directions and the speeches.

Similarly, once the browser knows it's looking at an HTML document, it begins separating its instructions about how to make the page "pretty," from the information for humans. How does it do that, if the file is all sent in "plain text"?

HTML Delimiters

 


All the instructions in an HTML file are marked off or delimited by special characters. These always work in pairs, to show the beginning and the end of each HTML tag or other special instruction, such as special characters that don't occur on all keyboards.

  • For instruction tags, the delimiters are the angle-brackets: < and > Examples:
    <b> <i> <HTML>
  • For special characters, the delimiters are the ampersand & and the semicolon ; Examples:
    &cent; &pound; &yen; &#169;

Let's look at how tags and special characters work...

Tags


Just as tags are surrounded by delimiters, text for humans is surrounded by tags. Most tags (but not all) work in pairs: the second is usually like the first, but with a forward slash / to show that it's closing. Here are a couple of examples:

To make some text bold:

HTML Display by browser:
The mayor was <b>very pleased</b> with your work. The mayor was very pleased with your work.

To make some text italic:

HTML Display by browser:
The mayor was <i>very pleased</i> with your work. The mayor was very pleased with your work.

Tags can also be nested - that is, you can put one set of tags inside another.

To make some text both bold and italic:

HTML Display by browser:
The mayor was <b><i>very pleased</i></b> with your work. The mayor was very pleased with your work.

Caution! tags have to be closed in the opposite order than that in which they were opened - in other words, they must be nested neatly!

Bad example:

HTML Display by browser:
The mayor was <b><i>very pleased</b></i> with your work. The mayor was very pleased with your work.

What does a browser do when it finds a bad example? In most cases, it tries to ignore the problem and do the best it can without bothering the reader. This is good for the reader, but not necessarily good all around. The browser may become confused and not display things correctly, and in some cases it may even crash or cause an "illegal operation" error. Good coding of HTML will avoid these problems.

Notice that HTML tags can be either upper or lower case letters: the browser doesn't care.

Special Characters


Special characters are any that are not in the standard 128 defined as ASCII - the American Standard Code for Information Interchange - or any that have special uses in HTML, or may not be present on all keyboards. Many special characters have names to make them easy to remember, all of them have numbers. Some are shown here:

Character:
<
>
&
¿
©
®
Name:
&lt;
&gt;
&amp;
&iquest;
Number:
&#60;
&#62;
&#162;
&#191;
&#169;
&#153;
&#174;

The names of some special characters are case-sensitive: in other words, it makes a difference if you use capitals or lower-case letters. This is particularly true of the names for accented letters and letters used in some languages.

 

Organizing an HTML File

Main Parts

 


Each HTML file begins with a tag that identifies it as an HTML file. The simplest and most common tag is the HTML tag, with the /HTML tag to balance it at the end of the document. At any point in a document, you may put comment tags. These are (mainly) ignored by the browser, and not displayed or printed. They are just for the benefit of our fellow HTML coders! Example:

<HTML>

<!--This is a comment-->

<!-- This is not a complete HTML file: more tags need to be put in.

-->

</HTML>

 

Every well-formed HTML file consists of two parts: the head and the body. Let's look at each...

The Head

 


The head is the first part of the file. (Duh...) It's purpose is to give information about the document as a whole, and none of it shows in the main browser window. Here are some things that go in the head:

  • The title. Every Web page should have a title! Titles appear in the Title Bar at the top of the browser window, and they also appear in search engine results, history lists, and many other places.
  • Meta tags. These are tags that give background information, such as the name of the author, a brief description of the page, keywords, and more.
  • Styles. In the latest versions of HTML, designers can set up their own display instructions for the browser, specifying fonts, type sizes, colors, and many other features. These "styles" can be defined in the head of the document.
  • Script functions. Scripting languages like Javascript can define procedures that perform actions. A commonly used example: the "rollovers" or images that change when the mouse pointer rolls over them. These functions are defined in the head area.

Here is a page with its head and a title defined:

<HTML>

<head>

<title>How to Put a Head in a Web Page</title>

</head>

<!-- This is not a complete HTML file: it still needs a body!

-->

</HTML>

 

The Body


The body is the part of the Web page we see. Until we put text in the body, our readers will have nothing to read!

<HTML>

<head>

<title>How to Put a Head in a Web Page</title>

</head>

<body>

Here is something for you to read!

</body>

</HTML>

Here's what the browser would display:

 

Here is something for you to read!

 

Not much to look at, but it's a good start!

 

HTML Flexibility


To understand how HTML works, it's important to understand what it's for. It was developed by Tim Berners-Lee and the European Particle Physics Laboratory (CERN) in Geneva, Switzerland, to provide a way for scientists to share their research results quickly and easily using the Internet. Because the Internet can connect with many kinds of computers and many kinds of display systems, HTML had to be very flexible.

Browsers are therefore given the responsibility to adapt an HTML file as best they can to the computer, the display, and window in which it is running. This could be high or low resolution, many-colored or color-challenged, long and narrow, short and wide, large or small. To make this work, there are several principles employed:

  • Lines of text are ordinarily wrapped as needed to fit the screen
  • Lines of text in an HTML file are to be joined together unless separator tags instruct otherwise. Separator tags are:
    • <p> paragraph: start new line, leaving space between the lines
    • <br> break: start new line with no blank space between the lines
  • All "whitespace" in HTML files is treated the same -the browser shows one blank space. Whitespace includes these characters:
    • blank - ASCII 32
    • tab - ASCII 9
    • carriage return - ASCII 13
    • line feed - ASCII 10
  • Multiple whitespace characters are treated as one. If the HTML file contains ten blanks in a row, only one blank is displayed by the browser. If the HTML file has a space followed by two tabs and three carriage returns, only one blank is displayed.
  • Errors in the HTML code are not announced by the browser, and are ignored as much as possible

Here is an example illustrating this flexibility:

HTML Display by browser (narrow):

<p><i>Genly Ai:</i> "The Ekumen wants an alliance with the nations of Gethen."
<br><i>King Argaven: </i>"What for?" <br><i>Genly Ai: </i>"Material profit.
Increase of knowledge.

The augmentation
of the complexity and intensity of the field of intelligent life.

The enrichment
of harmony and the
greater glory of God.
Curiosity.
Adventure.
Delight."
<ul>
<li>
from <i>The Left Hand of Darkness</i> by Ursula LeGuin</li>
</ul>

Genly Ai: "The Ekumen wants an alliance with the nations of Gethen."
King Argaven: "What for?"
Genly Ai: "Material profit. Increase of knowledge. The augmentation of the complexity and intensity of the field of intelligent life. The enrichment of harmony and the greater glory of God. Curiosity. Adventure. Delight."

  • from The Left Hand of Darkness by Ursula LeGuin
Display by Browser (wide)

Genly Ai: "The Ekumen wants an alliance with the nations of Gethen."
King Argaven: "What for?"
Genly Ai: "Material profit. Increase of knowledge. The augmentation of the complexity and intensity of the field of intelligent life. The enrichment of harmony and the greater glory of God. Curiosity. Adventure. Delight."

  • from The Left Hand of Darkness by Ursula LeGuin

The result of these principles is that HTML seems to give more control to the browser than to the HTML designer. This can lead to frustrated designers, but it is (at least partly) responsible for the explosive growth of the World Wide Web.

 

About this document...

Audience:

This is for people who know the basic concepts of the World Wide Web and Web pages, and want to know how HTML works.

Objectives:

When you successfully complete this lesson, you will be able to...

  1. Explain the process by which a browser reads a text file with HTML and converts it to a formatted display;
  2. Describe the difference between text with information for a browser, and text with information for human beings;
  3. Explain the how a browser identifies the two kinds of text by looking for special characters used as delimiters;
  4. Identify the delimiter characters used in HTML;
  5. Identify the two main parts of an HTML file: the head and the body;
  6. Identify opening and closing tags customary and required on all HTML pages;
  7. Identify and explain the purpose of the <head> section;
  8. Identify and explain the purpose of the <body> section;
  9. Identify and explain the purpose of the <title> tag;
  10. Identify and explain the purpose of the <p>paragraph tag.

Module W22c:

This document is part of a modular instruction series in Computer Information Systems. For more information, see the overview or the list of modules in this series, W: World Wide Web. This document has been used in the following classes: INP 165, Basic HTML; INP 140, Building a Web Site.

Author:

Laurence J. Krieg

Institution:

Internet Professional Department, Washtenaw Community College
History:
Original: 19 Oct 2000
Last modification: Thursday, 09-Sep-2004 11:17:19 EDT
Copyright:
Copyright © 2000-2004, Laurence J. Krieg, Washtenaw Community College.
Instructors: You may point to this file in your Web-based materials.
Students: you may make a copy for your personal use.
All other uses: contact the author, Laurence J. Krieg for permission. Email krieg@ieee.org