Lecture 10: Comments, Special Characters, and Meta Data
Overview:
- (X)HTML Comments
- Special Characters
- Overview of Meta Tags
- Describing a Document
- Indicating a Content Type
- Giving Instructions to Bots
- Redirects
- Caching and Expiration
(X)HTML Comments
(X)HTML comments are coded as:
<!-- This is a comment -->
- <!-- always begins the comment
- --> always ends the comment
- I recommend leaving at least one space between <!-- and -->
- Comments do not render on the page.
- Comments are used for a variety of reasons, including identifying the author of the code, inserting reminders, or providing explanations for the current (or future) authors about something in the code.
- Comments can even be used when troubleshooting rendering problems, because whatever code is inside the comments will not be rendered. You start by putting a large area of the code inside comments and ensure that the rendering problem is gone. Then you systematically move the comments to smaller and smaller segments of code, each time checking to see if the rendering problem has returned. Through this process you will eventually isolate the code that is causing the rendering issue.
- Multiple sets of comments can be included in a document; there is no limit to this.
Special Characters
- One of the readings for today concerned the Character Entity References in HTML 4 and XHTML 1.0, which are characters that either cannot be entered via the keyboard or characters that have special meaning if entered directly (entering as a special character alleviates this concern).
- Special characters have a numerical value that identifies them as well as a series of characters that identifies them.
- Two of the most commonly used special characters are & (creates an ampersand; can also be coded as &) and (creates a non-breaking space; can also be coded as  ).
- The ampersand needs to be encoded because the regular character (the one typed in from the keyboard) is used when passing data from web pages to scripts, so using an ampersand as part of your page content needs to be distinguished from that.
- Non-breaking spaces render as regular spaces but prevent the content being joined together from wrapping to new lines when the window is too small. I recommend using these sparingly.
- Typically it is not necessary to encode quotes as " (or ") in your page content, but if issues arise you can use the special character. These would be quotes in the text on the page, not in your tags.
- To prevent confusion with tags, content that involves greater-than signs (> or >) or less-than signs (< or <) should have those characters encoded; otherwise they could be confused as the start or end of tags.
- Important: Always start a special character with & and end it with a semicolon. Also be aware that old browsers (some version 4 browsers and earlier browsers) may not support all of the special characters.
Overview of Meta Tags
- Meta tags are always located between the <head> </head> tags and provide information about the document or instructions for devices/programs reading the document.
- Meta tags are coded as <meta />, which means that they need to self-terminate (there is no closing tag).
- Meta tags do not render on the page; the user could only see them by viewing the source code.
| Attribute | Usage and Effect | Values Accepted | Default | Deprecated? |
|---|---|---|---|---|
| content | This required attribute gives the information about the document or the instructions to the device/program reading the file. | Text | Depends on usage | Not deprecated |
| http-equiv | Used for instructions to the device/program; typically these are for HTTP header information. | Text | Not deprecated | |
| name | Defines the type of information about the document. | Text | Not deprecated |
Describing a Document
- The most common meta tags for describing a document concern its keywords and a written summary/overview of its content.
The following keywords could be used by a search engine spider to match the page to the user's search terms:
<meta name="keywords" content="Jason Withrow washtenaw community college internet professional program" />
The following description could be displayed by a search engine in its results:
<meta name="description" content="Instructional site for Jason Withrow, Internet Professional instructor at Washtenaw Community College." />
- Due to abuse of the keywords meta tag (authors have used words not related to their page in order to increase the odds that someone would reach the page) most spiders do not use them.
- Given this lack of support, it is questionable whether keywords should be provided, though many authors still include them.
- If keywords are used, they should come from the content of the page.
- Keep the content for your description meta tag short and when writing it try to imagine how it would read to someone scanning through a list of search engine results.
- One final note is that keywords and description should be about that particular page, not about the entire website. Your goal is to help users locate the exact page that meets their information needs, out of all the pages on the website and out of the entire World Wide Web.
- You might also see people using name="author" with the content attribute giving his/her name (name="copyright" is also used), but spiders ignore those uses of <meta />
Indicating a Content Type
One meta tag that we have used since the beginning of the class uses http-equiv to pass the 'Content-Type' header to the browser.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Giving Instructions to Bots
- Spiders are one type of bot (program) that moves around the Web gathering information from web pages.
The robots meta tag allows you to give some directions to these bots. The following code would instruct the bot to not index the page content (noindex) and to not follow links on that page (nofollow):
<meta name="robots" content="noindex,nofollow" />
- Specifying content="none" should be equivalent to the code above (indicating both noindex and nofollow).
- While you could specify content="index,follow" or content="all", those are the defaults and why the bots are visiting your page. Should you have to remind them why they are there? I never indicate those values.
- When would you want to instruct a bot to not index or not follow?
- Note that these are just recommendations; bots could ignore your instructions.
- There is also a robots.txt file that can reside on the web server and instructs bots concerning what pages they can and cannot visit.
Redirects
- Meta tags can also be used to redirect the user to a new page. This ability to automatically request a new page from the server is called client-pull
- This is especially useful if the URL for a page has changed. Put up a dummy page at the old URL, which automatically redirects the user to the new URL.
- Client-pull uses the http-equiv attribute, rather than the name attribute
The code to redirect the user to a new URL could be:
<meta http-equiv="refresh" content="5; URL=http://www.google.com" />
- The number (in this case 5) specifies the number of seconds before the new URL (www.google.com) loads. This value can be set to 0 which loads the new page as soon as the current one has finished downloading.
- Note that the content attribute only has a single set of quotation marks, not two sets
- As a courtesy to your users, always inform them that they will be redirected and provide them with the new URL (just in case the redirect is not configured correctly). In general, do not just redirect them without any warning.
- You can also have the page refresh by not including a URL in your meta tag. This code would cause the page to be refreshed once, after 10 seconds:
<meta http-equiv="refresh" content="10" />
Caching and Expiration
- Browsers will typically cache files, including images, stylesheets, and your (X)HTML code. Caching refers to storing a copy of the file on your computer, so that you are not continually requesting the same file as you visit other pages on the website. This is done to improve performance and reduce bandwidth usage.
- One of the major drawbacks of caching is that you could be looking at outdated information, which is even more of an issue for news sites or sites where content changes rapidly.
In order to accommodate the variety of devices in use on the web, three meta tags are specified:
<meta http-equiv="cache-control" content="no-cache" /> <meta http-equiv="pragma" content="no-cache" /> <meta http-equiv="expires" content="-1" />
- The cache-control meta tag works for HTTP 1.1 web servers. HTTP is HyperText Transfer Protocol, the dominant protocol used for Web traffic.
- The pragma meta tag works for HTTP 1.0 web servers.
- Both are included to cover our bases.
- The expires meta tag notes the date/time when the document is expired. Spiders could remove expired documents from their search listings. Expired documents would also not be cached, so each visit would cause the document to be retrieved from the server again.
- The content of the expires meta tag must be in GMT (Greenwich Mean Time) in order to be valid. The value given in the earlier example is invalid and intentionally so; invalid values indicate expiration now.
A valid expiration would be:
<meta http-equiv="expires" content="Mon, 19 Jun 2006 10:42:22 GMT" />
- Only use these three tags as necessary; for many websites you might not need them.
Full Example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Comments, Special Characters, and Meta Tags Example</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="robots" content="noindex,nofollow" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="expires" content="-1" />
<meta name="keywords" content="comments, special characters, meta tags,
ampersand" />
<meta name="description" content="An example showing comments, special
characters, and meta tags" />
<meta http-equiv="refresh" content="5; URL=http://www.google.com" />
<!-- Author: Jason Withrow -->
</head>
<body>
<!-- Example of an ampersand -->
<p>This is an ampersand: &</p>
<!-- Notice of redirect to Google -->
<p>The page will change to Google (www.google.com) in 5 seconds.</p>
</body>
</html>