Contents |
Overview |
| MS Word is not primarily a Web page editor: Microsoft has other tools,
notably FrontPage, intended to create Web pages. However, lots of organizations
have a wealth of information stored as MS Word documents, and it's often
desireable to convert these for use on the Web.
Since Microsoft is very committed to using the Intenet and the Web, it's not surprising that they have put some effort into getting Word to convert documents into HTML. There's a basic problem, though: HTML has a much restricted set of capabilities compared with any modern word processor, especially one with the broad capabilities of MS Word. So unless the document is a very simple one, there will be features that don't convert to HTML. Word's HTML conversion also tends to distort spacing and fonts somewhat. As a result, you'll just about always want to bring the output of Word's HTML converter into a Web page editor to see if you can restore some of the intended appearance. Often, you'll have to use substitutes, or re-think the design to make it more Web oriented. But that's a good idea anyway: good paper-based designs often make poor Web-based designs! I've put in a large table, based on MS Word's Help, that discusses all the major features of Word and what happens to them in HTML. But first, let's look at basic Word document conversion. |
|
|
Converting a Word document to HTML is easy:
Let's look at some of the conversion details, first... |
Converting MS Word Special Features |
The following paragraph and table are found in MS Word Help under the title, "Learn what happens when you save a Word 97 document as a Web page." I have added my comments and suggestions to several of the topics.
From MS Word Help:
| Element | Word to HTML | Notes and Details
|
|
| Comments | See note | Comments you insert with the Comments command
on the Insert menu are removed. After saving the document in HTML format,
however, you can enter comments and apply the Comments style. The comments
will not appear when the Web page is displayed by a Web browser.
|
|
| Font sizes | See note | Fonts are mapped to the closest HTML size available,
which ranges from size 1 to 7. These numbers are not point sizes but are
used as instructions for font sizes by Web browsers. Word displays the
fonts in sizes ranging from 9 to 36.
|
|
| Emboss, shadow, engrave, all caps, small caps, double strikethrough, and outline text effects (Format menu, Font command, Font tab) | No | These character formats are lost, but the text
is retained.
|
|
| Bold, strikethrough, italic, and underline effects | Yes | Some special underline effects, such as dotted underlines, are converted to a single underline, and some underline effects aren't converted. | |
| Animated text
(Format menu, Font command, Animation tab) |
See note | Animations are lost, but the text is retained.
For an animated effect, insert scrolling text into your page in the Web
page authoring environment.
|
|
| Graphics | See note | Graphics, such as pictures and clip art, are
converted to GIF (.gif) format, unless the graphics are already in JPEG
(.jpg) format. Drawing objects, such as text boxes and shapes, are not
converted. Lines are converted to horizontal lines.
|
|
| Tabs | Yes | Tabs are converted to the HTML tab character,
represented in HTML source as 	. Tabs may appear as spaces in some
Web browsers, so you may want to use indents or a table instead.
|
|
| Fields | See note | Field results are converted to text; field codes are removed. For instance, if you insert a DATE field, the text of the date converts, but the date will not continue to update. | |
| Tables of contents, tables of authorities, and indexes | See note | The information is converted, but indexes and tables of contents, figures, and authorities can't be updated automatically after conversion because they are based on field codes. The table of contents displays asterisks in place of the page numbers; these asterisks are hyperlinks that the reader can click to navigate through the Web page. You can replace the asterisks with text that you want to have displayed for the hyperlinks. | |
| Drop caps | No | Drop caps are removed. In the Web page authoring environment, you can increase the size of one letter by selecting it and then clicking Increase Font Size. Or, if you have a graphic image of a letter, you can insert it in front of the text. | |
| Drawing objects, such as AutoShapes, text effects, text boxes, and shadows | No | Drawing objects are not retained. You can use
drawing tools in the Web page authoring environment by inserting Word Picture
Objects. The object is converted to GIF format.
|
|
| Equations, charts, and other OLE objects | See note | These items are converted to GIF images. The appearance is retained, but you won't be able to update these items. | |
| Tables | Yes | Tables are converted, although settings that aren't supported in the Web page authoring environment are lost. Colored and variable width borders are not retained. | |
| Table widths | See note | By default, tables are converted with a fixed
width. To convert a table with percentage width (so that the table is sized
relative to the browser window), set the option PercentageTableWidth=1
in the following Windows 95 Registry location: HKEY_LOCAL_MACHINE\Software\Microsoft\Shared
Tools\Text Converters\Export\HTML\Options
|
|
| Highlighting | No | Highlighting is lost.
|
|
| Revision marks | No | Changes entered with the track changes feature are retained, but the revision marks are removed. | |
| Page numbering | No | Because an HTML document is considered a single Web page, regardless of its length, page numbering is removed. | |
| Margins | No | To control the layout of your page, you can use a table. | |
| Borders around paragraphs and words | No | You can place borders around a table, and you can use horizontal lines to help emphasize or separate parts of your Web page. | |
| Page borders | No | There isn't an HTML equivalent for a page border. You can make your pages more attractive by adding a background using the Background command on the Format menu. You can also place borders around a table, and you can use horizontal lines to help emphasize or separate parts of your Web page. | |
| Headers and footers | No | There aren't equivalents for headers and footers in HTML. | |
| Footnotes and endnotes | No |
|
|
| Newspaper columns | No | For a multicolumn effect, use tables. | |
| Styles | See note | User-defined styles are converted to direct formatting, provided the formatting is supported in HTML. For instance, if you convert a style that includes bold and shadow formatting, the bold formatting is retained as a direct formatting, but the shadow formatting is lost. |
MS Word Styles and the What Happens to them in HTML |
When
you create an MS Word document, you have a choice of several templates
to choose from. One such template is listed as "Blank Web Page" under the
Web Pages tab of the New dialog box. The actual template is normally stored
in
C:\Program Files\Microsoft Office\Office\HTML.DOT
![]() |
When you create or edit Web page in Word, you get a choice of several
HTML-related styles. One set of choices that looks obvious but isn't: the
series Heading 1 through Heading 6 appear to be the same as Netscape Composer's
Heading 1 through Heading 6, which translate into HTML <H1> through
<H6> tags. In MS Word, these actually do not produce the expected
HTML tags! Instead, they produce custom-formatted text using various fonts
and sizes. These are not bad, but they aren't standard either, and depend
on the availability of the font used - often Arial. In order to get genuine
<H1> through <H6> tags in Word, you should use MS Word styles H1
through H6, which are so far down in the list-box that your can't see them
without scrolling down. This is illustrated in the following examples:
Microsoft HTML Template H1Microsoft HTML Template Heading 1Microsoft HTML Template H2Microsoft HTML Template Heading 2Microsoft HTML Template H3Microsoft HTML Template Heading 3Microsoft HTML Template H4Microsoft HTML Template Heading 4Microsoft HTML Template H5Microsoft HTML Template Heading 5Microsoft HTML Template H6Microsoft HTML Template Heading 6 |
Graphics and Other Objects |
Word accepts many types of "objects" and displays them: images, drawings,
"Word Art," sounds, video clips, and imports from many other programs that
use Microsoft's "Object Linking and Embedding" (OLE) standard. Only a few
of these are converted when a document is translated into HTML. Here are
details, together with some advice on how to get a few of them into HTML...
|
Pictures in the Word document are put into the Web page this way:
|
|
|
From MS Word Help: "The Graphics Interchange Format filter (Gifimp32.flt) supports file format versions GIF87a (including interlacing) and GIF89a (including interlacing and transparency). The GIF filter works with the Portable Network Graphics filter (Png32.flt) to import GIF files into Word. The GIF filter is also used by the HTML converter to export pictures in a Word document to .gif images linked to an HTML page. The GIF filter has the following limitation: Only the first image of a multiimage [animated] GIF is imported." |
|
Drawings
are objects created by using the Microsoft Draw toolbar, or the older Microsoft
Draw editor. They differ from the usual picture formats in that they are
"vector graphics" - a collection of shape and color objects in which each
object is defined by coefficients of the equation that describes its shape.
As you might guess, translating these to HTML is not straight-forward!
But MS Draw is a very useful way of creating charts and diagrams, so it's
worth knowing how to bring onto the Web.
The simplest way to get these images to the Web is to use copy-and-paste. Some (but not all) graphics programs know how to accept Word Art and Microsoft Drawings. The general idea is:
Graphics ProgramsTake a look at module W47c for more information about graphics programs. If you have a full-service graphics editor like PhotoShop or PaintShop, you can use the copy-and-paste method described in the preceding section. If you don't have these tools, here's a "work-around" for converting Word Art and Drawing objects in Word to Web-graphics using only tools from MS Windows 95 and MS Office Professional.
|
|
|
All the wide variety of multimedia that is available can be linked to Web pages and played either by the browsers themselves, or by "plug-ins". This module doesn't cover plug-ins, but the good news for Web page creators is that all you have to do is create a link to the multimedia file, and FTP the file to your Web server. The browser will do the rest, including telling the user if they need to install a plug-in - and often, where to download it. |
About this document... |
|
|---|---|
Audience: |
This is for people who are familiar with MS Word and Web editors, and want to take existing Word documents and convert them to Web documents without losing any more of the formatting than is necessary. |
Objectives: |
When you successfully complete this lesson, you will
be able to...
|
Module W36c: |
This document is part of a modular instruction series in Computer Information Systems. For more information, see the overview or the list of modules in this series, W: World Wide Web. This document has been used in the following classes: CIS 260 |
Author: |
Laurence J. Krieg |
Institution: |
![]() |
| History: | Original: 29 Nov 1998
Last modification: Wednesday, 07-Nov-2001 13:03:36 EST |
| Copyright: | Copyright © 1999, Laurence J. Krieg.
Instructors: You may point to this file in your Web-based materials. Students: you may make a copy for your personal use. All other uses: contact the author, Laurence J. Krieg for permission. Email krieg@ieee.org |