Instructional Module X50c

DTDs, Schemas, and Namespaces

Overview

  1. What are XML Namespaces? - X50c-ns
  2. What is an XML DTD? - X50c-dt
  3. What is an XML Schema? - X50c-sc
  4. What's the Difference between DTDs and Schemas? - X50c-ds
Link to Top

to Top Overview

Overview

Aside from XML code itself, three very important keys to understanding XML-based ontologies are namespaces, and the two techniques for defining XML languages: DTDs and schemas. Those are our focus in this module.

 

Link to Top

What are XML Namespaces?

In a Nutshell:

A "namespace" is simply a collection of names used of some purpose or other. When XML files are put on the Internet, names of elements need to be kept unique regardless of who created them. To do that, namespaces are created using URIs as a way to distinguish one namespace from another. Within a document, an abbreviation is created to make it easy to distinguish element names from different namespaces.

 

What is a Namespace?

OK, let's start by understanding in general what a "namespace" is:

A namespace is a collection of names used of some purpose or other.

Just what does this mean?

  • A collection can be systematic, like a library, or a hodge-podge, like a child's collection of pretty rocks.
  • A name is a symbol used to distinguish one thing from another.
    • If the collection of names is systematic, they will uniquely distinguish one thing from another.
    • If the collection of names is unsystematic, two things could have the same name. If these two things occur in the same space - the same college classroom, for instance - it would be necessary to find some other way to distinguish between the people or entities they refer to.
  • Names are used for many purposes. Consider:
    • Names of people
    • Names of books
    • Names of variables in a small program
    • Names of variables in a large program, in a language where all variables are global (that is, no matter how many sub-parts the program is divided into, all the variable names are available everywhere).
Ask yourself: How do humans get around the problem of multiple people with the same name? Answer: Lots of ways, including nicknames and Social Security Numbers.
Ask yourself: How do computers (computer scientiests, really) get around the problem of multiple variables with the same name? Answer: By inventing languages with local variables, and then avoiding global variables; by prefixing unique identifiers to variable names when variable names must be global.

How does XML Create and Use Namespaces?

XML uses names for two things: elements, and attributes of the elements.

If XML files all contained element and attribute names invented by the same person who creates the file, and if XML files were not shared on the Internet, there would be no problem. Since neither of these conditions is true, we've got a problem!

How did XML get into this trouble?
  • The first answer is pretty obvious: XML is intended as a way to share information on the Internet, so isolation is not an option. All element names could potentially be shared in contexts where they might not be unique.
  • The second reason is the need to mix-and-match ontologies. In other words, we may need to create an XML file with information that doesn't fit neatly into one ontology. We'd like to use vocabulary from more than one source to properly describe our topic.
  • Consider the nature of XML, too: by nature intended to be extensible in a decentralized way. There is no need to get your version of XML approved by a central authority (except by validating software), and that means there's no way to keep two ontologies from using the same name for an XML element.
How did W3C Get Us Out of the Problem?

W3C's XML Core Working Group provided this two-part solution in 1999:

  1. A way to uniquely distinguish one ontology from another globally: use Universal Resource Identifiers (URIs) to identify each ontology.
    • Why use URIs? Because the context of XML namespaces is the Internet, and the Internet uses URIs for the purpose of uniquely distinguishing one thing from another in its namespace.
    • Officially, that's the only reason to use URIs in XML namespaces: because they provide uniqueness.
  2. A way to create "nicknames" for each namespace you use, so you don't have to repeat the URI every time you want to distinguish one namespace from another.
Declaring a Namespace

First, you need to declare the namespace. There are some variations: You can either declare it in the prolog, or in an element; you can define a "nickname" or not, at your convenience.

The prolog is the part before the root element; here's how you do it there:

<?XML:namespace ns="http://poggin.wccnet.edu/xml/movies" prefix="m"?>
<movies>

    <!-- ... -->
</movies>

More commonly, the namespace is declared in the root element - the element that contains all the others. First, without a nickname:

<movies xmlns="http://poggin.wccnet.edu/xml/movies">
    <!-- ... -->
</movies>

This makes the unique identification of the namespace, and without the "nickname" makes this the default namespace for all element and attribute names between <movies> and </movies>.

In the root element with a "nickname":

<movies xmlns:m="http://poggin.wccnet.edu/xml/movies">
    <!-- ... -->
</movies>

This makes it possible to use the "nickname" m to identify an element name as part of the namespace declared as "http://poggin.wccnet.edu/xml/movies".

Using Namespace "Nicknames"

The "nickname" is put in front of an element or attribute name, separated from it by a colon ":" (and no space).

<m:movies xmlns:m="http://poggin.wccnet.edu/xml/movies">
    <m:title>Memoirs of a Geisha</m:title>
    <m:contributor m:role="director">Rob Marshall</m:contributor>
    <m:contributor m:role="female lead">Ziyi Zhang </m:contributor>

</m:movies>

In the example above, the every element and every attribute are prefixed with the "nickname" m we gave to the namespace. This is good, but it can get a bit tiresome if you're typing it all yourself. Another option is to declare the namespace as the default, so you don't have to use the "nickname". Here's how:

<movies xmlns="http://poggin.wccnet.edu/xml/movies">
    <title>Memoirs of a Geisha</title>
    <contributor role="director">Rob Marshall</contributor>
    <contributor role="female lead">Ziyi Zhang </contributor>

</movies>

Let's look at an example that uses two namespaces: our own for movies, and the Dublin Core namespace.

<movies xmlns="http://poggin.wccnet.edu/xml/movies"
xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:title>Memoirs of a Geisha</dc:title>
    <contributor role="director">Rob Marshall</contributor>
    <contributor role="female lead">Ziyi Zhang </contributor>
</movies>

What we've done here is to declare our movies namespace as the default. We did this by not putting :m in front of the URI. But we used the Dublin Core title element, prefixing it with the "nickname" dc declared in the movies element.

A note about the terminology:
  • I've used the term "nickname", but the official word is qualifier.
  • Names without a qualifier are known as local or unqualified names.
  • Names with a qualifier are called universal or qualified names.

Official and Unofficial Namespace Use

The W3C Namespace Recommendation gives the syntax for creating namespaces. The sole purpose for it is to distinguish between names used in different ontologies.

The URI used in declaring a namespace looks as if it should lead to a definition of that namespace, but officially it doesn't. In fact, the URI does not even have to be a URL (a Universal Resource Locator) because it need not have any location associated with it. If you prefer, you could use your telephone number as a URI - that's officially legitimate, as defined in RFC 3986, which defines URIs!

<movies xmlns:m="tel:+1-734-973-3311">

Officially, any URI that is unique in the global context, is a legitimate namespace identifier.

But...

Widely accepted "best practice" is to use the URL that points to the definition of the namespace - either a DTD or a schema. This is helpful for two reasons:

  1. This makes it possible for any human who needs to know how this namespace is organized, to follow the link and find more information. (I suppose a phone number could be used this way, but it's certainly not as helpful, and would be a real pain for the author!)
  2. It is also possible for software to follow the link and use information at the other end to validate the document against the definition of the namespace.

So whenever possible, a namespace URI should point to a definition of the namespace. Again: this is best practice, not an official requirement.

Link to Top

 

Link to Top

What is an XML DTD?

In a Nutshell:

A DTD is a Document Type Definition. The DTD defines each element, its data type, and its attributes. A DTD's statements begin with <! and end with >.

Learn more in this section about...

 

DTDs: the Concept

DTD: Document Type Definition

DTDs, in their present form, emerged as part of SGML in the 1980s. They use Augmented Backus-Naur Form, a system for expressing technical specifications and rules, widely used in computer science.

The purpose of a DTD is to define the elements and attributes of a document type - the variety of markup, whether XML or SGML - in such a way that both humans and software can use it.

Recognizing DTDs

DTDs can be either separate documents, or part of the XML file. Either way, you can recognize them by their delimiters:

<! and >

XML documents often have a Document Type Declaration in their prolog. This refers to the Document Type Definition elsewhere. For example, an XHTML document declares its type this way:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The Document Type Definition itself (either in a separate document or in the XML file) typically consists of defintions of entities, elements, and their attributes, in the general form:

<!ENTITY % name "content">
<!ELEMENT name (child elements)>
<!ATTLIST element-name list of attributes and their features>

More about DTDs in module X51c.

 

Link to Top

What is an XML Schema?

In a Nutshell:

An XML schema is a document that uses XML to define an XML language type, or document type. You can recognize a schema because it uses standard XML delimiters, begins with the <xs:schema> element as its root, and generally refers to the namespace of http://www.w3.org/2001/XMLSchema.

Learn more in this section about...
Got it already? Check yourself...

 

The Idea of Schemas

The idea behind schemas is to define types of XML using XML syntax. This makes simpler software able to validate XML files, because it only needs to understand XML syntax, rather than having to interpret ABNF as it would with a DTD.

In addition, schemas are able to define more types of data format than DTDs do. This makes it possible for software to check much more precisely the validity of data entered in these files. In turn, this makes schema-defined XML more compatible with database management systems and other software that requires strict adherence to data types and formats.

Recognizing Schemas

XML schema documents are easy to recognize:

  • It is XML and usually has a prolog with <?xml version="1.0" encoding="utf-8"?> (the encoding doesn't matter for identification)
  • The root element is <schema ...> or <xs:schema ...>
  • The namespace is usually http://www.w3.org/2001/XMLSchema

Schemas can be constructed using any of several schema systems, the most widely used of which is that provided by W3C.

More about schemas in module X52c.

 

Link to Top

What's the Difference between DTDs and Schemas?

In a Nutshell:

DTDs and schemas both provide ways to describe XML languages. DTDs are an older way that uses ABNF; schemas are a newer way that uses XML itself. Schemas provide greater simplicity and flexibility, but lack the ability to define "entities".

Learn more in this section about the difference between DTDs and schemas, and recommendations about which to use.

 

Comparing DTDs and Schemas

DTDs appeared with SGML, the ancestor of XML, in the early 1980s. Schemas, on the other hand, began appearing in the late 1990s along with XML itself. This chart outlines the differences:

FeaturesDTDs Schemas
Origin 1982 1999
Language ABNF (Augmented Backus-Naur Form) XML
Data types available 10 38+
Entities available Yes No
Data checking Very general Very specific
Software that understands it Almost all Growing number (all new versions)
Learning Curve Somewhat steeper Somewhat less
Varieties 1 standard 1 standard +
4-5 other forms widely used
More details: X51 X52
Ask yourself, before looking at the next section: Which would you recommend using for a new XML project: DTD, or schema? Answer: See "Recommendations" below.

Recommendations

Given the "competition" between DTDs and the various types of schema languages, what do you think is a wise course to follow when beginning a new project?

  1. Use DTDs?
  2. Use schemas?
  3. Use both?

First of all, most large projects don't start with totally new XML definitions. You would probably want to use DTDs if you're extending a large standard based on DTDs, or schemas if the opposite is true. But that's not a hard-and-fast rule, since nothing prevents using a DTD in one namespace and a schema in another. There's a lot to be said for consistency, though!

The added data typing and checking available with schemas makes it possible to keep the data more consistent and accurate. All newer software is able to deal with schemas, particularly the W3C standard variety.

Among the varieties of schema available, the W3C standard is almost always the best type to choose. Some varieties offer special features or capabilities not available in the W3C standard, but as with any choice between standard and non-standard, the standard offers the best long-term compatibility and path to the future.

What about DTDs? The one feature offered by DTDs that is not available in schemas is the ability to define entities. Entities are short sequences of characters beginning with & and ending with ; that are defined to become other sequences of characters when an XML file is processed. The most familiar of these are the ones "built-in" to the XML standard, such as:

&lt; &gt; &amp; &eacute;

which are rendered as:

< > & é

(For more detail on standard entities, see module W22f.)

You don't need a DTD to use the standard entities. However, DTDs let you custom-build your own entities, so that longer sequences of characters - such as standard names and addresses that are repeated frequently in a data file - can be abbreviated. Although convenient, there is nothing in custom-built entities that can't be handled fairly easily by software in other ways.

The most frequent circumstance making it necessary to use DTDs is older, specialized software. All the general purpose XML editors are comfortable with schemas, but older, specialized software may be dependent on DTDs. This might be the case if your project calls for building on an existing ontology that defines its XML using DTDs, and has not been updated to handle schemas.

Bottom line: unless there is some circumstantial reason not to, use W3C schemas.

Link to Top

to Top About This Document
Audience
to Top

This module is for people who are familiar with XML markup in general (modules X01c and X02c) and are interested in how namespaces DTDs, and schemas work.

Objectives

On successful completion of this module, you will be able to explain the concepts of namespaces, DTDs, and schemas

Module X50c: DTDs, Schemas, and Namespaces
This document is part of a modular instruction series in Computer Instruction. For more information, see the overview or the list of modules in this series, X: XML, XHTML, DHTML, CSS. This document has been used in the following classes: CIS 179.
History
Original: 6 November 2006, by Laurence J. Krieg
Last modification:
Copyright
Copyright © 2007, Laurence J. Krieg, Washtenaw Community College
Instructors: You may point to this file in your Web-based materials; however, its location may change without notice.
Students: You are welcome to make a copy for your personal use.
All other uses: Please contact the author, Laurence J. Krieg, for permission: krieg@ieee.org.
Background: X01c | Related modules | Module Home | Next reading: X51c

Link to Top