|
Instructional Module X10a |
|
|
Data And Metadata |
|||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
In a Nutshell:
|
|||||||||||||||||||||
| An Example of "Facts" |
Here's a common set of "facts": Chiang Liu |
||||||||||||||||||||
Ask yourself: |
|
||||||||||||||||||||
| Here's another example: | 2122 NEC187 1700 NEC196 - NEC217 - NEC228 1731 |
||||||||||||||||||||
Ask yourself: |
|
||||||||||||||||||||
| Discussion and more questions |
In the first example, most people can identify what each "fact" represents. That's because we're accustomed to seeing names and addresses in this arrangement and order.
|
||||||||||||||||||||
Metadata filled in for you: |
The second example represents a portion of a train schedule.
|
||||||||||||||||||||
| So what? |
Humans need metadata sometimes. Computers need metadata always. The point: XML is a system for providing metadata, mainly for computers, but understandable for humans. |
||||||||||||||||||||
Well-Formed XML | |
|---|---|
In a Nutshell:Well-formed XML conforms to these four rules, determined by the World Wide Web Consortium ( W3C):
|
|
Validity |
|
|---|---|
In a Nutshell:In addition to being well-formed, XML files must be valid: all the elements (tags and their contents) conform to the XML-based definition.
Learn more in this section...
Got it already? Check yourself...
|
|
What's the Difference between "Well-Formed" and "Valid"? |
To be valid, a document must first be well-formed. But there's more... No document is "just" XML, because XML isn't really a language, it's just a set of rules for how to create a language. All "XML" documents contain some specific XML-based language. Anybody can invent an XML-based language - assuming they know the rules! There are thousands of standard ones; here are some that are better known:
These XML-based languages, and all widely-used ones, have on-line definitions. These definitions:
So in order to be valid, an XML document must conform to the rules of one (or more) specific XML language's official, public definition, as well as being well-formed. Ask yourself: what's the advantage to having publicly available XML language definitions on the Internet? Answers: wide diffusion, increased use, less chance of misunderstanding ... and others. |
A Word about DTDs and Schemas |
There are two ways you can formally define an XML-based language:
These are the documents that provide a formal, on-line, machine-readable definition of the XML-based language. If you go into any detail at all with XML, you'll need to know a lot about DTDs and schemas. Here, we'll just point out a couple of important facts: Schemas are the best way to define a new XML language. Why? Because schemas use XML itself, and so are consistent with everything else connected to XML. They also provide many more options and finer control over data formats than DTDs. Most new standards are defined using schemas. DTDs are still important to know about, because they were the only way to define XML languages at first, so many of the original XML languages are defined using DTDs. They're also used in XML's predecessor, SGML (Standard Generalized Markup Language). Learn more:
|
Where to Find Validators |
The best way to validate a document is using W3C's validator, http://validator.w3.org/. Many XML software tools have validators built-in. These are fine to use, but don't have the authoritative weight of the W3C. Learn more about validators |
Why Validity is Important |
Sometimes, we can get away with XML-based documents that are well-formed but not valid. (Rarely, we can even get away with documents that aren't even well-formed!) So if we can get away with it, why bother with formal validity? The answer is that XML's strength lies in being both open and standard. Invalid documents are weak, because they deceive others into thinking they're open and standard.
Creating valid documents - especially coding by hand - can be frustrating, as the validators don't usually give very helpful messages. Depending on your situation, there are a couple of things you can do:
|
Elements and Attributes | |
|---|---|
In a Nutshell:
Learn more in this section about... Got it already? Check yourself... |
|
Elements |
Elements are the basic unit of XML documents. Think of them as the atoms from which the "chemistry" of XML is derived. Let's look at an example of a simple XML document:
This illustrates each of the types of elements:
Note also that elements that are not empty (the ones with data or other elements inside them) have a closing tag. The closing tag has a forward slash right after the opening angle bracket </ and no attributes. Ask yourself: Is there another logical possibility for what could be in an element? Answer: Yes, you could have an element containing but text data and other elements. These are called mixed elements are are skipped here for simplicity. Learn more about elements in the XML Specifications... |
What's the Difference between Empty and Non-empty Elements? |
An empty element is one that has no content - it may have attributes, but everything is contained in one tag. Here are some more examples:
Ask yourself: Is the flexibility illustrated in these examples more helpful, or more confusing? You're right. (Whatever you said!) But I hope you'll find it helpful. |
Attributes |
Attributes are properties of elements that are listed inside the element's main tag. Where to put attributes:
What attributes look like:
When to Use Attributes:Attribute values and the text content of elements are both example of data, as opposed to metadata. So you may wonder how to decide whether to put data in an attribute or an element's data. There are no hard-and-fast rules, but here are some considerations:
Ask yourself: What might be some other considerations in deciding whether a data item should be coded as an attribute value or element value? Answer: There are lots of considerations, and no one correct answer! Learn more about attributes in the XML specifications... |
XML Delimiters | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
In a Nutshell:Delimiters are boundary -markers. XML uses three main delimiter-pairs:
Learn more in this section about...
Got it already? Check yourself...
|
|||||||||||||||||||||||||||
Angle Bracket Delimiters |
Angle brackets are used to mark the boundaries of tags.
The example shows a typical XML non-empty entity, which begins and ends with tags <building number="37"> and </building>. While the tags delimit (mark the beginning and end) of the entity, the tags themselves are delimited with angle brackets < and > To be specific about the angle brackets:
When one of these two symbols needs to appear as part of the data in an element, it must be represented by its entity code Ask yourself: Why are delimiters necessary for tags in XML? Answer: Computers can recgnize the tags much more quickly if they are clearly marked. |
||||||||||||||||||||||||||
Quotation Mark Delimiters |
Quotation marks are familiar to just about eveyone, and work in XML much as they do in programming languages, or even literature. Here's an explanation, just to make sure everything is clear:
Here are some examples of quotes used in XML:
In the first two examples, quotes (double and single) are used to delimit values of attributes. In the third, they are used that way, and also in the element's data. Here are the technical details about quotation marks:
Instead of beginning and ending delimiters being different from one another, the same delimiter is used at the beginning and end. Ask yourself: Why do you think quotation marks are used differently than angle brackets? Answer: Possibly because most people are already familiar with using quotes this way. (There are many possible reasons!) |
||||||||||||||||||||||||||
Entity Delimiters |
Entities in XML are codes that can be used as abbreviations or ways of entering repetitive or difficult data. They are discussed in another module (see below). The most commonly used entities are characters that are either:
Examples:
Here are the technical details:
Note: The code for semi-colon hardly ever needs to be used. It is only treated as a delimiter when it is at the end of a properly-formed entity code. Ask yourself: Why do you think the characters & and
; were chosen as delimiters? Answer:
Your guess is as good as mine! Learn
more about entities...
|
||||||||||||||||||||||||||
The XML Prolog Section | |
|---|---|
In a Nutshell:The Prolog is the optional part of the document that comes before the root element. Its purpose is to help XML software to process the file correctly by giving background information, like the version of XML and the character encoding. Learn more in this section... Got
it already? Check yourself... |
|
What is a Prolog? |
In XML, the Prolog is the optional part of the document that comes before the root element. In this mini-example, we have a prolog: |
What is the Prolog for? |
Its purpose is to help XML software to process the file correctly by giving background information, like the version of XML, and the character encoding. All this is intended to help software process XML files correctly. It's possible for software to process XML files without a prolog, but only if the software already knows all about the file. Information in the prolog gives software the ability to verify the file type and encoding, and make adjustments if either of these were unexpects. With this information, XML files have the potential to be used much more widely. If the XML language uses the older DFD - Document Type Definition - the doctype statement is part of the prolog, too. (We'll get to DFDs later.) Ask yourself: The prolog is optional; is it worth
the trouble of including it? Answer:
Yes, because it helps insure the document is processed correctly.
Learn more about the Prolog...
Learn
more about...
|
The XML Root Element | |
|---|---|
In a Nutshell:The root is the first element in an XML document, and it contains all the other elements of the document.Got it already? Check
yourself... Learn more in this section...
|
|
What is a Root Element? |
The root element is the starting point of an XML document.
Here's a mini-document: Ask yourself: What
is the root element in this example? Answer: <train> |
Why is it called a "root"? |
It's called a "root" because of the way XML is designed to be processed. As soon as an XML file is opened by software that processes it, the software creates a data "tree". This is a very efficient way for computers to store and process data internally. In fact, one of the reasons XML has become such a widely-used method for storing and transmitting data is that it is designed to be processed using this efficient internal structure. All "tree" structures need to have a starting point, which is known as the "root" of the tree. From there, they branch out, with each element forming a limb or a leaf of the tree. Learn
more about tree structures...
|
The XML Tag | |
|---|---|
In a Nutshell:XML tags are the markers for the beginning and end of each element.
Got
it already? Check yourself... Learn
more in this section...
|
|
Anatomy of a Tag |
The basic structure of all XML tags is simple. You've seen them! They look like the tags in this simple element:
The components:
Beyond the basics, what's in the tag depends on what role it serves. There are three roles:
|
Opening Tags |
The opening tag marks the beginning of an element that has content; after the content will come a closing tag.
Opening tags can also have any number of attributes:
Learn more about attributes... |
Closing Tags |
Closing tags, at the end of an element, are simple: they just have a slash right before the name of the element:
Tip: the closing tag never has attributes, even if the opening tag does. |
Self-Closing Tags |
Elements that have no content are known as empty elements. They can have attributes inside the element's tag, but they have nothing after the tag. To show processing softwere that it need not look for content, empty elements are required to have a slash before the closing delimiter:
This kind of tag is called "self-closing" because it doesn't need to be followed by a closing tag. |
XML Document Examples | |
|---|---|
In a Nutshell:In this section, there are two examples of XML documents. You can use these to get an idea of what XML documents look like, or to review the basics of XML document structure. Got it already? Check
yourself...
|
|
Example 1 |
<?xml version="1.0" encoding="utf-8"?>
<mountains>
<mountain name="Brokeoff Mountain">
<elevation unit="feet">9144</elevation>
<lattitude>N40:26.717</lattitude>
<longitude>W121:33.605</longitude>
<locality>Lassen Volcanic Wilderness
Tehama County
California
United States
</locality>
</mountain>
<mountain name="Shoshone Point">
<elevation unit="feet">5672</elevation>
<lattitude>N40:41.045</lattitude>
<longitude>W116:32.353</longitude>
<locality>Eureka County
Nevada
United States
</locality>
</mountain>
</mountains>
Ask yourself: Which line is the prolog? Answer: <?xml version="1.0" encoding="utf-8"?>
Ask yourself: What is the root element? Answer: <mountains>
Can you find any "empty" elements? Answer: this document has no empty elements
Can you find an element with an attribute? Answer: mountain and elevation have attributes
Ask yourself: What (if any) are the attributes? Answer: "name" and "unit"
Ask yourself: What (if any) are the attribute values? Answer: "Brokeoff Mountain", "feet" (in two places), and "Shoshone point" |
Example 2 |
<?xml version="1.0" encoding="utf-8"?>
<rigs>
<rig>
<layout type="trac+trail"/>
<tractor id="CK200511">
<vin>KW044HH8693779450</vin>
<manufacturer>Kenworth</manufacturer>
<model>Heavy Hauler</model>
<modelYear>2002</modelYear>
<yearAquired>2005</yearAquired>
<horsepower>650</horsepower>
<weight unit="pounds">12000</weight>
<axles>3</axles>
<tires>10</tires>
</tractor>
<trailer id="LE199904">
<manufacturer>East</manufacturer>
<model>Rear-dump 30</model>
<yearAquired>1999</yearAquired>
<modelYear>1999</modelYear>
<weight unit="pounds">11000</weight>
<capacity unit="tons">30</capacity>
<axles>2</axles>
<tires>8</tires>
</trailer>
<loadedWeight unit="pounds">83000</loadedWeight>
</rig>
</rigs>Ask yourself: What is the root element? Answer: <rigs>
Can you find any elements with an attribute? Answer:
layout, tractor, trailer, capacity, weight, and loadedWeight have
attributes
Ask yourself: What (if any) are the attributes? Answer: "type", "id" and "unit"
Can you find any "empty" elements? Answer:
layout is an empty element |
Putting XML Together (Introductory) | |
|---|---|
In this section, you'll put simple XML files together. We'll do it in two steps:
In this section...
Got it already? Check
yourself...
|
|
Here's the Idea |
You're setting up a simple restaurant menu database, and you want to use XML. Here's what you'll need to represent:
|
Here's the Code to Put Together |
This part has the lines of code you'll need. Your job is to put them together in the right order. Use your mouse to copy the code, paste it into a text editor, and drag each line to the right place. (Hint: there are three menu items entered...)
|