by

A Pleasant Little Chat about XML

An important part of the ProfHacker 101 Manifesto is that we want to foster change by teaching people bits and pieces of technology that we use every day but others find totally intimidating. Markup languages, programming languages, database schemas, and similar technologies are those things that I live and breathe but I know send others running for the hills.

I am not here to tell anyone to use or begin to think about working with anything that doesn’t have a natural place in your personal or professional lives, but I am here to help you gain a working vocabulary of some of these topics. Plus, the next installment of the “Working with APIs” series (see parts 1, 2, 3) will use XML and I wanted to make sure that I had something relatively concise that you could reference. This post is simply a gentle introduction to XML and what it is intended to do.

XML [EXtensible Markup Language] is what its name suggests: a markup language, much like HTML. In fact, XML and HTML are siblings (if you want to think of it that way) in that they are both derivatives of SGML, or “Standard Generalized Markup Language”.

Data that has been encoded (“marked up”) in XML is just a plain text file with text surrounded by tags. At its core, XML is platform and application independent; you don’t need a proprietary word processing program to create or read an XML document, and a text file containing HTML can be opened on Windows, Mac, Linux/UNIX, or any other operating system that opens text files. [However, if you are looking for an XML editor, I recommend the <oXygen/> XML editor although there are others]

XML is not a programming language (neither is HTML for that matter). XML alone will not do anything at all. Much to @warnick’s dismay, a pony will not magically spring forth from an XML document… unless some other process has put the pony in place and XML is available to transport it to you, because XML was designed to store, transport, and exchange data.

Here’s the kicker: there are no XML tags to memorize, because with XML you create your own. When you mark up an HTML document you have to know things like <html><head><title></title></head><body></body></html> and everything in between, but with XML the structure of the document and the language you use to describe the data being stored is completely up to you.

The image used in this post shows an example of an XML document. XML documents contain two major elements: the prolog and the body. The prolog contains the XML declaration statement, and any processing instructions and comments you want to add. The following snippet is a valid prolog, and you can see at least some of it in the image as well:

<?xml version="1.0" ?>

After the prolog comes the content structure. XML is hierarchical, like a book—you know that in general books have titles and chapters, each of which contain paragraphs, and so forth. There is only one root element in an XML document, and in the case of the example in the image, the root element is “quiz”. But since I just mentioned a book, and books are easy to grasp, let me use the example of a book in a catalog. [NOTE: I yanked this example from Chapter 28, "Working with XML," of Sams Teach Yourself PHP, MySQL and Apache All in One, 4th ed., written by yours truly.]

The root element in this example is “Books”; the tags <Books></Books> surround all other information. Next, child elements are added to the document. In my Book example I’ll just pretend that Books only need elements for title, author, and publishing information. But the publishing information will likely contain more than one bit of information—you’ll need a publisher’s name, location, and year of publication. Not a problem—just create another set of child elements within your parent element (which also happens to be a child element of the root element). For example, just the <PublishingInfo> element could look like this:

<PublishingInfo>
<PublisherName>Sams Publishing</PublisherName>
<PublisherCity>Indianapolis</PublisherCity>
<PublishedYear>2008</PublishedYear>
</PublishingInfo>

<

All together, a sample books.xml document with one entry could look something like this:

<?xml version="1.0" ?>
<!--Sample XML document -->
<Books>
  <Book>
    <Title>A Very Good Book</Title>
    <Author>Jane Doe</Author>
    <PublishingInfo>
      <PublisherName>Sams Publishing</PublisherName>
      <PublisherCity>Indianapolis</PublisherCity>
      <PublishedYear>2008</PublishedYear>
    </PublishingInfo>
  </Book>
</Books>

There are two important rules (among many) for creating valid XML documents:

  • XML is case sensitive, so <Book> and <book> would be considered different elements.
  • All XML tags must be properly closed, XML tags must be properly nested, and no overlapping tags are allowed.

Ok, so that’s what XML can look like, but when do you use it? Well, it depends. Technically, you probably use XML every day, at least if you read blogs via a feed reader—all of the content that gets to that reader is in XML format—or if you use a third-party client to interact with Twitter. Remember, XML is used to store and transport data; when a client interfaces with Twitter using the Twitter API, data is sent back in response. That data looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="http://base.google.com/ns/1.0" xml:lang="en-US"
xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/"
xmlns="http://www.w3.org/2005/Atom"
xmlns:twitter="http://api.twitter.com/">
... (snip) ...
<entry>
  <id>tag:search.twitter.com,2005:4655492548</id>
  <published>2009-10-06T14:00:37Z</published>
  <link type="text/html"
  href="http://twitter.com/mkgold/statuses/4655492548"
  rel="alternate"/>
  <title>Late on these, but congrats to the @ProfHacker
  team on the @chronicle article and to @chutry on the First
  Monday piece. </title>
  <content type="html">Late on these, but congrats to
  the <a href="http://twitter.com/ProfHacker">@<b>ProfHacker</b></a>
  team on the <a href="http://twitter.com/chronicle">@chronicle</a>
  article and to <a href="http://twitter.com/chutry">@chutry</a> on
  the First Monday piece.</content>   <updated> 2009-10-06T14:00:37Z</updated>   <link type="image/png"   href="http://a3.twimg.com/profile_images/415797105/pic.jpg" rel="image"/>   <twitter:source><a href="http://twitter.com"   rel="nofollow">web</a></twitter:source>   <twitter:lang>en</twitter:lang>   <author>     <name>mkgold (Matt Gold) </name>     <uri>http://twitter.com/mkgold</uri>   </author> </entry> </feed>

The result above is the first result of the Twitter search for “ProfHacker” when I last checked a few minutes ago. If you were to access the search via the web you would get a different result—the data would not be sent via XML to your browser. But in order for a third-party Twitter client to show you the results, they first have to get the data (via XML) and then transform it into something readable through the client and on to you.

Right then. So, XML stores and transports data, and you get to figure out the structure and the tags that you use. Needless to say, there are thousands of other posts (and a ton of books, too) about XML and all of the other technologies that play a role in the storage, transformation, and transportation of data. Heck, the O’Reilly XML in a Nutshell book is over 600 pages long. That’s a heck of a large nut.

Tell me, besides getting a pony, why exactly you think you might want to use XML, and what it is you don’t understand. I could write for hours about what I want to do, but my problems are not your problems, and at ProfHacker we’re here to help you solve your problems. What practical example do you want to see next?

[Image from Wikimedia Commons]

Return to Top