CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise



June 5, 2000, Volume 3 - Number 9



XML Quirks


Tips for taming extensible markup language

After all these years, Iseem to be back on the conference circuit. In March, I traveled to the Netherlands to speak at Database Systems 2000, sponsored by Array Publications.

Before quickly returning home, I gave a short talk on Web databases, met some nice people, and attended a workshop on extensible markup language (XML). The workshop speaker, Hans Arents of IT Works of Belgium (www.ITworks.be), gave the lecture in Dutch (which I do not speak or read) and presented the slides in English. He has the delightful habit of cramming as much material as possible on his slides. Fortunately, the handout was a helpful quick guide on the subject, so I was able to follow along with the lecture portion of the workshop.

If you are a Web builder, let me describe XML as similar to HTML on steroids. If you are a database person, then think of XML as standard general markup language (SGML) on a diet.

Without going into a lot of details, XML is a stream of Unicode text, with tags enclosed in angle brackets that are terminated by matching angle-slash brackets (<item> ... </item>), in the same way that HTML uses them. The tags match up to represent a hierarchical structure modeled by nested structures.

Like HTML, you can use XML as a document format. However, XML can also link to objects that, instead of being passive like HTML, take various actions on their own. Although very nice for Web site programmers, who now have a standard language to build more active sites, this ability is not why XML is interesting for database people. XML is also a data format that can contain the information needed to read itself into a database. Much like the CREATE DOMAIN statement in SQL-92, I can use XML to define my own data types. I can describe these data types with pattern matching or ranges, as for example:

<datatype name = "partnumber" source = "string">

pattern value = "\d(5)-\d(5)"

</datatype>

<datatype name = "carspeed" source = "integer">

<minInclusive>0</minInclusive>

<maxInclusive>250</maxInclusive>

</datatype>

In addition to using the native or custom data types to define derived data types, I can also put references to Web sites, graphics, Java scripts, and just about anything else I want. In short, XML is a good tool for building Web sites.

I am a database person, however, and I have a different use for XML. Remember electronic data interchange (EDI)? In this system, industry groups were to meet under the ANSI x12 committee and agree on a set of code numbers for the transmission of standard documents within their industry. EDI was the first attempt at what we now call a business-to-business (B2B) tool. EDI had only limited success, because people found that documents can become very complex and consensus for an entire industry is hard to reach.

With XML, you only need consensus on terminology and representation. You can then put the data items into a structure of your choice, send the structure as a message, and let receivers pull the data items out as text strings and put the data into their back-end database.

People object that an XML file, containing all of those tags, is too big. Although an XML file is larger than a binary file in the native data format of the target database, other factors mitigate this drawback:

•The main reason for using XML is so you can send the XML file to any back-end database, without worrying about the details. Portability is important when you have no idea what those other databases might be.

•XML is pure Unicode, and modems are proficient at compressing Unicode text. Many other tools are also available for working with this text.

•Bandwidth is increasing every day, so sending XML files is becoming less of a problem.

The flexibility of the XML file is another advantage. The receiver can either convert the XML into HTML and display the file on a browser or rearrange the structure into a different XML document to be forwarded to yet another database. For example, I might receive an inventory in which the hierarchy puts suppliers above parts. I can rearrange the hierarchy structure and forward the XML file as parts above suppliers.

Support for XML is not a major concern. SQL Server 2000 contains an XML native data type, Informix has an XML Data Blade, and both DB2 and Oracle offer add-ins for XML.

Searching XML does not involve SQL, which uses logical predicates. The XQL language and other XML tools perform searches based on patterns. These patterns ask about the nesting of data items within each other. For example, you might ask for the “last names of persons,” who are nested within “customers on an invoice.” Oh well, I guess I can adjust to another conceptual model of how to do queries.


Joe Celko is an Atlanta-based independent consultant. He is the author of Instant SQL Programming (Wrox Press, 1997). You can contact him at www.celko.com or 71062.1056@compuserve.com.
 

Copyright © 2000 CMP Media Inc. ALL RIGHTS RESERVED
No Reproduction without permission





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address