Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




November 10, 2001



The Naked Truth

With XML, the metadata emperor still has no clothes

By Terry Moriarty

Integrating metadata from our multiple tools has always been problematic, and little progress has been made toward improving the situation. But in a recent discussion about this woe, a colleague exclaimed that soon there would be no problem: All his vendors, from data and object modeling and business rules development to data quality assessment and repository, had expressed their commitment to the extensible markup language (XML). Once these vendors delivered the ability to import and export metadata through XML, our metadata integration problem would be solved, he suggested.

His expectations may reflect the irrational exuberance many people have right now for XML. This exuberance is reminiscent of the story "The Emperor's New Clothes," in which prevailing beliefs blind people to the naked truth.

XML is bombarding us from every direction. Magazines flash the letters across their covers as a teaser to get you to read the articles inside. Companies seek clout by including these three powerful little letters in their name. Most information technology conferences devote at least one speaking slot to it. Heck, it even has its own conference. I don't think any other buzzword has achieved this level of hype so fast.

And it's just a programming language. That's right, a programming language.

XML - which is sponsored by the World Wide Web Consortium (W3C), an Internet-oriented standards organization - is primarily used to bring structure and flexibility to HTML (hypertext markup language), the language that Web browsers interpret to display Web pages. HTML comprises a set of tags that let you embed hyperlinks and describe how to format text and position graphics. A Web page's content is surrounded by these tags, which control fonts, colors, and the presentation of tables and lists. For you old IBM mainframers, if you ever wrote a document using Script, you're well on your way to becoming an HTML programmer.

Anyone who has mastered HTML tags can whip out Web pages very rapidly. But maintaining the pages over time is laborious. In an HTML document, formatting rules and content are intermixed. Finding the point within the HTML document where specific content needs to be changed can be difficult. Changing the presentation format of repetitive content, such as data presented in a table or a list, can be a very tedious process. In some cases, you may want to include the same content on all your Web pages, such as a copyright statement or signature information. You'd like to be able to format this type of information once and include it in every page. Unfortunately, HTML does not let you embed one document into another, so making a global change to several HTML documents requires changing each document individually.

XML addresses many of the deficiencies in HTML by separating a Web page's content from its formatting rules. The XML architecture fundamentally involves two files: the XML document (content), and a style sheet (formatting rules).

Typically, the style sheet contains commands that instruct a Web browser (or other interpreter) to transform an XML document into an HTML document. However, XML is not limited to supporting only HTML. In reality, XML is a "meta" programming language, which can be used to prepare the same data for different formats, such as ones that Web-enabled mobile phone or personal digital assistants can interpret.

XML: The Data Specification

Besides the separation of content and structure, XML also represents an evolution from tag languages like HTML because it supports user-defined tags. Therefore, I can make my XML documents easier to understand by creating tags that are meaningful to my data. For example, my Web site (www.inastrol.com) provides information about the articles I've written. The typical information about an article is provided, such as title, authors, magazine, and date published. In addition, I've categorized each article based on the topics it addresses, such as business rules, metadata management, or customer relationship management.

If I were to convert the source for my Web pages to XML, my customized tags would include article category, title, author, and so on. An example of the listing for this article in the Metadata Management category page specified in XML is shown in Listing 1.

The context of the document is much easier to grasp when the data elements are surrounded with meaningful tags, instead of just its formatting instructions. Formatting instructions, as I explained earlier, reside in the associated style sheet.

XML data is structured as a hierarchy, a series of one-to-many relationships between parent and child nodes. In my Web site, for instance, one Article Category parent can have many Article children.

You can create integrity rules to ensure that an XML document contains only correctly structured data, through the Document Type Definition (DTD) feature. These rules specify the allowable structures between the parent and child nodes, such as whether a child node is optional or mandatory. Examples of the types of rules that can be specified at the element level include whether the element is optional or required or multivalued. (See Listing 2.)

Some of the XML DTD syntax can be quite cryptic. For example, the plus sign (+) after AUTHOR_NAME indicates a requirement for one or more author names. The question mark (?) after ABSTRACT indicates that the element is optional. An asterisk (*) indicates a zero-to-many relationship between the element and its parent node. #PCDATA, which denotes that any string can be defined for the element, is the only data type XML 1.0 supports. However, future versions are expected to support more data types.

An important aspect of XML is that it allows a DTD to be shared across multiple documents.

XSL: The Programming Language

Extensible Stylesheet Language (XSL), the "programming" logic for XML, specifies the rules for transforming the data in an XML document into a document that the target environment, such as a Web browser, can process. An XSL file provides a view into a specific set of tags defined in another XML document. The same XSL file can be applied to multiple XML documents, so long as the specifications of the tags are shared by each XML document.

An XSL file, also known as a style sheet, contains the rules for selecting and presenting the data from its associated XML documents. If special handling is required for one of the tags defined in the XML document, a template is defined in the XSL file. For example, the following XSL template causes the category name to be displayed in a blue font and in accordance with Heading Style 1:

<xsl:template match="ARTICLE_
CATEGORY">
<h1><font COLOR="#0000ff">
<xsl:value-of select="CATEGORY_
NAME"/>
</font></h1>
</xsl:template>

When the XML document is executed, the XSL template is applied to the data, generating the following HTML code:

<h1><font COLOR="#0000ff">
Metadata Management
</font>
</h1>

XSL supports most of the commands you expect to find in a programming language, such as for-each loop control, if-then-else constructs, and selection and filtering through wildcards. Commands exist for navigating both up and down the hierarchy. XML is rapidly evolving into a robust programming language.

Impact on Data Management

How does XML affect the data management organization? What we're seeing is the introduction of yet another technology into our environment. Web developers have realized that they must provide their users with access to corporate data. This is a good thing. However, XML represents an entirely new standard developed to accommodate this requirement, including new techniques for describing data structures. So we have another data definition language standard. Did we really need another one?

We are now faced with answering the following questions.

  • What infrastructure do we need to establish to ensure that DTD specifications are available for sharing across Web applications?
  • What naming standards should we follow for XML tags? (Some people are suggesting that the database column names can be used as the tag names, while others consider these names to be too cryptic.)
  • How do we ensure that the integrity rules a DTD specifies are consistent with the rules incorporated into our data models?
  • How do we manage the data embedded into XML documents?
  • Is there any data that is held only in XML documents that should be made available for sharing enterprisewide?
  • Are there situations in which XML can serve as the master data store for a portion of an enterprise's data?

As with any new technology, care must be taken to ensure that XML is properly integrated into the overall data management strategy.

XML and Metadata Integration

Vendors have been rushing to incorporate support for XML into their products. With each new release, more products will be able to import and export their metadata through XML documents. Does this mean that our metadata integration issues have been resolved? Hardly! Each vendor is creating DTDs that describe their product's metadata schema. Most often, this schema is unique to their product line. If a product could not share its metadata with other tools prior to XML, it is unlikely that it will be able to share its metadata once formatted as an XML document. It is not enough to generate XML syntax. To share metadata, products must share DTDs. To share DTDs, vendors must agree on a common metadata model.

Our industry has a dismal track record on agreeing on a common metadata model. And I see no relief in sight in this area. Currently, several standards groups are working with varying degrees of cooperation on metadata models: The Object Management Group's (OMG) Common Warehouse Metamodel, The Meta Data Coalition (MDC) and Microsoft with their Open Information Model, and the W3C's Resource Description Framework - to name just a few. (At press time, OMG and MDC had just merged - Ed.) It's difficult enough to get a vendor to support one standard, let alone multiple ones. Case in point: when IBM's user group GUIDE presented the company with a standard in 1986, not even IBM chose to use it as the basis of its repository metamodel.

Nothing seems to have changed in the last 20 years. I'm certainly not holding my breath until the standards community delivers a model that any of my vendors will actually build to. What is becoming available, though, are mapping tools that read in disparate XML DTDs and let you do the mapping between them. Mapping tools let you transform the DTD generated by one product into one that another product can import. Unfortunately, the mappings are left as an exercise for us to do.

I applaud any group that takes on the responsibility for developing an industry standard. And I think that XML is an essential component of the Internet infrastructure. But when it comes to integrating metadata across various vendors' products, XML is just a thread that eventually can be used to weave the fabric of an integrated environment. Without a pattern, in the form of a single industry standard metadata model, we're going to end up with a poorly conceived patchwork that meets no one's needs. I'm afraid the metadata emperor may be better off continuing to run around in his birthday suit.

Terry Moriarty (terry@inastrol.com), president of Inastrol, a San Francisco-based information management consultancy, specializes in customer relationship information and metadata management.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space