CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise



April 28, 2000, Volume 3 - Number 7



Smaller is better when it comes to XML naming standards


XML Microstandards


William J. Lewis                

If I were a bettin’ man, as the saying goes, I’d probably not bet in favor of much across-the-board adoption of standard XML schemas. I say this despite the fact that several industries are putting a lot of collective effort into developing these schemas, such as those in repositories at Biztalk.org and Xml.org. If XML is to fulfill hopes that it’ll be the data interchange lingua franca, we do need standards, but schema repositories just don’t provide adequate support for tag names.

What I would bet on is the continuing use of these repository-based schemas in the typical “search, download, and modify” scenario. Of course, then you have to ask how “standard” the product of such a “search, download, and modify” methodology ends up being. Typically, developers take what is essentially a monolith, chop away what they don’t want or need, and add what they require. Anyone who has done any coding in a real-world development environment will recognize this as the time-honored technique of code (as opposed to component or object) reuse.

Software developers have discovered that the most reusable components tend to be those with encapsulated functionality just big enough in scope to be highly utilitarian, and no bigger. These components take on an elemental nature, acting like vocabulary words. Likewise, extending a more fine-grained approach to developing truly standard, reusable data definitions for the digital economy might lead us toward smaller chunks of XML metadata. These smaller chunks — “microstandards” if you will — could be highly cohesive groups of a few data elements, or, even more likely, atomic data elements themselves, which, as we’ll see, have the potential to richly supplement today’s shared collections of template schemas.

As the pace of activity around the Web-based XML schema repositories accelerates, the number of registered schemas increases dramatically. As an example, a quick count revealed that among the more than 300 schemas at one of the major XML repositories, at least eight of these describe purchase order documents. As a developer, which should I choose? Regardless of my choice, if my company deals with multiple trading partners — and what company doesn’t — which schemas will my partners choose? And across these multiple schemas, how many different tag names for “customer number,” “ship-to address,” or “purchase order number” are there likely to be?

FIGURE 1 Example UDEF classification hierarchy.


All of this may begin to sound familiar to those with a data administration background. The problem has been labeled “semantic dispersion,” the dilution of shared meaning due to proliferation of synonyms and homonyms. What is needed is a direction toward “semantic convergence,” clarity of shared meaning with a minimum number (approaching zero) of synonyms and homonyms. Are there frameworks or tools available today that support semantic convergence, and by doing so, help XML developers and their companies work toward a truly common vocabulary?

One such framework is the Universal Data Element Framework, or UDEF. Ron Schuldt, its originator, calls it “the ‘standard’ to harmonize other ‘standards.’” He describes the relationship between UDEF and XML in these terms:

“The fundamental thing that XML lacks is what UDEF has to offer. Specifically, XML lacks a rigorous rules-based approach for standardizing tag names across multiple domains of discourse. UDEF applies a rigorous rules-based approach for naming data elements (tags) from which one is able to derive a taxonomy-based intelligent ID to the name. The ID is the key that allows an unlimited number of aliases. Any XML system that is based on a normalized and widely adopted data dictionary (such as X.12, EDIFACT, STEP, and MIL-STD-2549) could continue using the tag names that are widely used or adopted, while using the UDEF ID to span the different standards.”

UDEF is a tool for identifying and resolving “semantic equivalence,” multiple names meaning the same thing. It therefore resolves synonyms and prevents homonyms. It’s been used as the basis of a product configuration data dictionary for the federal government (MIL-STD-2549), and is currently under consideration by the Electronics Industries Alliance, as well as the XML/EDI Group.

Similar efforts to provide for enhanced XML semantics have been undertaken by Ontology.org, in support of both the CommerceNet eCo project and the ebXML initiative of OASIS and UN/CEFACT. Ontology.org, composed largely of representatives from academic organizations, believes that “it is becoming all too obvious that [XML’s] utility is severely limited unless people can agree on the semantics of the terms being used in the metadata.” They define “ontology” to include “a vocabulary of basic terms and a precise specification of what those terms mean.” CommerceNet is a large nonprofit group comprising commercial organizations with interests in e-commerce. Their eCo framework includes seven layers, the most detailed of which, the Data Element Layer, specifies the registration of schema-independent, fundamental building blocks. EbXML is a newer initiative, and its detailed registry specifications are not yet announced.

How might these multiple, interrelated efforts (and you can’t tell the players without a scorecard) affect the day-to-day efforts of an XML schema developer? Suppose you, the XML developer, need to develop (or discover) a schema for a new document, transaction, message, or record. To complete this assignment, you must evaluate and, if necessary, discard each tag name the format requires. To make your work as reusable as possible, you want to use existing standards to the greatest extent you can. Where available standards fall short, you may want to submit your creation to be considered as a candidate standard.

Given the current state of affairs, when you’ve completed your assignment, you probably have not used an existing standard schema in its entirety. What you’ve developed is, for all intents and purposes, an entirely new schema. Even if you did use tags from a “standard” schema, there is no referenceable correlation between these instances in your schema and the standard. If you created new tag names, there is no way for another developer to access these without addressing your entire schema … and the next developer is back to square one.

If, however, the tag names themselves are reusable components, you could base your new schema on the standard tags available and add new tags that were not. Then you could register your schema, the new tags, and your schema usage of all tags new and old. Subsequent developers would have the benefit of not only your schema, but also all this additional detailed, cross-referenced information.

Sound laborious? Not necessarily. Imagine, if you will, a client-side schema editor on your workstation linked to an online, globally accessible, Web-based XML repository that includes reusable, comprehensive meta-metadata on tag names and their cross-references. Because this environment is based on a disciplined classification of meaning such as the UDEF, the editor and the repository work together to guide you to the tag names you need, simply by working through a tree or nested structure such as those presented by Windows Explorer or Yahoo. Once you locate a candidate tag, you could further drill into its properties, displaying its attributes, comments, descriptions, and cross-references to all registered schemas in which the tag (or any of its synonyms) occurs. You pull the tag name into your new schema, and repeat the process until you have all the metadata you need for your new schema.

If, for some reason, a standard tag name does not fit your needs (perhaps your company has its own internal naming standard), you could search through all the synonyms registered for the standard name and use one that fits your needs. Or, create and register your own, cross-reference it to the standard, and it’s also available to others.

Such a cross-reference would be invaluable when a schema needs to be mapped to another. To the extent their common tag usages are registered, the mapping and transformation between any two schemas — via an XSL style sheet, for example — could be automatically generated. The schema repository now functions not only as a source of templates, but also as an updatable dictionary and thesaurus. Once an XML tag name is mapped to a data element of a harmonizing standard, it does not need to be mapped again. If all of my trading partners and I have mapped our tag names to such a standard, I will not have to map to each and every purchase-order schema; the result will be “map once, use repeatedly.”

Rather than supplanting current schema repositories, these microstandards would, naturally, extend and enhance current XML repository capabilities. If widely adopted, tools and techniques operating at the tag-name level could significantly augment the resources that have already been developed and propel us forward toward the goal of developing a truly common vocabulary for e-business. Otherwise, we may find ourselves erecting a Tower of XML Babel.

William J. Lewis (datamodel@aol.com) is an associate director in the analytic business solutions practice of Cambridge Technology Partners and he has more than 20 years of experience in IT.



RESOURCES

biztalk.org
ebXML:www.ebxml.org
eCo framework:eco.commerce.net
Electronics Industries Alliance:www.eia.com
OASIS:www.oasis-open.org
ontology.org
UDEF:www.udef.com
UN/CEFACT:www.unece.org/cefactxml.org
XML/EDI Group:www.xmledi.org

 





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address