William J. Lewis
If I were a bettin man, as the saying goes, Id probably not bet in favor of much
across-the-board adoption of standard XML schemas. I say this despite the fact that several industries
are putting a lot of collective effort into developing these schemas, such as those in repositories at
Biztalk.org and Xml.org. If XML is to fulfill hopes that itll be the data interchange lingua
franca, we do need standards, but schema repositories just dont provide adequate support for tag
names.
What I would bet on is the continuing use of these repository-based schemas in the typical search,
download, and modify scenario. Of course, then you have to ask how standard the product
of such a search, download, and modify methodology ends up being. Typically, developers take
what is essentially a monolith, chop away what they dont want or need, and add what they require.
Anyone who has done any coding in a real-world development environment will recognize this as the
time-honored technique of code (as opposed to component or object) reuse.
Software developers have discovered that the most reusable components tend to be those with encapsulated
functionality just big enough in scope to be highly utilitarian, and no bigger. These components
take on an elemental nature, acting like vocabulary words. Likewise, extending a more fine-grained
approach to developing truly standard, reusable data definitions for the digital economy might lead us
toward smaller chunks of XML metadata. These smaller chunks microstandards if you will
could be highly cohesive groups of a few data elements, or, even more likely, atomic data elements
themselves, which, as well see, have the potential to richly supplement todays shared
collections of template schemas.
As the pace of activity around the Web-based XML schema repositories accelerates, the number of
registered schemas increases dramatically. As an example, a quick count revealed that among the more than
300 schemas at one of the major XML repositories, at least eight of these describe purchase order
documents. As a developer, which should I choose? Regardless of my choice, if my company deals with
multiple trading partners and what company doesnt which schemas will my partners
choose? And across these multiple schemas, how many different tag names for customer
number, ship-to address, or purchase order number are there likely to be?
FIGURE 1 Example UDEF classification hierarchy.
All of this may begin to sound familiar to those with a data administration background. The problem
has been labeled semantic dispersion, the dilution of shared meaning due to proliferation of
synonyms and homonyms. What is needed is a direction toward semantic convergence, clarity of
shared meaning with a minimum number (approaching zero) of synonyms and homonyms. Are there frameworks or
tools available today that support semantic convergence, and by doing so, help XML developers and their
companies work toward a truly common vocabulary?
One such framework is the Universal Data Element Framework, or UDEF. Ron Schuldt, its originator, calls
it the standard to harmonize other standards. He describes the
relationship between UDEF and XML in these terms:
The fundamental thing that XML lacks is what UDEF has to offer. Specifically, XML lacks a rigorous
rules-based approach for standardizing tag names across multiple domains of discourse. UDEF applies a
rigorous rules-based approach for naming data elements (tags) from which one is able to derive a
taxonomy-based intelligent ID to the name. The ID is the key that allows an unlimited number of aliases.
Any XML system that is based on a normalized and widely adopted data dictionary (such as X.12, EDIFACT,
STEP, and MIL-STD-2549) could continue using the tag names that are widely used or adopted, while using
the UDEF ID to span the different standards. UDEF is a tool for identifying and resolving
semantic equivalence, multiple names meaning the same thing. It therefore resolves synonyms
and prevents homonyms. Its been used as the basis of a product configuration data dictionary for
the federal government (MIL-STD-2549), and is currently under consideration by the Electronics Industries
Alliance, as well as the XML/EDI Group.
Similar efforts to provide for enhanced XML semantics have been undertaken by Ontology.org, in support of
both the CommerceNet eCo project and the ebXML initiative of OASIS and UN/CEFACT. Ontology.org, composed
largely of representatives from academic organizations, believes that it is becoming all too
obvious that [XMLs] utility is severely limited unless people can agree on the semantics of the
terms being used in the metadata. They define ontology to include a vocabulary of
basic terms and a precise specification of what those terms mean. CommerceNet is a large nonprofit
group comprising commercial organizations with interests in e-commerce. Their eCo framework includes
seven layers, the most detailed of which, the Data Element Layer, specifies the registration of
schema-independent, fundamental building blocks. EbXML is a newer initiative, and its detailed registry
specifications are not yet announced.
How might these multiple, interrelated efforts (and you cant tell the players without a scorecard)
affect the day-to-day efforts of an XML schema developer? Suppose you, the XML developer, need to develop
(or discover) a schema for a new document, transaction, message, or record. To complete this assignment,
you must evaluate and, if necessary, discard each tag name the format requires. To make your work as
reusable as possible, you want to use existing standards to the greatest extent you can. Where available
standards fall short, you may want to submit your creation to be considered as a candidate standard.
Given the current state of affairs, when youve completed your assignment, you probably have not
used an existing standard schema in its entirety. What youve developed is, for all intents and
purposes, an entirely new schema. Even if you did use tags from a standard schema, there is
no referenceable correlation between these instances in your schema and the standard. If you created new
tag names, there is no way for another developer to access these without addressing your entire schema
and the next developer is back to square one. If, however, the tag names themselves are reusable
components, you could base your new schema on the standard tags available and add new tags that were not.
Then you could register your schema, the new tags, and your schema usage of all tags new and old.
Subsequent developers would have the benefit of not only your schema, but also all this additional
detailed, cross-referenced information.
Sound laborious? Not necessarily. Imagine, if you will, a client-side schema editor on your workstation
linked to an online, globally accessible, Web-based XML repository that includes reusable, comprehensive
meta-metadata on tag names and their cross-references. Because this environment is based on a disciplined
classification of meaning such as the UDEF, the editor and the repository work together to guide you to
the tag names you need, simply by working through a tree or nested structure such as those presented by
Windows Explorer or Yahoo. Once you locate a candidate tag, you could further drill into its properties,
displaying its attributes, comments, descriptions, and cross-references to all registered schemas in
which the tag (or any of its synonyms) occurs. You pull the tag name into your new schema, and repeat the
process until you have all the metadata you need for your new schema.
If, for some reason, a standard tag name does not fit your needs (perhaps your company has its own
internal naming standard), you could search through all the synonyms registered for the standard name and
use one that fits your needs. Or, create and register your own, cross-reference it to the standard, and
its also available to others.
Such a cross-reference would be invaluable when a schema needs to be mapped to another. To the extent
their common tag usages are registered, the mapping and transformation between any two schemas via
an XSL style sheet, for example could be automatically generated. The schema repository now
functions not only as a source of templates, but also as an updatable dictionary and thesaurus. Once an
XML tag name is mapped to a data element of a harmonizing standard, it does not need to be mapped again.
If all of my trading partners and I have mapped our tag names to such a standard, I will not have to map
to each and every purchase-order schema; the result will be map once, use
repeatedly. Rather than supplanting current schema repositories, these microstandards would,
naturally, extend and enhance current XML repository capabilities. If widely adopted, tools and
techniques operating at the tag-name level could significantly augment the resources that have already
been developed and propel us forward toward the goal of developing a truly common vocabulary for
e-business. Otherwise, we may find ourselves erecting a Tower of XML Babel.
William J. Lewis (datamodel@aol.com) is an associate director in the
analytic business solutions practice of Cambridge Technology Partners and he has more than 20 years of
experience in IT.
|

|
Subscribe to the newsletter
| 
|