|
||
|
http://www.intelligententerprise.com/010723/411infosc1_1.jhtml Information Supply ChainXML Schemas Get the Nod
By Solomon H. Simon In spite of the kick-start that B2B e-commerce provided for XML, many companies held back because of their perception that XML lacked standards. But now that XML Schemas have been given final recommendation status by the World Wide Web Consortium (W3C), that resistance can start to subside. This status is tantamount to making XML Schemas a metadata standard and a valid alternative to Document Type Definitions (DTDs). With the general acceptance and use of schemas, companies will be ready to kick their XML communications and data interchange efforts into high gear.
XML Schemas greatly simplify the use of XML in business applications because they follow XML format, enable data reuse, are compatible with extensible stylesheet language transformations, and are simpler compared to DTDs. DTDS: THE METADATA FOR XMLA team of standardized general markup language (SGML) experts, who wanted to build a general markup language that was smaller than SGML so that it would be portable across the Web, developed XML. They borrowed DTDs from the original SGML syntax to help maintain compatibility between XML and SGML, and the first XML applications and parsers were derived from existing SGML applications. DTDs are the metadata for XML and serve as the foundation of XML documents and applications. A DTD contains the rules for writing an XML Web page and for instructing a browser on how to process it. The DTD designates valid syntax, structure, and format for defining the markup elements for an XML document. It helps a parser to process an XML document, identifying important information, mandatory information, and optional information. The DTD also identifies where elements appear and how they are related. The author of an XML document uses a DTD the same way a construction worker uses a blueprint. The DTD provides a list of elements, rules, and specifications that define a content model for a category of documents. WHAT IS A SCHEMA?A schema is the XML construct used to represent the data elements, attributes, and their relationships as defined in the data model. By definition, a DTD and a schema are very similar. However, DTDs usually define simple, abstract text relationships, while schemas define more complex and concrete data and application relationships. A DTD doesn't use a hierarchical formation, while a schema uses a hierarchical structure to indicate relationships. The XML Schema standard uses the XML syntax exclusively, rather than borrowing from SGML, and it will augment, then later supplant, DTDs. An XML document follows two syntax rules. First, if a document follows the general rules of XML, then it is well formed. If the document has a DTD or schema and follows the specific rules in the DTD or schema, then it conforms to the XML specification and it is considered valid. An important characteristic of DTD-based valid XML documents is that they are also compatible with SGML and can be processed by most SGML tools. Document validity results in language compatibility. DTDs and schemas define the metadata of an XML document, represented by the XML tags. When developers create a new XML document, they don't know how to proceed, and they are like tourists in a new city without a road map. The metadata (DTD or schema) acts as a tourist information center that provides complete information on what path to follow to arrive at the desired destination. DTDS VS. SCHEMAS: WHAT'S THE DIFFERENCE?Both DTDs and schemas represent the metadata for an XML document. However, the structure of the two constructs is very different. Consider the following DTD snippet for a book: < !ELEMENT book (title, author+, ISBN, price)> < !ELEMENT author (first, last)> < !ELEMENT title (#PCDATA)> < !ELEMENT isbn (#PCDATA)> < !ELEMENT price (#PCDATA)> < !ELEMENT first (#PCDATA)> < !ELEMENT last (#PCDATA)> In this example, the basic information about a book is defined, along with rules for how elements are related. The first line indicates that a book has a title, author, ISBN, and price associated with it. The "+" indicates that an element, author in this case, must exist and may occur in multiple instances. The next line implies that an author has a first and last name. The next five lines indicate that title, ISBN, price, first, and last are character data. (PCDATA stands for parsed character data.) Now let's look at the equivalent schema, shown in Listing 1, for a book. With some minor variations from the released standard, Microsoft Internet Explorer 5.0 can interpret XML and XML Schema files. (In this example, I've taken a few liberties for the sake of clarity.) A schema is longer than the equivalent DTD, but it may also contain more information. The initial < Schema ...> tag contains ellipsis to indicate other "header" information, such as version numbers, path names for other information, and additional schemas that could be included. Note that all tags have a corresponding close tag. (I included the spaces and extra lines for readability only; they are not part of the schema standard.) The first thing to notice is the hierarchical or taxonomic format. XML and XML Schemas follow a hierarchical, tree-structure format. The elements (title, author, ISBN, and price) are all leaf nodes that belong to the book element. The elements, first and last, are leaf nodes that belong to the author element. For simplicity, I have defined all elements to be string types, but there are other types, such as decimal, and so on. The minOccurs attribute in the closing part of the author element states that there must be at least one author. The maxOccurs attribute states that there can be an infinite number of authors. THE BENEFITS OF SCHEMASXML provides an application independent way of sharing data. Independent groups of people can agree to use a common schema for interchanging data. Your application can use a standard schema or DTD to verify that data that you receive from the outside is valid. You can also use a schema or DTD to verify your own data. XML Schemas, like DTDs, can be used to specify the metadata of a particular class of documents. Unlike DTDs, however, XML Schemas use XML syntax. This is convenient because you don't have to learn a completely new syntax just to describe your grammar - although you do need to learn how to declare elements and attributes using XML Schemas. There are two other advantages to XML Schemas that do not come out in the example. First, schemas give the developer richer control over the data type declarations than is possible in DTDs. Second, schemas allow greater reuse of metadata by permitting the developer to include more external schemas than allowable with DTDs. The main reason to use schemas is to improve compatibility and consistency within an XML document or application. In isolation, it doesn't matter significantly if an XML document uses a DTD or a schema. However, the moment that a developer or user wants to modify the document, share the document, or combine multiple documents, the differences become more apparent. Because schemas follow the XML format, it is easier to design tools, such as extensible stylesheet language transformation scripts, that will modify them. A real concern about XML documents is that developers will use different vocabularies, which will minimize interoperability. To leverage the capabilities of XML, developers must be able to bend the syntax rules of a specific document without breaking the vocabulary. Although there are still obstacles to overcome, such as vocabulary, the W3C's recommendation of XML Schemas is a major step toward better data interchange between companies and, eventually, more sophisticated, widely used B2B e-commerce. SOLOMON H. SIMON [hank.simon@lmco.com] has more than 20 years of experience in IT. He has a Ph.D. in AI, has worked as a chemist, and is currently consulting and writing about XML, WAP, and Bluetooth Web technologies. He is also the author of XML, part of the McGraw-Hill Emerging Business Technology Series. RESOURCES Roger L. Costello's tutorial on XML Schemas Microsoft has a collection of Web sites related to a top-down description of XML Schemas, from what they are to how to build them: msdn.microsoft.com/xml/xmlguide/schema-intro.asp msdn.microsoft.com/xml/tutorial/author_schema.asp Oracle has a technical site at: technet.oracle.com/tech/xml/schema_java/ Related Article on IntelligentEnterprise.com: "XML Will Make It Easier," April 16, 2001 |