Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home




March 27, 2001



Untangling the Web

SOAP uses XML as a simple and elegant solution that automates B2B transactions

By Greg Barish

Most of today's Web applications are built for human consumption. Because real people interact with these applications, information must be presented in a visually appealing way. Users fill out HTML forms and then receive static or dynamic HTML output in response. For example, metacatalogs automatically query hundreds of existing online catalogs from a single user interface where users have made queries. In recent years, more and more such software agents - not people - are interacting with these Web applications. The long-term view of Web-based B2B is based on such automation. In fact, it is likely that the network transmission of such automation will eventually dwarf the traffic generated from human-based interactivity.

THE ORIGINS OF SOAP
From IBM rejection to W3C recognition

SOAP was first proposed by Microsoft as a means for heterogeneous software objects to communicate over a network. The protocol's Microsoft origins may seem surprising considering that it is not tied directly to any Microsoft technology - rather, it is a proposal for an open standard. However, the truth is that the original 1998 proposal (which involved Microsoft, UserLand, and DevelopMentor Inc.) did emphasize an approach that favored what has become BizTalk - Microsoft's SOAP strategy. It was only after the input of IBM, which initially rejected it, that the proposal began to distance itself from its original Microsoft bent, evolving into something more open. Sun also initially rejected the proposal and only recently (June 2000) changed its tune, whispering support for the version that the W3C acknowledged in May of 2000. Several other B2B companies (Ariba, CommerceOne Corp, and Lotus among them) also supported the proposal submitted to the W3C.

While a nice visual interface is an asset when it comes to enabling humans to interact with machines, it is an unnecessary obstacle when machines communicate with each other. What B2B really needs is an easy way to integrate the back-end systems of participating organizations. And we're not just talking about a solution that involves each business maintaining multiple interfaces to that data. That's the way things work today and, to a large extent, visual interfaces have often proved to be unwieldy solutions. IT managers want a way to consolidate their data and functionality in one system that can be accessed over the Web by real people or automatically by software agents.

The Simple Object Access Protocol, better known as SOAP, is aimed squarely at this data consolidation problem. Recently approved by the World Wide Web Consortium (W3C), SOAP uses XML and HTTP to define a component interoperability standard on the Web. SOAP enables Web applications to communicate with each other in a flexible, descriptive manner while enjoying the built-in network optimization and security of an HTTP-based messaging protocol. SOAP's foundations come from attempts to establish an XML-based form of RPC as well as Microsoft's own efforts to push its DCOM technology beyond Windows.

SOAP increases the utility of Web applications by defining a standard for how information should be requested by remote components and how it should be described upon delivery. The key to achieving both of these goals is the use of XML to provide names to not only the functions and parameters being requested, but to the data being returned.

Why SOAP?

As it exists today, Web-based distributed computing is not widely practical. IT managers have just two ways to go about enabling components to talk to each other over the Internet. One method is to use what HTTP provides, which means marshalling input and output as part of a POST or GET request/reply scenario. The other way is to use existing component technologies (integrating as necessary) between servers. In the latter scenario, objects communicate using a binary protocol over TCP/IP, but not as HTTP.

Let's take the HTTP-based solution first. Under this approach, components invoke functionality on other remote components by issuing POST or GET requests and processing associated HTML replies. However, this process is not general; it is inherently inflexible and, at times, can be just plain ugly. To understand why, let's consider an example.

Mix and Match

Suppose your company is trying to match sellers to buyers. You have established partnerships with several seller Web sites, each one different and each one providing access to its catalog via the Web. Now, suppose your company wants to integrate essentially all of these Web sites into one virtual catalog, so that when users query for some product, your system can match the query against those sellers that have the requested product. The problem is that the seller catalogs are huge, highly dynamic, and the sellers vary widely on how they store their data. Thus, downloading catalogs in their native format on a periodic basis is not always practical because (a) it is not always possible, (b) it may mean significant integration costs, and (c) it usually forces the need for very large, redundant databases. Since each seller distributes its catalog via the Web already, it would be far less costly if the B2B company could simply extract that data from those pages - possibly even extract it on the fly (per query).

However, there is no simple solution to this problem of extraction. For example, extraction implies that your company either develop technology that allows for the data to be extracted (or "scraped") from the seller's Web pages or that the seller provide an alternative, easy-to-parse interface to the data. Obviously, the root of the problem here is that the existing Web site data is prepared for human - not machine - consumption. Although useful data exists on Web pages, it is embedded between another type of data (HTML tags) that is used purely to facilitate browsers and provide a visual representation. However, inflexibility is another problem with querying via HTTP. If the client or server wants to communicate more complex data types (such as a list of catalog items, each of which has a list of colors and/or features), some ad hoc method for encoding those data structures must be developed.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







Techweb
Informationweek Business Technology Network
InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
space
TechWeb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
Black HatGTECEnergy CampCloud ConnectGov 2.0 ExpoGov 2.0 Summit
space
Light Reading Communications Network
Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
space
Microsoft Technology Network
MSDNTechNetTotal IT ProTotal Dev ProNET Total Dev Pro CommunitySQL Total Dev Pro Community
space