Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home


September 14, 1999, Volume 2 - Number 13


Through the Looking Glass


If portals can’t combat information overload, they’re nothing more than data warehouse interfaces by a new name

By Erik Thomsen



Alice saw a world of fantasy and illogic as she traveled through the looking glass. It’s the same thing I see when I travel through the Internet looking at information about portals. The term portal, having been applied to everything from text search engines to online analytical processing (OLAP) clients, certainly deserves to be nominated for entry into Alice’s wonderland of buzzwords. Its meaning has been stretched at least as thin as that of “business intelligence” or “data mining.”

In order to make some sense of this chaos, I will review the various product applications being called portals and outline the core technologies they rely on, explain my own vision for what portal applications should bring us, and identify some of the key technology challenges for those enabling technologies that are critical to portals. (You can find a general architecture for portals on the publications page at DSSlab.com )

Portal Technology

Whether referring to public Web sites or to corporate intranets, people consistently use the term “portal” to refer to a single point of data or information access. On the public side, sites such as Yahoo.com, Lycos.com and Infoseek.com are generalized Web portals. In other words, whenever you are looking for something on the Internet, each of the Web portal companies would like to think that you first go to their site. In the same sense, sites such as Ebay.com, Lowestfare.com, and Amazon.com are niche, product, or topical portals. Amazon.com, for example, would like to think that whenever you look for a book, you first go to Amazon.com

Regardless of the technology underlying a site, if it can attract the majority of browsers for that topic (especially if it can accumulate useful information about the habits of those browsers), the site has the potential to be extremely profitable. This profitability, on the potential of which the market is betting heavily, is evaluated in terms of the “e-commerce” it generates (including advertisers’ willingness to pay for site exposure) and the potential for cross-selling into the site’s regular visitors. It’s no wonder that so many people are trying to develop the next big public portal!

There are several grounding technologies for Internet portals. Most of these — such as high-performance transaction processing (traditional database technology), high connection bandwidth (from T1 lines and DSL to cable modems), graphical interfaces, Java virtual machines, CGI scripting, XML/HTML interface protocols, and secure transmissions — we now consider a part of the background computing technology infrastructure.

In addition to the technology infrastructure, there exist a number of specifically enabling technologies. The most prominent among them is the text analysis-driven search engine. Most of the information on the Web is textual. When you search from an Internet portal, you generally supply one or more key terms, in response to which you receive anywhere from 0 to 1,000 documents (more on the IQ gap of search engines later). The next most prominent enabling technology is data mining. For example, when you log on to a product portal site, request a product, and are told about other similar products (in the hope that you will be motivated to purchase them as well), it’s data mining that calculates the identity of those similar products. All in all, Internet portals are a reasonably new application, dating back only about three to four years.

New Label, Same Stuff?

In contrast, the internal corporate portal seems similar to the user interface for a data warehouse or decision-support data in general, something that has been around for over a decade. On the surface, at least, the term “portal” seems more like a new word than a new thing. Doesn’t the term “corporate portal” basically mean ubiquitous thin-client access to data (whether offered to internal users or through some supply-chain effort to external suppliers or clients)? Being a bit cynical, but just a bit, couldn’t we go on to say that the reason why the term “portal” has taken off with the business community is because traditional data warehousing efforts (which were supposed to provide end users with all the information they needed) failed? And thus the grand promise of corporate portals is essentially the same as the grand promise of data warehousing. Only now, people assume the supply side has been completely addressed. In other words, data warehousing has generated lots of data; people just can’t get at it. I did, by the way, make this supply-side criticism a few years ago in OLAP Solutions (John Wiley & Sons, 1997).

Companies such as Plumtree Software Inc., Information Advantage Inc., Documentum Inc., Mineshare Inc., Epicentric Inc., and Viador Inc. are all positioning themselves in the corporate portal space. Here, depending on the company, the access may be oriented more toward numeric or text data.

Most corporate portal products offer a single interface to all or most corporate data and applications. Thus, I can access numeric data, create text reports, or perform Internet searches through a single window. Additionally, I can subscribe to “channels” of information or publish documents that are themselves distributed to the appropriate people. The overriding allure is eliminating the information overload created over the last 15 years of incessant build-up of stored information. So, now to the $64,000 question: Do portals solve the problem of information overload?

I saw one attempt to answer that question affirmatively in an analyst’s paper on business portals written for a portal provider (Business Portals: Drivers, Definitions, and Rules, Wayne Eckerson, 1999). It compares the advantage of business portals to that of shopping malls: Everything is in one place. But how useful would a mall (let’s say, a super-duper mall) be to me if it comprised 1,000 shops, 75 of which sell shoes, when I want a particular pair of new shoes? I would be overwhelmed with choices. Malls do not reduce information (product) overload. What I really want (continuing with this hard-product analogy) is a personal valet who knows all my requirements and constraints as well as my capricious likes and dislikes so that when I ask this agent to get me a new pair of shoes, I’ll get exactly the shoes I desire! It’s a big jump from mall-like “access to everything” to a valet-like retrieval of “all and only what you want.”

Portal Goals

So the bad news is that corporate portals do not now deliver on the grand promise. The good news is that they do deliver on some of it and the technology is moving in the right direction. Before getting into the technical hurdles that lie between the reality here and the promise there, I will lay out my grand goals for portals, which are the same for corporate as well as Internet portals.

The grand goal for portals is to provide an easily maintainable computing environment (such as thin, Java-based software clients) that offers:

• A single point of access (or user interface) for lots of

- (Possibly) casual users (the mass of users corporate or public) and/or

- Collaborative users (more and more work is based on virtual teams) needing

• Fast response times (one of the requirements for OLAP) to

• High-level queries (one of the holy grails of computing that we have definitely not yet achieved) requiring

• Secure transactions (or there wouldn’t be any e-commerce), posed against

• Huge amounts of

- Structured (mostly numeric) and

- Unstructured (text and image) data

• Widely distributed across heterogeneous platforms (within companies and across the world) and requiring

• Substantial computation (a hallmark of decision support) and

• Multimedia metadata integration (or the meaning of single point of access becomes trivialized) to be resolved.

Portals do not currently fulfill this goal. However, there is a good amount they do today. Portals can currently provide reasonable levels of functionality, such as:

• Low-maintenance clients

• Secure transactions

• Connectivity to heterogeneous data sources

• Multiplatform distribution

• User scalability

• Data access scalability

• Minimal text retrieval, and

• Routing on the basis of simple or manual classification.

These features belong to what I’ll call first-generation portals. In a nutshell, they have a good chunk of the infrastructure and offer access to lots of data. But they’re not hitting the grand goal yet.

Some of the main challenges we face today as we head toward the grand goals are in the areas of text analysis, DSS metadata integration, multimedia integration, federated metadata, information visualization, and componentization. We will need to overcome these challenges if we want to evolve from providing data access to truly solving the problem of information overload.

Text analysis is the enabling technology behind

• Simple, personalized, browsable information classification schemes

• Intelligent routing

• Publish and subscribe

• Intelligent retrieval

• Document clustering and

• Intelligent dialogues for disambiguation.

Clearly, text analysis is a major limiting factor for portals. Unfortunately, the technology is still at an early stage, is based primarily on keyword searches, and isn’t smart enough to figure out what texts actually mean. Lots of work is happening in this arena. I do know at least one company that has striven beyond keywords for their searches. HNC, for example, uses a context-based method for similarity searches.

DSS metadata integration is the enabling technology behind

• Connecting Web-based text data (such as market analysis data) to dimension tables in a warehouse schema

• Connecting data-mining analyses with materialized views and

• Connecting data transformations in a data visualization tool with formula metadata in a data-mining tool.

Here again, we are still at an early stage of DSS metadata integration. For example, when data moves from a data mining application into an OLAP application, the relevant derivational metadata doesn’t carry forward. The “silo” criticism leveled against some data mart efforts can also apply to OLAP, visualization, and data-mining applications.

Multimedia semantic integration is the enabling technology behind

• Creating nominal or ordinal variables from text input

• Creating numeric variables from visual input and

• Sucking useful information from text documents in a dimensional framework alongside relevant numeric info.

For example, it would nice to be able to classify managerial reports by their degree of optimism (on an ordinal scale of 1-5). Or it would be nice to create an “aggressivity” scale for text-adds that would allow a marketing department to track how aggressive their competitors were becoming. And certainly, most corporations would like to enhance their data warehouses with appropriate demographic information gleaned either from public Web sources or from subscription services. It would also be useful for retail stores to analyze the visual aspects of product display by performing visual analysis on store images. (Just think of all the non-numeric factors that may influence sales, such as lighting, aisle color or width, and product category labeling.) This technology is a little further along than the others.

Federated metadata is the enabling technology behind de-centralized metadata.

We can’t possibly have centralized metadata forever; there’s too much stuff out there. We need a way to query “the world” for information about x and have all the world’s databases listening in on a “snoopy net” and volunteering “Hey that’s me! I’ve got info on that.” Currently there’s profiling based on sample queries.

Information visualization is the enabling technology behind understanding the complex relationships that form the bases for most of our important decisions (in other words, the facts or justification behind the big decisions are rarely black and white).

Tools are rapidly improving in this arena, but we still need help interpreting all this data, numeric and textual. And tools need to do a better job of figuring out how to visualize data. It shouldn’t be such a manual process.

Componentization enables us to minimize functional redundancy. It includes such challenges as better connectivity, better distribution of derived data (RDBMS vs. OLAP vs. portal cache), and better leveraging of current corporate investments in security. For example, it doesn’t make sense to maintain separate security for a backend relational database, an OLAP database and a Web front end. Whether through Novell Directory Services or Microsoft’s upcoming Active Directory Services, or some other one-stop security mechanism, portal providers will need to be able to leverage organizations’ existing security infrastructure.

Portals denote a style of decision-support architecture rather than any particular technology. First-generation portals provide at least a “single point of access” to information. They are part of a larger trend toward information (and political?) democratization. And with information, as with most things in life, it’s possible to have too much. Although portals do not yet solve the information overload or “Goldilocks” problem, a lot of good research is currently underway, especially in the area of text analysis — a topic I will drill down on in an upcoming column.



Erik Thomsen is an author, lecturer, researcher, and consultant focusing on OLAP and decision-support applications. He is cofounder of the Cambridge, Mass.-based consultancy Dimensional Systems and author of the book OLAP Solutions (John Wiley & Sons, 1997). You can reach him via email at erik@dimsys.com.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space