|
Portal Technology
Whether referring to public Web sites or to corporate intranets, people consistently use the term portal to refer to a single point of data or information access. On the public side, sites such as Yahoo.com, Lycos.com and Infoseek.com are generalized Web portals. In other words, whenever you are looking for something on the Internet, each of the Web portal companies would like to think that you first go to their site. In the same sense, sites such as Ebay.com, Lowestfare.com, and Amazon.com are niche, product, or topical portals. Amazon.com, for example, would like to think that whenever you look for a book, you first go to Amazon.com
Regardless of the technology underlying a site, if it can attract the majority of browsers for that topic (especially if it can accumulate useful information about the habits of those browsers), the site has the potential to be extremely profitable. This profitability, on the potential of which the market is betting heavily, is evaluated in terms of the e-commerce it generates (including advertisers willingness to pay for site exposure) and the potential for cross-selling into the sites regular visitors. Its no wonder that so many people are trying to develop the next big public portal!
There are several grounding technologies for Internet portals. Most of these such as high-performance transaction processing (traditional database technology), high connection bandwidth (from T1 lines and DSL to cable modems), graphical interfaces, Java virtual machines, CGI scripting, XML/HTML interface protocols, and secure transmissions we now consider a part of the background computing technology infrastructure.
In addition to the technology infrastructure, there exist a number of specifically enabling technologies. The most prominent among them is the text analysis-driven search engine. Most of the information on the Web is textual. When you search from an Internet portal, you generally supply one or more key terms, in response to which you receive anywhere from 0 to 1,000 documents (more on the IQ gap of search engines later). The next most prominent enabling technology is data mining. For example, when you log on to a product portal site, request a product, and are told about other similar products (in the hope that you will be motivated to purchase them as well), its data mining that calculates the identity of those similar products. All in all, Internet portals are a reasonably new application, dating back only about three to four years.
New Label, Same Stuff?
In contrast, the internal corporate portal seems similar to the user interface for a data warehouse or decision-support data in general, something that has been around for over a decade. On the surface, at least, the term portal seems more like a new word than a new thing. Doesnt the term corporate portal basically mean ubiquitous thin-client access to data (whether offered to internal users or through some supply-chain effort to external suppliers or clients)? Being a bit cynical, but just a bit, couldnt we go on to say that the reason why the term portal has taken off with the business community is because traditional data warehousing efforts (which were supposed to provide end users with all the information they needed) failed? And thus the grand promise of corporate portals is essentially the same as the grand promise of data warehousing. Only now, people assume the supply side has been completely addressed. In other words, data warehousing has generated lots of data; people just cant get at it. I did, by the way, make this supply-side criticism a few years ago in OLAP Solutions (John Wiley & Sons, 1997).
Companies such as Plumtree Software Inc., Information Advantage Inc., Documentum Inc., Mineshare Inc., Epicentric Inc., and Viador Inc. are all positioning themselves in the corporate portal space. Here, depending on the company, the access may be oriented more toward numeric or text data.
Most corporate portal products offer a single interface to all or most corporate data and applications. Thus, I can access numeric data, create text reports, or perform Internet searches through a single window. Additionally, I can subscribe to channels of information or publish documents that are themselves distributed to the appropriate people. The overriding allure is eliminating the information overload created over the last 15 years of incessant build-up of stored information. So, now to the $64,000 question: Do portals solve the problem of information overload?
I saw one attempt to answer that question affirmatively in an analysts paper on business portals written for a portal provider (Business Portals: Drivers, Definitions, and Rules, Wayne Eckerson, 1999). It compares the advantage of business portals to that of shopping malls: Everything is in one place. But how useful would a mall (lets say, a super-duper mall) be to me if it comprised 1,000 shops, 75 of which sell shoes, when I want a particular pair of new shoes? I would be overwhelmed with choices. Malls do not reduce information (product) overload. What I really want (continuing with this hard-product analogy) is a personal valet who knows all my requirements and constraints as well as my capricious likes and dislikes so that when I ask this agent to get me a new pair of shoes, Ill get exactly the shoes I desire! Its a big jump from mall-like access to everything to a valet-like retrieval of all and only what you want.
Portal Goals
So the bad news is that corporate portals do not now deliver on the grand promise. The good news is that they do deliver on some of it and the technology is moving in the right direction. Before getting into the technical hurdles that lie between the reality here and the promise there, I will lay out my grand goals for portals, which are the same for corporate as well as Internet portals.
The grand goal for portals is to provide an easily maintainable computing environment (such as thin, Java-based software clients) that offers:
A single point of access (or user interface) for lots of
- (Possibly) casual users (the mass of users corporate or public) and/or
- Collaborative users (more and more work is based on virtual teams) needing
Fast response times (one of the requirements for OLAP) to
High-level queries (one of the holy grails of computing that we have definitely not yet achieved) requiring
Secure transactions (or there wouldnt be any e-commerce), posed against
Huge amounts of
- Structured (mostly numeric) and
- Unstructured (text and image) data
Widely distributed across heterogeneous platforms (within companies and across the world) and requiring
Substantial computation (a hallmark of decision support) and
Multimedia metadata integration (or the meaning of single point of access becomes trivialized) to be resolved.
Portals do not currently fulfill this goal. However, there is a good amount they do today. Portals can currently provide reasonable levels of functionality, such as:
Low-maintenance clients
Secure transactions
Connectivity to heterogeneous data sources
Multiplatform distribution
User scalability
Data access scalability
Minimal text retrieval, and
Routing on the basis of simple or manual classification.
These features belong to what Ill call first-generation portals. In a nutshell, they have a good chunk of the infrastructure and offer access to lots of data. But theyre not hitting the grand goal yet.
Some of the main challenges we face today as we head toward the grand goals are in the areas of text analysis, DSS metadata integration, multimedia integration, federated metadata, information visualization, and componentization. We will need to overcome these challenges if we want to evolve from providing data access to truly solving the problem of information overload.
Text analysis is the enabling technology behind
Simple, personalized, browsable information classification schemes
Intelligent routing
Publish and subscribe
Intelligent retrieval
Document clustering and
Intelligent dialogues for disambiguation.
Clearly, text analysis is a major limiting factor for portals. Unfortunately, the technology is still at an early stage, is based primarily on keyword searches, and isnt smart enough to figure out what texts actually mean. Lots of work is happening in this arena. I do know at least one company that has striven beyond keywords for their searches. HNC, for example, uses a context-based method for similarity searches.
DSS metadata integration is the enabling technology behind
Connecting Web-based text data (such as market analysis data) to dimension tables in a warehouse schema
Connecting data-mining analyses with materialized views and
Connecting data transformations in a data visualization tool with formula metadata in a data-mining tool.
Here again, we are still at an early stage of DSS metadata integration. For example, when data moves from a data mining application into an OLAP application, the relevant derivational metadata doesnt carry forward. The silo criticism leveled against some data mart efforts can also apply to OLAP, visualization, and data-mining applications.
Multimedia semantic integration is the enabling technology behind
Creating nominal or ordinal variables from text input
Creating numeric variables from visual input and
Sucking useful information from text documents in a dimensional framework alongside relevant numeric info.
For example, it would nice to be able to classify managerial reports by their degree of optimism (on an ordinal scale of 1-5). Or it would be nice to create an aggressivity scale for text-adds that would allow a marketing department to track how aggressive their competitors were becoming. And certainly, most corporations would like to enhance their data warehouses with appropriate demographic information gleaned either from public Web sources or from subscription services. It would also be useful for retail stores to analyze the visual aspects of product display by performing visual analysis on store images. (Just think of all the non-numeric factors that may influence sales, such as lighting, aisle color or width, and product category labeling.) This technology is a little further along than the others.
Federated metadata is the enabling technology behind de-centralized metadata.
We cant possibly have centralized metadata forever; theres too much stuff out there. We need a way to query the world for information about x and have all the worlds databases listening in on a snoopy net and volunteering Hey thats me! Ive got info on that. Currently theres profiling based on sample queries.
Information visualization is the enabling technology behind understanding the complex relationships that form the bases for most of our important decisions (in other words, the facts or justification behind the big decisions are rarely black and white).
Tools are rapidly improving in this arena, but we still need help interpreting all this data, numeric and textual. And tools need to do a better job of figuring out how to visualize data. It shouldnt be such a manual process.
Componentization enables us to minimize functional redundancy. It includes such challenges as better connectivity, better distribution of derived data (RDBMS vs. OLAP vs. portal cache), and better leveraging of current corporate investments in security. For example, it doesnt make sense to maintain separate security for a backend relational database, an OLAP database and a Web front end. Whether through Novell Directory Services or Microsofts upcoming Active Directory Services, or some other one-stop security mechanism, portal providers will need to be able to leverage organizations existing security infrastructure.
Portals denote a style of decision-support architecture rather than any particular technology. First-generation portals provide at least a single point of access to information. They are part of a larger trend toward information (and political?) democratization. And with information, as with most things in life, its possible to have too much. Although portals do not yet solve the information overload or Goldilocks problem, a lot of good research is currently underway, especially in the area of text analysis a topic I will drill down on in an upcoming column.
Erik Thomsen is an author, lecturer, researcher, and consultant focusing on OLAP and decision-support applications. He is cofounder of the Cambridge, Mass.-based consultancy Dimensional Systems and author of the book OLAP Solutions (John Wiley & Sons, 1997). You can reach him via email at erik@dimsys.com.
|
|
|
| |||||||||||||||||||||||||||||||




















