Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




May 31, 2003

Data as it Happens

Data in real time, all the time, is what many enterprises want. To make it happen, IT needs to get the big picture — and not burn out on one-off solutions for single applications

by Mark Madsen

Continued from Page 2

Directories can become confusing when they deal with events rather than just data. An event is an interpretation of data, not just the data from a system. Where you choose to do the interpretation is a major decision. If the event is defined by the sending application, then the directory can be simple. If the event is defined at an integration hub, then the directory must also know how to define these events.

Intermediate data storage. The nature of application and data interaction requirements will determine your intermediate data storage strategy. You might require an explicit or implicit repository. If explicit, then you need to worry about designing the repository. For example, an application that interacts with a pub-sub approach must implicitly store data until subscribers retrieve it, just as a message queue that is storing messages until they're consumed is essentially a repository for real-time data.

If you want the infrastructure to define shared data elements, then it may be necessary to store interim or final values resulting from data transformations. How and where you decide to implement data transformation will influence your strategy for storing these values.

Data transformation. Just getting the raw data from one system to another isn't always sufficient. You need to manage the transformations if data must be normalized, put into a common format, or combined with other information. Transformation might be as simple as programming the process into the extract that feeds data into the infrastructure: or, it could be as complex as an extract, transform, load product sitting at a hub and getting fed data from the spokes, if you've chosen a hub-and-spoke model.

Metadata. If you want to do more than maintain a simple directory of available information, metadata becomes important. Similarly, if you want centrally managed transformations and the ability to trace transformations back to the data sources, you will require a metadata component.

Monitoring and management. Often an overlooked element of the architecture, monitoring and management become critical if you want the ability to see where the data came from, where it's going, and in the event that it didn't get there, what happened. Management also includes handling crash recovery, fail-over, and related tasks. Monitoring and management often forms one of the of the most difficult infrastructure components to implement — and is often the one most dependent on products you have selected. We're beginning to see tools for automating and self-healing the management of real-time data movement, but solid products are a ways off.

Security. This component can be included with management but it's best addressed separately. The primary security concerns in real-time integration are an application's access to the data network, access to specific data, and authorization. The key difference is that we're focused on application communications, rather than a user-centric view of security.

The data movement infrastructure can handle data access controls while transporting the data, but no control exists once it passes to an application. The ideal approach is to have the infrastructure transmit the data securely, authenticating and authorizing its endpoints for data access. The endpoints can then deal with data security according to their individual rules. If the infrastructure is expected to interface with security models built into different applications, you will create unnecessary complications.

Technology Choices

Once you know the architectural model, the components of the architecture, and what services you need to provide, you're ready to make your technology choices. The marketplace offers many tools and products; the choice hinges entirely on how suitable a given technology is to the organization's specific needs.

Suitability can't be measured solely on how well a technology fits a single component or even multiple components. You must consider the interaction models. For example, a request-response model leads you to consider RPCs, Web services, and direct messaging tools.

Integration requirements for connecting the various core components of the architecture must also drive technology selection. It does no good to buy a great messaging product, only to get bogged down integrating it with your management framework, transformation services, and custom directory.

You must also consider your developers. Synchronous models are easier for most developers to grasp. Choosing an asynchronous message queuing product because it offers the most flexibility also has the potential of forcing developers to change from a familiar programming model. Making asynchronicity work could require extra training and a conceptual change for the developers, which they might not like.

The number of technology components required for a real-time data integration infrastructure is relatively small. Keep in mind that the best product designs are simple. Avoid the ornamentation that comes with many products and focus on core components. Does it matter that an enterprise application integration software vendor can provide real-time data integration as well as supply data warehouse tools? The extra tools expand the scope — and can derail an otherwise sound effort.

Disciplined Approach

What general rules should you follow? First, leave to your vendors as much of the management, monitoring, and security components as possible. Next, keep the interface APIs simple; focus on the underlying interaction protocol. Finally, keep the services you provide to applications to a minimum — and keep them simple. Applications and data will change: That means that your infrastructure should be a source of stability and reuse.



Rate This Article

Comments:

Optional e-mail address:

Every good-sized, experienced organization will have a mix of off-the-shelf applications, customized software, and in-house systems that the integration infrastructure must address. With a disciplined approach, you will ensure a better future than the current mess of ad hoc integration that drags down too many strategic IT aspirations.


Mark Madsen [mmadsen@clickstreamdatawarehousing.com] is an award-winning IT architect who has been working in the data warehouse field for 10 years. He is one of the principal authors of Clickstream Data Warehousing (Wiley, 2002). For more information, visit the book Web site at www.clickstreamdatawarehousing.com.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space