Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




March 08, 2001



Data in the Time of Cholera

How a 19th century physician's data warehouse helped prevent the spread of cholera

By Steven Johnston

Continued from Page 1

The "Great Stench"

In the 1850s London had several million inhabitants and all the sewage from those people ended up in cesspools and ditches and eventually got into the river Thames. In the summer of 1858, the smell from the river became so terrible that they called it the "Great Stench of London." The House of Parliament hung blankets treated with chemicals in the windows of Parliament to cut down on the smell from the river.

Of course, the real problem was that most of the inhabitants of London got their drinking water from local shallow wells and ditches that were contaminated with sewage. The "high tech" residents got city water piped in from the river Thames. At the time, there were several water companies supplying city water to London residents. However, the Southwark and Vauxhall company and the Lambeth company were two water supply companies that played an intriguing role in our current understanding of cholera. Both companies drew polluted water right out of the Thames. Then in 1852, the Lambeth company moved its water intake facility 22 miles upstream from London and unknowingly began providing some London inhabitants with uncontaminated water.

Enter the Data Mining Doctor

LIFE-SAVING ETL
Early data mining efforts

Excerpt from John Snow's classic monograph On the Mode of Communication of Cholera, Second Edition, 1854 (reprinted by The Commonwealth Fund in Snow on Cholera, 1936).

"As the Registrar-General published a list of all the deaths from cholera which occurred in London in 1853, from the commencement of the

epidemic in August to its conclusion in January 1854, I have been able to add up the number which occurred in the various sub-districts on the south side of the Thames, to which the water supply of the Southwark and Vauxhall, and the Lambeth Companies, extends. I have presented them in the table ... arranged in three groups...."

  Number of Houses Death from Cholera Death in Each 10,000 Houses
Southwark and Vauxhall Company 40,046 1,263 315
Lambeth Company 26,107 98 37
Rest of London 256,423 1,422 59

---

"The experiment, too, was on the grandest scale. No fewer than 300,000 people of both sexes, of every age and occupation, and of every rank and station, from gentlefolks down to the very poor, were divided into two groups without their choice, and, in most cases, without their knowledge; one group being supplied with water containing the sewage of London, and, amongst it, whatever might have come from the cholera patients, the other group having water quite free from such impurity."

John Snow was a physician practicing in London during this period. While tending to cholera victims, he came up with the crazy idea that cholera was spread from one victim to another through contamination of drinking water. Snow published a pamphlet in 1849 detailing this theory, but nobody paid any attention to it because it contradicted the well-established miasma theory of disease. Snow's theory was obviously wrong because of the dilution problem, according to the thinking of the time. If a group of people drink a poison that later kills them, and before they die, they excrete some of the poison, the poison will soon get diluted to a safe level as this process is repeated over and over. Snow's theory implied that somehow the cholera poison had to grow within a victim and that was a ludicrous idea at the time.

Not to be discouraged, Snow decided upon a different approach following the 1853-54 cholera epidemic. Snow took the death certificate data collected by the London Registrar-General and created a data warehouse with it. The job of the Registrar-General was to collect operational data for the city of London such as marriages, births, and deaths for the purpose of taxing people. Snow did some extraction, transformation, and loading (ETL) on the cholera deaths by tabulating the addresses of the victims and determining from where they received their water. He then compared the cholera deaths in the 1849 epidemic to those in the 1853-54 epidemic by water-supply category in a tabular report. Snow called this analysis his "Grand Experiment." See the sidebar, "Lifesaving ETL," for a brief extract from Snow's classic paper.

The result of Snow's Grand Experiment was that before 1852, your chances of getting cholera were not correlated with getting your water from either water company; but for the epidemic of 1853-54, your chances of getting cholera if your water was from the Southwark and Vauxhall company were more than eight times greater than if you got your water from the Lambeth company! See Map 1.

Later in 1854, Snow had yet another opportunity to do some data warehousing. Cholera reoccurred in the Soho district of London. About 600 people died from cholera in a 10-day period. Once again Snow took the operational death-certificate data from the Registrar-General and this time he plotted the data on a clustering diagram instead of presenting it in a tabular form. He used a stacked histogram technique plotted on a map of Soho to do the data mining. Based upon this map, Snow was able to convince the London Board of Guardians to remove the pump handle from the public pump located on Broad Street. The outbreak of cholera subsided with this operational change. It was later revealed that the Broad Street well was contaminated by an underground cesspool located at 40 Broad Street which was just three feet from the well. The Broad Street pump without a handle remains today as a tribute to Snow. (See Map 2.)

Remarkably, Snow was able to do realtime data mining while people were dying and make an operational change on the fly in 1854. Unfortunately, old ideas die slowly; it was not until the Public Health Act of 1875 that the construction of proper sewage and water supply systems was mandated by law.

Discovery Through Observation

Data warehousing is a fledgling observational science that is less than 10 years old. So I believe there are many things that you can learn from the other observational sciences. For example, the current understanding of the customer base for many corporations is largely based upon folklore and anecdotal observations, much like the ideas surrounding cholera and miasma in the 19th century. Recognition of this problem has led to CRM efforts at many corporations.

For example, the airline industry has traditionally viewed its most valued customers as those who fly the most miles. Thus frequent-flier programs have historically been based upon miles flown with the airline. Nevertheless, closer examination of customer data reveals that customer value is a much more complicated matter and needs to be based upon past and projected revenues and costs, household information, the customer's influence on others, and the market in which the customer travels. Similarly, the banking industry has traditionally viewed its customer base as a large collection of unrelated accounts. By tying these accounts to actual people and understanding what banking services these people need, the banking industry has been able to up-sell and cross-sell products to its existing customers.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics