Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




May 28, 2002

The Hidden Truth

Data analysis can be a strategic weapon in your company's management and control of fraud

By Girish Keshav Palshikar

Continued from Page 1

Medical Fraud

I'll illustrate some of these techniques to handle the problem of fraud detection in a hypothetical and highly simplified medical insurance claims database. This database (as maintained by the insurance company and populated from the claim documents submitted by patients) consists of a single table and has the following format:

  1. Patient ID (SSN)
  2. Sex (M/F)
  3. Age (0 to 120)
  4. Address
  5. Claim Date
  6. Illness Category, Illness ID, and Illness Description (may be more than one illness)
  7. Illness Duration Start Date - End Date
  8. Hospital ID(s)
  9. Doctor ID(s)
  10. IDs of diagnostic tests performed
  11. Names of medicines given
  12. Other treatments (for example, physiotherapy)
  13. Diagnostic tests bills
  14. Medicine bills
  15. Other treatment bills
  16. Hospital bills
  17. Doctors' charges
  18. Misc. amount (all other costs)
  19. Net Amount.

Let's evaluate whether a new specific claim is "suspicious" in some way. If so, the claim can be processed in a different way — cancel claim payment, proceed with claim payment, recall claim, reduce payment amount, or seek clarification from hospital or patient.

For the purpose of evaluating a new claim, you can often define various criteria or indices for suspiciousness. For each criteria or index, the claim gets a score; typically, high-score values in a specific index indicate greater suspiciousness. Thus, a claim that has high scores for many criteria is more suspicious. Examples of such criteria include:

  • The net amount is too large as compared to the average amount in similar claims.
  • The cost of one or more diagnostic tests is too large as compared to the average amount in similar claims.
  • The percentage of one diagnostic test costs to the net amount is too high as compared to the average percentage in similar claims.
  • The previous two scores can be adapted for medicine costs, doctors' charges, hospital bills, and other costs.
  • The claim is a duplicate (a very similar claim by the same patient was paid in recent past).
  • The address of patient, hospital, or doctor is suspicious (missing ZIP Code, address includes P.O. Box number, errors in address components — incorrect phone number, ZIP Code, town name, or email address).

You can define many more such indices. All such indices have to be defined rigorously; the previous descriptions are merely indicative. Ideally, the fraud control system can provide a facility to dynamically define such indices outside the system so that enhancements are easily possible. Because the indices represent knowledge about the fraud detection in claims warranty data, a rule language can capture it in a knowledge base. The system can provide a facility that lists similar claims to the given claim (based on k-nearest-neighbor algorithms, for example), along with a similarity matching score. This facility would enable the end user to evaluate the given claim with respect to similar claims. From a pool of already known fraudulent claims, machine-learning algorithms can construct a classification (such as a decision tree) that can help evaluate a new claim.

As a simple example, you can check the disease (illness) ID against the duration and costs. Using the historical claims database, you can easily get a histogram of the hospital duration bins (0 to 2 days, 3 to 5 days, and so on) against the number of claims (this histogram will be for a specific illness ID, sex, and age group). You can then compare the claim duration against this histogram. If it falls in a sparsely populated bin, then it's at least a bit suspicious. Clustering of historical data can be used to automatically detect such outliers.

Several types of calculations can be performed for fraud detection, such as regression analysis and time-series analysis. In time-series analysis, the time-stamped data is analyzed for trends, seasonal patterns, and outliers. The series is first transformed, if necessary, so that the variance is constant. Additional assumptions may be needed because the observations in claims data aren't necessarily at regular time intervals. Several time series in the claims data can be analyzed using time-series analysis techniques. For example, the NO. OF CLAIMS and NET_AMOUNT (or any other component of claim amount) for any specific or all hospitals, for any specific or all illness IDs, and for any specific or all patients.

Suppose X is the time-varying quantity (NO. OF CLAIMS) in a time-series; let X(I) denote value of X at time I. A graph can be used to plot week-to-week changes (X(I+1) - X(I)) in the time-varying quantity (No. of claims). This can be used to quickly identify the outliers. Another graph that plots week-to-week percentage change is X100 * X(I+1)/X(I) in the time-varying quantity. Autocorrelation and other techniques can be used to study these time series.

The following are some variables that are important for fraud detection in the claims data. Multiple regression analysis can be performed on chosen subsets of these variables:

  • Age
  • Sex
  • Hospital ID
  • Illness ID
  • Duration of illness
  • Various cost components
  • Net amount.

Statistical analysis can also be performed for identifying outliers:

  • Test cost outliers
  • Hospital charges outliers
  • Medicines cost outliers
  • Illness duration outliers
  • Combination outliers (doctors charges and net amount).



Rate This Article

Comments:

Optional e-mail address:

Some important temporal parameters for a claim include CLAIM_DATE, illness start and end dates (duration of illness). The difference between CLAIM_DATE and ILLNESS_START_DATE, called CLAIM_DELAY, is an important independent variable. The relationship between the two independent variables NET_AMOUNT (on X-axis) and duration of illness (on Y-axis) can be shown in a scatter plot (for only those claims for a specific illness ID). You may find, for example, that most claims above $20,000 have a long duration of illness. The Pearson correlation coefficient R for these two variables can be computed easily and indicates how closely related these two variables are. Analysis of variance can be used to check if the mean duration of illness is equal for, say, all hospitals. Moreover, these comparisons can be done for claims of different NET_AMOUNT bins. If not, further tests can be performed for ensuring that there's no special behavior by specific hospitals.

All such statistical analyses need to be studied in-depth and defined for the specific tasks of fraud detection and control in the medical claims domain. A large number of predefined statistical calculations oriented for detecting suspicious data can be provided.

Fraud is an important phenomenon in today's wired commercial world. Fraud causes huge losses and damages an organization's reputation and good will. Fraud management is a complex and knowledge-intensive process involving deployment and effective use of tools based on a plethora of statistical and AI techniques.

The author wishes to thank Prof. Mathai Joseph for his support. Thanks to Dr. Manasee Palshikar for her patience, hopes, and confidence.


Girish Keshav Palshikar [girishp@pune.tcs.co.in] is a scientist at Tata Research Development and Design Centre (TRDDC) in Pune, India. TRDDC is the R&D Division of Tata Consultancy Services, India's largest software company. His areas of work include theory and applications of artificial intelligence.


RESOURCES

AI and Fraud Detection/Fraud Management: www.dinkla.net/fraud

AI techniques in fraud management: www.aaai.org/AITopics/html/fraud.html

Association of Certified Fraud Examiners: www.cfenet.com

Communications Fraud Control Association: www.cfca.org

Computer Fraud and Security Journal: www.elsevier.nl

Medicare help line for fraud: www.medicare.gov/fraudabuse/overview.asp

NASD Regulations: www.nasdr.com

National Check Fraud Centre: www.ckfraud.org

National Fraud Information Centre: www.fraud.org

National Healthcare Anti-Fraud Association: www.nhcaa.org

Online magazine for insurance fraud: www.fraudreport.com

U.S. Securities and Exchange Commission: www.sec.gov









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics