Authors
Advisor(s)
Abstract(s)
This thesis presents a study on the issue of Automobile Insurance Fraud. The purpose of
this study is to increase knowledge concerning fraudulent claims in the Portuguese market,
while raising awareness to the use of Data Mining techniques towards this, and other
similar problems.
We conduct an application of data mining techniques to the problem of predicting
automobile insurance fraud, shown to be of interest to insurance companies around the
world. We present fraud definitions and conduct an overview of existing literature on the
subject. Live policy and claim data from the Portuguese insurance market in 2005 is used to
train a Logit Regression Model and a CHAID Classification and Regression Tree.
The use of Data Mining tools and techniques enabled the identification of underlying fraud
patterns, specific to the raw data used to build the models. The list of potential fraud
indicators includes variables such as the policy’s tenure, the number of policy holders, not
admitting fault in the accident or fractioning premium payments semiannually. Other
variables such as the number of days between the accident and the patient filing the claim,
the client’s age, and the geographical location of the accident were also found to be relevant
in specific sub-populations of the used dataset.
Model variables and coefficients are interpreted comparatively and key performance results
are presented, including PCC, sensitivity, specificity and AUROC. Both the Logit Model
and the CHAID C&R Tree achieve fair results in predicting automobile insurance fraud in
the used dataset.
