Chronic conditions can be costly but also preventable as well as predictable. We develop a model to predict in the short term (2-3 years) the onset of one or more chronic conditions. Five chronic conditions are considered: heart disease, stroke, diabetes, hypertension and cancer. Predictions are made on the basis of standard demographic/socio-economic variables, risk factors such as smoking and obesity, and the presence of the chronic conditions.

We compare two predictive models. The first model is the multivariate probit (MVP), which considers correlated outcome variables. The second model is Multiclass Support Vector Machine (MSVM), which is considered a leading predictive method in machine learning. We use Australian data from the “Social, Economic, and Environmental Factory” (SEEF) study, a follow-up to the“45 and Up” study survey, which contains two repeated observations of 60,000 individuals in NSW over age 45. We use a 10-fold cross-validation approach to estimate the performance of the MVP and MSVM methods.

We find that MSVMs and MVPs have comparable rates of false positives. However, MSVMs are much better than MVPs at detecting true positives, with true positive detection rates that are on average 30% better than those of MVPs. Since long-term predictions are made by a sequence of short-term predictions, a 30% improvement in the accuracy of short-term predictions can lead to a substantial reduction of long-term uncertainty, which from a policy point of view is extremely valuable.

It is likely that this performance can be improved, and we do not claim to have built the optimal classifier. However, given the slow adoption rate of methods such as SVMs or Deep Learning in the health care domain, we hope that this study will be a first step toward a broader use of methods that may lead to large improvement over the status quo.

Author(s):  Shima Ghassen Pour, Federico Girosi