Extracting explanatory models from surveys

Extracting explanatory models from surveys

Data in health can come in very different forms, ranging from images (e.g., to detect anomalies such as cancer) to sound (e.g., to detect mood when monitoring well-being). In this project, we performed data mining on structured data, most commonly in the form of health surveys. While such surveys have typically been analyzed via regression techniques, the use of data mining techniques is well-suited to capture the non-linearity often found in health behaviors. We have used data mining techniques to analyze surveys of drinking behavior and found that short questionnaires on motives and demographic information were sufficient to correctly identify in most respondents whether they were at risk of engaging in binge drinking (Crutzen & Giabbanelli 2014). The technique was later applied to investigate whether surveys completed by guardians/parents could be used to infer drinking behavior in adolescents, and we found that only self-reported rules had a strong informative value (Crutzen et al., 2015). Since both studies revealed that participants may be over-burdened in completing questionnaires of which only a subset was informative, we extended the approach to the population level on nutrition surveys and found even more strikingly that only 3% of recorded foods were sufficient to identify whether a respondent met key dietary recommendations (Giabbanelli & Adams 2016). While the main innovation of this research in data mining has been to bring these techniques to new fields, we have also developed new techniques in the recent years to face the complexity of the datasets and of the health authorities generating them (Belyi et al., 2016).

Key references:

  • Crutzen, R., Giabbanelli, P.J. (2014) Using classifiers to identify binge drinkers based on drinking motives. Substance use and misuse, 49(1-2), 110-115.
  • Crutzen, R., Giabbanelli, P.J., Jander, A., Mercken, L., de Vries, H. (2015) Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: building classifiers on adolescent-parent paired data. BMC Public Health, 15(1), 747.
  • Giabbanelli, P.J., Adams, J. (2016) Identifying small groups of foods that can predict achievement of key dietary recommendations: data mining of the UK National Diet and Nutrition Survey, 2008-2012. Public Health Nutrition 19(9), 1543-1551.
  • Belyi, E., Giabbanelli, P.J., Patel, I., Balabhadrapathruni, N.H., Abdallah, A.B., Hameed, W., Mago, V.K. (2016) Combining association rule mining and network analysis for pharmacy surveillance. Journal of Supercomputing, 72(5), 2014-2034.

Key collaborators:

  • Dr. Jean Adams, University of Cambridge, United Kingdom
  • Dr. Rik Crutzen, University of Maastricht, The Netherlands
  • Dr. Vijay Mago, Lakehead University, Canada