Machine Learning – Mind the Gap Blog

Substance use disorder (SUD) – also known as drug addiction – is one of the most problematic co-occurring conditions with ADHD. Children with ADHD are twice as likely to develop SUD later in life, compared to those without ADHD [1, 2]. Identifying children and adolescents who are at risk, even before they ever use substances, can be an important step in preventing SUD. But how do you know who is at risk? Artificial Intelligence (AI) technologies are promising tools to identify risk factors and predict the likelihood of an individual developing SUD later in life.

SUD and ADHD

SUD is a mental illness that involves impulsive use of substances such as alcohol, marijuana, nicotine and opioids. Substance use often starts with experimentation in adolescence, such as cigarette smoking and drinking. Studies have found that adolescents with ADHD often start these experiments at younger ages, more quickly become heavier users and develop more severe functional impairments [3]. Indeed, younger ages of first use of a substance is associated with a higher risk of abusing the substance later in life.

While SUD causes significant harm, and personal and societal burdens, it is actually a preventable and treatable condition. Early identification of at-risk youth is especially important because it would allow for more targeted early interventions. Studies have shown that school- and community-based prevention programs can be highly effective at reducing substance use. Treating and managing ADHD, both with medication and behavioral therapy, has also been found beneficial in terms of reducing substance use [4, 5]. Being able to screen and identify the high-risk adolescents before their first attempt and being able to further “discourage” and “prevent” such onsets of substance use would probably be the most cost-effective prevention program for SUD.

But how do you know who is high-risk and who is not, even before they start using drugs? One method is to use Artificial Intelligence (AI) technologies in combination with very large databases.

Machine Learning Prediction Models for SUD

Machine learning is a special subset of AI technologies, that can learn hidden patterns and attributes in a completely data-driven fashion. Machine learning algorithms learn from the iterative exposures to a large amount of data, and subsequently make predictions based on what the model has learned from these training examples.

In this recently published EU-funded study in the Journal of Child Psychiatry and Psychology [6], my colleagues and I applied machine learning models to data collected from the Swedish national registers. These registries contain family and health data of millions of people, including information on clinical diagnoses such as ADHD and SUD. For our research, we focused on children with ADHD and trained various models to predict those who would eventually have a diagnosis of SUD, and those who would not. More than 19,000 children with ADHD were used in the study. The collected information that we used to train the models included psychiatric and somatic diagnoses, family history of these disorders, socioeconomic status, and birth complications.

The machine learning algorithm produced two useful models to predict SUD in children or adolescents with ADHD. The first one makes a prediction at age 17 for future SUD diagnosis between age 18-20. This is an important period when young adults, often leaving home for the first time, are more subjective to peer influence and start their first use of substance. The second model, a longitudinal model, makes a yearly prediction at every age from age 2 to 17 for SUD diagnosis in up to 10 years in the future. We found that both models were able to make significant predictions. This means that when we tested the models on part of the dataset (that was not used to train the model), the model was more often correct to predict SUD outcome than could be expected from chance. You can interpret this as that the model had learned certain parameters in the data that predict whether a person with ADHD will develop SUD or not.

Early Detection of SUD Risk

One important discovery was that the longitudinal model was able to make significant prediction at as early as age 2 for up to 10 years into the future. The earliest age of a valid SUD diagnosis in our dataset was at age 12. Using a method called “supervised learning”, we “taught” the model to identify those children at age 2 who would be diagnosed with SUD at age 12. Such supervised training was carried out for each age for any given outlook of 1, 2, 5 or 10 years. Being able successfully predict at-risk children years before their first attempt of substance use is a really promising result. Furthermore, the yearly prediction at every age provides a monitoring system tracking the risk for SUD, either increasing or decreasing risk over years.

What have we learned from our prediction models?

It is important to stress that the predictions of these models were not perfect. The algorithms are not always correct at identifying who will develop SUD and who does not. However, they do give insight into some important predictors that are more informative than others. For example, the early detection and diagnosis of ADHD and socioeconomic status. At the population level, the early diagnosis of ADHD was associated with a lower risk of developing SUD later in life, while adverse socioeconomic status was associated with a higher risk. However, we could not identify any predictors at the individual level that contributed their increased or decreased risks. The ability to identify such specific risk for each individual would be extremely useful when targeted interventions can be most effectively applied.

Despite that the prediction accuracies were modest and not ready for real-life deployment yet, these findings clearly have many, broad implications for policy-makers, parents, teachers and clinicians. Given that more and more data will become available to train these AI models, more accurate and generalizable predictions can be made at early ages regarding the disease risk, it is imperative to develop more effective and personalized measures for prevention and risk-reduction. We will therefore continue our work to expand the prediction accuracy of our models.

Overall, we provided a proof-of-concept that machine learning AI technologies – by leveraging the large volume of data, such as those of national registers and other electronic health records – can be used to predict disease risk, such as SUD, long before the disease onset.

Dr. Yanli Zhang-James is an associate professor at SUNY Upstate Medical University Department of Psychiatry and Behavioral Sciences. She is involved in the CoCA Project.

1. Ercan, E.S., et al., Childhood attention deficit/hyperactivity disorder and alcohol dependence: a 1-year follow-up. Alcohol Alcohol, 2003. 38(4): p. 352-6.

2. Wilens, T.E., et al., Does ADHD predict substance-use disorders? A 10-year follow-up study of young adults with ADHD. J Am Acad Child Adolesc Psychiatry, 2011. 50(6): p. 543-53.

3. Kousha, M., Z. Shahrivar, and J. Alaghband-Rad, Substance use disorder and ADHD: is ADHD a particularly “specific” risk factor? J Atten Disord, 2012. 16(4): p. 325-32.

4. Molina, B.S., et al., Delinquent behavior and emerging substance use in the MTA at 36 months: prevalence, course, and treatment effects. J Am Acad Child Adolesc Psychiatry, 2007. 46(8): p. 1028-40.

5. Schoenfelder, E.N., S.V. Faraone, and S.H. Kollins, Stimulant treatment of ADHD and cigarette smoking: a meta-analysis. Pediatrics, 2014. 133(6): p. 1070-80.

6. Zhang-James, Y., et al., Machine-Learning prediction of comorbid substance use disorders in ADHD youth using Swedish registry data. J Child Psychol Psychiatry, 2020.