Refining ADHD diagnosis with EEG: The impact of preprocessing and temporal
segmentation on classification accuracy
by Sandra García-Ponsoda, Alejandro Maté & Juan Trujillo (Computers in
Biology and Medicine 183, 2024, DOI 10.1016/j.compbiomed.2024.109305)

This blog is a discussion of the paper. Everything in the blog turns around a
single, clearly identified scholarly source: García-Ponsoda et al.’s 2024 article on
electroencephalography (EEG)–based diagnosis of Attention-Deficit/Hyperactivity
Disorder (ADHD). I cite the work in full above because the ideas, data and much
of the reasoning that follow originated with those authors, and correct attribution is
essential in a public-facing commentary.

ADHD is one of the most common neurodevelopmental disorders, yet its diagnosis
in clinics still leans heavily on behavioral observation and questions. The
limitations are well known subjective bias, variable thresholds for impairment and
an inability to pin down subtle neural differences that might distinguish genuine
ADHD from mimics such as anxiety or sleep loss. EEG, is cheap, portable and
tolerant of fidgeting. Many teams have therefore tried to train machine-learning
classifiers on EEG signatures of ADHD.

García-Ponsoda and his colleagues step into that methodological minefield. They
formulate two deceptively simple questions. First, how profoundly does the depth
of EEG cleaning influence downstream classification accuracy? Second, when a
recording is chopped into temporal segments, are some epochs more diagnostic
than others?

The dataset is publicly available and well-characterised: 121 school-age children,
sixty-one with clinician-diagnosed ADHD, sixty neuro-typical controls. All
recordings use a 19-channel, 10-20 montage, sampled at 128 Hz while the child
completes a simple visual enumeration task that lasts, on average, fifty seconds.
That task length matters: it is long enough that fatigue or waxing-and-waning
attention can plausibly emerge, yet short enough for children to tolerate.

The heart of the paper lies in a three-tiered pre-processing experiment. Tier 1 is
about conventional 0.5–40 Hz finite-impulse-response band-pass filter. Tier 2
layers is on Artifact Subspace Reconstruction (ASR) which is increasingly popular,
near-real-time algorithm that identifies noisy subspaces such as electrode pops or
bursts of muscle activity and reconstructs them from cleaner neighbors. Tier 3 is
about Independent Component Analysis followed by automatic ICLabel
classification, stripping out residual ocular, muscular and line-noise components.

Three models are used: Support-Vector Machines, k-Nearest Neighbors and
XGBoost, each subjected to five-fold cross-validation. Parameter grids are not
exhaustively tuned—sensible, given the combinatorial explosion caused by
cleaning tiers, segment choices, channel subsets and feature sets. Instead, the
authors rely on XGBoost’s robust and the default regularization and early-stopping
logic. They also restrict channel searches to all singletons, all pairs and all triples
out of the nineteen available channels.

The headline number—86.1 % mean accuracy (±5.7 % s.e.)—occurs for XGBoost
trained on ASR-cleaned, non-segmented recordings from the P3, P4 and C3
electrodes, using only statistically significant features.

In essence, this study persuades on two levels. It shows that careful cleaning and
time-sensitive analysis of EEG can approach clinically useful accuracy while using
only a handful of electrodes. Methodologically, it models a transparent
workflow—public data, open-source code, statistical feature filters and explainable
models—that should be emulated in neurodiagnostic research.

Reference

García-Ponsoda, S., Maté, A., & Trujillo, J. (2024). Refining ADHD diagnosis
with EEG: The impact of preprocessing and temporal segmentation on
classification accuracy. Computers in Biology and Medicine, 183, 109305.
https://doi.org/10.1016/j.compbiomed.2024.109305