Introduction
Alzheimer’s disease (AD) is a challenging neurodegenerative disorder that affects millions of people worldwide. Many people delay seeking medical help because they believe memory loss is a natural part of growing old. This leads to late diagnosis and fewer treatment options. Traditional diagnostic tools like PET scans, cerebrospinal fluid tests, MRI are invasive, costly, and not easily accessible .
As part of my ongoing research efforts, I deployed an experimental prototype that uses EEG data set from OpenNeuro, combined with machine learning to explore early detection of Alzheimer’s. Although this work is experimental and will not be used in the final research publication, it has deepened my skills in signal processing, feature extraction, and model development.
Why Alzheimer’s Detection Matters
Early detection of Alzheimer’s can lead to timely intervention, which may slow the progression of the disease and improve the quality of life for patients as this disease is non-curable. Studies have shown that increased theta power, decreased alpha power, and disrupted gamma coherence are often associated with Alzheimer’s. By applying deep learning to these spectral features, we aim to create a tool that could eventually assist clinicians in making early and accurate diagnoses.
Theoretical Background: PSD, DSP, and EEG Signals
A core part of this project is the extraction of power spectral density (PSD) features from EEG signals. PSD analysis reveals how the power of a signal is distributed across different frequencies. Using Welch’s method— an approach that divides the signal into overlapping segments, computes the Fast Fourier Transform (FFT) on each, and averages the results— we obtain a reliable estimate of the PSD.
This process is a fundamental aspect of digital signal processing (DSP) and helps transform raw EEG data into a structured frequency-domain representation that highlights biomarkers related to Alzheimer’s.
The Preprocessing Pipeline
Before training the model, raw EEG recordings must be transformed into meaningful features. The dataset used here is from OpenNeuro, which is already extensively preprocessed, providing us with a clean dataset. Here’s a breakdown of the preprocessing steps implemented:
1. Data Loading and Label Mapping
We begin by loading EEG data using MNE-Python and reading participant metadata from a TSV file. The metadata maps diagnostic groups—‘A’ for Alzheimer’s, ‘F’ for Frontotemporal Dementia, and ‘C’ for healthy controls—to numeric labels.
import pandas as pd
metadata = pd.read_csv('Dataset/participants.tsv', sep='\t')
group_mapping = {'A': 0, 'F': 1, 'C': 2} # Map diagnostic groups to integers
metadata['label'] = metadata['Group'].map(group_mapping)
subject_labels = dict(zip(metadata['participant_id'], metadata['label']))
This mapping is essential because it links each subject’s EEG data with their clinical result thus helping us in supervised learning.
2. EEG Signal Processing
This EEG data although cleaned, still has several unwanted frequencies. We only need certain frequencies for our analysis, so we apply an FIR filter (0.5–45 Hz) to remove unwanted frequencies (e.g., power line noise).
raw.filter(0.5, 45, fir_design='firwin')
Then we segment the continuous data into 2-second epochs with a 1-second overlap. This step captures transient neural patterns relevant to Alzheimer’s.
epochs = mne.make_fixed_length_epochs(raw, duration=2.0, overlap=1, preload=True)
3. PSD Calculation and Feature Extraction
For each epoch, we use Welch’s method to compute the PSD, and then extract relative band power (RBP) features for the standard EEG frequency bands: delta, theta, alpha, beta, and gamma. This step involves averaging the power within each frequency range and normalizing by the total power, resulting in a 4D tensor (epochs, channels, bands, 1) that is suitable as input for a deep learning model.
psd = epochs.compute_psd(method="welch", fmin=0.5, fmax=45)
psds, freqs = psd.get_data(return_freqs=True)
freq_bands = {
"delta": (0.5, 4),
"theta": (4, 8),
"alpha": (8, 13),
"beta": (13, 25),
"gamma": (25, 45),
}
band_power = {}
for band, (fmin, fmax) in freq_bands.items():
idx = np.logical_and(freqs >= fmin, freqs <= fmax)
band_power[band] = psds[:, :, idx].mean(axis=-1)
bp_abs = np.stack(list(band_power.values()), axis=-1)
total_power = bp_abs.sum(axis=-1, keepdims=True)
rbp_relative = bp_abs / total_power
features = rbp_relative.reshape(rbp_relative.shape[0], rbp_relative.shape[1], rbp_relative.shape[2], 1)
4. Label Vector Construction and Data Standardization
Finally, we associate each epoch with its corresponding diagnostic label using the metadata mapping, and concatenate all subject features to form the final input matrix X
. We also split our data into training and test sets. To improve training stability, we standardize X
using StandardScaler, but this requires data to be in 2D shape, so we reshape our data, apply the functions, and then reshape it back to the original.
5. Final Data Format
After all these steps, If we print our input final matrix that we will feed into our model i.e. 'X' .
print("X shape:", X.shape)
we get output :
X shape: (69706, 19, 5, 1)
The given implies :
- 69706 Epochs: This is the total number of epochs (or samples) extracted from all subjects. Each epoch represents a 2-second window of EEG data transformed into a feature map.
- 19 Channels: Each epoch's feature map has 19 rows, corresponding to 19 EEG channels.
- 5 Frequency Bands: The 5 columns in each feature map represent no. of frequency bands: delta, theta, alpha, beta, and gamma.
- 1 Channel (Grayscale Image): The final dimension (1) indicates that data is a single channel. This is analogous to a grayscale image. Here each pixel value corresponds to the normalized relative band power of a particular EEG channel in a specific frequency band.
Conclusion
This blog post has covered the theoretical background of power spectral density (PSD) and digital signal processing (DSP) as they relate to EEG signals, explained why EEG is a promising tool for Alzheimer’s detection, and detailed the preprocessing steps that transform raw EEG data into meaningful features for deep learning. Although the model is still experimental, this pipeline lays a strong foundation for my future improvements and learnings.
In Part 2, we will dive into the details of the model architecture and training strategies, discuss how machine learning components work to learn from these spectral features and ultimately classify EEG recordings.
For a detailed look at the code and further updates, please visit my GitHub repository: EEG-ML-Experiment.
vivekvohra
/
EEG-ML-Experiment
Automated EEG-Based Alzheimer’s Detection System
EEG-ML-Experiment
Welcome to the EEG-ML-Experiment repository! This repository is dedicated to exploring various experimental models for processing EEG data using deep learning techniques. The overall goal is to develop and test different approaches for tasks like Alzheimer’s detection using EEG signals. Although these projects are experimental, they serve as an important learning tool and a foundation for future research and development.
Overview
This repository contains multiple experimental models, each implemented in its own subdirectory along with a dedicated README file. The main focus is on leveraging EEG data—specifically, features extracted from power spectral density (PSD) and relative band power—for diagnostic purposes. This work is part of my ongoing research efforts, and while the models are still in development and experimental in nature, they represent a significant learning experience in applying machine learning to biomedical signals.
Repository Structure
The repository is organized as follows:
EEG-ML-Experiment/
│
├── README.md # This file,
…
Top comments (1)
270+ Machine Learning Projects. You can get these projects document here: youtube.com/shorts/g2V1aFuRiPg