Forem

Cover image for Part 1: Detecting Alzheimer’s with EEG and Deep Learning – Theory, Motivation, and Preprocessing
vivekvohra
vivekvohra

Posted on

Part 1: Detecting Alzheimer’s with EEG and Deep Learning – Theory, Motivation, and Preprocessing

Introduction

Alzheimer’s disease (AD) is a challenging neurodegenerative disorder that affects millions of people worldwide. Many people delay seeking medical help because they believe memory loss is a natural part of growing old. This leads to late diagnosis and fewer treatment options. Traditional diagnostic tools like PET scans, cerebrospinal fluid tests, MRI are invasive, costly, and not easily accessible .

As part of my ongoing research efforts, I deployed an experimental prototype that uses EEG data set from OpenNeuro, combined with machine learning to explore early detection of Alzheimer’s. Although this work is experimental and will not be used in the final research publication, it has deepened my skills in signal processing, feature extraction, and model development.

Why Alzheimer’s Detection Matters

Early detection of Alzheimer’s can lead to timely intervention, which may slow the progression of the disease and improve the quality of life for patients as this disease is non-curable. Studies have shown that increased theta power, decreased alpha power, and disrupted gamma coherence are often associated with Alzheimer’s. By applying deep learning to these spectral features, we aim to create a tool that could eventually assist clinicians in making early and accurate diagnoses.

Theoretical Background: PSD, DSP, and EEG Signals

A core part of this project is the extraction of power spectral density (PSD) features from EEG signals. PSD analysis reveals how the power of a signal is distributed across different frequencies. Using Welch’s method— an approach that divides the signal into overlapping segments, computes the Fast Fourier Transform (FFT) on each, and averages the results— we obtain a reliable estimate of the PSD.
This process is a fundamental aspect of digital signal processing (DSP) and helps transform raw EEG data into a structured frequency-domain representation that highlights biomarkers related to Alzheimer’s.

The Preprocessing Pipeline

Before training the model, raw EEG recordings must be transformed into meaningful features. The dataset used here is from OpenNeuro, which is already extensively preprocessed, providing us with a clean dataset. Here’s a breakdown of the preprocessing steps implemented:

Pipeline

1. Data Loading and Label Mapping

We begin by loading EEG data using MNE-Python and reading participant metadata from a TSV file. The metadata maps diagnostic groups—‘A’ for Alzheimer’s, ‘F’ for Frontotemporal Dementia, and ‘C’ for healthy controls—to numeric labels.

import pandas as pd

metadata = pd.read_csv('Dataset/participants.tsv', sep='\t')
group_mapping = {'A': 0, 'F': 1, 'C': 2}  # Map diagnostic groups to integers
metadata['label'] = metadata['Group'].map(group_mapping)
subject_labels = dict(zip(metadata['participant_id'], metadata['label']))
Enter fullscreen mode Exit fullscreen mode

This mapping is essential because it links each subject’s EEG data with their clinical result thus helping us in supervised learning.

2. EEG Signal Processing

This EEG data although cleaned, still has several unwanted frequencies. We only need certain frequencies for our analysis, so we apply an FIR filter (0.5–45 Hz) to remove unwanted frequencies (e.g., power line noise).


raw.filter(0.5, 45, fir_design='firwin')

Enter fullscreen mode Exit fullscreen mode

Then we segment the continuous data into 2-second epochs with a 1-second overlap. This step captures transient neural patterns relevant to Alzheimer’s.


epochs = mne.make_fixed_length_epochs(raw, duration=2.0, overlap=1, preload=True)

Enter fullscreen mode Exit fullscreen mode

3. PSD Calculation and Feature Extraction

For each epoch, we use Welch’s method to compute the PSD, and then extract relative band power (RBP) features for the standard EEG frequency bands: delta, theta, alpha, beta, and gamma. This step involves averaging the power within each frequency range and normalizing by the total power, resulting in a 4D tensor (epochs, channels, bands, 1) that is suitable as input for a deep learning model.

psd = epochs.compute_psd(method="welch", fmin=0.5, fmax=45)
psds, freqs = psd.get_data(return_freqs=True)

freq_bands = {
    "delta": (0.5, 4),
    "theta": (4, 8),
    "alpha": (8, 13),
    "beta": (13, 25),
    "gamma": (25, 45),
}

band_power = {}
for band, (fmin, fmax) in freq_bands.items():
    idx = np.logical_and(freqs >= fmin, freqs <= fmax)
    band_power[band] = psds[:, :, idx].mean(axis=-1)

bp_abs = np.stack(list(band_power.values()), axis=-1)
total_power = bp_abs.sum(axis=-1, keepdims=True)
rbp_relative = bp_abs / total_power

features = rbp_relative.reshape(rbp_relative.shape[0], rbp_relative.shape[1], rbp_relative.shape[2], 1)
Enter fullscreen mode Exit fullscreen mode

4. Label Vector Construction and Data Standardization

Finally, we associate each epoch with its corresponding diagnostic label using the metadata mapping, and concatenate all subject features to form the final input matrix X. We also split our data into training and test sets. To improve training stability, we standardize X using StandardScaler, but this requires data to be in 2D shape, so we reshape our data, apply the functions, and then reshape it back to the original.

5. Final Data Format

After all these steps, If we print our input final matrix that we will feed into our model i.e. 'X' .

print("X shape:", X.shape)
Enter fullscreen mode Exit fullscreen mode

we get output :

X shape: (69706, 19, 5, 1)

The given implies :

  • 69706 Epochs: This is the total number of epochs (or samples) extracted from all subjects. Each epoch represents a 2-second window of EEG data transformed into a feature map.
  • 19 Channels: Each epoch's feature map has 19 rows, corresponding to 19 EEG channels.
  • 5 Frequency Bands: The 5 columns in each feature map represent no. of frequency bands: delta, theta, alpha, beta, and gamma.
  • 1 Channel (Grayscale Image): The final dimension (1) indicates that data is a single channel. This is analogous to a grayscale image. Here each pixel value corresponds to the normalized relative band power of a particular EEG channel in a specific frequency band.

RBP

Conclusion

This blog post has covered the theoretical background of power spectral density (PSD) and digital signal processing (DSP) as they relate to EEG signals, explained why EEG is a promising tool for Alzheimer’s detection, and detailed the preprocessing steps that transform raw EEG data into meaningful features for deep learning. Although the model is still experimental, this pipeline lays a strong foundation for my future improvements and learnings.

In Part 2, we will dive into the details of the model architecture and training strategies, discuss how machine learning components work to learn from these spectral features and ultimately classify EEG recordings.


For a detailed look at the code and further updates, please visit my GitHub repository: EEG-ML-Experiment.

GitHub logo vivekvohra / EEG-ML-Experiment

Automated EEG-Based Alzheimer’s Detection System

EEG-ML-Experiment

Welcome to the EEG-ML-Experiment repository! This repository is dedicated to exploring various experimental models for processing EEG data using deep learning techniques. The overall goal is to develop and test different approaches for tasks like Alzheimer’s detection using EEG signals. Although these projects are experimental, they serve as an important learning tool and a foundation for future research and development.


Overview

This repository contains multiple experimental models, each implemented in its own subdirectory along with a dedicated README file. The main focus is on leveraging EEG data—specifically, features extracted from power spectral density (PSD) and relative band power—for diagnostic purposes. This work is part of my ongoing research efforts, and while the models are still in development and experimental in nature, they represent a significant learning experience in applying machine learning to biomedical signals.


Repository Structure

The repository is organized as follows:

EEG-ML-Experiment/
│
├── README.md                # This file,

Top comments (1)

Collapse
 
techtter profile image
Techtter

270+ Machine Learning Projects. You can get these projects document here: youtube.com/shorts/g2V1aFuRiPg