Hilda Ogamba

Posted on Feb 11

Getting Started with AWS SageMaker: Train and Deploy a Model in the Cloud for Cybersecurity Threat Detection (Part 2)

#machinelearning #aws #python #cybersecurity

Welcome Back! 🚀

You’ve made it this far—great job! So far, we have:

✅ Set up AWS SageMaker with a Jupyter Notebook instance.
✅ Prepared the NSL-KDD dataset by cleaning, encoding categorical features, and normalizing numerical values.
✅ Uploaded the dataset to AWS S3 to be used for training.

Now, it’s time to bring everything together!

In this next section, we will go through:
☑️ Part 3 – Train an anomaly detection model on network security data using XGBoost.
☑️ Part 4 – Deploy the trained model as a SageMaker endpoint for real-time threat analysis.
☑️ Part 5 – Make predictions to classify network traffic as normal or malicious.

By the end of this section, you will have a fully deployed cybersecurity threat detection system running in the cloud! Let’s get started. 🚀

Step 3: Train an Anomaly Detection Model on Network Security Data

Now that our preprocessed NSL-KDD dataset is ready in AWS S3, we will:

Use SageMaker’s built-in XGBoost algorithm for binary classification.
Train the model using SageMaker-managed infrastructure.
Store the trained model in S3 for later deployment.

3.1 Set Up SageMaker and Define Training Parameters

First, we will configure the SageMaker environment and specify the XGBoost training job.

import boto3
import sagemaker
from sagemaker import get_execution_role

# Initialize SageMaker session and get execution role
sagemaker_session = sagemaker.Session()
role = get_execution_role()

# Retrieve the built-in SageMaker XGBoost container
region = boto3.Session().region_name
xgboost_image = sagemaker.image_uris.retrieve("xgboost", region, "latest")

print("SageMaker environment configured successfully!")

3.2 Configure the XGBoost Estimator

We now define the XGBoost model, set hyperparameters, and configure the SageMaker training job.

# Define the XGBoost model and training job
xgb_model = sagemaker.estimator.Estimator(
    image_uri=xgboost_image,
    role=role,
    instance_count=1,  # Single instance for training
    instance_type="ml.m5.large",  # Adjust as needed
    volume_size=5,  # Storage in GB for the training container
    output_path=f"s3://{s3_bucket}/{s3_prefix}/output",  # Model output location
    sagemaker_session=sagemaker_session
)

# Set XGBoost hyperparameters
xgb_model.set_hyperparameters(
    objective="binary:logistic",  # Binary classification problem
    num_round=100,  # Number of boosting rounds
    eval_metric="auc",  # Performance metric: Area Under Curve (AUC)
    eta=0.2,  # Learning rate
    max_depth=5,  # Maximum tree depth
    subsample=0.8,  # Use 80% of the data for each boosting round
)

print("XGBoost training job configured successfully!")

3.3 Start Training the Model

We now launch the training job on AWS SageMaker.

# Define SageMaker Training Input
train_input = sagemaker.inputs.TrainingInput(s3_train_data, content_type="csv")

# Start training
xgb_model.fit({"train": train_input})

print("Model training started... ⏳")

Monitor Training Progress

Go to AWS SageMaker Console → Training Jobs to see live logs.
Training time depends on dataset size (usually a few minutes).
Once complete, the trained model will be stored in Amazon S3 at:

s3://cybersecurity-dataset-XXXXX/sagemaker/cybersecurity/output/

Once training completes, we move to Step 4! 🚀

Step 4: Deploy the Model as a SageMaker Endpoint

Now that training is complete, we deploy the trained model as an endpoint so we can send network traffic data for real-time cybersecurity threat detection.

4.1 Deploy the Trained Model

from sagemaker.serializers import CSVSerializer

# Deploy the trained model as a SageMaker endpoint
predictor = xgb_model.deploy(
    initial_instance_count=1,  # Number of deployed instances
    instance_type="ml.m5.large",  # Choose based on expected traffic
    serializer=CSVSerializer(),  # Ensure input data is sent in CSV format
)

print("✅ Model deployed successfully as a SageMaker endpoint!")

✅ SageMaker has now deployed the model! 🎯

Step 5: Make Predictions on Network Traffic Data

Now that the model is deployed, we can send network traffic data to classify it as normal or malicious.

5.1 Test the Model with Sample Data

We will send a normalized network traffic sample to the model for prediction.

import numpy as np

# Example input (Ensure 42 features, replace with real normalized test data)
test_input = np.array([[
    0.1, 0.2, 0.3, 0.5, 0.7, 0.0, 1.0, 0.0, 150, 
    0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 
    0.5, 0.2, 1.0, 0.6, 0.3, 0.1, 0.0, 0.2, 0.3, 0.4, 
    0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.0, 0.3, 0.2, 0.4, 
    0.6, 0.8
]])

print(f"Test input shape: {test_input.shape}")  # Should be (1, 42)

# Send request to the deployed model
response = predictor.predict(test_input.tolist())

# Decode response
probability = float(response.decode("utf-8").strip())  # Convert to float

# Apply threshold to classify as 0 or 1
predicted_label = 1 if probability >= 0.5 else 0

print(f"🔍 Probability Score: {probability:.4f}")
print(f"🔍 Predicted Label: {predicted_label} (0: Normal, 1: Attack)")

5.2 Clean Up Resources

After testing, delete the endpoint to avoid unnecessary AWS charges.

predictor.delete_endpoint()
print("✅ SageMaker endpoint deleted. Training and deployment process complete!")

Remember to stop the notebook instance! Once the instance is stopped, go ahead and delete the it.

Final Thoughts: What We Achieved 🎯

✅ Trained an XGBoost model on network traffic data to detect anomalies.
✅ Deployed the model as a SageMaker endpoint for real-time cybersecurity threat analysis.
✅ Sent test network activity for real-time classification (normal vs. attack).
✅ Shut down resources to avoid unnecessary costs.

🚀 Congratulations! You now have a fully functional, cloud-based cybersecurity anomaly detection system using AWS SageMaker! 🔥

DEV Community

Getting Started with AWS SageMaker: Train and Deploy a Model in the Cloud for Cybersecurity Threat Detection (Part 2)

Welcome Back! 🚀

Step 3: Train an Anomaly Detection Model on Network Security Data

3.1 Set Up SageMaker and Define Training Parameters

3.2 Configure the XGBoost Estimator

3.3 Start Training the Model

Monitor Training Progress

Step 4: Deploy the Model as a SageMaker Endpoint

4.1 Deploy the Trained Model

Step 5: Make Predictions on Network Traffic Data

5.1 Test the Model with Sample Data

5.2 Clean Up Resources

Final Thoughts: What We Achieved 🎯

Top comments (0)

Read next

Joining the AWS Community Builders Program A Journey of Growth and Collaboration

Effective error handling strategies for Lambda invocation models

A Magic Line That Cuts Your LLM Latency by >40% on Amazon Bedrock

Getting Started with Terraform: Automating AWS Infrastructure