Welcome Back! 🚀
You’ve made it this far—great job! So far, we have:
✅ Set up AWS SageMaker with a Jupyter Notebook instance.
✅ Prepared the NSL-KDD dataset by cleaning, encoding categorical features, and normalizing numerical values.
✅ Uploaded the dataset to AWS S3 to be used for training.
Now, it’s time to bring everything together!
In this next section, we will go through:
☑️ Part 3 – Train an anomaly detection model on network security data using XGBoost.
☑️ Part 4 – Deploy the trained model as a SageMaker endpoint for real-time threat analysis.
☑️ Part 5 – Make predictions to classify network traffic as normal or malicious.
By the end of this section, you will have a fully deployed cybersecurity threat detection system running in the cloud! Let’s get started. 🚀
Step 3: Train an Anomaly Detection Model on Network Security Data
Now that our preprocessed NSL-KDD dataset is ready in AWS S3, we will:
- Use SageMaker’s built-in XGBoost algorithm for binary classification.
- Train the model using SageMaker-managed infrastructure.
- Store the trained model in S3 for later deployment.
3.1 Set Up SageMaker and Define Training Parameters
First, we will configure the SageMaker environment and specify the XGBoost training job.
import boto3
import sagemaker
from sagemaker import get_execution_role
# Initialize SageMaker session and get execution role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
# Retrieve the built-in SageMaker XGBoost container
region = boto3.Session().region_name
xgboost_image = sagemaker.image_uris.retrieve("xgboost", region, "latest")
print("SageMaker environment configured successfully!")
3.2 Configure the XGBoost Estimator
We now define the XGBoost model, set hyperparameters, and configure the SageMaker training job.
# Define the XGBoost model and training job
xgb_model = sagemaker.estimator.Estimator(
image_uri=xgboost_image,
role=role,
instance_count=1, # Single instance for training
instance_type="ml.m5.large", # Adjust as needed
volume_size=5, # Storage in GB for the training container
output_path=f"s3://{s3_bucket}/{s3_prefix}/output", # Model output location
sagemaker_session=sagemaker_session
)
# Set XGBoost hyperparameters
xgb_model.set_hyperparameters(
objective="binary:logistic", # Binary classification problem
num_round=100, # Number of boosting rounds
eval_metric="auc", # Performance metric: Area Under Curve (AUC)
eta=0.2, # Learning rate
max_depth=5, # Maximum tree depth
subsample=0.8, # Use 80% of the data for each boosting round
)
print("XGBoost training job configured successfully!")
3.3 Start Training the Model
We now launch the training job on AWS SageMaker.
# Define SageMaker Training Input
train_input = sagemaker.inputs.TrainingInput(s3_train_data, content_type="csv")
# Start training
xgb_model.fit({"train": train_input})
print("Model training started... ⏳")
Monitor Training Progress
Go to AWS SageMaker Console → Training Jobs to see live logs.
Training time depends on dataset size (usually a few minutes).
Once complete, the trained model will be stored in Amazon S3 at:
s3://cybersecurity-dataset-XXXXX/sagemaker/cybersecurity/output/
Once training completes, we move to Step 4! 🚀
Step 4: Deploy the Model as a SageMaker Endpoint
Now that training is complete, we deploy the trained model as an endpoint so we can send network traffic data for real-time cybersecurity threat detection.
4.1 Deploy the Trained Model
from sagemaker.serializers import CSVSerializer
# Deploy the trained model as a SageMaker endpoint
predictor = xgb_model.deploy(
initial_instance_count=1, # Number of deployed instances
instance_type="ml.m5.large", # Choose based on expected traffic
serializer=CSVSerializer(), # Ensure input data is sent in CSV format
)
print("✅ Model deployed successfully as a SageMaker endpoint!")
✅ SageMaker has now deployed the model! 🎯
Step 5: Make Predictions on Network Traffic Data
Now that the model is deployed, we can send network traffic data to classify it as normal or malicious.
5.1 Test the Model with Sample Data
We will send a normalized network traffic sample to the model for prediction.
import numpy as np
# Example input (Ensure 42 features, replace with real normalized test data)
test_input = np.array([[
0.1, 0.2, 0.3, 0.5, 0.7, 0.0, 1.0, 0.0, 150,
0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
0.5, 0.2, 1.0, 0.6, 0.3, 0.1, 0.0, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.0, 0.3, 0.2, 0.4,
0.6, 0.8
]])
print(f"Test input shape: {test_input.shape}") # Should be (1, 42)
# Send request to the deployed model
response = predictor.predict(test_input.tolist())
# Decode response
probability = float(response.decode("utf-8").strip()) # Convert to float
# Apply threshold to classify as 0 or 1
predicted_label = 1 if probability >= 0.5 else 0
print(f"🔍 Probability Score: {probability:.4f}")
print(f"🔍 Predicted Label: {predicted_label} (0: Normal, 1: Attack)")
5.2 Clean Up Resources
After testing, delete the endpoint to avoid unnecessary AWS charges.
predictor.delete_endpoint()
print("✅ SageMaker endpoint deleted. Training and deployment process complete!")
Remember to stop the notebook instance! Once the instance is stopped, go ahead and delete the it.
Final Thoughts: What We Achieved 🎯
✅ Trained an XGBoost model on network traffic data to detect anomalies.
✅ Deployed the model as a SageMaker endpoint for real-time cybersecurity threat analysis.
✅ Sent test network activity for real-time classification (normal vs. attack).
✅ Shut down resources to avoid unnecessary costs.
🚀 Congratulations! You now have a fully functional, cloud-based cybersecurity anomaly detection system using AWS SageMaker! 🔥
Top comments (0)