DEV Community

Cover image for Journey into Visual AI: Exploring FiftyOne Together — Part IV Model Evaluation
Jimmy Guerrero for Voxel51

Posted on

Journey into Visual AI: Exploring FiftyOne Together — Part IV Model Evaluation

Author: Paula Ramos (Senior DevRel and Applied AI Research Advocate at Voxel51)

Good data can create good models, but how can we validate that our models are well-trained, fix mistakes, and address dataset gaps? The key lies in proper data management and model evaluation.

Image description

In my journey with FiftyOne (Blog1Blog2, and Blog3), I’ve explored its transformative impact on the MLOps pipeline, from data curation to models in production. Previously, I covered steps like data loading, curation, cleaning, and metadata management. Now, it’s time to focus on model evaluation, a critical step for building high-quality, reliable, and trustworthy AI/ML systems.

Image description

The Importance of Model Evaluation

Effective model evaluation goes beyond accuracy metrics. It ensures your models are:

  • Accurate: Perform as expected across various datasets.
  • Robust: Handle real-world variability gracefully.
  • Fair: Address potential biases in training data.
  • Explainable: Offer insights into decision-making.

Developers can adopt advanced techniques like automated pipelines, explainable AI (XAI) tools, and bias detection to meet these goals. This is where FiftyOne shines, offering visual and interactive tools that simplify and enhance model evaluation.

FiftyOne as a Solution for Model Evaluation

FiftyOne provides an intuitive approach to evaluating datasets and models, allowing developers to:

  • Visually inspect samples where a model performs well or poorly.
  • Identify labeling errors and dataset biases.
  • Streamline evaluation processes with interactive dashboards.
  • Lay the foundation for fairness, explainability, and better decision-making.

In the context of human action recognition, FiftyOne helps bridge the gap between raw data and actionable insights, especially for computer vision tasks like the Elderly Action Recognition Challenge.

Evaluating Models for Human Action Recognition

In this notebook, I demonstrate how to evaluate models for human action recognition using FiftyOne. We leverage the UCF101 dataset and integrate a pre-trained model from Hugging Face, walking through key steps like importing datasets, visualizing predictions, and improving model performance.

Importing the Dataset

First, let’s import the UCF101 dataset and launch the FiftyOne App:

import fiftyone as fo
import fiftyone.zoo as foz
# Load the UCF101 test dataset
dataset = foz.load_zoo_dataset("ucf101", split="test")
# Launch the FiftyOne app
session = fo.launch_app(dataset)
Enter fullscreen mode Exit fullscreen mode

This gives you an interactive view of your data, enabling you to spot trends, errors, or biases.

Visualizing Predictions

FiftyOne allows you to visualize model predictions side-by-side with ground truth labels. This is especially useful for tasks like Elderly Action Recognition (EAR), where understanding subtle nuances is critical. The notebook referenced in this blog works with a Human Action Recognition (HAR) dataset and model, but this could be interpolated to EAR.

from transformers import VideoMAEImageProcessor, 
                         VideoMAEForVideoClassification
import torch

# Load a pre-trained video classification model
processor = VideoMAEImageProcessor.from_pretrained(
            "sayakpaul/videomae-base-finetuned-ucf101-subset")
model = VideoMAEForVideoClassification.from_pretrained(
        "sayakpaul/videomae-base-finetuned-ucf101-subset")

#Predict labels for dataset samples
frame_fp_lists = dataset.values("filepath")
predicted_labels = []

for frame_fps in frame_fp_lists:
     video_frames = construct_frames_array(frame_fps, 16)
     inputs = processor(video_frames, return_tensors="pt")
     with torch.no_grad():
     outputs = model(**inputs)
     logits = outputs.logits
     predicted_class = logits.argmax(-1).item()
     predicted_labels.append(predicted_class)
     dataset.set_values("predictions", predicted_labels)
Enter fullscreen mode Exit fullscreen mode

Evaluating Performance with Confusion Matrices

FiftyOne simplifies the process of analyzing model performance. You can generate confusion matrices to pinpoint areas for improvement:

# Evaluate predictions
results = dataset.evaluate_classifications(
          "predictions",
          gt_field="ground_truth",
          eval_key="evaluation"
          )
# Plot confusion matrix
results.plot_confusion_matrix().show()
Enter fullscreen mode Exit fullscreen mode

Using the Model Evaluation Panel

We have loaded a UFC-101 dataset in the App and two predictions for 10 labels. To showcase the Model Evaluation Panel, we modify the ground truth labels in the 10% of the subset. So we can compare the two predictions, one over the original ground_truth and the second one over the modified labels. With one or more evaluations, we can open the Model Evaluation panel to visualize and interactively explore the evaluation results in the App:

The panel’s home page shows a list of evaluations on the dataset, their current review status, and any notes you’ve added. Click on an evaluation to open its expanded view, which provides a set of expandable cards that dives into various aspects of the model’s performance:

Summary Card:

Image description

#### Metric Performance Card:

Image description

Class performance Card:

Image description

Confusion Matrix Card:

Image description

Joining the Elderly Action Recognition Challenge

This notebook is part of the Elderly Action Recognition Challenge. It will help participants:

  • Work with pre-trained models.
  • Evaluate models effectively.
  • Submit meaningful solutions to advance action recognition for the elderly.

By focusing on model evaluation, we can make strides toward creating fair, interpretable, and impactful AI systems for healthcare and beyond.

Additional resources:

Just wrapping up! 😀

Model evaluation is not just a step in the AI/ML workflow, it’s a critical process that determines the success and reliability of your solutions. With FiftyOne, you gain powerful tools to visualize, analyze, and improve your models in ways that traditional techniques simply can’t match. Whether addressing human action recognition or diving into the shade of elderly action detection, FiftyOne empowers you to identify weaknesses, address dataset gaps, and ultimately build models that make a real-world impact.

The journey doesn’t end here, model evaluation is an ongoing process that evolves with your data and objectives. By adopting advanced techniques and tools like FiftyOne, you’re not just building models; you’re building reliable, robust, and fair AI systems that solve meaningful problems.

I would love to hear about your experiences! Please Share Your Thoughts, Ask Questions, and Provide Testimonials. Your insights might help others in our next posts. Don’t forget to participate in the challenge and try out the notebook I have created for you all.

Together, we can innovate in action recognition and make meaningful contributions to AI for Good. Let’s build something impactful!

Stay tuned for the next post, in which we’ll explore FiftyOne’s advanced and evaluate the model.

Let’s make this journey with FiftyOne a collaborative and enriching experience. Happy coding!

Stay Connected:

What is next?

I’m excited to share more about my journey at Voxel51! 🚀 If you’d like to follow along as I explore the world of AI and grow professionally, feel free to connect or follow me on LinkedIn. Let’s inspire each other to embrace change and reach new heights!

You can find me at some Voxel51 events (https://voxel51.com/computer-vision-events/), or if you want to join this amazing team, it’s worth taking a look at this page: https://voxel51.com/jobs/

Top comments (0)