SVD is commonly used in collaborative filtering methods, especially for predicting ratings in recommendation systems (e.g., Netflix, Amazon).
To make predictions using SVD, we use the decomposed matrices (U, S, and Vt) to approximate missing values (i.e., ratings that a user has not given for a movie). Here's how it works:
How Predictions Are Made with SVD:
Reconstruction of the Matrix: After applying SVD to the user-item matrix, you get three matrices: U (user features), S (singular values), and Vt (movie features).
Approximation: By multiplying these matrices back together, you get an approximation of the original matrix, where missing values are filled in with predicted ratings.
Prediction: The reconstructed matrix provides approximate values for missing ratings based on the latent patterns discovered in the data.
NOTE: SVD doesn't "know" where the ratings are missing, but we can use the fact that SVD approximates the original matrix based on the patterns it detects in the data.
What SVD Does:
SVD decomposes this matrix into three matrices: U, S, and Vt.
- U matrix: Represents user features (latent factors related to users).
- S matrix: Contains singular values, representing the importance of each latent factor.
- Vt matrix: Represents movie features (latent factors related to movies).
SVD's goal is to find the patterns in the user ratings and approximate the original matrix by breaking it down into these three matrices. When we multiply them together, we get an approximation of the original matrix. For example:
Original matrix ≈ U * S * Vt
SVD and Missing Values:
Here's the key: SVD doesn't directly "know" the missing values. It works with the zeros where the data is missing, but it doesn't treat them as actual ratings. Instead, it tries to approximate the entire matrix (including the zeros) based on patterns in the known data.
- When performing SVD, we decompose the matrix where the missing values are filled with zero.
- The decomposition identifies patterns across users and movies.
- Even though the values are missing (i.e., zero), SVD will approximate these missing values based on the relationship between the non-zero ratings. For example, it may find that a particular user tends to rate movies similarly, or that certain movies are rated similarly by many users.
Here’s a simple example of a movie recommendation system using SVD (Singular Value Decomposition). This program will predict movie ratings for a user and recommend movies based on those predictions.
Steps:
- Prepare the data: We'll use a simple movie rating dataset (in this example, we'll create a small user-item matrix manually).
- Apply SVD: Decompose the matrix using SVD.
- Reconstruct the matrix: Use the decomposed matrices to predict missing values (i.e., missing ratings).
- Recommend movies: Based on the predicted ratings, recommend the top N movies for a user.
import numpy as np
# Step 1: Create a user-item matrix (ratings matrix)
# Rows: Users (0 to 4)
# Columns: Movies (0 to 4)
# Values: Ratings (1 to 5), 0 indicates a missing rating
ratings_matrix = np.array([
[5, 0, 0, 2, 0], # User 0's ratings
[4, 0, 0, 1, 0], # User 1's ratings
[1, 1, 0, 5, 4], # User 2's ratings
[0, 0, 5, 4, 0], # User 3's ratings
[0, 3, 4, 0, 5] # User 4's ratings
])
# Step 2: Perform SVD on the ratings matrix
# Decompose the matrix into three matrices: U, S, and Vt
U, S, Vt = np.linalg.svd(ratings_matrix, full_matrices=False)
# Step 3: Reconstruct the matrix by multiplying U, S, and Vt
# Note: We approximate the matrix by reducing the number of components (k)
k = 2 # Use 2 latent factors for this example (you can experiment with this value)
S_k = np.diag(S[:k]) # Take the first 'k' singular values
U_k = U[:, :k] # Take the first 'k' columns of U
Vt_k = Vt[:k, :] # Take the first 'k' rows of Vt
# Reconstruct the matrix using the reduced matrices
reconstructed_matrix = np.dot(U_k, np.dot(S_k, Vt_k))
# Step 4: Predict ratings for missing values (approximated ratings)
# Let's print the original matrix and the reconstructed matrix
print("Original Ratings Matrix:")
print(ratings_matrix)
print("\nReconstructed Ratings Matrix (Predictions for missing values):")
print(reconstructed_matrix)
# Step 5: Recommend movies for User 0 based on the predicted ratings
# For User 0, find the movies with the highest predicted ratings (excluding already rated ones)
user_index = 0 # User 0
predicted_ratings_user_0 = reconstructed_matrix[user_index]
# Find movies with highest predicted ratings (we'll ignore already rated movies)
rated_movies = ratings_matrix[user_index] > 0
predicted_ratings_user_0[rated_movies] = -1 # Exclude already rated movies by setting them to -1
# Get the indices of the top N recommended movies
N = 3 # Recommend top 3 movies
recommended_movie_indices = predicted_ratings_user_0.argsort()[-N:][::-1]
# Print recommended movies (indices of the movies)
print("\nTop 3 movie recommendations for User 0:")
for idx in recommended_movie_indices:
print(f"Movie {idx}: Predicted Rating = {predicted_ratings_user_0[idx]:.2f}")
Example Output:
Original Ratings Matrix:
[[5 0 0 2 0]
[4 0 0 1 0]
[1 1 0 5 4]
[0 0 5 4 0]
[0 3 4 0 5]]
Reconstructed Ratings Matrix (Predictions for missing values):
[[4.54577411 1.08670443 2.20015523 3.53068276 2.80830327]
[4.24466444 1.12701613 1.98279903 3.2210786 2.73432993]
[2.39280315 1.31704768 3.93904552 4.31580358 3.70635026]
[1.24819589 1.23391165 4.26573872 4.30332321 2.62780186]
[3.04222213 3.01697039 3.96412569 3.01974792 4.46130856]]
Top 3 movie recommendations for User 0:
Movie 2: Predicted Rating = 2.20
Movie 4: Predicted Rating = 2.81
Movie 3: Predicted Rating = 3.53
Top comments (0)