Analyzing the contents of a PDF document is a very common task. But if earlier it was interesting for users to get a text from a document, now they want to look deeper.
Let's take an example, we have several documents containing images of tourist spots or points of interest, and we want to make these documents more searchable. Another example - we have PDF catalogs of clothes and we want to know which ones have T-shirts.
Modern AI services allow us to determine the type and nature of the images contained in them and they can help us to solve this task.
In this article, we will consider an example of integration Aspose.PDF for .NET with Microsoft Cognitive Services.
So, assume we have a bunch of PDF documents with images and we want to know what are depicts in them. We should add/update keywords in the document to make our documents more searchable.
Let split our task into 3 subtasks:
- extract images
- analyze images and get the tags
- add/update meta information in PDF
Image extraction
To extract images from PDF document we will use an ImagePlacementAbsorber class. First, we create an instance of ImagePlacementAbsorber
, then
get the images from the document using Visit method and filter small images to avoid analyze decorative and/or non-informative images.
var fileNames = new StringCollection();
// Create an instance of ImagePlacementAbsorber
var abs = new ImagePlacementAbsorber();
// Fill ImagePlacements collection with images from PDF
abs.Visit(new Aspose.Pdf.Document(pdfFileName));
// Filter small images and make an array for the future handling
var imagePlacements = abs.ImagePlacements
.Where(i => (i.Rectangle.Width > 50) && (i.Rectangle.Height > 50))
.ToArray();
We save each found image in a temporary directory for later analysis.
for (var i = 0; i < imagePlacements.Count(); i++)
{
var fileName = $@"{TempDir}\\image_{i}.jpg";
// Create a file stream to store image in the temporary directory
using (var stream = new FileStream(fileName, FileMode.Create))
{
imagePlacements[i].Image.Save(stream, ImageFormat.Jpeg);
stream.Close();
}
// Add filename to the collection
fileNames.Add(fileName);
}
To store the extracted image we should create a file stream and pass it to the ImagePlacement.Save method
Image recoginition
As stated above, we will use Microsoft Computer Vision service
At the previous stage, we got image files in the temporary directory and a list of files that can be saved in a specific variable. Now we will upload each image to the Microsoft Computer Vision service and will get the tags for recognized objects. Each tag contains the Name and Confidence properties. The Confidence
property points to probability for correspondence Name
to the object. Thus we can filter less valuable tags.
private static IEnumerable<string> CollectImageTags(
StringCollection imageFileNames,
double confidence)
{
// Create a client for Computer Vision service
var client = Authenticate(Endpoint, SubscriptionKey);
// Create a set of tags
var tags = new HashSet<string>();
foreach (var fileName in imageFileNames)
{
// Upload image and recognize it
var result = AnalyzeImage(client, fileName).Result;
// Get the tags collection
var imageTags =
result.Tags
// filter less valuable tags
.Where(iTag => iTag.Confidence > confidence)
// and select only names
.Select(iTag => iTag.Name);
// add new tags into tag's set
tags.UnionWith(imageTags);
}
return tags;
}
Two helper methods were used in the snippet above. The first creates a client for Computer Vision API and the second uploads and analyzes the image.
public static ComputerVisionClient Authenticate(string endpoint, string key)
{
var client =
new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
{Endpoint = endpoint};
return client;
}
public static async Task<TagResult> AnalyzeImage(ComputerVisionClient client, string imagePath)
{
Console.WriteLine($"Analyzing the image {Path.GetFileName(imagePath)}...");
Console.WriteLine();
// Analyze the URL image
return await client.TagImageInStreamAsync(
File.OpenRead(imagePath ?? throw new ArgumentNullException(nameof(imagePath))));
}
The ComputerVisionClient
is a class from Microsoft.Azure.CognitiveServices.Vision.ComputerVision
library. If you interested in how to work with Microsoft Computer Vision, please refer to Microsoft Cognitive Services Documentation.
Update Meta information
To work with meta information in the Aspose.PDF library provides class DocumentInfo. According to our task, we will use DocumentInfo.Keywords
property.
private static void SaveMetaData(
string pdfFileName,
IEnumerable<string> tags)
{
var document = new Aspose.Pdf.Document(pdfFileName);
_ = new DocumentInfo(document)
{
Keywords = string.Join("; ", tags)
};
document.Save(pdfFileName.Replace(".pdf","_tags.pdf"));
}
So, let look at whole code:
using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Drawing.Imaging;
using System.Threading.Tasks;
using System.IO;
using System.Linq;
using Aspose.Pdf;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
namespace Aspose.PDF.Demo.ImageClassification
{
class Program
{
private const string SubscriptionKey = "<add key here>";
private const string Endpoint = "https://<add endpoint here>.cognitiveservices.azure.com/";
private const string LicenseFileName = @"<add license file here>";
private const string PdfFileName = @"C:\tmp\<file>";
private const string TempDir = "C:\\tmp\\extracted_images\\";
private static readonly License License = new Aspose.Pdf.License();
static void Main()
{
//you can use a trial version, but please note
//that you will be limited with 4 images.
//License.SetLicense(LicenseFileName);
AnalyzeImageContent(PdfFileName);
}
private static void AnalyzeImageContent(string pdfFileName)
{
// Specify the directories you want to manipulate.
var di = new DirectoryInfo(@TempDir);
try
{
// Determine whether the directory exists.
if (!di.Exists)
// Try to create the directory.
di.Create();
var images = ExtractImages(pdfFileName);
var tags = CollectImageTags(images, 0.9);
SaveMetaData(pdfFileName, tags);
// Delete the directory.
di.Delete(true);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
private static void SaveMetaData(string pdfFileName, IEnumerable<string> tags)
{
var document = new Aspose.Pdf.Document(pdfFileName);
_ = new DocumentInfo(document)
{
Keywords = string.Join("; ", tags)
};
document.Save(pdfFileName.Replace(".pdf","_tags.pdf"));
}
private static IEnumerable<string> CollectImageTags(StringCollection imageFileNames, double confidence)
{
// Create a client for Computer Vision service
var client = Authenticate(Endpoint, SubscriptionKey);
// Create a set of tags
var tags = new HashSet<string>();
foreach (var fileName in imageFileNames)
{
// Upload image and recognize it
var result = AnalyzeImage(client, fileName).Result;
// Get the tags collection
var imageTags =
result.Tags
// filter less valuable tags
.Where(iTag => iTag.Confidence > confidence)
// and select only names
.Select(iTag => iTag.Name);
// add new tags into tag's set
tags.UnionWith(imageTags);
}
return tags;
}
private static StringCollection ExtractImages(string pdfFileName)
{
var fileNames = new StringCollection();
// Create an instance of ImagePlacementAbsorber
var abs = new ImagePlacementAbsorber();
// Fill ImagePlacements collection with images from PDF
abs.Visit(new Aspose.Pdf.Document(pdfFileName));
// Filter small images and make an array for the future handling
var imagePlacements = abs.ImagePlacements
.Where(i => (i.Rectangle.Width > 50) && (i.Rectangle.Height > 50))
.ToArray();
for (var i = 0; i < imagePlacements.Count(); i++)
{
var fileName = $@"{TempDir}\\image_{i}.jpg";
// Create a file stream to store image in the temporary directory
using (var stream = new FileStream(fileName, FileMode.Create))
{
imagePlacements[i].Image.Save(stream, ImageFormat.Jpeg);
stream.Close();
}
// Add filename to the collection
fileNames.Add(fileName);
}
return fileNames;
}
public static ComputerVisionClient Authenticate(string endpoint, string key)
{
var client =
new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
{Endpoint = endpoint};
return client;
}
public static async Task<TagResult> AnalyzeImage(
ComputerVisionClient client,
string imagePath)
{
Console.WriteLine($"Analyzing the image {Path.GetFileName(imagePath)}...");
Console.WriteLine();
// Analyze the URL image
return await client.TagImageInStreamAsync(
File.OpenRead(imagePath ?? throw new ArgumentNullException(nameof(imagePath))));
}
}
}
Top comments (0)