Analyzing images in PDF using Microsoft Cognitive Services.

#pdf #ai #csharp

Analyzing the contents of a PDF document is a very common task. But if earlier it was interesting for users to get a text from a document, now they want to look deeper.

Let's take an example, we have several documents containing images of tourist spots or points of interest, and we want to make these documents more searchable. Another example - we have PDF catalogs of clothes and we want to know which ones have T-shirts.

Modern AI services allow us to determine the type and nature of the images contained in them and they can help us to solve this task.

In this article, we will consider an example of integration Aspose.PDF for .NET with Microsoft Cognitive Services.
So, assume we have a bunch of PDF documents with images and we want to know what are depicts in them. We should add/update keywords in the document to make our documents more searchable.

Let split our task into 3 subtasks:

extract images
analyze images and get the tags
add/update meta information in PDF

Image extraction

To extract images from PDF document we will use an ImagePlacementAbsorber class. First, we create an instance of ImagePlacementAbsorber, then
get the images from the document using Visit method and filter small images to avoid analyze decorative and/or non-informative images.

var fileNames = new StringCollection();
// Create an instance of ImagePlacementAbsorber
var abs = new ImagePlacementAbsorber();
// Fill ImagePlacements collection with images from PDF
abs.Visit(new Aspose.Pdf.Document(pdfFileName));
// Filter small images and make an array for the future handling
var imagePlacements = abs.ImagePlacements
    .Where(i => (i.Rectangle.Width > 50) && (i.Rectangle.Height > 50))
    .ToArray();

We save each found image in a temporary directory for later analysis.

 for (var i = 0; i < imagePlacements.Count(); i++)
{
    var fileName = $@"{TempDir}\\image_{i}.jpg";
    // Create a file stream to store image in the temporary directory
    using (var stream = new FileStream(fileName, FileMode.Create))
    {
        imagePlacements[i].Image.Save(stream, ImageFormat.Jpeg);
        stream.Close();
    }
    // Add filename to the collection
    fileNames.Add(fileName);
}

To store the extracted image we should create a file stream and pass it to the ImagePlacement.Save method

Image recoginition

As stated above, we will use Microsoft Computer Vision service

At the previous stage, we got image files in the temporary directory and a list of files that can be saved in a specific variable. Now we will upload each image to the Microsoft Computer Vision service and will get the tags for recognized objects. Each tag contains the Name and Confidence properties. The Confidence property points to probability for correspondence Name to the object. Thus we can filter less valuable tags.

private static IEnumerable<string> CollectImageTags(
    StringCollection imageFileNames,
    double confidence)
{
    // Create a client for Computer Vision service
    var client = Authenticate(Endpoint, SubscriptionKey);
    // Create a set of tags
    var tags = new HashSet<string>();

    foreach (var fileName in imageFileNames)
    {
        // Upload image and recognize it
        var result = AnalyzeImage(client, fileName).Result;
        // Get the tags collection
        var imageTags =
            result.Tags
                    // filter less valuable tags
                .Where(iTag => iTag.Confidence > confidence)
                    // and select only names
                .Select(iTag => iTag.Name);
        // add new tags into tag's set
        tags.UnionWith(imageTags);
    }
    return tags;
}

Two helper methods were used in the snippet above. The first creates a client for Computer Vision API and the second uploads and analyzes the image.

public static ComputerVisionClient Authenticate(string endpoint, string key)
{
    var client =
        new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
            {Endpoint = endpoint};
    return client;
}

public static async Task<TagResult> AnalyzeImage(ComputerVisionClient client, string imagePath)
{
    Console.WriteLine($"Analyzing the image {Path.GetFileName(imagePath)}...");
    Console.WriteLine();
    // Analyze the URL image 
    return await client.TagImageInStreamAsync(
        File.OpenRead(imagePath ?? throw new ArgumentNullException(nameof(imagePath))));
}

The ComputerVisionClient is a class from Microsoft.Azure.CognitiveServices.Vision.ComputerVision library. If you interested in how to work with Microsoft Computer Vision, please refer to Microsoft Cognitive Services Documentation.

Update Meta information

To work with meta information in the Aspose.PDF library provides class DocumentInfo. According to our task, we will use DocumentInfo.Keywords property.

private static void SaveMetaData(
    string pdfFileName,
    IEnumerable<string> tags)
{
    var document = new Aspose.Pdf.Document(pdfFileName);
    _ = new DocumentInfo(document)
    {
        Keywords = string.Join("; ", tags)
    };
    document.Save(pdfFileName.Replace(".pdf","_tags.pdf"));
}

So, let look at whole code:

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Drawing.Imaging;
using System.Threading.Tasks;
using System.IO;
using System.Linq;

using Aspose.Pdf;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
using Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;


namespace Aspose.PDF.Demo.ImageClassification
{
    class Program
    {
        private const string SubscriptionKey = "<add key here>";
        private const string Endpoint = "https://<add endpoint here>.cognitiveservices.azure.com/";
        private const string LicenseFileName = @"<add license file here>";
        private const string PdfFileName = @"C:\tmp\<file>";

        private const string TempDir = "C:\\tmp\\extracted_images\\";
        private static readonly License License = new Aspose.Pdf.License();

        static void Main()
        {
            //you can use a trial version, but please note 
            //that you will be limited with 4 images. 
            //License.SetLicense(LicenseFileName);
            AnalyzeImageContent(PdfFileName);
        }

        private static void AnalyzeImageContent(string pdfFileName)
        {
            // Specify the directories you want to manipulate.
            var di = new DirectoryInfo(@TempDir);

            try
            {
                // Determine whether the directory exists.
                if (!di.Exists)
                    // Try to create the directory.
                    di.Create();

                var images = ExtractImages(pdfFileName);
                var tags = CollectImageTags(images, 0.9);
                SaveMetaData(pdfFileName, tags);

                // Delete the directory.
                di.Delete(true);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }

        private static void SaveMetaData(string pdfFileName, IEnumerable<string> tags)
        {
            var document = new Aspose.Pdf.Document(pdfFileName);
            _ = new DocumentInfo(document)
            {
                Keywords = string.Join("; ", tags)
            };
            document.Save(pdfFileName.Replace(".pdf","_tags.pdf"));
        }

        private static IEnumerable<string> CollectImageTags(StringCollection imageFileNames, double confidence)
        {
            // Create a client for Computer Vision service
            var client = Authenticate(Endpoint, SubscriptionKey);
            // Create a set of tags
            var tags = new HashSet<string>();

            foreach (var fileName in imageFileNames)
            {
                // Upload image and recognize it
                var result = AnalyzeImage(client, fileName).Result;
                // Get the tags collection
                var imageTags =
                    result.Tags
                            // filter less valuable tags
                        .Where(iTag => iTag.Confidence > confidence)
                            // and select only names
                        .Select(iTag => iTag.Name);
                // add new tags into tag's set
                tags.UnionWith(imageTags);
            }
            return tags;
        }

        private static StringCollection ExtractImages(string pdfFileName)
        {
            var fileNames = new StringCollection();
            // Create an instance of ImagePlacementAbsorber
            var abs = new ImagePlacementAbsorber();
            // Fill ImagePlacements collection with images from PDF
            abs.Visit(new Aspose.Pdf.Document(pdfFileName));
            // Filter small images and make an array for the future handling
            var imagePlacements = abs.ImagePlacements
                .Where(i => (i.Rectangle.Width > 50) && (i.Rectangle.Height > 50))
                .ToArray();

            for (var i = 0; i < imagePlacements.Count(); i++)
            {
                var fileName = $@"{TempDir}\\image_{i}.jpg";
                // Create a file stream to store image in the temporary directory
                using (var stream = new FileStream(fileName, FileMode.Create))
                {
                    imagePlacements[i].Image.Save(stream, ImageFormat.Jpeg);
                    stream.Close();
                }
                // Add filename to the collection
                fileNames.Add(fileName);
            }
            return fileNames;
        }

        public static ComputerVisionClient Authenticate(string endpoint, string key)
        {
            var client =
                new ComputerVisionClient(new ApiKeyServiceClientCredentials(key))
                    {Endpoint = endpoint};
            return client;
        }

        public static async Task<TagResult> AnalyzeImage(
            ComputerVisionClient client,
            string imagePath)
        {
            Console.WriteLine($"Analyzing the image {Path.GetFileName(imagePath)}...");
            Console.WriteLine();
            // Analyze the URL image 
            return await client.TagImageInStreamAsync(
                File.OpenRead(imagePath ?? throw new ArgumentNullException(nameof(imagePath))));
        }
    }
}

DEV Community

Analyzing images in PDF using Microsoft Cognitive Services.

Image extraction

Image recoginition

Update Meta information

Additional resources

Top comments (0)

Read next

My thoughts on Pieces

Unlocking Search Relevance: Large Language Models Power Pinterest Discovery

Breakthrough in AI Memory: Endowing Language Models with Human-like Episodic Recall

The Surprising Benefits of Smaller Language Models