DEV Community

Cover image for PDF Form Manipulation API
Auguste for Fileforge

Posted on

PDF Form Manipulation API

tl;dr: The new Fileforge API allows you to manipulate PDF forms with ease. Learn how to extract, mark, and fill forms programmatically in this blog article.

Introduction

Forms are an essential part of many business processes, from collecting customer information to processing orders. However, handling forms can be a time-consuming and error-prone process, especially when done manually. That's where Fileforge comes in.

With our latest API update, we're introducing new endpoints that allow you to extract, mark, and fill PDF forms programmatically. This makes it easier than ever to automate your form handling processes and reduce the risk of errors.

Note that you can also use our SDKs to interact with the API. We currently support Python, Node.js, and Ruby, with more languages coming soon.

What can you do with the new API endpoints?

As of today, we support 3 types of operations on PDF forms:

  • Extract: Extract form fields from a PDF document.
  • Mark: Mark form fields with comment annotations (debugging mode)
  • Fill: Fill form fields with data.

How it works?

We will execute the 3 operations on the following PDF document:

Image description

Extracting form fields

Let's first take a look at how you can set up your payload and make a request to extract form fields from a PDF document.

import { readFileSync, writeFileSync } from 'fs'
import {FormData, Blob} from 'formdata-node'

// set-up the API key and the path to the document
const API_KEY = 'XXX-API-KEY-XXX'
const documentPath = './form.pdf'
const documentBuffer = readFileSync(documentPath)

// create a new FormData object and append the file to it
const payload = new FormData()
payload.append('file', new Blob([documentBuffer]), "form.pdf")

/**
 * Detects fields in a programmatic way. Returns a list of all available fields, field types and options (if any).
 */
const detectFields = await fetch('https://api.fileforge.com/pdf/form/detect', {
  method: 'POST',
  body: payload,
  headers: {
    'X-API-Key': API_KEY
  }
})

// get the fields as JSON
const fields = await detectFields.json()

// log the first 20 fields to the console
console.log(fields.slice(0, 20))
Enter fullscreen mode Exit fullscreen mode

Note: The you can get you API_KEY from the Fileforge dashboard.

We obtain the following output in the console:

[
  {
    name: 'undefined',
    required: false,
    readOnly: false,
    locations: [ [Object] ],
    type: 'PDFCheckBox',
    isChecked: false
  },
  {
    name: 'Admissions and Records Enrollment Services at',
    required: false,
    readOnly: false,
    locations: [ [Object] ],
    type: 'PDFTextField',
    isPassword: false,
    isRichFormatted: false,
    isScrollable: true,
    isCombed: false,
    isMultiline: false,
    isFileSelector: false
  },
  {
    name: 'Name',
    required: false,
    readOnly: false,
    locations: [ [Object] ],
    type: 'PDFTextField',
    isPassword: false,
    isRichFormatted: false,
    isScrollable: true,
    isCombed: false,
    isMultiline: false,
    isFileSelector: false
  },
    ...
]
Enter fullscreen mode Exit fullscreen mode

As you can see, the API returns a list of all available fields in the PDF document, along with their types and options. There are different types of fields, such as PDFCheckBox and PDFTextField, each with its own set of properties. You can find the full list of field types and properties in the API documentation.

Also, you can notice that the first field has a name of undefined. This is because the field does not have a name attribute in the PDF document. This is why we developed the 'mark' operation, to help you identify fields that are not named. Let's see how it works.

Marking form fields

Here we will mark the fields with a comment annotation and a green square. This will help you identify fields that are not named in the PDF document.

/**
 * Use this to write a debug PDF with comments highlighting each field name and position. 
 * This is best open in Acrobat Reader. Note: the fields themselves may hide the highlight. 
 * To view the tags anyway, open the comments pane in Acrobat Reader.
 */
const markFields = await fetch('https://api.fileforge.com/pdf/form/mark', {
  method: 'POST',
  body: payload,
  headers: {
    'X-API-Key': API_KEY
  }
})

if (markFields.status !== 201) {
  throw new Error(await markFields.text())
}

// write the debug PDF to disk
writeFileSync('./form-debug.pdf', Buffer.from(await markFields.arrayBuffer()))
Enter fullscreen mode Exit fullscreen mode

The resulting PDF document will look like this:

Image description

As you can see, each field is marked with a comment annotation and a green square. This makes it easy to identify fields that are not named in the PDF document.

Filling form fields

Finally, let's see how you can fill form fields with data programmatically.

payload.append('options', new Blob([JSON.stringify({
  /**
   * The fields to fill. The name must match the field name exactly. There are various options depending on the field type.
   */
  fields: [{
    name: "Name",
    type: "PDFTextField",
    value: "Auguste Lefevre Grunewald"
  }, {
    name: "Office of Public Stewardship",
    type: 'PDFCheckBox',
    checked: true
  }, {
    name: "Phone Number",
    type: "PDFTextField",
    value: "+1 415 123-1234"
  }]
})], {
  type: "application/json"
}))

// fill the form
const filledForm = await fetch('https://api.fileforge.com/pdf/form/fill', {
  method: 'POST',
  body: payload,
  headers: {
    'X-API-Key': API_KEY
  }
})

// check for errors
if (filledForm.status !== 201) {
  throw new Error(await filledForm.text())
}

// write the filled form to disk
writeFileSync('./form-filled.pdf', Buffer.from(await filledForm.arrayBuffer()))
Enter fullscreen mode Exit fullscreen mode

The resulting PDF document will look like this:

Image description

As you can see, the form fields have been filled with the data we provided. This makes it easy to automate the process of filling out forms and reduce the risk of errors.

For other types of fields and options (e.g. checkboxes, radio buttons, dropdowns), please refer to the API documentation.

Conclusion

In this blog post, we've shown you how to use the new Fileforge API endpoints to extract, mark, and fill PDF forms programmatically. This makes it easier than ever to automate your form handling processes and reduce the risk of errors.

We hope you found this article helpful. If you have any questions or feedback, please feel free to reach out to us. We're always looking for ways to improve our API and make it more useful for our users.

Happy coding!

  • Try the new Fileforge API endpoints for form handling here. It's free!
  • Join the Fileforge community on Discord and share your feedback with us.
  • Contribute to the open-source library

This article was originally published on Fileforge's blog

Top comments (0)