Wesley Chun (@wescpy) for Google Workspace Developers

Posted on Mar 5, 2024 • Edited on Feb 13

Export Google Docs as PDF... without the Docs API?!?

#python #node #google #api

TL;DR:

Learn how to export Google Docs as PDF® without the Docs API! To be honest, that's the wrong API anyway. The Docs API is for document editing & formatting while file upload/download, import/export, sharing/permissions, etc., requires the Drive API. You'd think with a common operation like this that Google would have an example in their docs, but nope, so let's do it here & now in this first post of the "RECIPE" series.

Introduction

Are you a developer interested in using Google APIs? You're in the right place as this blog is dedicated to that craft from Python and sometimes Node.js. This post is part of what I call the "RECIPE" series because it highlights one small code sample that does something very specific that people may find useful, starting with Google Docs documents.

Google APIs like Drive, Docs, and Sheets, are Google Workspace (GWS) APIs, so before diving into the rest of this post, familiarize yourself with the security & credentials needed to access such APIs by checking out the separate posts on GWS APIs and OAuth client IDs.

Motivation

I don't know about you, but I find myself exporting Google Docs as PDF files fairly regularly. In the Google Docs editor, this involves pulling down File --> Download --> PDF Document. That works for a single document I'm working on, but this isn't scalable for an annual set of invoices, my students' term papers, or all legal documents pertaining to a court case.

As a software developer, I tend to turn to product APIs to help me do the things that humans shouldn't be doing in front of a UI (user interface), and one of those times is the export (meaning conversion & download) of Google Docs to PDF files when dealing with a massive number of documents. It's a problem literally begging for a programmatic solution.

I went hunting for such an example in Google's documentation, but nope, there's nothing like this in the Drive API docs nor the Docs API docs. While there are a few examples of doing this online, they're either too long or do other things I don't care about, and solutions from both ChatGPT and Gemini use service accounts (opening a Pandora's box which I'll cover in another post), so here we are.

Years ago, I covered how to upload & download files with the Drive API when v3 launched, and working on this post gave me the chance to revisit some of that code, modernize it, and customize it for Docs-to-PDF.

Code samples

As far as exporting goes, both the Docs app/UI and the Google Drive API support a variety of formats, including plain text, PDF, OpenDocument, and Microsoft Office file types. Check out the supported export formats page in the Drive API docs for the comprehensive list.

While we are exporting (again, that's converting and downloading), you can also do straight-up ("blob") downloads using the API as well. Read more about both in the downloads and exports page in the API docs. (If you're also seeking similar info for uploads & imports, I threw the links for both topics in a single SO Q&A.)

The code samples download a fictitious Google Doc called "Merged form letter," assuming you created an army of form letters copied because you were inspired by the mail merge topic I covered awhile back describing how to use the Drive, Sheets, and Docs APIs to accomplish that. And now, you want to export all those Docs as PDF to print en masse or archive into a giant ZIP file to move elsewhere.

Python

Let's start with Python, performing the prerequisites below. If you're new to Google APIs, please review the post series covering OAuth client IDs before proceeding as these are just the steps without much explanation:

Create a new project from the Cloud/developer console or with gcloud projects create . . .; or reuse an existing project
Enable the Drive API from the console or with the gcloud services enable drive.googleapis.com if you haven't already
Create new OAuth client ID & secret credentials and save the file to your local filesystem as client_secret.json.
Install the Google APIs client library for Python: pip install -U google-api-python-client google-auth-httplib2 google-auth-oauthlib (or pip3)

Ok, let's take a look at the Python script which you can also access in the repo, starting with the imports and constants:

from __future__ import print_function
import os

from google.auth.transport.requests import Request
from google.oauth2 import credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient import discovery

SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
CLNT_ID_SCRT = 'client_secret.json'
OAUTH_TOKENS = 'storage.json'
FILENAME = 'Merged form letter'
MIMETYPE = 'application/pdf'

The os module is used for token management, and Python 3's print() function for Python 2 users (ignored by 3.x interpreters). The others bring in the required Google client libraries. The constants include the Drive read-only permission (scope) to request from the end-user, the pair of credential-related files (client ID & secret and OAuth tokens), and the target file metadata (name & export type).

creds = None
if os.path.exists(OAUTH_TOKENS):
    creds = credentials.Credentials.from_authorized_user_file(OAUTH_TOKENS)
if not (creds and creds.valid):
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(CLNT_ID_SCRT, SCOPES)
        creds = flow.run_local_server()
with open(OAUTH_TOKENS, 'w') as token:
    token.write(creds.to_json())
DRIVE = discovery.build('drive', 'v3', credentials=creds)

This block of code is purely for security: Grab any locally-stored credentials. If they exist but expired, use the refresh token to request a new access token. If no credentials exist, create the OAuth flow and run it, prompting the user for the necessary permissions. If the user opted-in, the script now has valid credentials to connect to the Drive API, so save them (again locally) to storage.json so the code doesn't prompt the user to re-auth every time you/they run the script. With valid credentials, create an API client.

res = DRIVE.files().list(q="name='%s'" % FILENAME,
         fields='files(id)', pageSize=1).execute().get('files')
if res:
    print('** Downloading %r' % FILENAME)
    file_id = res[0]['id']
    data = DRIVE.files().export(fileId=file_id, mimeType=MIMETYPE).execute()
    if data:
        with open('%s.pdf' % FILENAME, 'wb') as fh:
            fh.write(data)

With proper credentials and an API client in-hand, it's time to do the real work:

Search for the first match (pageSize=1) in Drive matching the requested filename
Export the resulting file by its Drive file ID
Download and write the binary data locally with a .pdf file extension

This downloads the entire PDF as a single payload. If your Docs are particularly long, consider the googleapiclient.http.MediaIoBaseDownload class to download the PDF in chunks; example usage in the Drive API docs.

Admittedly the code is light on error checking/handling as to focus on the core functionality; do what you need to do. Change the filename (and export type) as desired. Running it as-is with proper permissions given results in not much but enough output:

$ python3 drive_export_doc_pdf.py
** Downloading 'Merged form letter'

Python 2 support

Most of the world is on Python 3 today, but there are still some with dependencies on 2.x that make migration challenging. The Python code samples in this repo are both Python 2-3 compatible, hence why you don't see modern features like async-await and f-strings.

Old Google Auth Python libraries support

Older Python auth libraries, primarily oauth2client, were deprecated in 2017 in favor of modern replacements. However the newer libraries do not support OAuth token storage, hence why current code is slightly longer than the *-old.py sample shown below and in the repo. For now, oauth2client still works, even in maintenance mode, and provides automated, threadsafe, and 2.x/3.x-compatible storage of and access to OAuth2 tokens for users whereas the newer libraries do not.

Like the Python 2 support, I'm providing a Python-equivalent *-old.py using the older auth libraries for those who have dependencies on them and/or still have old code lying around that do. This version has fewer imports:

from __future__ import print_function

from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

The security code has fewer lines because the libraries handle the OAuth token storage:

# check credentials from locally-stored OAuth2 tokens file; either
# refresh expired tokens or run flow to get new pair & create API client
store = file.Storage(OAUTH_TOKENS)
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets(CLNT_ID_SCRT, SCOPES)
    creds = tools.run_flow(flow, store)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

Everything else, the constants declarations, the core functionality, etc., are identical to the contemporary version. One last difference: the libraries you install on your machine or virtualenv:

pip install -U pip google-api-python-client oauth2client (or pip3)

Nothing else changes, and running it results in output identical to the current version. Now let's turn to JavaScript.

Node.js (JavaScript)

Node.js has similar prerequisites as Python:

Create a new project from the Cloud/developer console or with gcloud projects create . . .; or reuse an existing project
Enable the Drive API from the console or with the gcloud services enable drive.googleapis.com if you haven't already
Create new OAuth client ID & secret credentials and save the file to your local filesystem as client_secret.json.
Install the Google APIs client library for Node.js: npm i googleapis @google-cloud/local-auth

Now for the code sample which you can find in the repo:

const fs = require('fs').promises;
const path = require('path');
const process = require('process');
const {authenticate} = require('@google-cloud/local-auth');
const {google} = require('googleapis');

const CLNT_ID_SCRT = path.join(process.cwd(), 'client_secret.json');
const OAUTH_TOKENS = path.join(process.cwd(), 'storage.json');
const SCOPES = ['https://www.googleapis.com/auth/drive.readonly'];
const MIMETYPE = 'application/pdf';
const FILENAME = 'Merged form letter';

Like the Python version, perform the necessary imports on the Node.js side and create constants for the security stuff as well as the target file to download.

async function loadSavedCredentialsIfExist() {
  try {
    const content = await fs.readFile(OAUTH_TOKENS);
    const credentials = JSON.parse(content);
    return google.auth.fromJSON(credentials);
  } catch (err) {
    return null;
  }
}

async function saveCredentials(client) {
  const content = await fs.readFile(CLNT_ID_SCRT);
  const keys = JSON.parse(content);
  const key = keys.installed || keys.web;
  const payload = JSON.stringify({
    type: 'authorized_user',
    client_id: key.client_id,
    client_secret: key.client_secret,
    refresh_token: client.credentials.refresh_token,
    access_token: client.credentials.access_token,
    token_expiry: client.credentials.token_expiry,
    scopes: client.credentials.scopes,
  });
  await fs.writeFile(OAUTH_TOKENS, payload);
}

async function authorize() {
  var client = await loadSavedCredentialsIfExist();
  if (client) return client;
  client = await authenticate({
    scopes: SCOPES,
    keyfilePath: CLNT_ID_SCRT,
  });
  if (client.credentials) await saveCredentials(client);
  return client;
}

This is the chunk of security code, split up into separate functions to:

Load local credentials if they exist
Write (new) credentials locally
Check if user authorization is necessary, and if so, run it (the "OAuth flow")

async function exportDocAsPDF(authClient) {
  const drive = google.drive({version: 'v3', auth: authClient});
  let res = await drive.files.list({
    q: `name="${FILENAME}"`,
    fields: 'files(id)',
    pageSize: 1
  });
  const file = res.data.files[0];
  console.log(`** Downloading '${FILENAME}'`);
  const destPath = path.join(process.cwd(), `${FILENAME}.pdf`);
  const fh = await fs.open(destPath, 'w');
  const dest = fh.createWriteStream();
  res = await drive.files.export({
    fileId: file.id,
    mimeType: MIMETYPE,
  }, {responseType: 'stream'});
  await res.data.pipe(dest);
}

authorize().then(exportDocAsPDF).catch(console.error);

This is the key function that does all the "real" work, querying for the target file, and exporting the first matching result. Finally, "main" is at the bottom, chaining together the Promises of checking user authorization and performing the primary function, sending any errors to the console. The output is identical to the Python versions:

$ node drive_export_doc_pdf.js
** Downloading 'Merged form letter'

If you're looking for a more modern ECMAScript module, here's the equivalent import section in the .mjs version, also available in the repo (everything else is identical to the CommonJS version):

import fs from 'node:fs/promises';
import path from 'node:path';
import process from 'node:process';
import {authenticate} from '@google-cloud/local-auth';
import {google} from 'googleapis';

Its output is also identical to the others.

Summary

This wraps-up today's post showing developers how to export Google Docs as PDF using the Google Drive API, demonstrating short code samples of doing so using Python and Node.js. Feel free to modify it for your own purposes and see how it can help you in your projects today. The next set of related posts will probably include uploading & importing and other Drive API features. I hope you find these Python and Node.js samples useful. Got a topic you want me to cover in the future or found an error in the post? Drop a note in the comments below!

DEV Community

Export Google Docs as PDF... without the Docs API?!?

TL;DR:

Introduction

Motivation

Code samples

Python

Python 2 support

Old Google Auth Python libraries support

Node.js (JavaScript)

Summary

References

Blog post code samples

Google APIs client libraries

Google Drive API

Related GWS APIs and other API information

Other relevant content by the author

Top comments (0)

Read next

Why Is Spark Slow??

Sherlock Holmes: The Case Of Redis Overload During a DDoS Attack

Optimizing AWS Lambda Performance with Node.js: Minimizing Cold Start Latency

Wisper, ffmpeg을 활용한 비디오 자막 자동 생성