DEV Community

Cover image for The Struggle of Finding a Free Excel to PDF Converter: My Journey and Solution
Weerayut Teja
Weerayut Teja

Posted on

The Struggle of Finding a Free Excel to PDF Converter: My Journey and Solution

Converting Excel files to PDF is a common task in many projects, whether for generating reports, sharing data, or creating documents. Like many developers, I initially believed this would be an easy task to automate. However, my search for a free, reliable solution turned into a frustrating journey filled with limitations, compatibility issues, and expensive tools.

Here’s how I overcame these challenges, built my own Excel-to-PDF converter, and made it available as an open-source tool for others who may be struggling like I did.


The Frustration

Commercial Tools

My initial search brought me to paid solutions like Aspose.Cells, Syncfusion, and others. While they offered robust features, they came with steep licensing costs—well beyond what I could justify for small or personal projects.

Online Services

Free online converters seemed like a promising alternative, but they were unsuitable for automation. These tools often raised privacy concerns (since files are uploaded to third-party servers), had file size limits, and didn’t provide programmatic APIs.

Open-Source Libraries

I also explored open-source libraries, but most lacked the ability to convert Excel files to PDF. Even those that did were either unreliable or didn’t support modern Microsoft Office formats.


Discovering LibreOffice in Headless Mode

After weeks of searching, I stumbled upon the idea of using LibreOffice in headless mode. LibreOffice is a free, open-source office suite that can convert various file formats, including Excel, to PDF. When run in headless mode, it operates via the command line, making it perfect for automation.


How My Solution Works

To make this approach developer-friendly, I built a lightweight Go-based HTTP server that acts as a REST API. This server wraps LibreOffice’s functionality and allows any programming language to interact with it via HTTP requests.

Key Features

  1. Multiple File Format Support: Supports .xlsx, .xls, .csv, .docx, .pptx, and more.
  2. Automatic Cleanup: Temporary files are automatically deleted after one hour to save disk space.
  3. Custom Fonts: You can mount custom fonts by cloning the GitHub repository or using Docker volumes.
  4. Cross-Language Integration: Works with any programming language that supports HTTP.

The Temporary Directory Approach

Instead of relying on the system’s temporary directory, I opted to use a custom ./tmp directory. This ensures consistent behavior, as system temp directories sometimes have unpredictable permissions.


Implementation Details

How It Works

  1. File Upload: Clients upload an Excel file via the /convert endpoint using a POST request.
  2. Temporary Storage: The server saves the file in the ./tmp directory with a timestamp-based filename.
  3. Conversion: LibreOffice is called in headless mode to convert the file to PDF and save the result in the same directory.
  4. File Cleanup: A background goroutine deletes files older than one hour.
  5. Response: The converted PDF is returned as the HTTP response.

Getting Started

GitHub Repository

You can find the source code at https://github.com/wteja/pdf-converter.

Docker Image

The project is also available as a Docker image: wteja/pdf-converter.

Running the Docker Container

docker pull wteja/pdf-converter
docker run -p 5000:5000 wteja/pdf-converter
Enter fullscreen mode Exit fullscreen mode

Examples of Integrating with Other Languages

Since the service is exposed via HTTP, you can use any programming language to interact with it.

C#

var client = new HttpClient();
var fileContent = new ByteArrayContent(File.ReadAllBytes("example.xlsx"));
var formData = new MultipartFormDataContent { { fileContent, "file", "example.xlsx" } };

var response = await client.PostAsync("http://localhost:5000/convert", formData);
var pdfBytes = await response.Content.ReadAsByteArrayAsync();
File.WriteAllBytes("output.pdf", pdfBytes);
Enter fullscreen mode Exit fullscreen mode

Node.js

const axios = require("axios");
const FormData = require("form-data");
const fs = require("fs");

const form = new FormData();
form.append("file", fs.createReadStream("example.xlsx"));

axios.post("http://localhost:5000/convert", form, { headers: form.getHeaders() })
  .then(response => fs.writeFileSync("output.pdf", response.data))
  .catch(console.error);
Enter fullscreen mode Exit fullscreen mode

Python

import requests

with open("example.xlsx", "rb") as f:
    response = requests.post("http://localhost:5000/convert", files={"file": f})

with open("output.pdf", "wb") as f:
    f.write(response.content)
Enter fullscreen mode Exit fullscreen mode

Go

package main

import (
    "bytes"
    "io"
    "mime/multipart"
    "net/http"
    "os"
)

func main() {
    file, _ := os.Open("example.xlsx")
    defer file.Close()

    body := &bytes.Buffer{}
    writer := multipart.NewWriter(body)
    part, _ := writer.CreateFormFile("file", "example.xlsx")
    io.Copy(part, file)
    writer.Close()

    req, _ := http.NewRequest("POST", "http://localhost:5000/convert", body)
    req.Header.Set("Content-Type", writer.FormDataContentType())

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()

    out, _ := os.Create("output.pdf")
    defer out.Close()
    io.Copy(out, resp.Body)
}
Enter fullscreen mode Exit fullscreen mode

Challenges and Trade-Offs

Image Size

The Docker image is 2.67 GB due to the dependencies required by LibreOffice. While I tested smaller images like Alpine, they shipped with an older version of LibreOffice that wasn’t compatible with modern Microsoft Office formats. Debian, although offering the latest LibreOffice, resulted in an even larger image (~3 GB).

Why It’s Worth It

The large image size is a reasonable trade-off when compared to the cost of commercial solutions. Once set up, the image can be reused across multiple projects without any additional licensing fees.


Conclusion

The frustration of finding a free Excel-to-PDF converter led me to build my own solution using LibreOffice in headless mode. While it’s not perfect, it’s free, reliable, and flexible. If you’re facing the same challenge, I hope this project saves you time and effort.

Check out the project on GitHub or pull the Docker image from Docker Hub. Let me know how it works for you or if you have suggestions for improvement.

Top comments (0)