A sample application showing Docling implementation, exposing as a REST service, using FastPI in Python.
Introduction
This is yet another request for a project (much simplified) to produce a REST application based on Docling document conversion from PDF to markdown using FastAPI.
First things first, what is Docling and what does it do if you haven’t heard of it 😲😳
What is Docling
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
Docling Features
- 🗂️ Parsing of multiple document formats incl. PDF, DOCX, XLSX, HTML, images, and more
- 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
- 🧬 Unified, expressive DoclingDocument representation format
- ↪️ Various export formats and options, including Markdown, HTML, and lossless JSON
- 🔒 Local execution capabilities for sensitive data and air-gapped environments
- 🤖 Plug-and-play integrations incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
- 🔍 Extensive OCR support for scanned PDFs and images
- 💻 Simple and convenient CLI
Implementation
The first sample application is meant to be used on a CPU based platform.
The recommended steps are shown below.
- Create a venv.
#macos/linux version
python3.11 -m venv myenv
source myenv/bin/activate
- Install the required packages and dependencies.
pip install fastapi uvicorn docling python-multipart
- Write the code! 😊
from fastapi import FastAPI, UploadFile, File, HTTPException
from docling.document_converter import DocumentConverter, ConverterConfig
from pathlib import Path
app = FastAPI()
@app.post("/convert_pdf_to_markdown/")
async def convert_pdf_to_markdown(file: UploadFile = File(...)):
if file.content_type != "application/pdf":
raise HTTPException(status_code=400, detail="Invalid file type. Only PDF files are allowed.")
try:
temp_file_path = Path(f"./temp_{file.filename}")
with open(temp_file_path, "wb") as f:
f.write(await file.read())
config = ConverterConfig(ocr_backend="default", layout_backend="default")
converter = DocumentConverter(config=config)
result = converter.convert(str(temp_file_path))
markdown_output = result.document.export_to_markdown()
temp_file_path.unlink()
return {"markdown": markdown_output}
except Exception as e:
if temp_file_path.exists():
temp_file_path.unlink()
raise HTTPException(status_code=500, detail=f"An error occurred during conversion: {str(e)}")
- Run the code in a Terminal.
# you can change the defauly port number from 8000 to another one
# if already in use
uvicorn main:app --port 8080
- Test your code in another Terminal (duh 🤓)
curl -X POST -F "file=@./docker-commands.pdf" http://127.0.0.1:8080/convert_pdf_to_markdown/
- Sample output.
{"markdown":"## All Docker Commands\n\nHere's a comprehensive list of commonly used Docker commands, along with their usage:\n\n## Basic Docker Commands\n\n- 1. ocker version d Displays Docker version information. :\n- 2. ocker info d Provides detailed information about the Docker installation. :\n- 3. ocker --help d Lists all available Docker commands and options. :\n\n## mage Management Commands I\n\n- 1. ocker pull <image> d Downloads an image from a Docker registry (e.g., Docker : Hub).\n- ○ Example: ocker pull nginx d 2. ocker images d Lists all Docker images available on the system. : ○ Example: ocker images d 3. ocker rmi <image> d Deletes a Docker image from the system. : ○ Example: ocker rmi nginx d 4. ocker build -t <name> <path> d Builds a Docker image from a Dockerfile. : ○ Example: ocker build -t myapp:latest . d 5. ocker tag <source\\_image> <target\\_image> d Tags an image with a new : name. ○ Example: ocker tag nginx:latest myrepo/nginx:v1 d 6. ocker save -o <file> <image> d Saves an image to a tar archive. : ○ Example: ocker save -o nginx.tar nginx:latest d 7. ocker load -i <file> d Loads an image from a tar archive. :\n- ocker load -i nginx.tar\n- ○ Example: d\n\n## Container Management Commands\n\n- 1. ocker run <image> d Creates and starts a new container from an image. :\n- ○ Example: d\n- 2. ocker run -d <image> d : background).\n- ○ Example: d\n- 3. ocker run -it <image> d Runs a container interactively with a terminal. :\n- ○ Example: d\n- 4. ocker ps d :\n- ○ Example: d\n- 5. ocker ps -a d :\n- ○ Example: ocker ps -a d\n- 6. ocker stop <container> d Stops a running container. :\n- ○ Example: ocker stop my\\_container d\n- 7. ocker start <container> d Starts a stopped container. :\n- ○ Example: ocker start my\\_container d\n- 8. ocker restart <container> d Restarts a container. :\n- ○ Example: ocker restart my\\_container d\n- 9. ocker rm <container> d Deletes a stopped container. :\n- ○ Example: ocker rm my\\_container d\n- 10. ocker exec -it <container> <command> d Executes a command in a : unning container. r\n- ○ Example: ocker exec -it my\\_container bash d\n- 1. 1 ocker logs <container> d Displays logs from a container. :\n- ○ Example: ocker logs my\\_container d\n- 12. ocker attach <container> d Attaches to a running container's console. :\n- ○ Example: ocker attach my\\_container d\n- 13.\n- ocker kill <container> d Forcefully stops a container. :\n- ○ Example: ocker kill my\\_container d\n\n```
\nocker run nginx Runs a container in detached mode (in the ocker run -d nginx ocker run -it ubuntu bash Lists all running containers. ocker ps Lists all containers, including stopped ones.\n
```\n\n## Container Networking Commands\n\n- 1. ocker network ls d Lists all Docker networks. :\n- ○ Example: ocker network ls d\n- 2. ocker network create <name> d Creates a new Docker network. :\n- ○ Example: ocker network create my\\_network d\n- 3. ocker network rm <name> d Deletes a Docker network. :\n- ○ Example: ocker network rm my\\_network d\n- 4. ocker network connect <network> <container> d Connects a container to : a network.\n- ○ Example: ocker network connect my\\_network my\\_container d\n- 5. ocker network disconnect <network> <container> d Disconnects a : ontainer from a network. c\n- ○ Example: ocker network disconnect my\\_network my\\_container d\n\n## Volume Management Commands\n\n- 1. ocker volume ls d Lists all Docker volumes. :\n- ○ Example: ocker volume ls d\n- 2. ocker volume create <name> d Creates a new Docker volume. :\n- ○ Example: ocker volume create my\\_volume d\n- 3. ocker volume rm <name> d Deletes a Docker volume. :\n- ○ Example: ocker volume rm my\\_volume d\n- 4. ocker volume inspect <name> d Displays detailed information about a volume. :\n- ○ Example: ocker volume inspect my\\_volume d 5. ocker run -v <volume>:/path/in/container <image> d Mounts a volume : nto a container. i ○ Example: ocker run -v my\\_volume:/data nginx d\n\n## Dockerfile Commands\n\n- 1. ocker build -f <Dockerfile> d Builds an image from a specific Dockerfile. : ○ Example: ocker build -f Dockerfile . d\n\n## Docker Compose Commands\n\n- 1. ocker-compose up d Starts containers defined in a : ocker-compose.yml d ile. f\n- ocker-compose up\n- ○ Example: d\n- 2. ocker-compose down d Stops and removes containers, networks, and volumes : reated by c ocker-compose up d .\n- ○ Example: ocker-compose down d\n- 3. ocker-compose ps d Lists containers created by Docker Compose. :\n- ○ Example: ocker-compose ps d\n- 4. ocker-compose logs d Shows logs for containers managed by Docker Compose. :\n- ○ Example: ocker-compose logs d\n- 5. ocker-compose build d Builds or rebuilds services defined in a Compose file. :\n- ○ Example: ocker-compose build d\n\n## mage and Container Inspection I\n\n- 1. ocker inspect <container\\_or\\_image> d Returns low-level information about : a container or image.\n- ○ Example: ocker inspect my\\_container d\n- 2. ocker top <container> d Displays running processes in a container. :\n- ○ Example: ocker top my\\_container d\n- 3. ocker stats d Displays resource usage statistics of running containers. :\n- ○ Example: ocker stats d\n\n## System Cleanup Commands\n\n- 1. ocker system df d :\n- ○ Example: d\n- 2. ocker system prune d : networks, dangling images).\n- ○ Example: d\n- 3. ocker image prune d :\n- ○ Example: d\n- 4. ocker container prune d Removes all stopped containers. :\n- ○\n- Example: d\n\n```
\nDisplays information about disk usage by Docker. ocker system df Removes unused data (stopped containers, unused ocker system prune Removes unused and dangling images. ocker image prune\n
```\n\nocker container prune\n\n## Other Commands\n\n- 1. ocker commit <container> <image> d Creates a new image from a : ontainer's changes. c\n- ○ Example: ocker commit my\\_container my\\_image d\n- 2. ocker export <container> d Exports a container's filesystem to a tar archive. :\n- ○\n- Example: ocker export my\\_container > container.tar d\n- 3. ocker import <file> d Imports a tarball to create an image. :\n- ○ Example: ocker import container.tar my\\_imported\\_image d\n\nThese Docker commands cover the most common activities when working with Docker, anging from managing containers, images, and volumes to orchestrating multi-container r applications with Docker Compose. Mastering these commands helps in efficiently creating, deploying, and managing containerized applications."}%
Et voilà!
Conclusion
This article showed a simple REST implementation of Docling conversion capacities.
Stay tuned for more stories 🔜
Useful links
- Docling: https://github.com/DS4SD/docling
- Docling examples and documentation: https://ds4sd.github.io/docling/
Top comments (0)