DEV Community

Cover image for ๐Ÿ“โœจClearText
Ajinkya Bobade
Ajinkya Bobade

Posted on

๐Ÿ“โœจClearText

This is a submission for the GitHub Copilot Challenge : Transitions and Transformations

What I Built

I have built "ClearText" which is an AI-powered text detection and enhancement tool that makes text in images cleaner.

Title bar

It's Perfect For ๐ŸŽฏ

  • ๐Ÿ“„ Document Digitization
  • ๐Ÿ“š Book Scanning
  • ๐Ÿ“ฑ Mobile Photos of Text
  • ๐Ÿ–จ๏ธ Improving Scanned Documents
  • ๐Ÿ“‘ Text Enhancement in Images

Demo

ClearText Demo

Repo

Github Repository - ClearText

Here's an example of what ClearText can do:

ClearText Demo

Image description

ClearText takes input image (left hand side), removes all noise and outputs pure text (right hand side).

ClearText has a huge potential where it can be used in the following fields:

Document Processing ๐Ÿ“„

  • Banking & Finance
    • ๐Ÿฆ Check processing
    • ๐Ÿ“Š Financial statement digitization

Healthcare ๐Ÿฅ

  • Medical Records
    • ๐Ÿ“‹ Patient records digitization
    • ๐Ÿ”ฌ Lab report enhancement

Legal Industry โš–๏ธ

  • Document Management
    • ๐Ÿ“œ Contract digitization
    • ๐Ÿ—„๏ธ Case file processing

Academic Use Cases ๐Ÿ“š

  • ๐Ÿ“– Textbook scanning
  • ๐Ÿ“‘ Research paper digitization

Copilot Experience ๐Ÿค–

I used co-pilot extensively to complete this amazing project. Here are the ways in which co-pilot helped me :

Code Completion ๐Ÿ“

  • Auto-completed common OpenCV operations
  • Suggested image processing parameters
  • Completed function signatures for Streamlit components

Chat Assistance ๐Ÿ’ฌ

  • Debugged ONNX model loading issues
  • Explained image processing pipeline
  • Suggested optimizations for image transformations

Inline Suggestions โšก

  • Recommended error handling patterns
  • Suggested variable names and types

Model Switching ๐Ÿ”„

Used different models for specific tasks:

  • Code Completion: GitHub Copilot
  • Documentation: Claude
  • Debugging: GPT-4

Common Prompts Used ๐ŸŽฏ

# Function implementation
/explain image processing pipeline
/suggest error handling
/optimize performance
Enter fullscreen mode Exit fullscreen mode

Code Edits โœ๏ธ

  • Refactored image processing functions
  • Added blur/no-blur options
  • Improved error messages
  • Enhanced documentation

Project Evolution & Contributions

Building on Open Source

This project builds upon the excellent CRAFT text detection model by CLOVA AI Research, while making significant architectural and functional improvements:

1. Production-Ready Architecture ๐Ÿ—๏ธ

  • I converted the research-focused PyTorch model to production-ready ONNX format
  • Leveraged ONNX Runtime for optimized inference across different hardware
  • Added complete Docker containerization for reliable deployment

2. Enhanced Text Processing Pipeline ๐Ÿ”„

The original CRAFT model provides basic text detection. ClearText significantly expands on this by:

  • Adding custom image preprocessing for better text clarity
  • Implementing new post-processing transforms for enhanced output quality
  • Creating an entirely new text enhancement pipeline
  • Developing a user-friendly web interface for easy interaction

3. Major Output Improvements ๐Ÿ“ˆ

ClearText transforms the basic text detection output into a comprehensive text enhancement solution:

  • Original CRAFT: Basic text region detection
  • ClearText Additions:
    • Text clarity enhancement
    • Document digitization capabilities
    • Support for various document types (books, mobile photos, scanned documents)
    • Complete image processing pipeline

Transparency Statement

While this project builds upon CRAFT's foundational text detection capabilities, ClearText represents a significant evolution with entirely new functionality, architecture, and use cases. All original CRAFT code is properly credited and licensed under MIT License.

Conclusion

Developing ClearText during the GitHub Copilot 1-Day Build Challenge has been an amazing journey. Without co-pilot, transforming complex text detection model into an accessible, user-friendly web application would have been tremendously difficult. The project showcases how AI can bridge the gap between computer vision and practical, everyday use cases.

Top comments (0)