DEV Community

Cover image for Highlights: Extract Highlighted Text from Images using OCR and CV in your Browser
Chris Cook
Chris Cook

Posted on

Highlights: Extract Highlighted Text from Images using OCR and CV in your Browser

One of my first posts on DEV.to was about extracting highlighted text from images using OCR and OpenCV with Python. This project was largely aimed at doing a deep dive into Python while solving a challenging problem.

I didn't really take it further despite getting good feedback and interest in the project from people I showed it to. I kept it in the back of my mind with the wish to develop it further, if I ever had the time for it...

Then, a few months ago, I received a call for help on the repo regarding instructions to get it running by someone not so technical. Honestly, installing the right Python dependencies in the right versions was challenging even for me. So, explaining it to someone else was probably hopeless.

Assistance required -- woud love your help!!! #2

I've desperately searched for ability to automatically extract manually highlighted text from photos and was delighted/relieved/excited to find your pyhighlight project. However, I can't seem to get it to work after downloading it and would greatly value your help!! I'm not completely technically illiterate, but I'm not overly familiar with using github and while I understand the fundamentals of coding, I have very limited proficiency in setting up the right environmental conditions to run and troubleshoot raw code like this (I can figure some stuff out by googling, looking at stack-exchange, etc., but sadly that only gets me so far...).

More specifically, after downloading the repository, installing things like pytesseract, opencvv, etc. and trying to run main.py, there seems to be some kind of issue with the very bottom of the code related to the line "parser.add_argument('img_input', type=Path, help="Input image")" and where you call main(args). Some kind of mismatch on the dictionary type I think. Based on random comments from others online, I can prevent the error by changing that line of code to the following instead:

parser.add_argument('--img_input', '--i', required = False, help = "Path to input image", default='/content/pyhighlight-ocr/Source')
parser.add_argument("-f", "--file", required=False)

However, after resolving that (not that I'm confident what I did was correct), I run into the following error, and have very little understanding on how to resolve it.
cv2.error: OpenCV(4.9.0) /Users/runner/work/opencv-python/opencv-python/opencv/modules/imgproc/src/color.cpp:196: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Beyond that, do you have any instructions, a guide, or a help file of some kind that you would be willing/able to share? For example, where do I put the photo files I want to extract the highlighted text so that when I run the main.py file everything works automatically? Do the input files need certain names/file formats? Will this work on more than one photo at a time? Do I need to download any other programs to get this work? Basically, how does one get this code to work, practically speaking???

So, I decided to take this as motivation to make it more accessible by porting the project to JavaScript and making it run in the browser. This was actually easier than I originally thought. The two main dependencies from the Python project, OpenCV for image manipulation and Tesseract for OCR, are both available as JavaScript dependencies. However, making it work well with React was a bit more challenging.

Try it yourself at: zirkelc.github.io/highlights

If you like this project, have ideas for improvements, or want to contribute in any other way, please head over to GitHub, star the repository, and let me know what you think.

A funny remark at the end: the person who originally opened the issue which got me started to take the time to port it over to the browser, unfortunately, never responded to this new project. So, if you are out there and read this: please let me know what you think! 😄

Top comments (0)