Introduction
Data entry is a crucial but often time-consuming task. Instead of manually entering data into Excel, we can use Python and Pandas to speed up the process significantly. While some basic Python knowledge is required, don't worry—most of it is just plain English reasoning.
In this blog, we'll explore different methods of data entry based on the format of the source data, such as physical documents, images, PDFs, and more.
Methods of Data Entry
There are two main approaches to data entry:
1. Automated Data Entry: Using OCR & Pandas in Python
If the data is in image or scanned document format, we can extract text using OCR (Optical Character Recognition) and process it with Pandas.
Example: Extracting Data from an Image
We'll use the EasyOCR library to extract text from an image file.
import easyocr
# Creating a reader that processes English text
reader = easyocr.Reader(['en'])
# Reading text from the image
result = reader.readtext('small_business_loan.png', detail=0)
for text in result:
print(text)
Output (Example OCR Result):
Table B: Value of Small Business Loans Outstanding by Depository Lender Size, 2016 to 2020
Billions of Dollars, Nominal
Percentage Change,
Lenders by Total Asset Size
...
As you can see, OCR might introduce errors in the extracted text. Rather than relying entirely on AI to correct it, a manual review and correction step is often necessary. Once corrected, we can save the text in a structured format like a CSV file.
Converting OCR Output to a CSV File
Once we have a cleaned text file, we can structure it into a CSV format for easy import into Excel.
import pandas as pd
from io import StringIO
if __name__ == "__main__":
with open("output.txt", "r") as f:
data = f.read()
data = data.strip() # Remove trailing newlines
data_lines = data.split("\n")
new_text = []
col_count = 0
for text in data_lines:
new_text.append(text)
if col_count != 6:
col_count += 1
new_text.append(",")
else:
col_count = 0
new_text.append("\n")
csv_content = "".join(new_text)
data_csv_file = StringIO(csv_content)
df = pd.read_csv(data_csv_file)
print(df)
Saving Data to Excel Format
To store this structured data in an Excel file, simply use:
df.to_excel("business_loans.xlsx", sheet_name="Business Loans", index=False)
2. Manual Data Entry: Why CSV is Faster Than Excel
Manual data entry in Excel requires frequent switching between cells using the mouse or arrow keys, which can slow you down. If you're entering data row-wise (e.g., filling out a loan application form), Excel forces you to press Tab or Arrow keys to navigate between columns.
A faster alternative is to enter data into a CSV file using only the keyboard. Commas (,) separate values, making it easier to stay focused without switching keys frequently.
Example: Small Business Loan Data in CSV Format
Lenders by Total Asset Size,2016,2017,2018,2019,2020,Percentage Change (2019-2020)
Less than $100 million,11.5,10.4,10,8.5,7.4,-13.4
$100 million to $499.9 million,96,90.5,86.4,82.2,87.4,6.3
$500 million to $999.9 million,52.8,55.6,54,51.9,70.7,36.1
$1 billion to $9.9 billion,139.5,133,127,129.9,215.4,65.8
$10 million to $49.9 billion,73.5,85.6,106.9,114.5,162,41.5
$50 billion or more,240.5,243.7,248.2,257.4,352.4,36.9
This structured format allows for quick keyboard entry without mouse interaction.
Converting CSV to Excel Using Python
Once the data is entered in CSV format, we can easily convert it into an Excel file using Pandas
import pandas as pd
df = pd.read_csv("small_business_loans.csv")
df.to_excel("small_business_loans.xlsx", index=False)
Conclusion: Choosing the Right Data Entry Method
To speed up data entry, choose the most efficient method based on your data format:
If the data is in an image/PDF → Use OCR + Python for automation.
If the data is already in text format → Convert it directly into CSV.
If manual entry is necessary → Use CSV format instead of Excel to speed up typing.
By using these techniques, you can automate and streamline data entry, saving time and effort while ensuring accuracy. 🚀
Top comments (0)