Overhead OCR Document Capture System

52 Views, 1 Favorites, 0 Comments

Overhead OCR Document Capture System

Optical Character Recognition (OCR) systems allow printed text to be converted into digital text that can be searched, edited, and stored electronically. Many OCR systems require consistent lighting and stable image capture in order to produce accurate results. To achieve this, a controlled imaging setup is often necessary.

This project demonstrates how to build a low-cost OCR document capture system using a one-sided enclosure and an overhead webcam mount. The enclosure stabilizes the document while the camera captures images from a fixed position directly above the page. This configuration improves image consistency and helps OCR software accurately detect characters.

In this guide, you will learn how to design and construct a one-sided OCR imaging box, mount a webcam in an overhanging position, and prepare the system for document scanning. The result is a simple but effective document capture rig that can be used for digitizing printed materials or experimenting with OCR technology.

Supplies

Materials

Popsicle sticks (for building the frame and camera mount)
KTOJOY 200 Pcs Craft Sticks Ice Cream Sticks Natural Wood Popsicle Craft Sticks 4.5 inch ~ $5.00
Webcam (used to capture images of the document)
Logitech C270 HD Webcam, 720p, Widescreen HD ~ $16.00

Tools

Hot glue gun (to assemble the structure)
COMTO Hot Glue Gun Kit with 30 Sticks, 20W Fast Preheating Mini Hot Glue Gun ~ $7.00
Personal laptop or computer (to connect the webcam and run OCR software)

Optional but helpful:

Printed paper for testing
Desk lamp for better lighting

Designing the OCR Imaging Structure

Before building, plan the layout of your OCR box. The goal is to keep the document flat while positioning the webcam at a fixed height directly above the page for consistent, distortion-free image capture.

This design uses a single vertical side wall attached to the base. From this wall, an overhanging arm extends over the platform to hold the webcam. The camera should point straight downward so the entire document fits within the frame.

The base acts as a stable document platform, keeping papers from moving during capture. Proper positioning of the wall and arm ensures the webcam remains steady, which is critical for accurate OCR results. Sketching the structure and measuring distances based on your paper size will make assembly easier and improve scanning quality.

Building the OCR Box

Now it’s time to assemble the structure. Start by creating the base platform using popsicle sticks glued together with hot glue. This will hold the document flat during scanning.

Next, attach the single vertical side wall to one edge of the base. Make sure it is perpendicular to the base and firmly secured—this wall will support the overhanging webcam arm.

Finally, construct the overhanging arm using popsicle sticks and attach it to the vertical wall. Position the arm so the webcam will hang directly above the center of the document platform. Check that the structure is stable and that the arm can hold the webcam without sagging.

💡 Tip: Test the balance by temporarily placing the webcam on the arm before the glue fully sets to ensure it is positioned correctly.

Connecting the Webcam and Running OCR

With the physical box complete, the next step is to connect your webcam to a computer and run OCR software. The code captures an image from the webcam positioned above the document and then processes it to extract text using Optical Character Recognition (OCR).

This script ensures that every document scanned in the box is converted into digital text, making it searchable and editable. It also allows you to quickly test the positioning, lighting, and stability of the webcam before scanning multiple pages.

💡 Tip: Make sure the webcam is stable on the overhanging mount and the document is well-lit for the best OCR accuracy.

import cv2

import pytesseract

import pyttsx3

import numpy as np

from PIL import Image

import time

# Set Tesseract executable path

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Initialize text-to-speech engine

engine = pyttsx3.init()

engine.setProperty("rate", 150)

def find_usb_camera():

"""Find available USB camera automatically"""

for index in range(5): # Check indices 0-4

cap = cv2.VideoCapture(index, cv2.CAP_DSHOW) # Use DirectShow backend

if cap.isOpened():

# Test if camera is readable

ret, frame = cap.read()

if ret and frame is not None:

print(f"✓ Camera found at index {index}")

cap.release()

return index

cap.release()

return None

def preprocess_image(frame):

"""Advanced image preprocessing for better OCR accuracy"""

# Convert to grayscale

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Apply bilateral filter for noise reduction while preserving edges

gray = cv2.bilateralFilter(gray, 11, 17, 17)

# Apply Gaussian blur to reduce noise

gray = cv2.GaussianBlur(gray, (5, 5), 0)

# Apply adaptive thresholding for better contrast (FIXED)

thresh = cv2.adaptiveThreshold(gray, 255,

cv2.ADAPTIVE_THRESH_GAUSSIAN_C,

cv2.THRESH_BINARY_INV, 11, 2)

# Morphological operations to enhance text

kernel = np.ones((2, 2), np.uint8)

thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

return thresh

def extract_text(image):

"""Extract text with optimized Tesseract configuration"""

custom_config = r'--oem 3 --psm 6'

text = pytesseract.image_to_string(image, config=custom_config)

return text

def improve_text(text):

"""Clean and improve text quality"""

lines = [line.strip() for line in text.split('\n') if line.strip()]

cleaned_text = '\n'.join(lines)

return cleaned_text

# Find USB camera automatically

print("=" * 50)

print("USB CAMERA TEXT SCANNER")

print("=" * 50)

print("Searching for USB camera...")

camera_index = find_usb_camera()

if camera_index is None:

print("✗ No USB camera found. Trying default camera...")

camera_index = 0

# Initialize USB webcam with optimal settings

cap = cv2.VideoCapture(camera_index, cv2.CAP_DSHOW)

# Set higher resolution

cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)

cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

cap.set(cv2.CAP_PROP_FPS, 30)

if not cap.isOpened():

print("✗ Error: Could not open camera")

exit()

# Verify camera is working

ret, test_frame = cap.read()

if not ret or test_frame is None:

print("✗ Error: Camera not returning frames")

cap.release()

exit()

print(f"✓ USB Camera connected successfully (Index: {camera_index})")

print(f" Resolution: {int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))}x{int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))}")

print("=" * 50)

print("CONTROLS:")

print(" [S] - Scan text")

print(" [Q] - Quit")

print("=" * 50)

# Create window and move it to visible position

cv2.namedWindow("USB Camera - Text Scanner", cv2.WINDOW_NORMAL)

cv2.moveWindow("USB Camera - Text Scanner", 100, 100)

# Warm up camera

print("Warming up camera...")

for i in range(5):

ret, frame = cap.read()

if ret:

print(f" Frame {i + 1}/5 received")

time.sleep(0.1)

print("✓ Camera ready! Window should be visible now.")

print("=" * 50)

while True:

ret, frame = cap.read()

if not ret:

print("✗ Error: Lost camera connection")

break

if frame is None:

print("✗ Error: Empty frame received")

continue

# Create a copy for display

display_frame = frame.copy()

# Add text overlay

cv2.putText(display_frame, "USB Camera - Text Scanner", (10, 30),

cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

cv2.putText(display_frame, "Press 'S' to scan | 'Q' to quit", (10, 70),

cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

cv2.putText(display_frame, "Make sure text is visible and well-lit", (10, 110),

cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 255), 1)

# Show frame

cv2.imshow("USB Camera - Text Scanner", display_frame)

# Check for key press

key = cv2.waitKey(1) & 0xFF

if key == ord('s'):

print("\n" + "=" * 50)

print("SCANNING...")

print("=" * 50)

# Capture current frame

ret, scan_frame = cap.read()

if not ret or scan_frame is None:

print("✗ Error: Could not capture frame for scanning")

continue

# Save original frame for reference

cv2.imwrite("scanned_frame_original.jpg", scan_frame)

print("✓ Original frame saved as 'scanned_frame_original.jpg'")

# Preprocess image

processed_frame = preprocess_image(scan_frame)

cv2.imwrite("scanned_frame_processed.jpg", processed_frame)

print("✓ Processed frame saved as 'scanned_frame_processed.jpg'")

# Extract text

raw_text = extract_text(processed_frame)

cleaned_text = improve_text(raw_text)

print("\n----- EXTRACTED TEXT -----")

if cleaned_text.strip():

print(cleaned_text)

print("--------------------------")

print(f"✓ Text detected! ({len(cleaned_text)} characters)")

# Speak the text

engine.say("Text detected")

engine.runAndWait()

# Save to file

with open("scanned_text.txt", "w") as f:

f.write(cleaned_text)

print("✓ Text saved to 'scanned_text.txt'")

else:

print("[No readable text found]")

print("--------------------------")

print("Tips for better results:")

print(" - Ensure good lighting")

print(" - Hold camera steady")

print(" - Make sure text is in focus")

print(" - Keep text within the frame")

engine.say("No readable text found")

engine.runAndWait()

print("=" * 50)

elif key == ord('q'):

print("\nQuitting...")

break

# Cleanup

cap.release()

cv2.destroyAllWindows()

print("✓ Camera released. Window closed.")

print("✓ Program closed successfully.")

Testing and Optimizing Your OCR Box

Now that the box is built and the code is ready, it’s time to test the system. Place a document on the base platform, turn on the webcam feed, and run your OCR script. Check that the camera captures the entire page and that the text is correctly recognized.

If the results aren’t perfect, adjust the setup:

Camera height or angle – make sure it points straight down and is centered.
Lighting – add or reposition lights to reduce shadows or glare.
Document placement – ensure the paper is flat against the base.

Once optimized, your OCR box is ready for reliable document scanning. You can expand the system later with improvements like automated capture, higher-resolution webcams, or multiple lighting sources for even better accuracy.