Overhead OCR Document Capture System

by VedantBuilds in Design > 3D Design

52 Views, 1 Favorites, 0 Comments

Overhead OCR Document Capture System

Overhead OCR Document Capture System
Screenshot 2026-03-11 185242.png

Optical Character Recognition (OCR) systems allow printed text to be converted into digital text that can be searched, edited, and stored electronically. Many OCR systems require consistent lighting and stable image capture in order to produce accurate results. To achieve this, a controlled imaging setup is often necessary.

This project demonstrates how to build a low-cost OCR document capture system using a one-sided enclosure and an overhead webcam mount. The enclosure stabilizes the document while the camera captures images from a fixed position directly above the page. This configuration improves image consistency and helps OCR software accurately detect characters.

In this guide, you will learn how to design and construct a one-sided OCR imaging box, mount a webcam in an overhanging position, and prepare the system for document scanning. The result is a simple but effective document capture rig that can be used for digitizing printed materials or experimenting with OCR technology.

Supplies

Screenshot 2026-03-11 185002.png
Screenshot 2026-03-11 185159.png
Screenshot 2026-03-11 185129.png
Screenshot 2026-03-11 185053.png

Materials

  1. Popsicle sticks (for building the frame and camera mount)
  2. KTOJOY 200 Pcs Craft Sticks Ice Cream Sticks Natural Wood Popsicle Craft Sticks 4.5 inch ~ $5.00
  3. Webcam (used to capture images of the document)
  4. Logitech C270 HD Webcam, 720p, Widescreen HD ~ $16.00

Tools

  1. Hot glue gun (to assemble the structure)
  2. COMTO Hot Glue Gun Kit with 30 Sticks, 20W Fast Preheating Mini Hot Glue Gun ~ $7.00
  3. Personal laptop or computer (to connect the webcam and run OCR software)

Optional but helpful:

  1. Printed paper for testing
  2. Desk lamp for better lighting

Designing the OCR Imaging Structure

IMG_4899.jpeg

Before building, plan the layout of your OCR box. The goal is to keep the document flat while positioning the webcam at a fixed height directly above the page for consistent, distortion-free image capture.

This design uses a single vertical side wall attached to the base. From this wall, an overhanging arm extends over the platform to hold the webcam. The camera should point straight downward so the entire document fits within the frame.

The base acts as a stable document platform, keeping papers from moving during capture. Proper positioning of the wall and arm ensures the webcam remains steady, which is critical for accurate OCR results. Sketching the structure and measuring distances based on your paper size will make assembly easier and improve scanning quality.

Building the OCR Box

IMG_4900.jpeg
IMG_4901.jpeg

Now it’s time to assemble the structure. Start by creating the base platform using popsicle sticks glued together with hot glue. This will hold the document flat during scanning.

Next, attach the single vertical side wall to one edge of the base. Make sure it is perpendicular to the base and firmly securedβ€”this wall will support the overhanging webcam arm.

Finally, construct the overhanging arm using popsicle sticks and attach it to the vertical wall. Position the arm so the webcam will hang directly above the center of the document platform. Check that the structure is stable and that the arm can hold the webcam without sagging.

πŸ’‘ Tip: Test the balance by temporarily placing the webcam on the arm before the glue fully sets to ensure it is positioned correctly.

Connecting the Webcam and Running OCR

Screenshot 2026-03-11 202232.png

With the physical box complete, the next step is to connect your webcam to a computer and run OCR software. The code captures an image from the webcam positioned above the document and then processes it to extract text using Optical Character Recognition (OCR).

This script ensures that every document scanned in the box is converted into digital text, making it searchable and editable. It also allows you to quickly test the positioning, lighting, and stability of the webcam before scanning multiple pages.

πŸ’‘ Tip: Make sure the webcam is stable on the overhanging mount and the document is well-lit for the best OCR accuracy.

import cv2
import pytesseract
import pyttsx3
import numpy as np
from PIL import Image
import time


# Set Tesseract executable path
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"


# Initialize text-to-speech engine
engine = pyttsx3.init()
engine.setProperty("rate", 150)




def find_usb_camera():
"""Find available USB camera automatically"""
for index in range(5): # Check indices 0-4
cap = cv2.VideoCapture(index, cv2.CAP_DSHOW) # Use DirectShow backend
if cap.isOpened():
# Test if camera is readable
ret, frame = cap.read()
if ret and frame is not None:
print(f"βœ“ Camera found at index {index}")
cap.release()
return index
cap.release()
return None




def preprocess_image(frame):
"""Advanced image preprocessing for better OCR accuracy"""
# Convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)


# Apply bilateral filter for noise reduction while preserving edges
gray = cv2.bilateralFilter(gray, 11, 17, 17)


# Apply Gaussian blur to reduce noise
gray = cv2.GaussianBlur(gray, (5, 5), 0)


# Apply adaptive thresholding for better contrast (FIXED)
thresh = cv2.adaptiveThreshold(gray, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2)


# Morphological operations to enhance text
kernel = np.ones((2, 2), np.uint8)
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)


return thresh




def extract_text(image):
"""Extract text with optimized Tesseract configuration"""
custom_config = r'--oem 3 --psm 6'
text = pytesseract.image_to_string(image, config=custom_config)
return text




def improve_text(text):
"""Clean and improve text quality"""
lines = [line.strip() for line in text.split('\n') if line.strip()]
cleaned_text = '\n'.join(lines)
return cleaned_text




# Find USB camera automatically
print("=" * 50)
print("USB CAMERA TEXT SCANNER")
print("=" * 50)
print("Searching for USB camera...")
camera_index = find_usb_camera()


if camera_index is None:
print("βœ— No USB camera found. Trying default camera...")
camera_index = 0


# Initialize USB webcam with optimal settings
cap = cv2.VideoCapture(camera_index, cv2.CAP_DSHOW)


# Set higher resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
cap.set(cv2.CAP_PROP_FPS, 30)


if not cap.isOpened():
print("βœ— Error: Could not open camera")
exit()


# Verify camera is working
ret, test_frame = cap.read()
if not ret or test_frame is None:
print("βœ— Error: Camera not returning frames")
cap.release()
exit()


print(f"βœ“ USB Camera connected successfully (Index: {camera_index})")
print(f" Resolution: {int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))}x{int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))}")
print("=" * 50)
print("CONTROLS:")
print(" [S] - Scan text")
print(" [Q] - Quit")
print("=" * 50)


# Create window and move it to visible position
cv2.namedWindow("USB Camera - Text Scanner", cv2.WINDOW_NORMAL)
cv2.moveWindow("USB Camera - Text Scanner", 100, 100)


# Warm up camera
print("Warming up camera...")
for i in range(5):
ret, frame = cap.read()
if ret:
print(f" Frame {i + 1}/5 received")
time.sleep(0.1)


print("βœ“ Camera ready! Window should be visible now.")
print("=" * 50)


while True:
ret, frame = cap.read()


if not ret:
print("βœ— Error: Lost camera connection")
break


if frame is None:
print("βœ— Error: Empty frame received")
continue


# Create a copy for display
display_frame = frame.copy()


# Add text overlay
cv2.putText(display_frame, "USB Camera - Text Scanner", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.putText(display_frame, "Press 'S' to scan | 'Q' to quit", (10, 70),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.putText(display_frame, "Make sure text is visible and well-lit", (10, 110),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 255), 1)


# Show frame
cv2.imshow("USB Camera - Text Scanner", display_frame)


# Check for key press
key = cv2.waitKey(1) & 0xFF


if key == ord('s'):
print("\n" + "=" * 50)
print("SCANNING...")
print("=" * 50)


# Capture current frame
ret, scan_frame = cap.read()
if not ret or scan_frame is None:
print("βœ— Error: Could not capture frame for scanning")
continue


# Save original frame for reference
cv2.imwrite("scanned_frame_original.jpg", scan_frame)
print("βœ“ Original frame saved as 'scanned_frame_original.jpg'")


# Preprocess image
processed_frame = preprocess_image(scan_frame)
cv2.imwrite("scanned_frame_processed.jpg", processed_frame)
print("βœ“ Processed frame saved as 'scanned_frame_processed.jpg'")


# Extract text
raw_text = extract_text(processed_frame)
cleaned_text = improve_text(raw_text)


print("\n----- EXTRACTED TEXT -----")
if cleaned_text.strip():
print(cleaned_text)
print("--------------------------")
print(f"βœ“ Text detected! ({len(cleaned_text)} characters)")


# Speak the text
engine.say("Text detected")
engine.runAndWait()


# Save to file
with open("scanned_text.txt", "w") as f:
f.write(cleaned_text)
print("βœ“ Text saved to 'scanned_text.txt'")
else:
print("[No readable text found]")
print("--------------------------")
print("Tips for better results:")
print(" - Ensure good lighting")
print(" - Hold camera steady")
print(" - Make sure text is in focus")
print(" - Keep text within the frame")


engine.say("No readable text found")
engine.runAndWait()


print("=" * 50)


elif key == ord('q'):
print("\nQuitting...")
break


# Cleanup
cap.release()
cv2.destroyAllWindows()
print("βœ“ Camera released. Window closed.")
print("βœ“ Program closed successfully.")



Testing and Optimizing Your OCR Box

IMG_4898.jpeg
Overhead OCR Document Capture System

Now that the box is built and the code is ready, it’s time to test the system. Place a document on the base platform, turn on the webcam feed, and run your OCR script. Check that the camera captures the entire page and that the text is correctly recognized.

If the results aren’t perfect, adjust the setup:

  1. Camera height or angle – make sure it points straight down and is centered.
  2. Lighting – add or reposition lights to reduce shadows or glare.
  3. Document placement – ensure the paper is flat against the base.

Once optimized, your OCR box is ready for reliable document scanning. You can expand the system later with improvements like automated capture, higher-resolution webcams, or multiple lighting sources for even better accuracy.