SignSpeak AI

by Thopireddy Yuvan Reddy in Circuits > Raspberry Pi

22 Views, 0 Favorites, 0 Comments

SignSpeak AI

SignSpeak_AI.jpeg
poster.jpeg
Screenshot 2026-06-18 at 7.39.50 PM.png

SignSpeakAI is a Raspberry Pi-based project that converts sign language alphabet gestures into text.

In this project, the user shows a hand sign in front of a camera. The Raspberry Pi processes the camera feed and predicts which alphabet letter is being shown. The detected letter is then displayed on an I2C LCD display. An LED is also used to show the status of the system.

The goal of this project is to make a simple and affordable assistive technology prototype that can help convert sign language alphabets into readable text in real time.

Supplies

Screenshot 2026-06-17 at 1.58.17 PM.png
Screenshot 2026-06-17 at 1.53.25 PM.png
Screenshot 2026-06-17 at 1.49.27 PM.png
Screenshot 2026-06-17 at 1.47.24 PM.png
Screenshot 2026-06-17 at 1.51.55 PM.png
Screenshot 2026-06-17 at 1.52.21 PM.png
Screenshot 2026-06-17 at 1.52.48 PM.png
PSU-White-EU.jpg
4255998-40.jpg

HARDWARE:

  1. Raspberry Pi 5
  2. Raspberry Pi 5 Power Adapter
  3. Logitech C270 USB Webcam
  4. HC-SR04 Ultrasonic Distance Sensor
  5. 16×2 LCD Display with I2C Interface
  6. Push Button
  7. RGB Led
  8. Active Buzzer
  9. GPIO Ribbon Cable
  10. Jumper Wire

System Workflow

block.png
block diagram.jpeg

System Workflow

  1. Ultrasonic Sensor ActivationThe system remains idle until a hand is detected within a specific distance.
  2. This helps save processing power and avoids unnecessary predictions.
  3. Image Capture A USB webcam captures the live video feed of the user's hand.
  4. Hand Landmark Detection A pre-trained MediaPipe model extracts key hand landmarks (x, y, z coordinates).
  5. These landmarks describe the position of fingers and joints.
  6. Sign Classification The extracted landmarks are sent to a Machine Learning classification model.
  7. The model predicts the corresponding sign language alphabet.
  8. Visual Feedback The predicted letter is displayed on an I2C LCD display.
  9. An RGB LED provides feedback:
  10. 🟢 Green = Sign detected successfully
  11. 🔴 Red = No hand detected
  12. Error Notification If the system cannot recognize a valid sign, a buzzer sounds to notify the user.
  13. Data LoggingEach prediction is stored in a PostgreSQL database.
  14. The database records:
  15. Predicted letter
  16. Confidence score
  17. Timestamp
  18. This data can later be viewed through the Gradio dashboard.
  19. Real-Time DashboardA Gradio web interface displays:
  20. Live camera feed
  21. Predicted letters
  22. Confidence scores
  23. Prediction history
  24. System information


Designing the Case

front_view.png
side_view.jpg
Top_view.jpg

Before assembling the electronics, I designed a custom enclosure to house all the components of SignSpeakAI. The goal was to create a sturdy structure that protects the hardware, keeps the wiring organized, and provides a fixed position for the camera to improve recognition accuracy.

The enclosure is built from 4 mm MDF sheets and has the following dimensions:

  1. Width: 250 mm
  2. Depth: 250 mm
  3. Height: 150 mm

The main enclosure stores the Raspberry Pi, wiring, and power connections while keeping everything hidden inside for a cleaner appearance.

Front Panel Components

The front panel contains all the user-facing components. An I2C LCD display is mounted in the center and is used to show the predicted sign language alphabet. Above the display, an ultrasonic sensor is mounted. The sensor detects when a user places their hand in front of the system and activates the recognition process only when needed.

An RGB LED and buzzer are also mounted on the front panel. The LED provides visual feedback about the system status, while the buzzer alerts the user when a valid sign cannot be recognized.

Raspberry Pi and Controls

The Raspberry Pi is mounted inside the enclosure to protect it from damage and keep all cables hidden. Additional cut-outs are included for the push button and power supply connection, allowing easy access from outside the box.

Camera Holder

To achieve a consistent viewing angle, I designed an L-shaped camera holder using two MDF boxes joined together. The vertical section raises the camera above the enclosure, while the horizontal section positions the webcam directly over the signing area.

This design gives the camera a clear view of the user's hand and helps improve the accuracy of the MediaPipe hand-tracking model and machine learning classifier.

Assembly

After cutting the MDF panels, the enclosure is assembled using small screws. The LCD display and ultrasonic sensor are secured with screws, while the LED is fixed using wood glue. Once assembled, the enclosure provides a strong and stable housing for the entire SignSpeakAI system.

The final design transforms SignSpeakAI from a collection of electronic components into a compact and professional-looking device that is suitable for demonstrations and everyday use.

Collect Dataset

IMG_1548.jpeg
IMG_1554.jpeg
IMG_1584.jpeg
13d8da18-9cce-4a52-96f3-f2a7535cde7c.JPG
2bbb8d8c-1dfe-40b2-a2d2-5f798881723f.JPG
355ffcd5-9f50-4711-891e-d413091d827d.JPG
IMG_1668.jpeg
IMG_1690.jpeg
02e48d35-dbeb-420a-a4cc-35e24f64d219.JPG
IMG_1925.jpeg

Before training the machine learning model, I needed to create a dataset containing examples of each sign language alphabet supported by SignSpeakAI.

For this project, I collected data for the following alphabets:

A, B, C, D, E, F, G, K, L, and O

To build a reliable dataset, I captured approximately 120 images for each class, resulting in a dataset of around 1,200 images. Collecting a large number of samples helped improve the model's ability to recognize signs under different conditions.

Data Collection Process

For each alphabet:

  1. Position the hand in front of the webcam.
  2. Form the correct ASL sign.
  3. Capture multiple images from slightly different positions and angles.
  4. Repeat the process until approximately 120 images were collected.
  5. Repeat the same procedure for all supported letters.

Why Collect So Many Images?

Hand signs can appear different depending on:

  1. Hand orientation
  2. Distance from the camera
  3. Lighting conditions
  4. Small variations in finger placement

By collecting around 120 images per class, the dataset becomes more diverse, helping the model generalize better and make more accurate predictions during real-time use.

Dataset Classes

The images below show one sample image from each class used in the dataset:

  1. A
  2. B
  3. C
  4. D
  5. E
  6. F
  7. G
  8. K
  9. L
  10. O

Although only one image per class is shown here, each class contains approximately 120 captured samples. These images formed the foundation of the SignSpeakAI dataset and were later processed using MediaPipe to extract hand landmarks for machine learning training.

Extracting Keypoints and Feature Engineering

After collecting the dataset, the next step was to convert the images into useful training data. Instead of training the model directly on full images, I used MediaPipe to extract hand key points.

MediaPipe detects important points on the hand, such as the fingertips, joints, and palm position. These landmarks are represented as coordinate values and describe the shape of the hand.

For each image, the program:

  1. Loads the image from the dataset
  2. Detects the hand using MediaPipe
  3. Extracts the hand landmark coordinates
  4. Saves the values into a CSV file with the correct label

To improve the performance of the model, I also performed feature engineering. In addition to the raw landmark coordinates, I calculated:

  1. Distances between important hand landmarks
  2. Angles formed between fingers and joints

These additional features help the model better understand the shape and orientation of the hand, leading to more accurate predictions.

The final CSV file contains the extracted landmarks, engineered features, and the corresponding alphabet label for each image. This dataset is then used to train the machine learning model.

The complete Python notebook used for landmark extraction and feature engineering is attached to this step. The notebook uses MediaPipe to extract hand landmarks, calculates additional distance and angle features, and exports the final dataset to a CSV file for model training.

Downloads

Train the Model

After generating the CSV file containing the extracted hand landmarks and engineered features, the next step was to train the machine learning model.

I experimented with several classification algorithms to determine which model would perform best for sign language recognition. Each model was tested and compared based on its prediction accuracy and performance.

After evaluating the different models, I selected Logistic Regression as the final classifier. It provided the most reliable results while maintaining fast prediction speeds, making it well suited for real-time use on the Raspberry Pi.

The model was trained using:

  1. MediaPipe hand landmark coordinates
  2. Distance-based features
  3. Angle-based features
  4. Corresponding sign language labels

Once training was complete, the model was evaluated on unseen data to verify its performance. The trained model can now recognize the supported sign language alphabets and is ready to be integrated into the SignSpeakAI system for real-time predictions.

Preparing the Raspberry Pi

After designing the enclosure, the next step was to prepare the Raspberry Pi 5, which serves as the main controller of the SignSpeakAI system.

What You Need

  1. Raspberry Pi 5
  2. MicroSD Card (32 GB or larger recommended)
  3. Raspberry Pi Power Supply
  4. Internet Connection
  5. USB Webcam
  6. Computer for remote access

Step 1: Install Raspberry Pi OS

Open Raspberry Pi Imager on your computer and insert the microSD card. Select your Raspberry Pi model and choose Raspberry Pi OS (64-bit) as the operating system.

Step 2: Configure the Raspberry Pi

Before writing the operating system to the microSD card, configure the Raspberry Pi settings. Set a hostname, username, password, and enable SSH so that the Raspberry Pi can be accessed remotely. If using Wi-Fi, enter your network credentials as well.

Once the configuration is complete, write the operating system to the microSD card.

Step 3: Boot the Raspberry Pi

Insert the microSD card into the Raspberry Pi and connect the power supply. Allow the device to boot and connect to the network.

Step 4: Connect Through SSH

Using a computer on the same network, connect to the Raspberry Pi through SSH. This allows the system to be configured and managed remotely without requiring a separate monitor, keyboard, or mouse.

Step 5: Update the System

After connecting successfully, update the operating system and installed packages using the following commands:

sudo apt update

sudo apt upgrade -y

Step 6: Install Project Dependencies

Install all required software packages and Python libraries needed for SignSpeakAI, including OpenCV, MediaPipe, Gradio, PostgreSQL, NumPy, and Scikit-learn. These libraries are used for camera input, hand landmark detection, machine learning inference, database management, and dashboard development.

Step 7: Transfer the Project Files

Upload the SignSpeakAI project files, trained machine learning model, and database configuration files to the Raspberry Pi. Connect the USB webcam and verify that all hardware components are detected correctly.

The Raspberry Pi is now fully prepared and ready for the hardware integration and deployment of the SignSpeakAI system.

Creating Database

Screenshot 2026-06-18 at 6.25.02 PM.png
Screenshot 2026-06-18 at 6.24.20 PM.png

After preparing the Raspberry Pi, the next step was to create the PostgreSQL database for SignSpeakAI. The database is used to store every prediction made by the system so that the results can later be viewed in the Gradio dashboard.

The database contains three main tables: sign_alphabets, sessions, and translation_logs.

sign_alphabets Table

The sign_alphabets table stores the supported sign language letters.

It contains:

  1. letterID: unique ID for each letter
  2. letter: the alphabet letter, such as A, B, C, D, etc.

This table helps keep the detected letters organized instead of saving plain text randomly each time.

sessions Table

The sessions table stores each time the SignSpeakAI system is started.

It contains:

  1. session_id: unique ID for each session
  2. start_time: when the session begins
  3. end_time: when the session ends

This makes it possible to group predictions based on when the system was used.

translation_logs Table

The translation_logs table stores the actual prediction results.

It contains:

  1. id: unique ID for each prediction
  2. session_id: links the prediction to a session
  3. letterID: links the prediction to a detected alphabet
  4. timestamp: time when the prediction was made
  5. confidence: confidence score of the model prediction

The translation_logs table is connected to both the sessions table and the sign_alphabets table using foreign keys. This keeps the database structured and avoids duplicate data.

Why I Used a Database

Using PostgreSQL allows SignSpeakAI to save prediction history instead of only showing the current result on the screen. This makes the project more useful because I can track the detected letters, confidence scores, and timestamps.

The saved data is also used in the Gradio dashboard to display the latest predictions and system history. This helps during testing because I can check which letters were detected correctly and how confident the model was.

By adding a database, SignSpeakAI becomes more than just a live prediction system. It can also store, review, and analyze sign language recognition results over time.

Maker Case

Before assembling the hardware, I designed the enclosure panels using Adobe Illustrator. Creating the design beforehand ensured that all components would fit correctly and that the final device would have a clean and professional appearance.

The enclosure was designed using 4 mm MDF and consists of a main box measuring 250 mm × 250 mm × 150 mm. Cut-outs were added for the LCD display, ultrasonic sensor, RGB LED, buzzer, push button, and power connections.

I also designed a custom L-shaped camera holder that positions the webcam above the signing area. This allows the camera to capture a consistent top-down view of the user's hand, improving sign recognition accuracy.

While creating the design, all measurements were carefully matched to the dimensions of the components to ensure proper alignment and easy assembly. The front panel was designed to hold the LCD display and sensors, while the internal space was sized to accommodate the Raspberry Pi and wiring.

After completing the design, the files were prepared for manufacturing by converting all measurements to millimeters and adjusting the line widths required for laser cutting.

The Adobe Illustrator design files used to create the enclosure have been included with this step, allowing others to modify, recreate, or laser-cut the case for their own SignSpeakAI build.

The finished design served as the blueprint for the final SignSpeakAI enclosure and ensured that all components could be assembled neatly and securely.

Hardware Components

hardware_setup.jpeg
hardware2.jpeg

With the machine learning model trained, the next step was to assemble the hardware components that make up the SignSpeakAI system.

The Raspberry Pi 5 acts as the central controller and communicates with all connected devices. Each component plays an important role in the overall functionality of the project.

Components Used

  1. Raspberry Pi 5
  2. USB Webcam
  3. I2C LCD Display
  4. Ultrasonic Sensor
  5. RGB LED
  6. Buzzer
  7. Push Button
  8. MDF Enclosure
  9. Jumper Wires
  10. Breadboard and Connectors

Hardware Integration

The USB webcam is used to capture the user's hand signs in real time. The captured frames are processed by MediaPipe and the machine learning model running on the Raspberry Pi.

The ultrasonic sensor is mounted on the front of the enclosure and detects when a user places their hand within range. This allows the system to activate only when needed.

The I2C LCD display shows the predicted sign language alphabet, allowing users to instantly see the recognized letter without needing to look at the dashboard.

The RGB LED provides visual feedback about the system status. A successful prediction triggers a green light, while an invalid or undetected sign results in a red indication.

The buzzer provides audio feedback whenever the system is unable to confidently recognize a valid sign.

A push button is also included to allow safe system control and shutdown when required.

Final Assembly

After connecting all components to the Raspberry Pi, the wiring was secured inside the enclosure and each component was tested individually. Once everything was working correctly, the complete hardware system was assembled and prepared for software deployment.

At this stage, the physical SignSpeakAI device was fully assembled and ready for real-time sign language recognition.

Create a Frontend Page (Gradio)

After setting up the model, database, and hardware, I created a frontend dashboard using Gradio. This dashboard makes SignSpeakAI easier to use because it displays the system output in a clean web interface.

The Gradio interface has five main pages:

1. About and Onboarding Page

This page introduces SignSpeakAI and explains the goal of the project. It gives users a quick overview of how the system works and what components are used, such as MediaPipe, Raspberry Pi, the webcam, and the machine learning model.

2. Live Prediction Page

This is the main page of the system. It shows the live camera feed and displays the predicted sign language letter in real time. The confidence score is also shown so the user can see how sure the model is about the prediction.

3. Data Page

The Data page displays the prediction history stored in the PostgreSQL database. It shows the latest detected letters, confidence scores, and timestamps. This makes it easier to check how the system performed during testing.

4. ASL Guide Page

The ASL Guide page shows reference images for the supported sign language alphabets. This helps users know which hand signs they can test with the system.

5. System / Debug Page

This page is used to check system information and hardware status. It helps during testing by showing whether the camera, model, database, and connected components are working correctly.

Using Gradio allowed me to create a simple and interactive dashboard without needing to build a full website from scratch. It also made the project easier to demonstrate because all important information is available in one place.

Automation of the Server

To make SignSpeakAI work like a standalone device, I configured the Raspberry Pi to automatically start the application whenever it boots. This removes the need to manually run the program each time the system is powered on.

The following systemd service was used:

[Unit]
Description=SignSpeakAI Gradio App
After=network.target
Wants=network.target

[Service]
User=student
WorkingDirectory=/home/student/2025-26-projectone-ctai-ThopireddyYuvan/RPi
ExecStart=/home/student/.venv/bin/python -u /home/student/2025-26-projectone-ctai-ThopireddyYuvan/RPi/app.py

Restart=always
RestartSec=5

StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

This service automatically launches the SignSpeakAI application when the Raspberry Pi starts and restarts it if it unexpectedly stops. As a result, the system is always ready to use after powering on the device.

GitHub Link

All the source code, notebooks, and project files used to build SignSpeakAI are available on GitHub.

The repository contains:

  1. Raspberry Pi application code
  2. Machine learning training notebooks
  3. MediaPipe landmark extraction scripts
  4. Gradio dashboard files
  5. Database models and configuration
  6. Hardware integration code

You can access the complete project using the GitHub repository link below:

GitHub Repository: https://github.com/howest-mct/2025-26-projectone-ctai-ThopireddyYuvanReddy

Feel free to explore the code, modify the project, or use it as a starting point for your own sign language recognition system.