Gesture-Controlled Raspberry Pi Radio

This project is a Raspberry Pi radio prototype that can be controlled with hand gestures. A USB camera detects the hand, MediaPipe Tasks HandLandmarker extracts 21 hand landmarks, and a custom AI classifier predicts which gesture is being shown.

The goal of this project was to explore a different way of controlling a radio without only using small buttons, touchscreens or a phone app. The Raspberry Pi works as the functional internet radio, while the physical radio and enclosure make the prototype feel like a real object.

The system can recognise gestures for radio on, radio off, volume up, volume down, next station and mute. The relay is only used for safe ON/OFF control. Volume, mute and next station are handled as software actions inside the Raspberry Pi radio system.

Supplies

Main hardware

Raspberry Pi 5 8GB — about €131.95
USB camera / Logitech C270 — about €20.99
JBL GO 2 speaker — about €28.99
OLED display — about €19.12
Freenove Raspberry Pi project kit / keypad — about €59.95
5V 1-channel relay module — about €2.50
Retekess TR604 radio object — about €29.15
Raspberry Pi 27W USB-C power supply — about €12.95
32GB microSD card — about €19.91
Raspberry Pi 5 active cooler — about €5.64
Jumper wires and cable clips — about €10–€15
6 mm MDF laser-cut enclosure sheets — school workshop material / price depends on material use

Optional setup and debugging material

Portable monitor — about €89.99
HDMI adapter — about €12.95
USB external sound card — about €27.99
3.5 mm audio cable — about €18.69
Velcro strip — about €4.09

The full BOM subtotal in my project file is about €505.64, but this includes optional debugging material and parts that were already available. The real cost depends on which parts you already have.

Software

Python
MediaPipe Tasks HandLandmarker
scikit-learn
Gradio
PostgreSQL / database logging
Docker
GitHub

Tools

Laptop or desktop
Raspberry Pi OS
Laser cutter
Screwdrivers and basic hand tools

System Overview and AI Pipeline

The system starts with a USB camera. The camera frame is processed by MediaPipe Tasks HandLandmarker. MediaPipe detects 21 hand landmarks from the hand.

Each landmark has x, y and z coordinates. This creates 63 numerical keypoint values. These 63 values are sent to a custom classifier. The classifier predicts the gesture and returns a confidence score.

If the prediction is confident enough and stable for enough frames, the gesture is mapped to a radio action.

Basic pipeline:

Camera input → MediaPipe hand landmarks → 63 keypoint values → custom classifier → gesture prediction → debounce logic → radio action → OLED / Gradio / database feedback

Raspberry Pi Setup

The Raspberry Pi runs the main application. It starts the Gradio dashboard, camera stream, AI prediction code, radio logic and hardware feedback.

The JBL GO 2 speaker is used as the audio output for the internet radio. The OLED and keypad are used for local feedback and control.

The Raspberry Pi is placed inside the MDF enclosure. The wiring and components are mounted inside so the prototype can be moved and demonstrated as one system.

Hand Landmark Detection

For hand detection I used MediaPipe Tasks HandLandmarker. This does not classify the full camera image. It extracts hand landmarks first.

Each detected hand has 21 landmarks. Every landmark has x, y and z coordinates. This gives 63 input features for the classifier.

This was important because my project is a keypoint detection project, not a normal image classification project.

Collecting the Gesture Dataset

I collected my own dataset for the gestures. Each row contains the gesture label and 63 keypoint values.

The labels are:

ASL_A: Radio ON
ASL_B: Radio OFF
ASL_1: Volume UP
ASL_V: Volume DOWN
ASL_L: Next station
ASL_G: Mute

The CSV file uses this structure:

label, x0, y0, z0, x1, y1, z1, ..., x20, y20, z20

This means every row has 64 columns: 1 label and 63 numerical values.

After collecting the data, I validated the CSV file. The final dataset contains 1557 self-collected keypoint rows, 6 gesture classes, 64 columns and 0 invalid rows.

Training the Classifier

I trained and compared Random Forest and SVM models. Random Forest was useful as a strong baseline for tabular keypoint data. SVM performed best in the latest evaluation after scaling the features.

Random Forest works by building many decision trees and combining their votes. It is strong for small tabular datasets and does not need feature scaling as much.

SVM works by finding a boundary between gesture classes. Because SVM is distance and margin based, scaling is important so that all 63 coordinate features are treated in a fair way.

The final exported model is used by the real-time prediction system.

Real-Time Gesture Prediction

The real-time prediction system does not trigger an action from one single frame. It uses a confidence threshold, stable-frame check and debounce logic.

In my real-time script, an action is only triggered when the confidence is above 0.75, the same gesture is detected for 8 consecutive frames, and the debounce time has passed.

This makes the system more reliable because the same gesture must be detected for multiple frames before an action is executed.

Gradio Dashboard

The Gradio dashboard was used to explain and monitor the system. It contains pages for onboarding, dataset status, live operation and debugging.

The dashboard shows the system pipeline, gesture mapping, dataset validation, prediction status, confidence, system mode and logs.

A normal Gradio image stream caused flickering, so I used a separate MJPEG stream for the camera feed and kept Gradio for the dashboard controls and status information.

OLED, Keypad and Relay

The OLED shows local system status. The keypad can be used for local control and mode switching. The relay is used only for safe ON/OFF control.

The relay does not control volume, mute or next station. Those actions are software events inside the Raspberry Pi internet radio.

This separation is important for safety. I do not claim that the relay physically controls every radio function. It is only used for ON/OFF switching.

Enclosure and Final Assembly

The enclosure was made from laser-cut MDF. The goal was to create a clean dock-style prototype where the Raspberry Pi, wiring and hardware modules could be placed inside.

The enclosure includes space for the camera, OLED, keypad, speaker, Raspberry Pi and cable routing. The top can be opened so the inside can still be reached for debugging and maintenance.

The enclosure also gives the prototype a more finished look for the poster fair and final demo.

Final Demo, Limitations and Improvements

In the final demo, the system can detect hand gestures, predict the gesture, trigger radio actions, show feedback on Gradio and OLED, and log events to the database.

The model is not perfect in every condition. Bad lighting, fast hand movement, unclear hand position or a partially visible hand can reduce reliability.

The relay is only used for ON/OFF. Volume, mute and next station are software actions. In a future version, I would improve the dataset with more users, lighting conditions and hand positions. I would also test the full system earlier and more often during the project.

AI was used to help brainstorm, structure, debug and review parts of this project. The final project choices, testing, implementation, hardware work and submission were done and checked by me.