Gesture-Controlled Raspberry Pi Radio

by Younes El Kouy in Circuits > Raspberry Pi

172 Views, 2 Favorites, 0 Comments

Gesture-Controlled Raspberry Pi Radio

IMG_1832.jpg
IMG_1828.jpg

This project is a Raspberry Pi radio prototype that can be controlled with hand gestures. A USB camera detects the hand, MediaPipe Tasks HandLandmarker extracts 21 hand landmarks, and a custom AI classifier predicts which gesture is being shown.

The goal of this project was to explore a different way of controlling a radio without only using small buttons, touchscreens or a phone app. The Raspberry Pi works as the functional internet radio, while the physical radio and enclosure make the prototype feel like a real object.

The system can recognise gestures for radio on, radio off, volume up, volume down, next station and mute. The relay is only used for safe ON/OFF control. Volume, mute and next station are handled as software actions inside the Raspberry Pi radio system.

Supplies

IMG_1824.jpg
IMG_1826.jpg

Main hardware

  1. Raspberry Pi 5 8GB — about €131.95
  2. USB camera / Logitech C270 — about €20.99
  3. JBL GO 2 speaker — about €28.99
  4. OLED display — about €19.12
  5. Freenove Raspberry Pi project kit / keypad — about €59.95
  6. 5V 1-channel relay module — about €2.50
  7. Retekess TR604 radio object — about €29.15
  8. Raspberry Pi 27W USB-C power supply — about €12.95
  9. 32GB microSD card — about €19.91
  10. Raspberry Pi 5 active cooler — about €5.64
  11. Jumper wires and cable clips — about €10–€15
  12. 6 mm MDF laser-cut enclosure sheets — school workshop material / price depends on material use

Optional setup and debugging material

  1. Portable monitor — about €89.99
  2. HDMI adapter — about €12.95
  3. USB external sound card — about €27.99
  4. 3.5 mm audio cable — about €18.69
  5. Velcro strip — about €4.09

The full BOM subtotal in my project file is about €505.64, but this includes optional debugging material and parts that were already available. The real cost depends on which parts you already have.

Software

  1. Python
  2. MediaPipe Tasks HandLandmarker
  3. scikit-learn
  4. Gradio
  5. PostgreSQL / database logging
  6. Docker
  7. GitHub

Tools

  1. Laptop or desktop
  2. Raspberry Pi OS
  3. Laser cutter
  4. Screwdrivers and basic hand tools

System Overview and AI Pipeline

BlockDiagramPD02-v3.png
Screenshot 2026-06-19 065947.png

The system starts with a USB camera. The camera frame is processed by MediaPipe Tasks HandLandmarker. MediaPipe detects 21 hand landmarks from the hand.

Each landmark has x, y and z coordinates. This creates 63 numerical keypoint values. These 63 values are sent to a custom classifier. The classifier predicts the gesture and returns a confidence score.

If the prediction is confident enough and stable for enough frames, the gesture is mapped to a radio action.

Basic pipeline:

Camera input → MediaPipe hand landmarks → 63 keypoint values → custom classifier → gesture prediction → debounce logic → radio action → OLED / Gradio / database feedback

Raspberry Pi Setup

IMG_1824.jpg

The Raspberry Pi runs the main application. It starts the Gradio dashboard, camera stream, AI prediction code, radio logic and hardware feedback.

The JBL GO 2 speaker is used as the audio output for the internet radio. The OLED and keypad are used for local feedback and control.

The Raspberry Pi is placed inside the MDF enclosure. The wiring and components are mounted inside so the prototype can be moved and demonstrated as one system.

Hand Landmark Detection

01_mediapipe_hand_landmarks_working.png

For hand detection I used MediaPipe Tasks HandLandmarker. This does not classify the full camera image. It extracts hand landmarks first.

Each detected hand has 21 landmarks. Every landmark has x, y and z coordinates. This gives 63 input features for the classifier.

This was important because my project is a keypoint detection project, not a normal image classification project.

Collecting the Gesture Dataset

02_keypoint_capture_window.png
04_dataset_csv_header_preview.png

I collected my own dataset for the gestures. Each row contains the gesture label and 63 keypoint values.

The labels are:

  1. ASL_A: Radio ON
  2. ASL_B: Radio OFF
  3. ASL_1: Volume UP
  4. ASL_V: Volume DOWN
  5. ASL_L: Next station
  6. ASL_G: Mute

The CSV file uses this structure:

label, x0, y0, z0, x1, y1, z1, ..., x20, y20, z20

This means every row has 64 columns: 1 label and 63 numerical values.

After collecting the data, I validated the CSV file. The final dataset contains 1557 self-collected keypoint rows, 6 gesture classes, 64 columns and 0 invalid rows.

Training the Classifier

model_comparison.png
confusion_matrix_svm_latest.png

I trained and compared Random Forest and SVM models. Random Forest was useful as a strong baseline for tabular keypoint data. SVM performed best in the latest evaluation after scaling the features.

Random Forest works by building many decision trees and combining their votes. It is strong for small tabular datasets and does not need feature scaling as much.

SVM works by finding a boundary between gesture classes. Because SVM is distance and margin based, scaling is important so that all 63 coordinate features are treated in a fair way.

The final exported model is used by the real-time prediction system.

Real-Time Gesture Prediction

confidence_distribution.png
per_class_f1_comparison.png

The real-time prediction system does not trigger an action from one single frame. It uses a confidence threshold, stable-frame check and debounce logic.

In my real-time script, an action is only triggered when the confidence is above 0.75, the same gesture is detected for 8 consecutive frames, and the debounce time has passed.

This makes the system more reliable because the same gesture must be detected for multiple frames before an action is executed.

Gradio Dashboard

IMG_1834.jpg
Screenshot 2026-06-19 065947.png

The Gradio dashboard was used to explain and monitor the system. It contains pages for onboarding, dataset status, live operation and debugging.

The dashboard shows the system pipeline, gesture mapping, dataset validation, prediction status, confidence, system mode and logs.

A normal Gradio image stream caused flickering, so I used a separate MJPEG stream for the camera feed and kept Gradio for the dashboard controls and status information.

OLED, Keypad and Relay

IMG_1826.jpg

The OLED shows local system status. The keypad can be used for local control and mode switching. The relay is used only for safe ON/OFF control.

The relay does not control volume, mute or next station. Those actions are software events inside the Raspberry Pi internet radio.

This separation is important for safety. I do not claim that the relay physically controls every radio function. It is only used for ON/OFF switching.

Enclosure and Final Assembly

IMG_1818.jpg

The enclosure was made from laser-cut MDF. The goal was to create a clean dock-style prototype where the Raspberry Pi, wiring and hardware modules could be placed inside.

The enclosure includes space for the camera, OLED, keypad, speaker, Raspberry Pi and cable routing. The top can be opened so the inside can still be reached for debugging and maintenance.

The enclosure also gives the prototype a more finished look for the poster fair and final demo.

Final Demo, Limitations and Improvements

IMG_1830.jpg

In the final demo, the system can detect hand gestures, predict the gesture, trigger radio actions, show feedback on Gradio and OLED, and log events to the database.

The model is not perfect in every condition. Bad lighting, fast hand movement, unclear hand position or a partially visible hand can reduce reliability.

The relay is only used for ON/OFF. Volume, mute and next station are software actions. In a future version, I would improve the dataset with more users, lighting conditions and hand positions. I would also test the full system earlier and more often during the project.

AI was used to help brainstorm, structure, debug and review parts of this project. The final project choices, testing, implementation, hardware work and submission were done and checked by me.