How to Train a Custom YOLO Computer Vision Model for Raspberry Pi (From Scratch)

Computer Vision is one of the most exciting fields in AI, and with models like YOLO (You Only Look Once), you can build powerful real-time object detection systems—even on small devices like a Raspberry Pi.

In this blog, I’ll walk you through the complete process of training your own custom YOLO model from scratch using Label Studio, from collecting images to deploying it on a Raspberry Pi.

Supplies

Raspberry Pi 4 or 5 - for deployment

Raspberry Pi Camera Module 2 or 3 / USB Webcam - for running inference on surrounding objects

Collect Your Dataset

First, you need images of the objects you want your model to detect.

🔹 Tips for a good dataset:

Capture images from different angles
Use different lighting conditions
Include multiple backgrounds
Aim for at least 200–500 images per class

You can:

Capture images using your phone or Raspberry Pi camera
Download datasets from websites (like Kaggle or Roboflow)

Install Label Studio

91e23a79f08972e22abe23b7f70866fddecbb17b-1200x630 (3).png

Create a Virtual Environment and Activate it: Open Command Prompt and run:

python -m venv label-studio

label-studio\Scripts\activate

Run the following command:

pip install label-studio

Type label-studio to start the server:

label-studio

After that, Label Studio starts running on your computer locally. It will automatically open in your browser at http://localhost:8080.

Sign Up and Login to Label-studio

If you haven't created an account yet, click the blue Sign up link at the bottom (highlighted in red) to go to the registration page.
You can enter a fake email and password since this local setup doesn't require actual authentication. Just fill those in and click the Create Account button to get started immediately.
To log in, simply use the same fake email and password you used when signing up. Since there is no real-world authentication, just ensure the credentials match what you previously entered and click the Log in button.

Set Up Your Labelling Environment

To get started, click the blue Create Project button in the center of the screen. This will allow you to name your project and begin setting up your labeling environment.
Enter a name for your project in the Project Name field, such as Object Detection, and add an optional description if needed.
Once finished, click the Data Import tab at the top to proceed to uploading your images.
To import your data, go to the Data Import tab and click the Upload Files button to drag and drop your images or datasets into the central box (up to only 100 images at a time). Once your files are uploaded and appear in the list, proceed to the next step.
Navigate to the Labeling Setup tab and select the Object Detection with Bounding Boxes template from the Computer Vision section. This setup is ideal for drawing rectangular boxes around specific items to train your detection models.
To customize your labels, click the Trash Can icon next to the existing Airplane and Car entries to delete them.
Type your desired labels into the Add label names text box, putting each one on a new line. Once you've listed them, click the Add button to move them into your project's active label list.
Review your list in the Labels section to ensure all necessary classes are present and correctly spelled. Once you are satisfied with the configuration, click the blue Save button in the top right corner to initialize your project.

Start Labelling Your Dataset

Click the Label All Tasks button (highlighted in pink) to begin annotating your images. This will open the labeling interface where you can start drawing bounding boxes for each uploaded task in sequence.
Select the object name from the label list at the bottom that corresponds to the item you see in the image. Once selected, draw the bounding box around the object and click the blue Submit button to save your work.
Click the Export button in the top right corner to download your completed annotations.
Select the YOLO with Images option from the export list to download both your text annotations and the image files in a single package. This ensures your dataset is ready for training without needing to manually pair labels and photos later.
Once you have selected your desired format, click the blue Export button at the bottom of the window. This will generate and download the dataset file directly to your computer.

Train the Model

Once the dataset is downloaded and saved to your computer, you are ready for the final step. Follow the detailed instructions provided in the Google Colab Notebook to upload your files and begin training your custom model.

The notebook will guide you through connecting your environment, installing necessary dependencies, and starting the training process.

Activating the Virtual Environment

Open the terminal on your Raspberry Pi and run this command to refresh your package lists. This ensures that your system is aware of the latest available software versions before you proceed with any installations.

sudo apt update

Next, run the command to install the virtual environment package for Python. In your terminal, you can see that the system has confirmed python3-venv is already the newest version, so your environment is ready to go.

sudo apt install python3-venv -y

You've successfully created your working directory by running:

mkdir yolo_project

To access the yolo_project directory, execute this command:

cd yolo_project

You have successfully created your virtual environment by executing this command:

python3 -m venv venv

Next, run this Command, you will see (venv) appear at the beginning of your command prompt. This is a crucial step; it confirms that you are working inside the isolated environment.

source venv/bin/activate

Installing the YOLO Framework.

Run this command to update the Python package installer (pip) itself to the latest version. It connects to the Python Package Index (PyPI), checks for a newer version, and installs it, replacing the older version.

pip install --upgrade pip

Run this command to install the core Deep Learning Engine required to run your model. Torch provides the mathematical "brain" for neural network calculations, while torchvision adds the specific tools needed to process and interpret image data from your camera.

pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cpu

Run this command to install the main YOLO framework, providing the simplified code needed to load and run your detection models. It also automatically installs OpenCV, which is the essential library for processing video frames and displaying the live camera feed on your screen.

pip install ultralytics

To ensure your Raspberry Pi can properly read and run the optimized model folder, you need to install the specific NCNN support for the Ultralytics library.

Run this command in your terminal:

pip install ultralytics ncnn

Configure the Interpreter

The final step is to run the Detection Script. To start the process, click the Run menu and select Configure Interpreter.
Click the three-dot button to open the file browser. You need to navigate to your project folder and select the Python file located at:
As shown in the image, navigate to the folder path:

/home/pi/yolo_project/venv/bin/python3

Select the file named python3 and click OK. This specifically tells Thonny to use the Python interpreter in your virtual environment, ensuring it has access to the YOLO and camera libraries you installed.

Running the AI Inference Model

Save your model files directly in the /home/pi directory so your script can find them automatically without needing a long file path. This simple setup ensures your Python code connects to your YOLO model instantly when you click Run in Thonny.
Copy this code and paste it into your editor, and execute it. This code is for the webcam.

import cv2

from ultralytics import YOLO

import time

# 1. Load your model (Change 'yolo_model.pt' to 'yolo_ncnn_export' if using NCNN)

model = YOLO('yolo_model.pt')

# 2. Initialize the camera (0 is usually the default Raspberry Pi camera)

cap = cv2.VideoCapture(0)

# Set resolution for better FPS (e.g., 640x480)

cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)

cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

prev_time = 0

while cap.isOpened():

success, frame = cap.read()

if not success:

break

# 3. Run YOLO inference on the frame

results = model(frame, stream=True)

# 4. Calculate FPS

curr_time = time.time()

fps = 1 / (curr_time - prev_time)

prev_time = curr_time

# 5. Visualize the results on the frame

for r in results:

annotated_frame = r.plot()

# Add FPS text to the frame

cv2.putText(annotated_frame, f"FPS: {fps:.2f}", (20, 50),

cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# 6. Display the live feed

cv2.imshow("YOLO Real-Time Detection", annotated_frame)

# Press 'q' to quit the program

if cv2.waitKey(1) & 0xFF == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

If you are using a Picamera, then this is the code for you:

import cv2

from picamera2 import Picamera2

from ultralytics import YOLO

import time

# 1. Set up the Picamera2

picam2 = Picamera2()

# Using 640x480 for better FPS performance on Raspberry Pi 4

picam2.preview_configuration.main.size = (640, 480)

picam2.preview_configuration.main.format = "RGB888"

picam2.configure("preview")

picam2.start()

# 2. Load your model

# Switch to 'yolo_model.pt' if testing the standard format

model = YOLO('yolo_ncnn_export')

print("Press 'q' in the window to quit...")

try:

while True:

# 3. Capture a frame from the camera

frame = picam2.capture_array()

# 4. Run YOLO inference

# stream=True helps manage memory better during live loops

results = list(model(frame, stream=True))

# 5. Plot results and calculate FPS

annotated_frame = results[0].plot()

# Calculate FPS based on inference speed

inference_time = results[0].speed['inference']

if inference_time > 0:

fps = 1000 / inference_time

text = f'FPS: {fps:.1f}'

# Draw FPS on the top-right corner

cv2.putText(annotated_frame, text, (480, 40),

cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# 6. Display the resulting frame

cv2.imshow("YOLO PiCamera2 Detection", annotated_frame)

if cv2.waitKey(1) == ord("q"):

break

finally:

# 7. Clean up

picam2.stop()

cv2.destroyAllWindows()

Switching to the NCNN model provides a massive performance boost, jumping from roughly 0.75 FPS to nearly 3 FPS or higher, as seen in the screenshots. This happens because NCNN is specifically designed for low-power ARM CPUs like the one in your Raspberry Pi, whereas the original .pt model is built for heavy-duty desktop processors. NCNN uses quantization to simplify complex math into smaller, faster calculations and removes the "extra weight" of the PyTorch library. By focusing purely on inference, it allows your hardware to process frames much more efficiently without needing a dedicated GPU.

Now that you have successfully trained and deployed your YOLO model, you have a powerful tool that can be adapted for almost any visual task. Since your current setup is already optimized for a Raspberry Pi using NCNN, you are in a great position to build portable, real-time AI gadgets.

Here are a few exciting directions you could take your project next:

Automated Coin Counter: Expand your current project to not only detect coins but also calculate the total monetary value in real-time as they pass under the camera.
Shelf Stock Monitor: Train a model to recognize specific household items (like milk or soda) and have the Pi send you a notification or update a digital shopping list when an item is missing.
Package Delivery Guard: Set up your Pi near a window or door to detect when a delivery person leaves a package and trigger a sound or a mobile alert.
Pet Activity Tracker: Monitor where your pets spend most of their time, or create a Smart Pet Door that unlocks only when the camera identifies your specific cat or dog.
PPE Compliance: Train the model to detect safety gear such as helmets, masks, and high-visibility vests to ensure safety protocols are followed in a workspace.
Defect Detection: If you are into 3D printing or DIY electronics, you can use YOLO to spot common printing errors (like "spaghetti" filament) or missing components on a circuit board.
Parking Spot Monitor: Use the camera to look over a driveway or small lot to identify which spaces are occupied and which are free.
Wildlife Feeder Cam: Create a Smart Bird Feeder that identifies different species of birds and logs how often they visit.
Rep Counter: Use the "Pose" version of YOLO to track body movements and automatically count squats, push-ups, or bicep curls during your workout.
Smart Trash Sorter: Train it to distinguish between Plastic, Paper, and Metal to help automate recycling at home.