AR_Glass
With the rapid growth of the OpenClaw open-source ecosystem and cloud-based large language models, the AIoT industry is entering an unprecedented era of opportunity. Our team has long been dedicated to ESP-series development, accumulating extensive experience in embedded image processing and edge computing through the ESP-Claw project. A natural question emerged: Could we integrate ESP-Claw's lightweight AI capabilities into AR glasses, transforming the glasses themselves into an intelligent perception terminal?
This is the origin of our project. Rather than accepting AR glasses as mere "passive displays," we are attempting to let the ESP-Claw controller directly drive the optical module, completing the full pipeline of image acquisition, AI inference, and information overlay on the glasses themselves. Cloud-based large models provide powerful backend support (such as remote model updates and complex task offloading), while ESP-Claw handles latency-critical perception tasks locally. Together, they create a truly intelligent pair of glasses for the AIoT era.
This project is a low-difficulty, easy-to-replicate, and highly practical open-source smart AR glasses build, with a total cost kept under 1,000 RMB. It is suitable for hobbyists and makers, and welcomes commercial secondary development.
This guide will walk you through building a multifunctional AR glasses capable of video playback, thermal fusion, night vision, gesture recognition, SLAM mapping, ESP-Claw AI inference, and cloud-based large model collaboration.
Supplies
OSAAR features a 1920×1080 OLED display. The upper section houses a detachable thermal imaging module, and beneath it lies an integrated infrared camera. Core capabilities include:
Feature
Description
Video Playback
HDMI input, acts as a head-mounted display
Thermal Fusion
Detachable thermal module, supports real-time thermal overlay
Night Vision
Built-in IR camera (OV5647 IR), usable in low-light conditions
Gesture Recognition
Based on MediaPipe, extensible for interaction
SLAM Mapping
Supports Unity / AR application development
ESP-Claw AI Inference
Local lightweight AI models (edge detection, object recognition)
Cloud LLM Collaboration
Connects to the cloud via WiFi/BLE for complex task offloading and model updates
Optical Lens Assembly
1.1 Optical Principles
The core of AR display is the semi-transparent reflective optical system:
- Light from the screen is collimated by a convex lens, making the eye perceive the image as being far away
- A semi-transparent reflective coating on the lens allows the virtual image to overlay with the real world
- The human eye ultimately sees both the real world and the augmented information simultaneously
1.2 Sourcing and Modifying the Optical Module
- Search for "AR glasses optical engine" or "prism OLED" on second-hand marketplaces to purchase a used module
- Remove the original LCOS (no open-source driver available), keeping the lens and semi-transparent reflective coating
- Ensure the internal space of the module can accommodate the ECX335AF screen
1.3 Screen Installation
- Install the ECX335AF OLED screen into the lower part of the optical module, strictly aligning it with the lens imaging center
- Install the ECX335 driver board in the upper compartment
- Connect the FPC cable (handle with care to avoid excessive bending)
- Fix with AB glue, ensuring precise screen-to-lens distance (affects focus)
Warning: The FPC cable bend radius must not be less than 5mm, otherwise signal lines may break, causing display anomalies.
1.4 Expected Display Performance
- Resolution: 1920×1080
- In-eye brightness: ~300 nit
- FOV: ~30°-40°
- Weight: Optical module ~40g, total target <100g
ESP-Claw Board Preparation & Firmware Flashing
2.1 ESP-Claw Core Features
ESP-Claw is the key to this project. Its core advantages include:
- Dual-core processor + image processing accelerator — Supports local AI inference
- MIPI/DVP camera interface — Directly connects to camera modules
- WiFi 6 + BLE 5.3 — Seamless access to cloud large models
- LCD/MIPI display interface — Can drive display panels
- Low-power design — Suitable for battery-powered applications
- Mature ESP ecosystem — Arduino/ESP-IDF development with abundant resources
2.2 Development Environment Setup
Arduino IDE Method (Recommended for Beginners):
- Install Arduino IDE 2.0+
- Add ESP32 board URL: https://espressif.github.io/arduino-esp32/package_esp32_dev_index.json
- Search and install "ESP32" support package in the Board Manager
- Select the correct board model and port
ESP-IDF Method (Recommended for Advanced Users):
2.3 Firmware Flashing
- Connect the ESP-Claw dev board to your computer (USB-UART)
- Open a sample program to verify camera and display functions
- Compile and upload
- Open the Serial Monitor and confirm successful initialization
Tip: First-time flashing may require holding the BOOT button while pressing RESET to enter download mode. Some boards have integrated auto-download circuits and do not require manual operation.
2.4 Hardware Connection Overview
Option A (ESP-Claw Standalone Mode) Connection:
Option B (ESP-Claw + Cloud Collaboration) Connection:
2.5 MIPI-to-HDMI Module Explanation
The MIPI-to-HDMI module (e.g., LT9611, IT6161, etc.) is responsible for converting the ESP-Claw's MIPI DSI signal into an HDMI signal for the ECX335 driver board. This is a critical link in the display chain; ensure the module is compatible with the ESP-Claw MIPI interface.
2.6 Power Supply Methods
- Li-Po Battery Power: 3.7V Li-Po battery passes through a TP4056 charge/discharge management module, boosted to 5V/3.3V to supply ESP-Claw, camera, and driver board
- Type-C Charging: Charge via Type-C port, which can also serve as a debug interface
- External Power: Use a power bank or charger for development and debugging
Note: The bottom-right corner connector for OLED refers to the ECX335 driver board. A standard HDMI to Micro-HDMI cable is sufficient.
Image Processing Algorithm Deployment
3.1 Preparation Tools
- USB Data Cable (USB-UART) ×1
- ESP-Claw Dev Board ×1
- Computer (with Arduino IDE or ESP-IDF installed)
3.2 Development Environment Configuration
Arduino IDE Method:
- Install Arduino IDE 2.0+
- Add ESP32 board URL: https://espressif.github.io/arduino-esp32/package_esp32_dev_index.json
- Search and install "ESP32" support package in the Board Manager
- Select board model "ESP32S3 Dev Module" (or corresponding ESP-Claw model)
- Select the correct COM port
ESP-IDF Method:
3.3 Firmware Flashing Steps
- Connect the ESP-Claw dev board to your computer (USB-UART)
- Open a sample program to verify camera and display functions
- Compile and upload
- Open the Serial Monitor (baud rate 115200) and confirm successful camera initialization and capture information
Tip: First-time flashing may require holding the BOOT button while pressing RESET to enter download mode. Some boards have integrated auto-download circuits.
3.4 First Boot Verification
After flashing, power on the ESP-Claw and connect the ECX335 driver board via HDMI. You should see:
- Camera passthrough display ✓
- Basic system functions ✓
- Normal serial output ✓
If the display is normal, both the optical and ESP-Claw circuit sections are working correctly, and you can proceed to algorithm deployment.
Camera & Sensor Integration
4.1 MIPI/DVP Camera Integration (ESP-Claw)
ESP-Claw supports both MIPI CSI and DVP parallel interfaces. OV5647 is recommended to use MIPI connection:
After integration, you can perform:
- Local AI inference development (based on ESP-WHO framework)
- Real-time video acquisition and processing
- Cloud large model collaborative interaction
4.2 Image Processing Algorithm Deployment (Core Function)
ESP-Claw runs image processing algorithms locally, outputting results to the display:
Edge Detection Mode:
Object Recognition Mode (Using ESP-WHO):
4.3 Cloud Large Model Collaboration (Option B Extension)
ESP-Claw connects to cloud large models via WiFi, achieving "edge-cloud collaboration":
4.4 Thermal Module (Optional Extension)
Integrate MLX90640 thermal module (I2C interface):
Display Output & AR Frame Composition
5.1 Display Chain
ESP-Claw display output path:
5.2 MIPI-to-HDMI Module Configuration
The MIPI-to-HDMI module (e.g., LT9611, IT6161, etc.) converts the MIPI DSI signal from ESP-Claw into an HDMI signal:
5.3 Frame Buffer Overlay Rendering
Overlay processed images + AR information for output:
5.4 Display Timing Optimization
Target frame rate: >=15fps (acceptable), ideal 30fps
Optimization strategies:
- Camera acquisition resolution: VGA (640x480) or lower
- Separate processing resolution from display resolution: process small image first, then upscale
- Use double buffering to avoid tearing
- Optimize key algorithms with ESP-Claw hardware accelerator
Enclosure Assembly & System Integration
6.1 Circuit Layout
Complete system circuit diagram:
6.2 Assembly Steps
- Fix Optical Module: Secure the assembled ECX335 + lens module to the front of the glasses frame
- Install ESP-Claw: Mount the dev board on the temple or top beam, ensuring good heat dissipation
- Connect Camera: Connect OV5647 to ESP-Claw's MIPI interface via FPC cable
- Connect Display Chain: ESP-Claw MIPI → MIPI-to-HDMI Module → HDMI Cable → ECX335 Driver Board
- Connect Power: Li-Po battery connects to ESP-Claw's 5V/3.3V input via charge/discharge management module
- Install Buttons: Mount tactile push buttons on the outer side of the temple for mode switching
- Organize Cables: Secure cables with hot glue or zip ties to prevent movement
6.3 Power Supply Design
Warning: Li-Po batteries must have a protection circuit to prevent over-discharge/overcharge/short circuit. Do not use the device while charging.
6.4 Heat Dissipation Considerations
ESP-Claw generates some heat when running image processing:
- Attach a small heatsink to the ESP-Claw chip
- Add ventilation holes to the temple or top enclosure
- Avoid continuous high-load operation for extended periods
Functional Testing & Optimization
7.1 Power-On Test
- Insert battery and press the power button
- LED indicators light up (power LED steady, running LED flashing)
- Wait 3-5 seconds for initialization
- Display should show camera feed (passthrough mode)
7.2 Mode Switching Test
Press the temple button to switch modes and verify:
Mode
Expected Effect
Verification Method
Passthrough
Clear real-world view
Visual inspection
Edge Detection
Highlighted contours
Visual inspection
Face Detection
Green box around face
Test facing a person
Night Vision
Brightened dark scene
Test with lights off
Thermal Fusion
Color temperature overlay
Test with MLX90640
Cloud Collaboration
AI analysis text overlay
Test with WiFi connected
7.3 Cloud Collaboration Test (Option B)
- Configure WiFi connection
- Set cloud API key
- Switch to cloud collaboration mode
- Observe if AI analysis results (e.g., object recognition, scene description) are overlaid on the display
7.4 Performance Optimization
If frame rate is too low, try the following optimizations:
- Reduce processing resolution: Camera acquires 640x480, process scaled to 320x240
- Reduce frame rate: Set camera to 15fps instead of 30fps
- Simplify algorithm: Use lighter edge detection operators (e.g., Roberts instead of Sobel)
- Disable WiFi/BLE: Turn off if wireless is not needed to save CPU
- Overclock: Boost ESP-Claw frequency from 240MHz to 280MHz (test stability)
7.5 Battery Life Test
- 2000mAh battery + full function operation: ~1.5-2 hours
- WiFi/BLE off, reduced brightness: ~2.5-3 hours
- Passthrough only (no processing): ~3-4 hours