AR_Glass

by ClumsyCalendar in Design > Digital Graphics

488 Views, 7 Favorites, 0 Comments

AR_Glass

demo.jpeg

With the rapid growth of the OpenClaw open-source ecosystem and cloud-based large language models, the AIoT industry is entering an unprecedented era of opportunity. Our team has long been dedicated to ESP-series development, accumulating extensive experience in embedded image processing and edge computing through the ESP-Claw project. A natural question emerged: Could we integrate ESP-Claw's lightweight AI capabilities into AR glasses, transforming the glasses themselves into an intelligent perception terminal?

This is the origin of our project. Rather than accepting AR glasses as mere "passive displays," we are attempting to let the ESP-Claw controller directly drive the optical module, completing the full pipeline of image acquisition, AI inference, and information overlay on the glasses themselves. Cloud-based large models provide powerful backend support (such as remote model updates and complex task offloading), while ESP-Claw handles latency-critical perception tasks locally. Together, they create a truly intelligent pair of glasses for the AIoT era.

This project is a low-difficulty, easy-to-replicate, and highly practical open-source smart AR glasses build, with a total cost kept under 1,000 RMB. It is suitable for hobbyists and makers, and welcomes commercial secondary development.

This guide will walk you through building a multifunctional AR glasses capable of video playback, thermal fusion, night vision, gesture recognition, SLAM mapping, ESP-Claw AI inference, and cloud-based large model collaboration.

Supplies

routine.jpeg
Structure.png

OSAAR features a 1920×1080 OLED display. The upper section houses a detachable thermal imaging module, and beneath it lies an integrated infrared camera. Core capabilities include:

Feature

Description

Video Playback

HDMI input, acts as a head-mounted display

Thermal Fusion

Detachable thermal module, supports real-time thermal overlay

Night Vision

Built-in IR camera (OV5647 IR), usable in low-light conditions

Gesture Recognition

Based on MediaPipe, extensible for interaction

SLAM Mapping

Supports Unity / AR application development

ESP-Claw AI Inference

Local lightweight AI models (edge detection, object recognition)

Cloud LLM Collaboration

Connects to the cloud via WiFi/BLE for complex task offloading and model updates

Optical Lens Assembly

routine.jpeg

1.1 Optical Principles

The core of AR display is the semi-transparent reflective optical system:

  1. Light from the screen is collimated by a convex lens, making the eye perceive the image as being far away
  2. A semi-transparent reflective coating on the lens allows the virtual image to overlay with the real world
  3. The human eye ultimately sees both the real world and the augmented information simultaneously

1.2 Sourcing and Modifying the Optical Module

  1. Search for "AR glasses optical engine" or "prism OLED" on second-hand marketplaces to purchase a used module
  2. Remove the original LCOS (no open-source driver available), keeping the lens and semi-transparent reflective coating
  3. Ensure the internal space of the module can accommodate the ECX335AF screen

1.3 Screen Installation

  1. Install the ECX335AF OLED screen into the lower part of the optical module, strictly aligning it with the lens imaging center
  2. Install the ECX335 driver board in the upper compartment
  3. Connect the FPC cable (handle with care to avoid excessive bending)
  4. Fix with AB glue, ensuring precise screen-to-lens distance (affects focus)
Warning: The FPC cable bend radius must not be less than 5mm, otherwise signal lines may break, causing display anomalies.

1.4 Expected Display Performance

  1. Resolution: 1920×1080
  2. In-eye brightness: ~300 nit
  3. FOV: ~30°-40°
  4. Weight: Optical module ~40g, total target <100g


ESP-Claw Board Preparation & Firmware Flashing

2.1 ESP-Claw Core Features

ESP-Claw is the key to this project. Its core advantages include:

  1. Dual-core processor + image processing accelerator — Supports local AI inference
  2. MIPI/DVP camera interface — Directly connects to camera modules
  3. WiFi 6 + BLE 5.3 — Seamless access to cloud large models
  4. LCD/MIPI display interface — Can drive display panels
  5. Low-power design — Suitable for battery-powered applications
  6. Mature ESP ecosystem — Arduino/ESP-IDF development with abundant resources

2.2 Development Environment Setup

Arduino IDE Method (Recommended for Beginners):

  1. Install Arduino IDE 2.0+
  2. Add ESP32 board URL: https://espressif.github.io/arduino-esp32/package_esp32_dev_index.json
  3. Search and install "ESP32" support package in the Board Manager
  4. Select the correct board model and port

ESP-IDF Method (Recommended for Advanced Users):

git clone https://github.com/espressif/esp-idf.git
cd esp-idf
./install.sh esp32s3
. ./export.sh

2.3 Firmware Flashing

  1. Connect the ESP-Claw dev board to your computer (USB-UART)
  2. Open a sample program to verify camera and display functions
  3. Compile and upload
// ESP-Claw key configuration example
#define CAMERA_MODEL_ESP_CLAW
#define PWDN_GPIO_NUM -1
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 15
#define SIOD_GPIO_NUM 4 // SDA
#define SIOC_GPIO_NUM 5 // SCL
// ... configure pins according to specific board model
  1. Open the Serial Monitor and confirm successful initialization
Tip: First-time flashing may require holding the BOOT button while pressing RESET to enter download mode. Some boards have integrated auto-download circuits and do not require manual operation.

2.4 Hardware Connection Overview

Option A (ESP-Claw Standalone Mode) Connection:

[AR Glasses End] [ESP-Claw] [Power]
MIPI Camera ────→ MIPI/DVP Interface
HDMI Display ←──── MIPI-to-HDMI Module
FPC Screen Cable ──→ Driver Board ←── Micro-HDMI Cable
3.7V Li-Po + Charge/Discharge Management

Option B (ESP-Claw + Cloud Collaboration) Connection:

[AR Glasses End] [ESP-Claw] [Cloud/Phone]
MIPI Camera ────→ Local Preprocessing
HDMI Display ←──── Overlay Rendering ←─── Large Model Inference Results
WiFi/BLE ─────→ Cloud API

2.5 MIPI-to-HDMI Module Explanation

The MIPI-to-HDMI module (e.g., LT9611, IT6161, etc.) is responsible for converting the ESP-Claw's MIPI DSI signal into an HDMI signal for the ECX335 driver board. This is a critical link in the display chain; ensure the module is compatible with the ESP-Claw MIPI interface.

2.6 Power Supply Methods

  1. Li-Po Battery Power: 3.7V Li-Po battery passes through a TP4056 charge/discharge management module, boosted to 5V/3.3V to supply ESP-Claw, camera, and driver board
  2. Type-C Charging: Charge via Type-C port, which can also serve as a debug interface
  3. External Power: Use a power bank or charger for development and debugging
Note: The bottom-right corner connector for OLED refers to the ECX335 driver board. A standard HDMI to Micro-HDMI cable is sufficient.


Image Processing Algorithm Deployment

3.1 Preparation Tools

  1. USB Data Cable (USB-UART) ×1
  2. ESP-Claw Dev Board ×1
  3. Computer (with Arduino IDE or ESP-IDF installed)

3.2 Development Environment Configuration

Arduino IDE Method:

  1. Install Arduino IDE 2.0+
  2. Add ESP32 board URL: https://espressif.github.io/arduino-esp32/package_esp32_dev_index.json
  3. Search and install "ESP32" support package in the Board Manager
  4. Select board model "ESP32S3 Dev Module" (or corresponding ESP-Claw model)
  5. Select the correct COM port

ESP-IDF Method:

git clone https://github.com/espressif/esp-idf.git
cd esp-idf
./install.sh esp32s3
. ./export.sh

3.3 Firmware Flashing Steps

  1. Connect the ESP-Claw dev board to your computer (USB-UART)
  2. Open a sample program to verify camera and display functions
  3. Compile and upload
// ESP-Claw camera initialization configuration example
#include "esp_camera.h"

#define CAMERA_MODEL_ESP_CLAW
#define PWDN_GPIO_NUM -1
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 15
#define SIOD_GPIO_NUM 4 // SDA
#define SIOC_GPIO_NUM 5 // SCL
#define Y9_GPIO_NUM 16
#define Y8_GPIO_NUM 17
#define Y7_GPIO_NUM 18
#define Y6_GPIO_NUM 12
#define Y5_GPIO_NUM 10
#define Y4_GPIO_NUM 8
#define Y3_GPIO_NUM 9
#define Y2_GPIO_NUM 11
#define VSYNC_GPIO_NUM 6
#define HREF_GPIO_NUM 7
#define PCLK_GPIO_NUM 13

void setup() {
Serial.begin(115200);
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_sccb_sda = SIOD_GPIO_NUM;
config.pin_sccb_scl = SIOC_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
config.frame_size = FRAMESIZE_VGA; // 640x480
config.jpeg_quality = 12;
config.fb_count = 1;
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed: 0x%x\n", err);
return;
}
Serial.println("Camera initialized!");
}

void loop() {
camera_fb_t *fb = esp_camera_fb_get();
if (!fb) {
Serial.println("Capture failed");
return;
}
Serial.printf("Captured: %dx%d, %d bytes\n",
fb->width, fb->height, fb->len);
esp_camera_fb_return(fb);
delay(1000);
}
  1. Open the Serial Monitor (baud rate 115200) and confirm successful camera initialization and capture information
Tip: First-time flashing may require holding the BOOT button while pressing RESET to enter download mode. Some boards have integrated auto-download circuits.

3.4 First Boot Verification

After flashing, power on the ESP-Claw and connect the ECX335 driver board via HDMI. You should see:

  1. Camera passthrough display ✓
  2. Basic system functions ✓
  3. Normal serial output ✓

If the display is normal, both the optical and ESP-Claw circuit sections are working correctly, and you can proceed to algorithm deployment.

Camera & Sensor Integration

4.1 MIPI/DVP Camera Integration (ESP-Claw)

ESP-Claw supports both MIPI CSI and DVP parallel interfaces. OV5647 is recommended to use MIPI connection:

OV5647 Camera ESP-Claw
┌──────────┐ ┌──────────┐
│ VCC 3.3V│←──────→│ 3.3V │
│ GND │←──────→│ GND │
│ MIPI_D0+│←──────→│ MIPI_D0+│
│ MIPI_D0-│←──────→│ MIPI_D0-│
│ MIPI_CLK+│←──────→│ MIPI_CLK+│
│ MIPI_CLK-│←──────→│ MIPI_CLK-│
│ I2C_SDA │←──────→│ GPIO_4 │
│ I2C_SCL │←──────→│ GPIO_5 │
└──────────┘ └──────────┘

After integration, you can perform:

  1. Local AI inference development (based on ESP-WHO framework)
  2. Real-time video acquisition and processing
  3. Cloud large model collaborative interaction

4.2 Image Processing Algorithm Deployment (Core Function)

ESP-Claw runs image processing algorithms locally, outputting results to the display:

Edge Detection Mode:

// Simple Sobel edge detection
void sobel_edge_detect(uint8_t* src, uint8_t* dst, int width, int height) {
int gx, gy;
for (int y = 1; y < height - 1; y++) {
for (int x = 1; x < width - 1; x++) {
gx = (-1 * src[(y-1)*width + (x-1)]) + (-2 * src[y*width + (x-1)]) + (-1 * src[(y+1)*width + (x-1)]) +
(1 * src[(y-1)*width + (x+1)]) + (2 * src[y*width + (x+1)]) + (1 * src[(y+1)*width + (x+1)]);
gy = (-1 * src[(y-1)*width + (x-1)]) + (-2 * src[(y-1)*width + x]) + (-1 * src[(y-1)*width + (x+1)]) +
(1 * src[(y+1)*width + (x-1)]) + (2 * src[(y+1)*width + x]) + (1 * src[(y+1)*width + (x+1)]);
dst[y*width + x] = (abs(gx) + abs(gy)) > 128 ? 255 : 0;
}
}
}

Object Recognition Mode (Using ESP-WHO):

#include "dl_image.hpp"
#include "fb_gfx.h"

void face_detection_task() {
camera_fb_t *fb = esp_camera_fb_get();
// Face detection
box_array_t *net_boxes = face_detect(fb->buf, fb->len, fb->format);
if (net_boxes) {
for (int i = 0; i < net_boxes->len; i++) {
draw_box(fb, net_boxes->box[i], COLOR_GREEN);
draw_text(fb, net_boxes->box[i].x, net_boxes->box[i].y - 10, "Face", COLOR_WHITE);
}
free(net_boxes->box);
free(net_boxes);
}
display_output(fb);
esp_camera_fb_return(fb);
}

4.3 Cloud Large Model Collaboration (Option B Extension)

ESP-Claw connects to cloud large models via WiFi, achieving "edge-cloud collaboration":

#include <WiFi.h>
#include <HTTPClient.h>

const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
const char* cloud_api = "https://api.openai.com/v1/chat/completions";

void cloud_inference_task(camera_fb_t* fb) {
// 1. Local preprocessing: extract key frames, compress
uint8_t* compressed = compress_frame(fb, 50); // 50% compression quality
// 2. Upload to cloud for large model inference
HTTPClient http;
http.begin(cloud_api);
http.addHeader("Content-Type", "application/json");
http.addHeader("Authorization", "Bearer YOUR_API_KEY");
String payload = "{\"model\":\"gpt-4o\",\"messages\":[{\"role\":\"user\",\"content\":\"Analyze the objects in this image\"}]}";
int httpCode = http.POST(payload);
if (httpCode == 200) {
String response = http.getString();
// 3. Parse cloud response
String result = parse_cloud_response(response);
// 4. Overlay result on display frame
overlay_text(fb, result, 20, 20, COLOR_GREEN);
}
http.end();
display_output(fb);
}

4.4 Thermal Module (Optional Extension)

Integrate MLX90640 thermal module (I2C interface):

#include "MLX90640_API.h"

void thermal_fusion_mode() {
camera_fb_t *vis_fb = esp_camera_fb_get();
float mlx90640To[768]; // 32x24
MLX90640_GetFrame(MLX90640_ADDR, mlx90640To);
uint8_t* thermal_overlay = interpolate_thermal_to_visible(mlx90640To, vis_fb->width, vis_fb->height);
overlay_thermal_to_frame(vis_fb, thermal_overlay);
display_output(vis_fb);
esp_camera_fb_return(vis_fb);
}


Display Output & AR Frame Composition

F76RW7TMQA4XR8Q.png

5.1 Display Chain

ESP-Claw display output path:

ESP-Claw MIPI/LCD Controller → MIPI-to-HDMI Module → HDMI Cable → ECX335 Driver Board → ECX335AF OLED Screen

5.2 MIPI-to-HDMI Module Configuration

The MIPI-to-HDMI module (e.g., LT9611, IT6161, etc.) converts the MIPI DSI signal from ESP-Claw into an HDMI signal:

#include "driver/mipi_dsi.h"
#include "lt9611.h"

void display_init() {
// Initialize MIPI DSI interface
mipi_dsi_config_t dsi_config = {
.lane_num = 2,
.lane_rate_mbps = 1000,
.format = MIPI_DSI_FORMAT_RGB888,
};
mipi_dsi_init(&dsi_config);
// Initialize LT9611 bridge chip
lt9611_init();
lt9611_set_resolution(1920, 1080, 60); // 1080p60
// Configure display buffer
display_buffer = heap_caps_malloc(1920 * 1080 * 3, MALLOC_CAP_DMA);
}

5.3 Frame Buffer Overlay Rendering

Overlay processed images + AR information for output:

void display_output(camera_fb_t* fb) {
// 1. Scale camera frame to display resolution
scale_frame(fb->buf, display_buffer, fb->width, fb->height, 1920, 1080);
// 2. Overlay AR information (based on current mode)
switch(currentMode) {
case EDGE_DETECT:
// Edges already processed, display directly
break;
case FACE_DETECT:
// Face boxes already drawn during processing
break;
case THERMAL_FUSION:
// Thermal map already fused
break;
case NIGHT_VISION:
// Frame already brightened
break;
}
// 3. Overlay HUD information (fixed display)
draw_hud_overlay(display_buffer);
// 4. Push to display
mipi_dsi_send_frame(display_buffer, 1920 * 1080 * 3);
}

void draw_hud_overlay(uint8_t* buf) {
// Top-left: Mode indicator
draw_text(buf, 20, 20, mode_names[currentMode], COLOR_GREEN);
// Top-right: Battery level
char batt_str[16];
sprintf(batt_str, "BAT:%d%%", get_battery_level());
draw_text(buf, 1800, 20, batt_str, COLOR_YELLOW);
// Bottom: Crosshair (center point)
draw_crosshair(buf, 960, 540, COLOR_RED);
}

5.4 Display Timing Optimization

Target frame rate: >=15fps (acceptable), ideal 30fps

Optimization strategies:

  1. Camera acquisition resolution: VGA (640x480) or lower
  2. Separate processing resolution from display resolution: process small image first, then upscale
  3. Use double buffering to avoid tearing
  4. Optimize key algorithms with ESP-Claw hardware accelerator


Enclosure Assembly & System Integration

6.1 Circuit Layout

Complete system circuit diagram:

┌─────────────────────────────────────────────────────┐
│ [Glasses Frame / Headband] │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │
│ │ OV5647 │ │ ESP-Claw │ │ 3.7V │ │
│ │ Camera │───→│ Controller │←───│ Li-Po │ │
│ │ (Front) │ │ (Temple/Top)│ │ (Temple)│ │
│ └─────────────┘ └──────┬──────┘ └──────────┘ │
│ │ │
│ ↓ MIPI │
│ ┌─────────────┐ │
│ │ MIPI-to-HDMI│ │
│ │ Bridge │ │
│ └──────┬──────┘ │
│ │ HDMI │
│ ┌──────┴──────┐ │
│ │ ECX335 Driver│←── Micro-HDMI │
│ │ + Screen │ │
│ │ (Optical) │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────┘

6.2 Assembly Steps

  1. Fix Optical Module: Secure the assembled ECX335 + lens module to the front of the glasses frame
  2. Install ESP-Claw: Mount the dev board on the temple or top beam, ensuring good heat dissipation
  3. Connect Camera: Connect OV5647 to ESP-Claw's MIPI interface via FPC cable
  4. Connect Display Chain: ESP-Claw MIPI → MIPI-to-HDMI Module → HDMI Cable → ECX335 Driver Board
  5. Connect Power: Li-Po battery connects to ESP-Claw's 5V/3.3V input via charge/discharge management module
  6. Install Buttons: Mount tactile push buttons on the outer side of the temple for mode switching
  7. Organize Cables: Secure cables with hot glue or zip ties to prevent movement

6.3 Power Supply Design

3.7V Li-Po Battery ──→ TP4056 Charging Module ──→ 5V Boost Module ──→ ESP-Claw / Camera / Driver Board
USB Type-C Charging Port
Warning: Li-Po batteries must have a protection circuit to prevent over-discharge/overcharge/short circuit. Do not use the device while charging.

6.4 Heat Dissipation Considerations

ESP-Claw generates some heat when running image processing:

  1. Attach a small heatsink to the ESP-Claw chip
  2. Add ventilation holes to the temple or top enclosure
  3. Avoid continuous high-load operation for extended periods


Functional Testing & Optimization

7.1 Power-On Test

  1. Insert battery and press the power button
  2. LED indicators light up (power LED steady, running LED flashing)
  3. Wait 3-5 seconds for initialization
  4. Display should show camera feed (passthrough mode)

7.2 Mode Switching Test

Press the temple button to switch modes and verify:

Mode

Expected Effect

Verification Method

Passthrough

Clear real-world view

Visual inspection

Edge Detection

Highlighted contours

Visual inspection

Face Detection

Green box around face

Test facing a person

Night Vision

Brightened dark scene

Test with lights off

Thermal Fusion

Color temperature overlay

Test with MLX90640

Cloud Collaboration

AI analysis text overlay

Test with WiFi connected

7.3 Cloud Collaboration Test (Option B)

  1. Configure WiFi connection
  2. Set cloud API key
  3. Switch to cloud collaboration mode
  4. Observe if AI analysis results (e.g., object recognition, scene description) are overlaid on the display

7.4 Performance Optimization

If frame rate is too low, try the following optimizations:

  1. Reduce processing resolution: Camera acquires 640x480, process scaled to 320x240
  2. Reduce frame rate: Set camera to 15fps instead of 30fps
  3. Simplify algorithm: Use lighter edge detection operators (e.g., Roberts instead of Sobel)
  4. Disable WiFi/BLE: Turn off if wireless is not needed to save CPU
  5. Overclock: Boost ESP-Claw frequency from 240MHz to 280MHz (test stability)

7.5 Battery Life Test

  1. 2000mAh battery + full function operation: ~1.5-2 hours
  2. WiFi/BLE off, reduced brightness: ~2.5-3 hours
  3. Passthrough only (no processing): ~3-4 hours