AI Pet Pendant

Pendant with Alibaba Cloud text to image then image to video AI generated clips

This instructables show how to make a pendant that you can put your AI pet inside.

Supplies

Waveshare ESP32-S3 1.46inch Round Display Development Board (without cover glass)
MicroSD card (1GB volume size is already good enough)
502525 Lipo battery

Optional for pendant case:

35 mm round glass cabochons
one 12 mm M2 screw and nut
Accessories to make the pendant as a necklace or a bag charms

Note:

If you would like to use below 3D print pendant case, remember order the version of dev board without cover glass.

Idea

Pet backpacks can help you carry your pet in crowded places. But carrying a pet around every day is pure torture. Therefore, blind boxes have appeared on the market to satisfy your desire to have a pet by your side.

Blind boxes only contain static objects, why not replace them with AI pets?

Imagine an AI pet living inside a pendant, the pendant's screen is normally off. When you want to check on the AI pet, simply tap a button, and the screen will display your pet's activities inside the pendant, just like when you connect your phone to a home camera to check on your pet at home.

Proof of Concept

The ultimate goal is to raise an AI pet inside a pendant, but it is a long long way to go. Let's proof it is a good idea first. We need a pendant that can play videos of the AI pet's lives.

Let's break it into small steps:

Creating video clips of pet activities using Generative AI.
Convert clips to a dev device playable file format.
Make the dev device become a pendant.
Tap the button on the pendant to play a random clip.

AI Video Generator

You can find all sorts of AI video generators online, and they all work on similar principles. You need to describe in words the video you want to generate. The more detailed the description, the more closely the generated video will match your expectations.

This project use Alibaba Cloud as an example, other AI video generators should follow a similar approach.

Prompt

"Prompt" refers to the text input received by the AI.

Just like writing a story, The "5 Ws" (Who, What, When, Where, Why) are fundamental questions used to gather complete information. When writing prompt to AI, at least 3 "W" required for video generation:

Who, the object of the video, e.g. a calico kitten has large amber eyes and fluffy fur.
What, the event or action, e.g. the cat energetically chases a red laser dot across the rug.
Where, the scene of the video, e.g. a high-tech, minimalist space capsule with curved white walls, holographic displays, floating snacks, and a large round window showing Earth or deep space.

The scene sometimes also mentioned the time, i.e. "When", e.g. Soft afternoon sunlight streams in through the window on the left.

The generated video is just 5-15 seconds, it is not a complete story so in most case skipped "Why" information.

Moreover, AI Image/Video generation also can select the output style, e.g. Cinematic, soft lighting, hyper-realistic, shallow depth of field, slow, graceful movement.

If you still don't know how to write the prompt, then seek help from AI, just like I did ;>

Trial 1 - Image to Video

My first attempt simply selecting sample images provided by Alibaba Cloud and based on AI suggested prompt to generate few video clips. You can click the link below to access the YouTube playlist and view all the video outputs. You can find the prompt for generating the video in the video description.

Because the scene mentioned in the prompt doesn't match the selected image, you can see the entire scene gradually disappear from the house and eventually transform into a spaceship. This is unexpected, but very interesting.

Full Video list:

https://youtube.com/playlist?list=PLvFjwCSTDUWVYVsvr16NST3ONjLKI8NQp&si=rqce1EhY5hFiUI0a

Trial 2 - Text to Video

Then I try the text to video model. Without image assistance, all details are based on text descriptions. Therefore, it can have many variations. For example, I didn't initially mention eye color, AI generated a green one. I didn't like it, so I added amber eye color in the prompt.

Although all the generated versions used the same scene description, there were still some differences in the output.

Full video list:

https://youtube.com/playlist?list=PLvFjwCSTDUWUKPMIZIdap7VigL1bGQmFY&si=Ulax1uFJGOu28CcN

Trial 3 - Text to Image to Video

Based on the experience of the previous two iterations, I changed my strategy. To align the scenes of all video clips, I should first generate a good initial image.

I started by using a text to image model to create images of pets sleeping. After generating more than a dozen output images, I chose my favorite.

Based on the favorite image, few video clips are generated using an image-to-video model. The prompts are the same, only replace the pet activity descriptions. The output quality is much better now and all scenes are aligned.

Full video list:

https://youtube.com/playlist?list=PLvFjwCSTDUWUbhOdD4IqksB2Ls70hdrbT&si=xAba8PxieubKH2fw

Video Conversion

After download the generated video clips, you need to convert the video format.

Here are the sample convert command: (please change the input.mp4 and output.avi to your file names)

ffmpeg -y -i input.mp4 -ab 64k -ac 2 -ar 44100 -af loudnorm -c:a mp3 -pix_fmt yuvj420p -c:v mjpeg -q:v 7 -vf "fps=15,scale=-1:412:flags=lanczos,crop=412:412:0:(in_h-412)/2" output.avi

Then create a folder called "avi" in MicroSD card, copy all the converted AVI files into the folder.

Ref.:

https://ffmpeg.org/ffmpeg.html

Software Preparation

Arduino IDE

Download and install Arduino IDE latest version if not yet:

https://www.arduino.cc/en/software

Arduino-ESP32

Follow installation step to add Arduino-ESP32 support if not yet:

https://docs.espressif.com/projects/arduino-esp32/en/latest/installing.html

Arduino_GFX Library

Open Arduino IDE Library Manager by selecting "Tools" menu -> "Manager Libraries...". Search "GFX for various displays" and press "install" button.

You may refer my previous instructables for more information about Arduino_GFX.

Dev Device Pins

Open Arduino IDE Library Manager by selecting "Tools" menu -> "Manager Libraries...". Search "Dev Device Pins" and press "install" button.

JPEGDEC

Open Arduino IDE Library Manager by selecting "Tools" menu -> "Manager Libraries...". Search "JPEGDEC" and press "install" button.

arduino-libhelix

Download and import the arduino-libhelix library to Arduino IDE:

https://github.com/pschatzmann/arduino-libhelix.git

Note:

You may refer to Arduino Documentation for the details on how to install/import library to Arduino IDE.

Upload Program

This project use aviPlayer developed in my previous project for playing video.

Download the source code at GitHub: https://github.com/moononournation/aviPlayer.git
Open "AviPlayer/AviPlayer.ino" in Arduino IDE
At around line 36, ensure below line is uncommented and all other Dev Device header files are commented:

#include "PINS_ESP32-S3-Touch-LCD-1_46.h"

Then compile and upload the program to Waveshare ESP32-S3 1.46inch Round Display Development Board.

What the Program Do?

ESP32-S3-Touch-LCD-1.46B-details-size.jpg

The aviPlayer.ino is a simple program, it simply read the AVI files from the avi folder and play one of it by random.

This project selected a Waveshare dev device, it can be powered by Lipo battery. The dev device always on when plugged USB power, but if it only powered by Lipo battery, it always off.

Holding the power (PWR) button turn on the dev device. The program should do something to the BAT_Control (GPIO 7) pin to keep the power on. Without doing that, the dev device will power off once release the power button. This behavior just met the "peek" design, hold the button few seconds to peek what are pet doing and then leave it off again. This design is very power saving, you can bring it along with you a long time without charging.

3D Print Pendant Case

Please download and 3D print the model at Thingiverse: https://www.thingiverse.com/thing:7324253

Assembly

Please follow above video for the pendant case assembly.

What's Next?

The Proof of Concept stage is finished. The next step is towards the goal of raising an AI pet inside a pendant.

We can keep using "Text To Image To Video" model in next step. Image and prompt keep unchanged, but the pet activity can have infinite possibilities. Once you have new idea, you can generate new video clips.
ESP32-S3 is a Wifi capable chip, the program can be enhanced to retrieve new video clips wirelessly.
New pet activity idea can be suggested by AI, e.g. use claw digest new online pet video and "taught" your pet to do it

At that stage, your pet still not yet raising inside a pendant, but you are raising it on the Internet. We look forward to powerful hardware that will enable independent AI to be packed into portable devices.