Real-Time Face-Aware Earring Recommender Using Haar Cascades

by Ranvi25 in Circuits > Computers

73 Views, 1 Favorites, 0 Comments

Real-Time Face-Aware Earring Recommender Using Haar Cascades

Screenshot 2026-04-05 153804.png

Have you ever stood in front of a mirror wondering which earrings actually suit your face? This project answers that question in real time using nothing more than a webcam, Python, and OpenCV's built-in Haar cascade classifiers.

The AI Jewelry Stylist analyses your face live on camera, measures six distinct facial signals, classifies your face shape into one of six categories, and recommends the best earring style from a set of eight options. All of this runs locally — no cloud, no machine learning frameworks, no special hardware.

The best part? Every technique used is explained from scratch, so you can adapt, extend, and make it your own.

Supplies

Hardware

  1. A computer running Windows, macOS, or Linux
  2. A webcam (built-in laptop cameras work perfectly)
  3. Good lighting — a desk lamp pointing at your face makes a real difference

Software

  1. Python 3.8 or later
  2. OpenCV — installed with one command:



pip install opencv-python

That's it. OpenCV ships with all the Haar cascade XML files you need already bundled inside it.

The Science: Face Shapes and Earring Styling

Screenshot 2026-04-05 145307.png

Before writing a single line of code it's worth understanding the real-world styling rules we're encoding. Professional stylists have used face shape as a guide for earring recommendations for decades.

Face shape Ratio(H/W) Key feature Best Earring Avoid

Oval ~1.35 Balanced Almost anything — drop earrings especially Overloading

Round < 1.10 Wide Hoops,drops Small studs

Square ~1.10 Strong jaw Statement, hoops Geometric studs

Oblong > 1.55 Narrow Studs, clip-ons Long drops

Heart ~1.40 Wide forehead Chandelier, drops Top-heavy pieces

Diamond ~1.45 Wide cheeks Ear cuffs, studs Wide statement pieces

The ratio column is the key number — it's simply bounding-box height divided by bounding-box width. A ratio close to 1.0 means a wide face, while a ratio above 1.5 means a long narrow face. Everything else in the algorithm flows from this one measurement.

How Haar Cascade Detection Works

A Haar cascade is a trained binary classifier that slides a detection window across an image at multiple scales. At each position it applies a chain of simple rectangle filters. If an early filter fails, the window is immediately rejected — which is what makes it so fast even on a basic laptop.

OpenCV ships with pre-trained Haar cascades for frontal faces, profile faces, eyes, and smiles. We use all four in this project to extract as much information as possible from a plain webcam feed.

The four cascades we use:

  1. haarcascade_frontalface_default.xml — primary face detection
  2. haarcascade_profileface.xml — catches side-on faces the frontal cascade misses
  3. haarcascade_eye.xml — locates both eyes so we can measure their spacing
  4. haarcascade_smile.xml — detects smiling and adjusts jewelry mood accordingly

One important trick: before running detection we call cv2.equalizeHist() on the greyscale frame. This stretches the contrast of the image and significantly improves detection accuracy in dim or uneven lighting — something that matters a lot when someone is sitting at a desk with one window on their left

Extracting the Six Facial Signals

Screenshot 2026-04-05 145513.png

Once a face is detected we extract six numerical signals from the bounding box and from the eye sub-detections. These are the raw inputs to the scoring engine.

The first is the face ratio, which is simply the bounding box height divided by its width. This is the single most important signal — it's what drives the face shape classification in the next step.

The second is slimness, computed as (h - w) / h. This captures how narrow the face is independently of the ratio, and is the main driver behind whether drop earrings or studs score higher.

The third and fourth signals both come from the eye cascade. Eye span ratio is the distance between the two eye centres divided by the face width — wide-set eyes suit bolder, wider pieces while close-set eyes are better balanced by drops that draw the eyes apart. Eye height ratio is where the eyes sit vertically on the face as a fraction of face height. Eyes that sit high up on the face mean the lower half is prominent, so chandelier earrings score a boost to balance it out.

The fifth signal is face area — simply width times height in pixels. This acts as a rough proxy for how prominent and strong the facial features appear on screen. Larger faces carry statement pieces and chandeliers better than smaller, more delicate faces.

The sixth signal is smile detection, which is a binary yes/no from the Haar smile cascade. When a smile is detected the algorithm gives a small boost to playful styles — hoops and statement earrings — on the basis that someone who's smiling can carry a more expressive piece.

The eye signals are computed by running the eye cascade on the face region-of-interest only, not the full frame. We sort the detected eyes left-to-right, take the centre point of each, and compute span and height from those centres. If fewer than two eyes are found we fall back to neutral values of 0.5 and 0.4 — the EMA smoothing in Step 5 handles these occasional gaps without the recommendation jumping around.

The Scoring Engine

Screenshot 2026-04-05 145340.png

The scoring engine combines two layers that are added together and then normalised so all scores sum to 1.0 and can be displayed as percentages.

Layer 1 — Shape Affinity Table

Each face shape has a manually tuned affinity score between 0.0 and 1.0 for each of the eight earring types. These values encode the same expert styling knowledge from Step 1 — the difference is that here they're expressed as numbers the algorithm can actually work with.

An Oval face, for example, gets a 0.85 affinity for Drop Earrings and a 0.75 for Threader Earrings, because oval faces are balanced enough to carry almost any elongating style well. A Round face gets a 0.85 for Hoops because hoops are the classic counterbalance for a wider face shape. An Oblong face gets a 0.80 for Studs and Clip-Ons because those styles don't add any vertical length to a face that's already long. Every number in the table was set by working through the styling logic in Step 1 and translating it into a score.

These affinity scores are multiplied by 1.5 before being added to the totals, which gives the face shape classification a strong base influence that the continuous signals in Layer 2 then fine-tune on top of.

Layer 2 — Continuous Signal Adjustments

The six facial signals each add targeted boosts to specific earring types. This is what separates the algorithm from a simple lookup table — two people can share the same face shape category but still get different recommendations because their individual feature measurements differ.

The ratio signal pushes Drop Earrings and Threader Earrings higher the taller the face gets, and pushes Hoops higher the wider the face gets. For very tall faces with a ratio above 1.5 it also boosts Studs and Clip-Ons, because at that point elongating styles would make the face look even longer. The slimness signal reinforces this — a slim face gets an extra drop earring push on top of whatever the ratio already contributed.

The eye span signal is where things get interesting. Wide-set eyes get a boost toward Ear Cuffs and Statement Earrings, because bold pieces that draw attention outward complement the natural width. Close-set eyes get a boost toward Drop Earrings, because drops draw the eye downward and outward, creating the illusion of more space between the eyes. The eye height signal adds to Chandelier Earrings when the eyes sit high on the face, because chandeliers fill in the visual weight of the lower face.

Face area boosts Statement Earrings and Chandelier Earrings — not because bigger faces are more fashionable, but because larger, more prominent facial features can carry more dramatic pieces without being overwhelmed by them.

Finally, if a smile is detected, Hoops get a 0.3 boost and Statement Earrings get a 0.2 boost. This is a small but meaningful nudge — someone smiling has an expressive, open energy that suits a more playful piece.

Every adjustment is passed through clamp(x, 0, 1) before being added, which stops any single signal from dominating the result. Once all the adjustments are in, the scores are divided by their total so the final output is always a clean set of percentages that add up to 100.

Stability: EMA Smoothing and Lock Threshold

Raw per-frame measurements jump around a lot. A face bounding box shifts by a few pixels every frame, lighting changes subtly, and the eye cascade fires inconsistently. Without smoothing the recommendation would flicker every few frames and feel completely unusable.

We fix this with two techniques working together.

Exponential Moving Average (EMA)

EMA weights recent measurements more heavily than old ones, but without the sharp edges of a simple rolling average. The update rule is just one line:



python

ema = alpha * new_value + (1 - alpha) * previous_ema

We use alpha = 0.15, meaning each new frame contributes only 15% of the update. The other 85% is carried over from the previous smoothed value. This gives a stable signal that still responds to genuine changes — if you move significantly closer to the camera the ratio will drift to reflect that over a second or two, rather than snapping instantly or ignoring the change entirely.

Each feature gets its own independent EMA state, stored in a dictionary keyed by name. This means the ratio, area, slimness, eye span, and eye height are all smoothed separately, which is important because they have very different natural ranges and rates of change.

Lock Threshold

Even with EMA smoothing the top recommendation can occasionally swap between two close competitors for a few frames. The lock threshold fixes this by adding a second layer of inertia at the output level rather than the feature level.

The displayed recommendation only changes after 12 consecutive frames agree on a different winner. If the new top pick appears for 11 frames and then drops back, nothing changes. Only a sustained, consistent shift triggers an update. This makes the interface feel confident and deliberate rather than indecisive, which matters a lot for something the user is meant to read and act on.



python

if ranked[0][0] != locked_ranked[0][0]:
lock_counter += 1
if lock_counter >= LOCK_THRESHOLD:
locked_ranked = ranked
lock_counter = 0
else:
lock_counter = max(0, lock_counter - 1)

The counter also decrements when the locked recommendation wins a frame, which means brief flickers don't accumulate — if the challenger appears for 5 frames, disappears for 5 frames, and reappears, the counter resets each time rather than building toward a false switch.

The HUD Overlay

The HUD gives the user all the information they need at a glance without cluttering the view of their face. It has two main parts: the label above the face box, and the side panel.

Face Box and Label

The face bounding box changes colour depending on the detected face shape — green for Oval, blue for Round, orange for Square, purple for Oblong, pink for Heart, and cyan for Diamond. This gives instant visual feedback on what the algorithm thinks it's looking at, which is useful both for the user and for debugging.

Floating centred above the box is the best pick label, showing the top earring recommendation and its confidence percentage. It sits on a small semi-transparent rounded rectangle so it stays readable against any background. The label is centred over the face rather than pinned to the top-left corner of the box, which looks much more deliberate and polished.

Side Panel

The side panel is a semi-transparent dark rectangle on the right side of the frame. It shows the top 5 ranked earring types, each with a horizontal score bar and a percentage. The bars are colour coded — gold for first, blue for second, purple for third, and so on — so you can see at a glance how close the competition is between the top picks.

Below the rankings the panel shows the face shape label in the matching box colour, a line indicating whether eyes were detected and whether a smile was found, and a small debug strip at the very bottom showing the current smoothed ratio and eye span values. That debug strip is worth keeping in even in a finished demo — it lets anyone watching immediately understand what the algorithm is measuring.

The Rounded Rectangle

OpenCV has no native rounded rectangle drawing function, so we build one ourselves. The approach is to draw two overlapping filled rectangles — one horizontal and one vertical — and then place four filled circles at the corners where the rectangles don't reach. That gives a shape that looks rounded. An alpha blend with the original frame using cv2.addWeighted then gives the translucency effect.

One important note: never use emoji in cv2.putText. OpenCV only renders ASCII characters — this is exactly why the original prototype showed ???? where the emoji should have been. Plain text labels only.

Running It

Save the full source code as jewelry_stylist.py and run:



python jewelry_stylist.py

A window opens showing your webcam feed with the HUD overlay. Sit about 50–80 cm from the camera in decent light. Press Q to quit.

If no face is detected, the bottom of the frame shows a prompt to move closer. If the recommendation feels wrong, the debug strip is your first stop — check what ratio and eye span values the algorithm is actually seeing. A ratio that seems off usually means the head isn't level or the camera isn't at eye height. An eye span stuck at 0.5 usually means the eye cascade didn't fire, which is almost always a lighting problem.

If confidence scores are very low across the board it usually means the face is partially out of frame. Centre yourself so your full face including forehead and chin is visible.

Extending the Idea

Screenshot 2026-04-05 145746.png

The techniques here generalise well. The same EMA smoothing, lock threshold, and Haar-based feature extraction pipeline could power a hat recommender, a glasses frame finder, or a hairstyle suggester — all from a plain webcam with no special hardware.

Within jewelry specifically, the most natural extension is necklace recommendations. The face shape categories map directly onto necklace styles — V-necks and pendant necklaces elongate round faces, chokers suit oblong faces, statement necklaces work on square faces. A second affinity table and a second line on the HUD panel would do it.

Skin tone estimation is another good one. If you sample the average pixel value inside the face ROI in HSV space, a simple hue classification gives warm, cool, or neutral undertones. That lets you add a metals recommendation — gold for warm tones, silver for cool — on top of the earring shape suggestions.

For the experience itself, a photo booth mode is a satisfying addition. Give the user a 5-second countdown before locking the final recommendation and saving a screenshot with cv2.imwrite(). It makes the whole thing feel much more like a finished product and much less like a demo.

If you want to handle groups, the algorithm currently only processes the largest detected face. You could extend it to loop over all detected faces and draw a separate mini-panel for each person in frame, which works well for two or three people but gets crowded quickly beyond that.

Conclusion

This project shows you don't need deep learning, a GPU, or an internet connection to build something genuinely useful. By combining four of OpenCV's built-in Haar cascades with a carefully designed multi-signal scoring engine, you get a real-time jewelry recommender that responds to your actual facial features.

The most important lesson is that domain knowledge matters more than algorithmic complexity. The scoring engine works not because of clever code, but because the underlying styling rules — developed by professional stylists over many years — are encoded carefully into the affinity table and the signal weights. The code is just the vehicle.

The same principle applies if you extend it. Before writing any new scoring logic, find the domain knowledge first. Look up how stylists actually think about necklace lengths, or how makeup artists approach skin undertones, and encode that knowledge directly. The algorithm will follow.