Wireless earbuds are everywhere, but they have always been limited to sound. A new prototype from researchers at the University of Washington aims to change that — by embedding tiny cameras into a pair of earbuds and connecting them to an AI system capable of answering questions about the visual world.
The system, called VueBuds, is the first camera-integrated wireless earbuds designed for real-time visual intelligence. By pairing dual ear-mounted cameras with a vision language model, users can ask spoken questions — “What am I holding?” or “Can you translate this?” — and receive spoken answers, without reaching for a phone or strapping on smart glasses.
In an online study with 74 participants rating AI-generated responses on a 1-to-5 scale, VueBuds achieved a mean opinion score of 3.33 across 17 visual question-answering tasks, compared to 3.32 for the Ray-Ban Meta smart glasses, which is an essentially identical result, despite VueBuds using low-resolution, monochrome cameras. “Our work establishes low-power camera-equipped earbuds as a compelling platform for visual intelligence, bringing rapidly advancing VLM capabilities to one of the most ubiquitous wearable form factors,” the researchers write.
The case for earbuds over glasses, the researchers argue, comes down to reach. In the same survey, 93.3% of participants reported using earbuds at least occasionally, compared to 62.7% for glasses. More than a third of participants said they never wear glasses at all, excluding them from any glasses-based AI system.
To build VueBuds, the team integrated an ultra-low-power Himax CMOS image sensor into a pair of Sony WF-1000XM3 earbuds, housed inside custom 3D-printed enclosures. Each camera draws under 5 milliwatts of power and streams monochrome imagery over Bluetooth Low Energy to a host device, where a vision language model processes the images and synthesizes a spoken reply. Even at heavy use, 60 visual queries per hour, the camera hardware adds only 11 to 14 percent of battery overhead, the study finds.

Maruchi Kim, Rasya Fawwaz, et al.
A key engineering challenge was the placement of the cameras. Unlike smart glasses, where lenses sit close to the eyes, ear-mounted cameras face partial obstruction from the wearer’s own face. The team addressed this by angling the two cameras slightly outward, producing a combined field of view up to 108 degrees, comparable to Ray-Ban Meta’s 100 degrees, while keeping the blind spot directly in front of the user well within comfortable interaction distances.
In an in-person study with 16 participants across kitchen, office, and living room settings, VueBuds achieved 82.5% accuracy on object recognition, 94.3% on reading text, and 83.8% on translation tasks. Participants generally reported that the devices felt like ordinary earbuds. One noted a privacy advantage that glasses cannot offer: “I think I could put my hair down, and I can stop it from recording any visual data. Seems like that can’t be done with glasses.”
Others pointed to practical everyday applications. “When I’m running or biking, it’s a lot easier to just say a wake word and ask a question than to stop, take out my phone, take a photo, etc.,” one participant observed.
Source: 10.1145/3772318.3791322
