Augmented (non-visual) reality

Been thinking a lot about augmented reality recently, for fairly obvious reasons. The other night I was talking to Yiying Lu at the first LaunchUp Las Vegas about AR and the possibilities inherent in it, and it got me thinking.

I’ve always had a big interest in ambient information interfaces and what is apparently called “calm technology“. Ambient information is really the idea that much of the information we’re required to monitor/process in our daily lives doesn’t actually require our complete attention all the time; that our interface to it can be passive, subliminal, rather than active and engaged.

For example, several years ago a company called Ambient Devices debuted a product called Ambient Orb that was basically a frosted globe with multicolored LEDs in it. This device connected up to your various data streams and changed color based on their status. For example, you could set the Orb to glow red if you had unread email, or change color based upon the performance of your stocks. The idea was that you wouldn’t have to sit and actively engage with your data; you could reduce it to a simple, nearly-binary yes/no or color-based alert system.

The Orb never really did much for me, because it used a weird non-standard wireless network to return data and it seemed pretty locked in to the use cases that Ambient had created for it; I have no need to constantly (or really ever) monitor stock data in real-time, for example. But it was an interesting idea in theory, and one which has never really gone away.

The most obvious example of a passive info display is a traditional clock face, with hour, minute and second hands. Most people on Earth can simply glance at a clock and instantly recognize what time it is — maybe not down to the precise second, but close enough for government work, as the old saying goes. Clocks are both ambient and precise interfaces, of course; if you look at the clock for more than a second you can determine the precise time — by “drilling down” past the “macro” interface (i.e. the general geometric configuration of the clock hands) and focusing on the “micro” interface (the second hand and the numbers).

A more modern example, in my opinion, are the icon “badges” that appear in Apple’s iOS (and, to a lesser extent, in Dock icons in their main MacOS X). The badge overlaid on the Mail icon, for example, shows you how many unread emails you have. But it also functions as a simple binary indicator: if it’s there, you have unread mail. Again, you drill down by focusing on the number inside the icon, which is the actual number of unread emails.

This is all very fine and dandy, but I started thinking about one of the basic underlying assumptions in both ambient information and in augmented reality: namely, that the primary interface ought to be visual.

In our everyday lives, at least for the non-deaf, sound is our most often-used tool for passively gathering information about the state of our environment. We listen to the world as much as we look at it. And my own experience suggests that my own brain is better at setting audio to a lower cognitive priority than vision, while still paying continuous partial attention to it.

I suspect this is true of most humans: this is why it is illegal (at least in America) to put a video player in front of the driver in a motor vehicle, but car radios have been with us since around 1930. You can listen to This American Life and simultaneously perform the complex mental gymnastics required to drive a car without killing yourself or anybody else…but very, very few of us could watch TAL’s companion TV series on an LCD screen above our dashboard and make it more than a mile or two without ending up in a ditch. I’m sure there’s a well-documented neurological reason for this, but I’m gonna just skip the Googling here and say with confidence: humans are better at passively monitoring audio than vision.

Musician/artist/producer Brian Eno discovered this in 1975, when he was laid up in bed after being hit by a car. As he explains it in the liner notes to Discreet Music, the album that essentially launched the “ambient” music genre:

In January this year I had an accident. I was not seriously hurt, but I was confined to bed in a stiff and static position. My friend Judy Nylon visited me and brought me a record of 18th century harp music. After she had gone, and with some considerable difficulty, I put on the record. Having laid down, I realized that the amplifier was set at an extremely low level, and that one channel of the stereo had failed completely. Since I hadn’t the energy to get up and improve matters, the record played on almost inaudibly. This presented what was for me a new way of hearing music – as part of the ambience of the environment just as the colour of the light and the sound of the rain were parts of that ambience. It is for this reason that I suggest listening to the piece at comparatively low levels, even to the extent that it frequently falls below the threshold of audibility.

This gave Eno the impetus to create “make a piece that could be listened to and yet could be ignored… perhaps in the spirit of Satie who wanted to make music that could ‘mingle with the sound of the knives and forks at dinner.'”

And yet, despite the obvious advantage of auditory interfaces, most ambient information interfaces are still visual displays, like the Ambient Orb. (One notable exception is in-car GPS, which usually uses some sort of celebrity voice to give you ambiguous directions.) And, so far as I am aware, the bulk of augmented reality tools — which, by overlaying data onto our real environment, are a sort of cousin of ambient displays — use audio sparingly, if at all.

In these use cases, audio simply makes more sense in these contexts. For another thing, playing and even manipulating audio in real time is far less power and CPU intensive than rendering graphics in real time, which is why the iPhone can serve as a perfectly serviceable guitar stomp box.

Imagine, for example, a simple navigation system for a mobile device that directed you towards your destination via the simple means of adjusting the balance and volume of the music playing via your audio player. The sound is “centered” when you’re facing your destination directly; turn right and the music grows louder in your left ear and vice versa. The closer you are to your destination, the louder the music plays. Such a system would require no engagement at all by the user; the mobile device could remain safe in the user’s pocket for the duration of their journey. It’s an incredibly intuitive and simple interface to a fairly complex computational system.

(Of course, such an interface would be rather annoying on its face for someone who was simply trying to listen to their tunes, but it gives an example of one modality for such a tool.)

Talking to Yiying, I was reminded of an old idea of mine: an entirely musical social network based on generative music created via musical “DNA”. Think about how music discovery apps like Pandora work: each song is weighed against a set of criteria, such as “slow/fast” “exciting/mournful”, etc. Each track’s particular set of traits serves as its musical “chromosomes” so to speak.

So imagine that you sign up for this network and feed it your Pandora or Last.fm playlists. It analyzes your music and creates a new music for you: a sort of personalized, endlessly generated, changing score: your theme music.

Your theme music plays over your mobile device as you navigate through the world — ambient music, as easily ignored as paid attention to. Now, imagine you walk by someone else using this notional network — someone else with their own theme music.

As you pass them, some magic wireless technology (Bluetooth, near field communications or something powered by unobtanium) triggers off. It randomly merges your musical DNA with theirs, and your theme music is subtly altered by your encounter with them, and vice versa, incorporating elements of their musical genes.

This wouldn’t be limited to people, either. Buildings, landmarks, communities, even the time of day — your theme could be endlessly modified by the very landscape you walk through. Over time, your music would become a product of your encounters, your experiences, your movements. (Of course, this is true anyway of anybody’s music collection, in a metaphorical sense. But in this case it would be literal.) You could “follow” people based upon the grooviness of their personal soundtrack.

This is one — admittedly slightly silly — example of an auditory augmented reality. But there are more practical and practicable variations on this as well. I’d love to see more people think about and embrace the potential of auditory ambient/AR systems as they become more widely-used and prevalent in the UI/UX community.

Leave a comment