The world is a robot.

This afternoon, I attended an excellent talk by Ken Goldberg about “cloud robotics” — the idea of building robots that are essentially taught and controlled by the Internet “cloud”. As Ken was talking, I had a moment of pure epiphany about cloud robotics and the “Internet of things“. I realized that the underlying assumptions about how this should all work are completely wrong.

First, a bit of a summary: in the traditional model, a robot is an autonomous (or semi-autonomous) object. Its behavior is pre-programmed into it, and it’s set loose to do whatever it does: build automobile chassis, or roll around your house vacuuming until it’s gotten every bit of the floor. These sorts of robots are extremely limited, because they can only deal with whatever it is they’re programmed to do in the first place. Your Roomba is very good at vacuuming your rug, but if it encounters a Coke can on the rug, it doesn’t know what to do with it — it either ignores it or it runs away in a robot’s version of existential dread.

“Cloud” robotics refers to the idea of robotic systems in which the behavior modeling is offloaded onto the Internet. A perfect example is Google’s self-driving car, which is absolutely incapable of driving itself around the most sedate of suburban neighborhoods without a constant connection to Google’s servers, which are processing the data from the car’s sensors and comparing it to maps and predicated behavior and reporting back to the car, adjusting its actions accordingly. In this sense, the self-driving car isn’t self-driving at all. There is no direct human intervention, but in a very real sense, it’s Google’s servers that are behind the wheel.

There’s a lot of work being done in this area, to make robots smarter and in less need of human intervention. Ken talked about the idea, for example, of a robot that uploads pictures of an unfamiliar item to the cloud, which interprets the picture, deciphers what the object is, and returns instructions to the robot on how to deal with it. If algorithms break down, we can even foresee a future in which robots “call in” to humans, who “tell” the robot how best to proceed.

This is all well and good and presents a rosy future, but the fact is that at the moment this is all a fantasy. Right now, there’s no standard way for robots to communicate with the cloud, and even if there were, there’s no standard way for that communication to be translated into action. Every robot works differently, every robot design is unique; one would have to write an entire software stack to deal with each and every model of robot.

In fact, robots in 2013 are very much like musical synthesizers were, up until the late 1980s. This is a digression, but bear with me.

If I showed up on the doorstep of a forward-thinking musician in 1979 or so and asked them to define a synthesizer, they’d tell me that it was an electronic device that made sounds. Synths were boxes, with a “brain” which took signals from an input controller — usually, but not always, a piano-style keyboard — and turned them into audio signals that were sent to an output (usually a speaker or a mixing board for recording). Though the principles of synthesis were pretty much the same, every synth went about it in a different way: the “brain” of a Moog was totally different from that of a Buchla, for example, and in many cases they even handled the input from their keyboards totally differently. Everyone was, not so much reinventing the wheel, but inventing their own wheel.

It occurred to somebody in the late 1970s that it would be really useful if you could control multiple synths from the same keyboard, or even figure out a way to record a series of notes that could be “played back” on a synth live, to allow much more complicated music than could be performed by a single person with one or even two keyboards. But at the time, there was no real way to accomplish this, due to the sui generis nature of every synth.

A lot of goofy hacks and kludges were invented to solve this problem — including devices that sat atop the keyboards of different synths and physically pressed the notes, using solenoids — until a group of nerds invented something called MIDI, or Musical Instrument Digital Interface, in the early 1980s — a protocol for allowing synthesizers to communicate amongst one another that is still the de facto standard today.

The entire MIDI protocol is too complex to get into here, but the gist of it is that a MIDI-enabled device can send or receive basically three instructions: turn X note on on channel Y at Z volume; turn X note off on channel Y; and send X value on channel Y to controller Z. That, a bunch of wanky technical details aside, is basically it! And while MIDI has its very serious limitations, it’s the basis of at least 50% of the musical sounds you hear every single day — from the ravey keyboard lead in Lady Gaga’s “Bad Romance” to the Hammond organ sound on your favorite indie track to the deep beats on Kanye’s new jam.

Aside from the ability for one synth to talk to another, MIDI allowed something else entirely: the ability to separate the input device from the output device. To our musician in 1979, a “synth” was a monolithic physical object; but in the 1980s, you began to see synth “brains” without keyboards and keyboards without brains that could be connected using standard MIDI protocols (and cables). And as desktop computers became more powerful, a “synthesizer” could just as easily refer to a piece of software, controlled by an inexpensive MIDI keyboard controller that sends MIDI signals over USB, as a big box sitting on your desk. In fact, you don’t even need a human performer at all; one of my hobbies is writing little apps that algorithmically generate MIDI commands and send them to my software synths. (I’ve actually released two of the resulting tracks commercially, and they’ve sold surprisingly well.)

Ask a musician in 2013 what a “synth” is, and they’re not likely to describe a big physical box to you; they’re more likely to tell you it’s an app that runs on their laptop or their iPad.

The monolithic, in other words, has become modular.

By contrast, ask an engineer in 2013 what a “robot” is, and they’ll tell you it’s a machine that can be programmed to carry out physical tasks. A robot looks like Wall-E or Asimo: it’s a thing, a discrete physical object.

But this is both a simplification and an overcomplication. A robot can just as easily be defined as a collection of input and output devices, or, if you prefer, “sensors” and “actuators”, connected by a cybernetic controller. The sensors take in data from the world; the cybernetic controller interprets the data, and makes the output devices do things, upon which the whole cycle begins again, in a feedback loop.

For example: a Roomba is, when you get right down to it, a collection of touch sensors hooked up to three motors (one to turn each of the two wheels, and one to turn the fan that actually vacuums stuff up) via a “brain”. When a given touch sensor sends a strong enough signal (via running into a wall or a cat or an ottoman), the brain makes the wheels change direction or speed; for the most part, the vacuum fan isn’t involved in this process at all, but keeps happily chugging away until the Roomba is turned off.

The value of these sensors and actuators broadly follows Metcalfe’s Law: each by itself is essentially useless, but when connected together — along with something to sort out the data from the sensors and decide what commands to send to the actuators — they become far more valuable than the sum of their parts. They become a “robot”.

But here’s the thing: they’re still just parts. We call them a “robot” when they’re put into a chassis together, but that’s just limited imagination on our part.

Let’s try something else instead. Let’s take all those components out of that little round chassis and reconfigure them entirely. Let’s mount the touch sensors into a console and call them by a slightly different name: “buttons”. (Because that is, in fact, what they are.) Let’s put those motors into the wings of a very light aircraft, to control the flaps and ailerons that adjust the aircraft’s movement in the air. And instead of a hardware chip, let’s give them something more akin to a nervous system, that sends and receives signals — using radio, for example.

When you push the buttons, those signals are sent via radio to the motors, which — when combined together — move the airplane up and down and left to right. What you have now is a radio-controlled plane!

But let’s get more interesting. Let’s add a brain back in, but instead of that stupid simple chip, let’s do what the synth people did, and move it into software. After all, our laptop is a thousand times more powerful than the little microprocessor that used to be our Roomba’s tiny brain, right? And let’s swap out our touch sensors, our buttons, for another sensor: a GPS unit.

Now, we can use the infinite power of our laptop to take the simple signals from the GPS and translate them into simple instructions to our motors, which really can only go on and off. If the X coordinate of the GPS is too low, turn on the tail motor for two seconds (or if it’s a servomotor, by Z degrees). Once the X coordinate is right, turn it the other way.

Let’s make it more interesting. Let’s use Google Maps to get the precise GPS coordinates of an arbitrary address, and send that as a reference point for our two motors. (We’ve taken the fan motor and used it to turn the propeller on our plane, but it’s still stupid and only needs to turn on when we begin and turn off when we’re done.)

Now we can simply type a street address into our interface, sit back, and wait for our Roomba to get there. Only it’s not a Roomba anymore, is it? Is it even a robot at all? It’s the same collection of sensors and actuators (well, almost). It’s doing the same thing — taking input, processing it, and using that processed data to control output.

A “robot” is merely our convenient placeholder for an arbitrary collection of sensors and actuators. There’s a certain amount of anthropomorphism in that: a “robot” is a thing, like a “person”. But the difference is that each of the active parts of a robot — the sensors and actuators — can, in fact, be addressed and controlled individually. If that input and output is coordinated by a subtle and complex system — a “brain” — each simple input and output can become a remarkably advanced robot indeed…the same way a synthesizer becomes much more powerful and versatile and capable of producing amazing things when you stop thinking of it as a piano keyboard with a box attached to it.

But that convenient placeholder — “robot” — has become a trap. Robots in 2013 are like synths in 1975 — each one is sui generis, each manufacturer reinvents the wheel. Every model of robot has a different onboard operating system, a different way of connecting input to output, a different protocol. And yet, how many actual types of actuators even exist? Rotary motors, linear motors, solenoids, pistons…almost every actuator in every robot on Earth is based on a set of mechanical devices which were pretty well understood by the end of the Second World War. And all of them operate by the same rough principle: send X amount of current for Y amount of time. (Or send X amount of current at Y frequency for Z amount of time, if you’re talking about pulse width modulation-based components.)

Inputs? Slightly more complicated, but as we’ve seen with computer peripherals, it’s perfectly possible to standardize even the most complex of inputs, provided we’re willing to offload the processing to software. That’s why there are USB standard protocols for everything from computer mice to webcams to yes, even MIDI devices. Webcams may have different resolutions and color depths, but they’re still just sending an array of pixel data to software.

What if we stopped thinking of and designing robots as monolithic objects, but started thinking of them as useful collections of components? And designed simple protocols for sending and retrieving their sensor and actuator data to their brain or nervous system — protocols that could be standardized and given APIs and made replicable, as well as transmitting unique information about the robot when it connects? (USB-MIDI synths and controllers do this; when you connect one, it sends its model name and manufacturer to the MIDI-handling subsystem of the operating system. If you have a Mac, go to Applications->Utilities->Audio MIDI Setup and plug a cheap USB-MIDI controller in; you’ll see what I mean.)

Imagine a Bluetooth robot that, when paired with a computing device, sends an addressed list of its sensors and actuators to the client software, maybe like this:

0 :: rotary motor :: "leftTread"
1 :: rotary motor :: "rightTread"
2 :: servomotor :: "robotArmElbow"

I’m just making that up off the top of my head, but you see what I mean. Or you could provide the developer with a list of endpoints; this is similar to the way that MIDI hardware synths come with manuals that show which controllers handle what, like “CC42: Filter Frequency”. (This lets you, the musician, know that if you assign the knob on your MIDI controller to CC42, when you turn it, it will adjust the filter frequency of your hardware synth.)

This would allow the creation of simple network protocols for interacting with sensors and actuators, in which the business logic is offloaded to the cloud or a controller device. For example, imagine this bit of pseudocode:

while(robot1/pressureSensor < 20){
        robot1/leftTread.rotate(20);
}

It doesn’t matter what the actual value range sent by robot1/pressureSensor is, in this simple example, so long as the cloud “knows” the proper range; it could be 0 to 1 or 0 to 255 or 0 to the distance from the Earth to the moon in micrometers. The same with the tread motor, or the servo, or the solenoid. It doesn’t matter any more than it matters to the HTML renderer in your browser whether you type a two word declaration or a 500 word soliloquy into your Facebook status box; the client-side takes care of all the tricky bits of displaying your text and converting it into POST data and sending the data to be processed on the server-side.

If every actuator/sensor became separately addressable, with all of the coordination between them being handled by higher-level computing devices, the whole notion of the “robot” ceases to exist. It’s just components, some of which are physically joined, some of which are not, connected by routers. A camera on a pole could provide data that tells forklift robots how to coordinate their movement; a light sensor could tell all of the automated blinds on the east side of your house to roll themselves down when the sun rises…while also telling your coffeemaker to power on and start brewing your coffee; if the Weather Channel’s API says it’s going to be cold, your car automatically turns on the window defroster before you get in and turn on the engine.

The whole world, in effect, becomes one giant robot, a billion different actuators driven by a billion different sensors, all linked up and connected by the cloud. Nor do the “actuators” or the sensors need necessarily be physical; again, we’re moving away from the idea of the robot as a device that does physical work. A robot that bangs a drum whenever you send a Tweet to a specific account is still a robot, right?

In fact, a roboticist of 2033 might think of a “robot” as a “set of behaviors that drive physical devices”, rather than as the physical devices themselves. One can even imagine different robotic “social networks”, where you can hook your devices up to a specific cloud that suits your tastes and needs. The military would have their cloud, businesses would have intranet clouds to control their industrial robots; you might connect your “hardware cloud” of sensors and actuators up to a “software cloud” that learns behaviors from your friends and family.

It’s difficult to fully imagine this scenario, of course. And what I’m describing here isn’t easy. It requires a complete rethinking of how we design and envision robots — from monolithic to modular. But this transition is something that every aspect of technology seems to go through at some point, from industrialization to communications to computation, and even, as we’ve seen, music technology. I believe it’s a necessary paradigm shift.

What we’re doing is nothing less than making the world around us come to life, to act and react according to the information we create and share. In order to truly make that happen, we need to teach our devices to share as well.

Leave a comment