Admit it, the moment you saw Jarvis in Iron Man, you knew you wanted one, or maybe it was Hal 2000 before that, or even Samantha.
There’s something undeniably cool about an AI companion or secretary that you can interact with. Think about it. Would Star Wars be the same without R2D2 and BB8, or Iron Man without Jarvis?
If you’re like Mark Zuckerberg, however, you’ve gone out of your way to bring these fantasies to life. Meet Jarvis, a Zuckerberg-designed AI that Zuckerberg and his family can use to control his house.
Built from scratch using the tools that Facebook gives developers and engineers on its platform, Jarvis is the perfect example of what can be achieved with modern computing power.
The idea of Jarvis was born when Zuckerberg decided to spend a year learning about AI. The Facebook CEO’s fascination with Iron Man naturally led him to the idea of Jarvis, and that’s exactly what he set about building.
To start with, Zuckerberg says that he needed to take stock of his home and figure out what he wanted to do. Most of his house is wired up via a Crestron system (a universal remote for your home), and he also had a multitude of appliances, including a toaster and a T-Shirt cannon — we kid you not — that he wanted his AI to control.
Baby steps
The first step was getting everything to talk to a central server, called the Jarvis server. As Zuckerberg discovered, this was much harder than he expected. Most of his home’s smart devices were using proprietary APIs (Application Program Interface) that made it almost impossible for devices to talk to each other, let alone a central server.
Another issue was that not all devices were ‘smart’ and the only way he could connect them was via smart switches and the like (think: toasters).
To solve this problem, Zuckerberg had to reverse-engineer APIs and modify the hardware of components that couldn’t be connected to his server.
Among the challenges, the highlights include the fact that for safety reasons, a toaster won’t start toasting bread that had been previously inserted. Zuckerberg had to get a toaster from the 1950s that did what he wanted and then design a mechanism to get it to toast bread via a digital signal.
Once done, his next priority was to build a glorified remote control. The Jarvis server needed to be able to communicate with and control all of this equipment, after all.
Listen, learn, and obey
That was the hard part. Natural language processing came next, and Zuckerberg claims that he was quite surprised at how easy it was. There were challenges of course, but he says that the tools that Facebook has its disposal, the ones it gives its engineers and developers, are very robust and capable. In particular, he’s all praise for Messenger’s Bot API.
He started with text commands via the Facebook Messenger because these were easier for Jarvis to understand. The idea behind using Messenger was simple. Zuckerberg wanted a means of controlling his home from wherever he was, even from outside.
He explains that the initial programming was easy. Teaching Jarvis to understand simple words like “bedroom” and “Lights” was easy, but he ran into problems when he realized that his wife refers to a room as the “family room” and he calls it the “living room.”
Rather than hard-code these words into the system, Zuckerberg decided to use a deep-learning algorithm that can learn and adapt to these nuances.
This problem came to the fore when he started work on the music system in his house. At home, his music needs are satisfied by a Sonos system and Spotify. In Zuckerberg’s words, asking Jarvis to “play someone like Adele” is very different from asking it to “play some Adele.”
Taking things a step further, Jarvis needed to understand the classification of songs, know preferences and adapt to listening patterns.
This took a lot of tweaking, but Facebook already has some tools for coding AI and speech recognition, including a complete, open-source archive on GitHub here.
Voice recognition was the next step and a relatively easy one to implement. The Messenger API already allows audio to be transmitted and Jarvis only transcribed this. Zuckerberg even built a custom app for iOS that would always listen to him and accept voice commands.
Face recognition and image processing were the obvious next step.
He set up multiple cameras around his house and at the front door. If someone presses the bell, the system will use Facebook’s facial recognition algorithms to determine who the person is and tell Zuckerberg as much. Zuckerberg can then ask Jarvis to unlock the door or perform a similar such action.
Putting the ‘intelligence’ in AI
All of these steps were taken to help Jarvis understand context. Once Priscilla, Zuckerberg’s wife, joined in on the fun, Jarvis had to learn to differentiate between the two.
Asking Jarvis to turn up the AC in “my office” could mean either Priscilla’s office or Zuckerberg’s office, for example. Jarvis also needed to learn whose music to play when asked to do so.
A simple command like “turn on the lights” would require Jarvis to know who gave that command and where that person was and then to turn on the appropriate lights.
Zuckerberg succeeded in doing all of this by his lonesome. In fact, he’s so happy with his progress that he claims that such “open-ended” requests are his favorite commands these days.
Isn’t it far more convenient to say, “turn on the lights” and have them just turn on, rather than say, “turn on the lights in Mark Zuckerberg’s bedroom”?
Facebook’s work on voice recognition and image processing have helped immensely on this front.
In a blog post where he explains his work in more detail, Zuckerberg is all praise for Facebook’s tools and the work his engineers have put in to make these tools more accessible to the average developer.
Call it what you will, a plug for Facebook or shameless product placement, the fact remains that Zuckerberg did all of this coding himself with tools that are available to you and me. It’s incredible work.