As the popularity of Amazon Alexa and other voice assistants grows, so too does the number of ways those assistants both do and can intrude on users’ privacy. Examples include hacks that use lasers to surreptitiously unlock connected-doors and start cars, malicious assistant apps that eavesdrop and phish passwords, and discussions that are surreptitiously and routinely monitored by provider employees or are subpoenaed for use in criminal trials. Now, researchers have developed a device that may one day allow users to take back their privacy by warning when these devices are mistakenly or intentionally snooping on nearby people.
LeakyPick is placed in various rooms of a home or office to detect the presence of devices that stream nearby audio to the Internet. By periodically emitting sounds and monitoring subsequent network traffic (it can be configured to send the sounds when users are away), the ~$40 prototype detects the transmission of audio with 94-percent accuracy. The device monitors network traffic and provides an alert whenever the identified devices are streaming ambient sounds.
LeakyPick also tests devices for wake word false positives, i.e., words that incorrectly activate the assistants. So far, the researchers’ device has found 89 words that unexpectedly caused Alexa to stream audio to Amazon. Two weeks ago, a different team of researchers published more than 1,000 words or phrases that produce false triggers that cause the devices to send audio to the cloud.
“For many privacy-conscious consumers, having Internet-connected voice assistants [with] microphones scattered around their homes is a concerning prospect, despite the fact that smart devices are promising technology to enhance home automation and physical safety,” Ahmad-Reza Sadeghi, one of the researchers who designed the device, said in an email. “The LeakyPick device identifies smart home devices that unexpectedly record and send audio to the Internet and warns the user about it.”
Taking back user privacy
Voice-controlled devices typically use local speech recognition to detect wake words, and for usability, the devices are often programmed to accept similar-sounding words. When a nearby utterance resembles a wake word, the assistants send audio to a server that has more comprehensive speech recognition. Besides falling to these inadvertent transmissions, assistants are also vulnerable to hacks that deliberately trigger wake words that send audio to attackers or carry out other security-compromising tasks.
In a paper published early this month, Sadeghi and other researchers—from Darmstadt University, the University of Paris Saclay, and North Carolina State University—wrote:
The goal of this paper is to devise a method for regular users to reliably identify IoT devices that 1) are equipped with a microphone, and 2) send recorded audio from the user’s home to external services without the user’s awareness. If LeakyPick can identify which network packets contain audio recordings, it can then inform the user which devices are sending audio to the cloud, as the source of network packets can be identified by hardware network addresses. This provides a way to identify both unintentional transmissions of audio to the cloud, as well as above-mentioned attacks, where adversaries seek to invoke specific actions by injecting audio into the device’s environment.
Achieving all of that required the researchers to overcome two challenges. The first is that most assistant traffic is encrypted. That prevents LeakyPick from inspecting packet payloads to detect audio codecs or other signs of audio data. Second, with new, previously unseen voice assistants coming out all the time, LeakyPick also has to detect audio streams from devices without prior training for each device. Previous approaches, including one called HomeSnitch, required advanced training for each device model.
To clear the hurdles, LeakyPick periodically transmits audio in a room and monitors the resulting network traffic from connected devices. By temporarily correlating the audio probes with observed characteristics of the network traffic that follows, LeakyPick enumerates connected devices that are likely to transmit audio. One way the device identified likely audio transmissions is by looking for sudden bursts of outgoing traffic. Voice-activated devices typically send limited amounts of data when inactive. A sudden surge usually indicates a device has been activated and is sending audio over the Internet.
Using bursts alone is prone to false positives. To weed them out, LeakyPick employs a statistical approach based on an independent two-sample t-test to compare features of a device’s network traffic when idle and when it responds to audio probes. This method has the added benefit of working on devices the researchers have never analyzed. The method also allows LeakyPick to work not only for voice assistants that use wake words, but also for security cameras and other Internet-of-things devices that transmit audio without wake words.
The researchers summarized their work this way:
At a high level, LeakyPick overcomes the research challenges by periodically transmitting audio into a room and monitoring the subsequent network traffic from devices. As shown in Figure 2, LeakyPick’s main component is a probing device that emits audio probes into its vicinity. By temporally correlating these audio probes with observed characteristics of subsequent network traffic, LeakyPick identifies devices that have potentially reacted to the audio probes by sending audio recordings.
LeakyPick identifies network flows containing audio recordings using two key ideas. First, it looks for traffic bursts following an audio probe. Our observation is that voice-activated devices typically do not send much data unless they are active. For example, our analysis shows that when idle, Alexa-enabled devices periodically send small data bursts every 20 seconds, medium bursts every 300 seconds, and large bursts every 10 hours. We further found that when it is activated by an audio stimulus, the resulting audio transmission burst has distinct characteristics. However, using traffic bursts alone results in high false positive rates.
Second, LeakyPick uses statistical probing. Conceptually, it first records a baseline measurement of idle traffic for each monitored device. Then it uses an independent two-sample t-test to compare the features of the device’s network traffic while being idle and of traffic when the device communicates after the audio probe. This statistical approach has the benefit of being inherently device agnostic. As we show in Section 5, this statistical approach performs as well as machine learning approaches, but is not limited by a priori knowledge of the device. It therefore outperforms machine learning approaches in cases where there is no pre-trained model for the specific device type available.
Finally, LeakyPick works for both devices that use a wake word and devices that do not. For devices such as security cameras that do not use a wake word, LeakyPick does not need to perform any special operations. Transmitting any audio will trigger the audio transmission. To handle devices that use a wake word or sound, e.g., voice assistants, security systems reacting on glass shattering or dog barking, LeakyPick is configured to prefix its probes with known wake words and noises (e.g., “Alexa”, “Hey Google”). It can also be used to fuzz test wake-words to identify words that will unintentionally transmit audio recordings.
Guarding against accidental and malicious leaks
So far, LeakyPick—which gets its name from its mission to pick up the audio leakage of network-connected devices, has uncovered 89 non-wake words that can trigger Alexa into sending audio to Amazon. With more use, LeakyPick is likely to find additional words in Alexa and other voice assistants. The researchers have already found several false positives in Google Home. The 89 words appear on page 13 of the above-linked paper.
Besides detecting inadvertent audio transmissions, the device will spot virtually any activation of a voice assistant, including those that are malicious. An attack demonstrated last year caused devices to unlock doors and start cars when they were connected to a smart home by shining lasers at the Alexa, Google Home, and Apple Siri devices. Sadeghi said LeakyPick would easily detect such a hack.
The prototype hardware consists of a Raspberry Pi 3B connected by Ethernet to the local network. It’s also connected by a headphone jack to a PAM8403 amplifier board, which in turn connects to a single generic 3W speaker. The device captures network traffic using a TP-LINK TL-WN722N USB Wi-Fi dongle that creates a wireless access point using hostapd and dnsmasq as the DHCP server. All wireless IoT devices in the vicinity will then connect to that access point.
To give LeakyPick Internet access, the researchers activated packet forwarding between the ethernet (connected to the network gateway) and wireless network interfaces. The researchers wrote LeakyPick in Python. They use tcpdump to record packets and Google’s text-to-speech engine to generate the audio played by the probing device.
With the increasing usage of devices that stream nearby audio and the growing corpus of ways they can fail or be hacked, it’s good to see research that proposes a simple, low-cost way to repel leaks. Until devices like LeakyPick are available—and even after that—people should carefully question whether the benefits of voice assistants are worth the risks. When assistants are present, users should keep them turned off or unplugged except when they’re in active use.