Scroll Top

This AI can tell what you’re typing just by listening


This is a new way to invade people’s privacy and hack into systems, and it’s quite innovative.


Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

We’re seen ways for drones to exfiltrate data from computer servers using light, or ways to steal data by listening to hard drives, or by capturing the faint electromagnetic pulses from computers. But now the messages you type can be decoded from the mere sound of your fingers tapping on the keys, according to a recent paper by researchers at Durham and Surrey universities and the Royal Holloway University of London.


See also
DeepFakes are the hot new corporate communications tool as companies dive in


The researchers trained two machine learning models to recognize the distinctive clicks from each key on an Apple laptop keyboard. The models were trained on audio collected from two sources: a smartphone placed nearby and a video call conducted over Zoom. They report an accuracy of 95 percent for the smartphone-audio model and 93 percent for the Zoom-call model.

These models could make possible what’s known as an acoustic side-channel attack. While the technique presented in this paper relies on contemporary machine learning techniques, such attacks date back at least to the 1950s, when British intelligence services surreptitiously recorded mechanical encryption devices employed by the Egyptian government.

A laptop acoustic side-channel attack estimates what keys were pressed, and in which order, from audio recordings of a person using the laptop. These attacks can reveal sensitive information from the user, like bank PINs, account passwords, or government credentials.

The team’s models are built around Convolutional Neural Networks, or CNNs. Just as such networks can recognize faces in a crowd, so can they recognize patterns in a spectrogram, the graph of an audio signal. The program isolates the audio of each keypress, transforms its waveform into a spectrogram, extracts from it the frequency patterns of each click, and computes the relative probability that a given key was pressed.


See also
First of a kind Chinese smart city to put AI in control of meeting citizen's needs


“We considered the acoustic data as an image for the CNN,” says Ehsan Toreini, a coauthor of the report. “I think that is the core reason our method works so well.”

The attack presented in the paper is limited in scope. The two audio-decoding models were trained and evaluated on data collected from the same user typing on a single laptop. Also, the training process they used requires that key sounds be paired with key labels. It remains to be seen how effective this attack would be if used on other laptop models in different audio environments and with different users. Also, the need for labelled training data puts limits on how widely the model can be deployed.

Still, there are plausible scenarios in which an attacker would have access to labelled audio data of a person typing. Though that data may be difficult to collect covertly, a person could be coerced into providing it. In a recent episode, the hosts of the Smashing Security podcast discussed the paper and hypothesized a scenario in which a company requires new employees to provide that data so that they can be monitored later on. In an interview, coauthor Maryam Mehrnezhad said that “another example would be intimate partner violence. An ex-partner or current partner could be a bad actor in that scenario.”


See also
Reuters just unveiled their first ever full time DeepFake newscaster


The research team presents several ways to mitigate the risks of this attack. For one, you could simply type fast: Touch-typing can mix individual key presses and complicate keystroke isolation and decoding. Systemic changes would also help. Video-call services like Zoom could introduce audio noise or distortion profiles into recordings that would prevent machine-learning models from easily matching the audio to typed characters.

“The cybersecurity and privacy community should come up with more secure and privacy-preserving solutions that enable people to use modern technologies without risk and fear,” says Mehrnezhad. “We believe that there is room for industry and policymakers to find better solutions to protect the user in different contexts and applications.”

The researchers presented their paper at the recent 2023 IEEE European Symposium on Security and Privacy Workshops.

Related Posts

Leave a comment


Awesome! You're now subscribed.

Pin It on Pinterest

Share This