How can I see the sound?

How can I see the sound?

One of the well known technologies of remote sound recording, so to speak, visual methods is a laser microphone. This technology is rather uncomplicated.

A laser beam invisible to the human eye (usually working in the infrared range) is directed to some suitable object (most often – a window glass) in the room where the conversation interesting to the organizers of the wiretap takes place. The beam is reflected from the surface and hit the receiver. Sound waves create vibrations on the surface of the object, which in turn change the behavior of the reflected laser beam. These changes are recorded by the receiver and eventually converted into a sound recording of the conversation.

This technology has been used in practice since the Cold War and has been featured in many spy films – you have probably seen it in one of them. Several companies produce ready-made devices for laser listening, and the declared range of work reaches 500 or even 1000 meters. Two good news: firstly, laser microphones are very expensive; secondly, manufacturers (at least according to their claims) sell laser microphones only to government agencies.

Anyway, according to Ben Nassi, the laser microphone has one serious disadvantage: it is an active method. For it to work, you need to “illuminate” an object with a laser beam – and this can be detected with an IR detector.

A few years ago a group of researchers from MIT suggested an alternative method“visual recording”, completely passive. The idea is the same: sound waves create vibrations on the surface of an object that can be visually recorded.


In order to record these vibrations, the researchers used a high-speed camera that shoots at several thousand frames per second. By comparing the frames they received – naturally, not manually, but with a computer – they were able to restore sound from the video sequences.

However, this method also has a disadvantage, and one more thing: to convert a huge amount of visual information received from a high-speed camera into sound requires an inhumane amount of computing resources. To analyze a five-second video recording using a fairly powerful workstation, the researchers from MIT took 2-3 hours each. So this method is not very suitable for recognizing on the fly conversations.

How does Lamphone work

Ben Nassi and his colleagues at Ben-Gurion University came up with a new method of “visual wiretapping” and called it Lamphone. The main idea of the method is that the researchers decided to use a light bulb as an object from which sound-induced vibrations are removed, hence the name of the technique.

A light bulb is an object that is as simple as possible and as bright at the same time. This is why you don’t need to spend computational resources on the analysis of the smallest details of the image. It is enough to direct to the bulb a powerful telescope, through which the light stream from the bulb goes to the photocell.

The light bulb does not evenly emit light in different directions (which is interesting, this unevenness is different: it is quite high in incandescent and diode lamps, but much lower in fluorescent lamps). Because of this unevenness, the vibration of the light bulb caused by sound waves slightly changes the intensity of light flux in the direction of the photocell. And these changes are quite enough to be detected with confidence. By recording these changes and doing some simple transformations, the researchers were able to restore the sound from the received “light recording”.

As a final test of the method, the researchers installed a listening device on a pedestrian bridge 25 meters from the window of the test room where the sound was played through the speaker. By pointing the telescope at a light bulb in this room, the researchers recorded light vibrations and were able to convert them into sound recordings.

The sound recording turned out to be quite legible: for example, Shazam successfully defined Beatles test compositions “Let It Be” and Coldplay “Clocks”, and Google’s speech recognition service correctly translated Donald Trump’s words from his election speech into text.


Ben Nassi and his colleagues managed to develop a really working method of “visual listening”. Importantly, this method is completely passive, so it cannot be detected with any detector.

What is also important – unlike the method of researchers from MIT, the results of Lamphone measurements are extremely simple, so that their translation into sound does not require any unthinkable computing resources. Therefore, Lamphone can work in real time.

As Ben Nassi admits, during the experiment, the sound in the test room was played at very high volume. So far, the results of the experiment may be of more theoretical interest. On the other hand, the methods of converting “light recording” to sound have been used as simple as possible. So, the technique can be further improved, for example, with the help of machine learning algorithms – they solve similar problems quite well.

As a result, the researchers themselves assess the expediency of using this method in practice as an average – but see the potential to improve the practicality of the method when using more complex methods of converting readings recorded by a photoelectric cell into sound.

Kaspersky Daily

WARNING! All links in the articles may lead to malicious sites or contain viruses. Follow them at your own risk. Those who purposely visit the article know what they are doing. Do not click on everything thoughtlessly.


0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments

Do NOT follow this link or you will be banned from the site!
Would love your thoughts, please comment.x

Spelling error report

The following text will be sent to our editors: