How important is "Clear Audio" for VoIP?
The most popular use of VoIP technology is in speech communication applications like Skype, Google Talk and other standard SIP phones.
For the purposes of speech communication, clear audio can be defined as an audio stream that has a low noise component (ideally no noise), and a high speech component (the audio content that we are trying to transmit). However, this specification alone is not sufficient to describe clear audio.
We need the speech signal to be transmitted to the listener at the other end of the communication system with the least possible modification along the way (ideally with no modification at all: the listener should hear the audio exactly as the speaker spoke it).
These requirements become even more critical when the "listener" is not a human, but a far less capable recognizer of human speech, such as an ASR (automatic speech recognition) computer program. ASR programs can be found in today's dictation software, telephone IVR (Interactive voice response) systems and command and control applications.
VoIP systems employ several technologies to optimize the efficiency of audio transmission. These technologies make a trade off between the quality of audio transmitted and the cost of transmitting the audio. In order to minimize the degradation of audio quality, it is important to ensure that clear audio is sent into the system regardless of the environment in which the end users are located.
How to provide clear audio for VoIP in noisy environments
In order to satisfy both the requirements for clear audio (high speech to noise ratio, and high speech fidelity), we need to filter the noise out from the signal leaving the speech. This can be accomplished in two broad ways:
- Remove (as much of) the noise (as possible) from the signal after it has been mixed in
- Prevent (as much of) the noise (as possible) somehow from entering the signal
Technique 1 is commonly employed by
DSP-based noise cancelling systems that use frequency based algorithms to remove noise. This is an inherently difficult problem, since speech and noise invariably overlap at several frequencies. While this does often accomplish the first requirement of high speech to noise ratio, it invariably fails the second one to varying degrees. Since speech and noise overlap at many frequencies, removing the "noisy" frequencies results in the removal of (often critical) speech frequencies as well, leading to distorted speech. This is especially prominent at high noise levels. Some
DSP based systems use adaptive techniques to minimize the identification of speech as noise, such as
the Jawbone.
Technique 2 is also a tough problem, since both speech and noise travel through the same medium. A patented technology developed at UmeVoice exploits the noise cancelling properties of a standard dual-port noise cancelling microphone, making use of distinguishing characteristics of noise versus speech, to prevent the noise from entering the signal, thus accomplishing both requirements for clear audio. This makes such a solution ideal not only for VoIP communication but also for high quality speech applications like speech recognition. UmeVoice makes headsets theBoom, theBoom O and theBoom Quiet that offer the ability to effectively communicate even in the noisiest of environments.
Freedom to communicate from anywhere?
In an increasingly global society, technology is making it possible to work productively and stay connected while being mobile. Voice is one of the most natural human modes of communication. Technologies that facilitate clear audio capture and transmission will be crucial in ensuring that people can have true freedom to communicate clearly and effectively.