Enhanced noise suppression in Jitsi Meet
For a while now Jitsi Meet has been using the RNNoise library to calculate voice audio detection scores for audio input tracks and leveraging those to implement functionality such as “talk while muted” and “noisy mic detection”. However, RNnoise also has the capability to denoise audio.
In this article we’ll briefly go through the steps taken to implement noise suppression using RNnoise in Jitsi Meet.
What’s RNNoise anyway?
RNNoise, as the authors describe it, “combines classic signal processing with deep learning, but it’s small and fast”, this makes it perfect for real time audio and does a good job at denoising.
It’s written in C which allows us to (relatively) easily use it on the Web by compiling it as a WASM module, that combined with a couple of optimizations gets us noise suppression functionality with very little added latency.
Working with Audio Worklets
Previously Jitsi Meet processed audio using ScriptProcessorNode which handles audio samples on the main UI thread. Because the audio track wasn’t altered and we simply extracted some information from a copy of the track, performance issues weren’t apparent. With noise suppression the track gets modified, so latency is noticeable, not to mention that any interference on the main UI thread will impact the audio quality, so we switched to audio worklets.
Audio worklets run in a separate thread from the main UI thread, so samples can be processed without interference. We won’t go into the specifics of implementing one as there are plenty of awesome resources on the web such as: this and this. Our worklet implementation can be found here.
Webpack integration
Even though using an audio worklet looks fairly straightforward there were a couple of bumps along the road.
First off, and probably the most frustrating part was making them work with webpack’s dev server.
Long story short, the dev server has some neat features such as hot module replacement and live reloading, these rely on some bootstrap code added to the output JavaScript bundle. The issue here is that audio worklet code runs under the AudioWorkletGlobalScope’s context which doesn’t know anything about constructs like window, this or self, however the aforementioned boilerplate code makes ample use of them and there doesn’t seem to be a way to tell it that the context in which it’s running is a worklet.
We tried several approaches but the solution that worked for us was to ignore the dev server bootstrap code altogether for the worklet’s entry point, which can be configured in webpack config as follows:
module: { rules: [ ...config.module.rules, { test: resolve(__dirname, 'node_modules/webpack-dev-server/client'), loader: 'null-loader' } ] }
That took care of the dev server, however production webpack bundling also introduced boilerplate which made use of the “forbidden” worklet objects, but in this case it’s easily configurable by specifying the following output options:
output: { ...config.output, globalObject: 'AudioWorkletGlobalScope' }
At this point we had a working worklet (pun intended) that didn’t break our development environment.
WASM in audio worklets.
Next came adding in the RNnoise WASM module. Jitisi uses RNnoise compiled with emscripten (more details in the project: https://github.com/jitsi/rnnoise-wasm). With the default settings the WASM module will load and compile asynchronously, however because the worklet loads without waiting for the resolution of promises we need to make everything synchronous, so we inline the WASM file by passing in -s SINGLE_FILE=1 to emscripten and we also tell it to synchronously compile it with -s WASM_ASYNC_COMPILATION=0. With that in place everything will be loaded and ready to go when audio samples start coming in.
Efficient audio processing.
Audio processing in worklets happens on the process() callback method in the AudioWorkletProcessor implementation at a fixed rate of 128 samples (this can’t be configured as with ScriptProcessorNodes), however RNnoise expects 480 samples for each call to it’s denoise method rnnoise_process_frame.
To make this work we implemented a circular buffer that minimizes copy operations for optimal performance. It works by having both the buffered samples and the ones that have already been denoised on the same Float32Array with a roll over policy. The full implementation can be found here.
To summarize, we keep track of how many audio samples we have buffered, once we have enough of them (480 to be precise) we send a view of that data to RNnoise where it gets denoised in-place (i.e. no additional copies are required). At this point the circular buffer has a denoised part and possibly some residue samples that didn’t fit in the initial 480, which will get processed in the next iteration. The process repeats until we reach the end of the circular buffer at which point we simply start from the beginning and overwrite “stale” samples; we consider them stale because at this point they have already been denoised and sent.
The worklet code gets compiled as a separate .js bundle and lazy loaded as needed.
Use it in JaaS / using the iframe API
If you are a JaaS customer (or are using Jitsi Meet through the iframe API) we have added an API command to turn this on programmatically too! Check it out.
Check it out!
In Jitsi Meet this feature can be activated by simply clicking on the Noise Suppression button.
Since in this case a sound file is probably worth more than 1000 words, here is an audio sample demonstrating the denoising:
Original audio:
Denoised audio:
❤️ Your personal meetings team.
Author: Andrei Gavrilescu