Video encoding in Jami

A few weeks ago, we published a post on the auto adjustment of the bitrate during video calls. It explained how Jami detects fluctuations in the available bandwidth and adjusts the bitrate accordingly in order to prevent glitches caused by data congestion. In this week’s article, we dive deeper into how video encoding works, why a lower bitrate is a compromise in quality and what strategy Jami uses to manage this compromise.

Video streams are encoded in order to minimize the necessary bandwidth they require for transfer while preserving their quality as much as possible. Encoding works by identifying patterns in each frame that can be used to describe it without having to store information about every single pixel. The simplest example would be a completely black HD frame, which is much more efficient to describe by saying “all the pixels are black” rather than saying “pixel 1 is black, pixel 2 is black, ... and pixel 2073600 is black”. This is an extreme case, but encoding algorithms are able to identify many complex patterns not only within frames but also between them. For example, if something is moving in the video but the background remains still, the encoding algorithm can take advantage of this by reusing the identical parts across multiple frames. This is basically how encoders are able to create an effective representation of a video that requires significantly less bandwidth than the raw output of the camera.

Most codecs (encoding and decoding algorithms)are lossy, implying that the images after decoding are not always exactly the same as before encoding. However, the level of fidelity can be adjusted by defining a quality factor in the codec parameters. High quality encoding has better fidelity, but reducing it can be useful for lowering the bitrate and avoiding the glitches caused by limited bandwidth. Finding the right balance between quality and bitrate is complicated because they are generally related to each other, but their correlation is not straightforward. Some video sections are easier to encode than others (such a in the example above with the black frame) and for them, it is possible to maintain high quality while also drastically reducing the bitrate, but it can also be the other way around. There are multiple methods for balancing those two variables, called “Rate Control Modes”, and each one is more suited for a different use case.

We use the capped Constant Rate Factor (CRF) rate control mode in Jami because it is appropriate for such live streaming applications. Using this method means that we define a quality target for the video, and the encoder will determine which bitrate is more appropriate depending on the type of video section to encode (a movement needs more data to encode than a steady image). Then, we define a maximum bitrate to cap the amount of data that can be sent in order to avoid exceeding the available bandwidth. This allows us to change the maximum bitrate on the go when the available bandwidth fluctuates, therefore preventing glitches caused by congestion and packet drops.

Jami mainly uses the h264 and VP8 codecs, but it is designed to be easily able to work with any other. There is a lot of progress being made in the field of encoders, especially with the advent of artificial intelligence that makes it possible to take advantage of subtle patterns that humans would not be able to think of in order to compress video with minimal impact on the quality. As they are becoming more and more efficient, Jami will always stay up to date in order to provide the best videoconferencing experience possible to its users.

By François Naggar-Tremblay - Jami product manager

Photo by Denise Jans on Unsplash