The 5G Standard

IVAS – taking 3GPP voice and audio services to a new immersive level

Jan 08,2023

By Stefan Bruhn (Dolby Laboratories, Inc.), Markus Multrus (Fraunhofer IIS), Imre Varga (Qualcomm Incorporated, 3GPP SA4 Audio SWG Co-Chair)

First published Oct. 2022, in Highlights Issue 05  

It is almost established practice that 3GPP standardizes a new codec for 3GPP voice service every decade. The driver for that is the never-ending demand for enhanced QoE - while maintaining highly competitive service efficiency. Each time, from Adaptive Multi Rate (AMR) over AMR-WB to Enhanced Voice Services (EVS), a quantum leap in service quality offered has been achieved. Current standardization work is taking place against the background of an upswing in immersive media services, such as the spatial or ‘surround’ audio experience which is already well established for streaming professionally generated content.

At the same time, advanced visual rendering systems with large and curved TV screens, (head-tracked) VR gears and AR glasses are enabling the immersion of the user into the rendered Audio Visual (AV) scene.
What is currently lacking is a codec enabling the sharing of immersive audio experiences from highly mobile and uncontrolled capture environments and the rendering of those experiences in other virtually unconstrained environments using headsets, earbuds or multi-speaker systems with custom loudspeaker configurations – in environments such as homes, cars or conference rooms.

3GPP SA4 is now closing this gap with the standardization of its codec for Immersive Voice and Audio Services (IVAS). IVAS will not only introduce immersion into the traditional voice service, it will also address the demand for more general immersive multimedia services. Service applications include, but are not limited to, conversational voice, multi-stream teleconferencing, VR conversational and user generated live and non-live content streaming, AR/MR.

This article is written to draw attention to this important standardization work, to give User Equipment (UE) manufacturers and service providers the possibility to monitor or influence the standardization process and to make sure that the time-to-market of IVAS-enabled new immersive services and products is minimized.

Features and Use-cases

The IVAS codec will be built upon and be backwards compatible with the successful EVS codec. Thus, a single universal codec will be provided incorporating the quality and performance attributes of EVS (such as excellent audio quality, low delay, appropriate range of bit rates, high-quality error resiliency, practical implementation complexity) while taking them to the next – immersive – level.

For this, new features will be added: Immersive audio formats, such as channel-based audio (including stereo and common multi-channel configurations from 5.1 up to 7.1+4), binaural audio, scene-based audio (i.e. Ambisonics up to 3rd order) and object-based audio. Also, IVAS will support Metadata-assisted spatial audio (MASA) – a novel, parametric spatial audio format optimized for direct UE pick-up without loss, instead of converting to one of the other immersive formats. In order to enable playout on a multitude of devices, a rendering solution and an interface to an external rendering will be made available, including head-tracked rendering. 

 IVAS01

 

Spatial audio capture and presentation

 

In a stereo or immersive telephony use-case, a participant can capture and convey an immersive scene to a remote participant, e.g., to share the full immersive experience of an event. For spatial conferencing applications, the flexibility of the IVAS codec will provide multiple options for:

  • Ad-hoc conferencing calls with the transmission of the physical immersive scene picked up by a UE, e.g., placed on a table. Rendering of the immersive scene makes it easier to distinguish the talkers’ voice, clearly separated from ambient sounds, leading to more natural and effort-less conferencing.

 ivas02

 Illustration of spatial audio capture

 

  • More complex scenarios with multiple participants, transmitted as individual streams and spatially rendered on the receiving UE to match the video scene, for example.
  • Scenarios where an intermediate call server combines multiple participants into an immersive scene.

Further on, the IVAS codec will support content distribution use-cases including streaming of stereo/immersive content and advanced VR/AR applications.

IVAS codec standardization in SA4 Audio SWG

An IVAS codec candidate is currently being developed. The Terms of Reference of that effort and any essential development project data (code repository, technical documentation, meeting reports) are publicly available on 3GPP Forge.

Once chosen, IVAS codec standardization will follow the traditional rigorous approach based on permanent documents (Pdocs), all agreed in 3GPP. Key Pdocs are design constraints and performance requirements. These ensure the standardized codec can be implemented on relevant UEs and that it is suitable for the intended service applications.

Any new codec undergoes a rigid selection process in which it must meet all 3GPP-agreed requirements. The process includes selection testing in which the quality of the candidate is formally evaluated against the performance requirements. A significant budget – in excess of a million euros – is dedicated to testing the IVAS codec.

A key element of selection testing is the reliance on capable neutral laboratories. Accordingly, SA4 has issued a call for such labs and once SA4 agrees on a lab assignment, the 3GPP Mobile Competence Centre (MCC) is tasked with setting up the necessary contracts.

The actual selection of the codec requires a determination by SA4 that the overall IVAS codec work item goals are fulfilled, based on an assessment of how the performance requirements and design constraints are met. It also involves determining that any other required data and documentation as specified in the selection deliverables have been provided. The deliverables typically include draft IVAS codec specifications and reference C source code. Reference code is expected to become available both as fixed- and floating-point code specifications, enabling efficient and timely implementations on relevant platforms. Subsequently, TSG SA will formally approve codec selection and the provided specifications. 

The IVAS codec is scheduled for 3GPP Release 18.

[END]

3GPP Working Group SA4 - Multimedia Codecs, Systems and Services is a part of the 3GPP Technical Specification Group Service and System Aspects (TSG SA). See more about SA4 at www.3gpp.org/specifications-groups