How to convert between OGG and WAV in 2023 with C#

Photo by Soundtrap on Unsplash

How to convert between OGG and WAV in 2023 with C#

Make Telegram and WhatsApp Voice messages compatible with Speech-to-Text and Text-to-Speech from Azure

The default format of voice messages generated by clients like WhatsApp and Telegram is OPUS, which could also have the file formats "ogg" or "oga". While this is good for short voice messages between smartphone microphones and little speakers, they seem to be not common enough to be supported by Microsoft Speech, which offers Speech-to-Text and Text-to-Speech capabilities.

Furthermore, the most obvious answers on Stackoverflow, the output from GPT-4 and the generated code from Bing Chat all point to outdated answers on this topic. They all prefer the "NAudio.Vorbis" lib, which seems to have big trouble with the more up to date OPUS format, while also providing documentation for local usage, not cloud native scenarios.

So here is how to make your Voice messages compatible with Microsoft Speech:

The solution is "Concentus Oggfile" by Logan Stromberg.

GitHub - lostromb/concentus.oggfile: Implementing support for reading/writing .opus audio files using Concentus

Simply add the Concentus.OggFile package your preferred way

Add Concentus.Oggfile with the nuget package manager

As the documentation is also sparse for this lib, here are the snippets you probably came to copy & paste:

 private byte[] ConvertOggToWav(byte[] audioBytes)
        {
            var decoder = new OpusDecoder(16000, 1);
            MemoryStream audioInput = new MemoryStream(audioBytes);
            List<byte> output = new List<byte>();
            var opus = new OpusOggReadStream(decoder, audioInput);

            while (opus.HasNextPacket)
            {
                short[] packet = opus.DecodeNextPacket();
                if (packet != null)
                {
                    for (int i = 0; i < packet.Length; i++)
                    {
                        var bytes = BitConverter.GetBytes(packet[i]);
                        output.AddRange(bytes);
                    }
                }
            }
            return output.ToArray();
        }

The output of this method can be pushed directly into a Microsoft.CognitiveServices.Speech PushStream, for example.

To convert the output of a SpeechSynthesis you can use the same package again to make it compatible with mobile Apps:

public byte[] ConvertWavToOgg(byte[] audioBytes)
        {
            var encoder = new OpusEncoder(16000, 1, Concentus.Enums.OpusApplication.OPUS_APPLICATION_AUDIO);
            MemoryStream opusOut = new MemoryStream();

            var opus = new OpusOggWriteStream(encoder, opusOut);
            int shortCount = audioBytes.Length / 2;
            short[] audioShorts = new short[shortCount];
            for (int i = 0; i < shortCount; i++)
            {
                audioShorts[i] = BitConverter.ToInt16(audioBytes, i * 2);
            }
            opus.WriteSamples(audioShorts, 0, shortCount);
            opus.Finish();
            return opusOut.ToArray();
        }

This runs natively on Azure Functions.

Thats it. Hope it helps.

Special Thanks to Nicklas Swärd for providing an implementation:

NickSwardh/StreamSpeechToText: Stream Mp3 & Opus to Azure's Speech to Text without GStreamer (github.com)

Did you find this article valuable?

Support Jens Caasen by becoming a sponsor. Any amount is appreciated!