When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (`el-GR`), the recognition process hangs, and after a long time it fails #2737

PavlosIsaris · 2025-02-11T07:17:59Z

Issue with `el-GR` language: Speech Recognition canceled: CancellationReason.Error - Client buffer exceeded maximum size

Description

When using the Azure Speech SDK to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails with the following error:

Speech Recognition canceled: CancellationReason.Error Error details: Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer.

However, when the language is set to English (en-US), the recognition process continues as expected (even though the output is english words). This seems to be a bug in the Azure Speech SDK.

Also, the same happens in Azure Speech Studio. If you select "Greek" in the languages dropdown and upload an audio file with Greek contents, the process hangs and it does not output any text:

Speech SDK log taken from a run that exhibits the reported issue.

Log file:

STTLogs.txt

A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.

Code to Reproduce

import os
import time
import argparse
import azure.cognitiveservices.speech as speechsdk
from pydub import AudioSegment
from env_loader import load_env_values

class AzureSpeechToTextParser:
    """
    Parses an audio file to text using the Azure Speech SDK.
    Supports both compressed (MP3, MP4) and uncompressed (WAV) formats.
    Converts compressed files to WAV before processing, then deletes the WAV file after processing.
    """
    def __init__(self, audio_file_path: str, subscription_key: str, service_region: str):
        super().__init__(audio_file_path)
        self.subscription_key = subscription_key
        self.service_region = service_region
        self.done = False
        self.speech_recognizer = None
        self.recognized_texts = []
        self.temp_wav_file = None  # Stores temp WAV file if conversion is needed

    def convert_to_wav(self):
        """
        Converts MP3/MP4 to WAV format and returns the new file path.
        """
        file_extension = os.path.splitext(self.audio_file_path)[-1].lower()

        if file_extension in [".mp3", ".mp4"]:
            print(f"Converting {file_extension} to WAV for processing...")

            wav_path = self.audio_file_path.replace(file_extension, ".wav")
            audio = AudioSegment.from_file(self.audio_file_path, format=file_extension[1:])  # Remove dot from extension
            audio = audio.set_channels(1).set_frame_rate(16000)  # Ensure Azure compatibility
            audio.export(wav_path, format="wav")

            self.temp_wav_file = wav_path  # Store for deletion later
            return wav_path
        return self.audio_file_path  # If already WAV, return as is

    def parse(self):
        """
        Parses the audio file into text. Converts MP3/MP4 to WAV before processing.
        Deletes the temporary WAV file after recognition.
        """
        self.done = False
        self.recognized_texts = []
        temp_file_used = False  # Track if conversion happened

        try:
            print("Initializing Azure Speech SDK...")
            speech_config = speechsdk.SpeechConfig(subscription=self.subscription_key, region=self.service_region)
            speech_config.speech_recognition_language = "el-GR"  # Change to "en-US" to make it work
            log_file_path = os.path.join(os.path.dirname(__file__), "STTLogs.txt")
            speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, log_file_path)
            # Convert file if necessary
            self.audio_file_path = self.convert_to_wav()
            if self.temp_wav_file:
                temp_file_used = True  # Mark that we need to delete the file later

            # Process the WAV file with Azure
            audio_config = speechsdk.audio.AudioConfig(filename=self.audio_file_path)
            self.speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

            self.speech_recognizer.recognized.connect(self.recognized_callback)
            self.speech_recognizer.canceled.connect(self.cancel_callback)
            self.speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
            

            print("Starting speech recognition...")
            self.speech_recognizer.start_continuous_recognition()

            while not self.done:
                time.sleep(0.5)

            self.speech_recognizer.stop_continuous_recognition()
            return self.recognized_texts

        except Exception as e:
            print(f"An error occurred during speech recognition: {e}")
            return None

        finally:
            # Delete the temporary WAV file if it was created
            if temp_file_used and self.temp_wav_file:
                try:
                    os.remove(self.temp_wav_file)
                    print(f"Deleted temporary file: {self.temp_wav_file}")
                except Exception as e:
                    print(f"Failed to delete temporary file: {e}")

    def recognized_callback(self, event):
        """Handles recognized text."""
        print(f"\n\nRecognized phrase: {event.result.text}")
        self.recognized_texts.append(event.result.text)

    def cancel_callback(self, event):
        """Handles recognition cancellation."""
        cancellation_details = event.result.cancellation_details
        print(f"Speech Recognition canceled: {cancellation_details.reason}")
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print(f"Error details: {cancellation_details.error_details}")
        self.done = True

# Main execution
if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert audio file to text using Azure Speech SDK.")
    parser.add_argument("--file", required=True, help="Path to the audio file to be processed.")
    args = parser.parse_args()

    try:
        env_values = load_env_values()
        speech_key = env_values["AZURE_SPEECH_KEY"]
        speech_region = env_values["AZURE_SPEECH_REGION"]

        if not speech_key or not speech_region:
            raise ValueError("Azure speech key and region must be set in the environment variables")

        # set the file path as the "--file" argument
        file_path = args.file
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"File not found: {file_path}")

        parser = AzureSpeechToTextParser(file_path, speech_key, speech_region)
        texts = parser.parse()
        print("\n\nFinal recognized text:")
        # print each element in the list with its index
        for i, text in enumerate(texts):
            print(f"{i + 1}. {text}")
    except Exception as e:
        print(f"An error occurred in the main method: {e}")

If relevant, a WAV file of your input audio.

Unfortunately GitHub does not allow the uploading of .wav files.

Additional information as shown below

Describe the bug

A clear and concise description of what the bug is. If things are not working as you expect,
describe exactly what you are getting and why that is not what you expect.
For example, speech recognition "does not work" may mean you got a cancellation
event with a particular error message, or you did not get any recognition events,
or the recognition result you got contains text that does not match what was spoken.

To Reproduce

Steps to reproduce the behavior:

Run the python class provided as follows: python path/to/AzureSpeechToTextParser.py --file ~/Downloads/test.wav
Try to set the speech_config.speech_recognition_language = "en-US" to temporarily make it "work" (even though the output will not be Greek words)

Expected behavior

The speech recognition process should work for Greek (el-GR) (it should output Greek phrases), without hanging and without exceeding the client buffer size.

** Actual Behavior**

The speech recognition process fails with the following error:

Speech Recognition canceled: CancellationReason.Error
Error details: Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech = "1.42.0"

Platform, Operating System, and Programming Language

OS: Ubuntu 22.04
Hardware - x64
Programming language: Python

The text was updated successfully, but these errors were encountered:

pankopon · 2025-02-21T01:54:45Z

Hi, you can upload files of any format when you first zip the file(s) and then attach the zip package.

The SDK does not have language specific logic so issues like this are typically due to the service behavior on a certain language or region.

I used the following simple code with a 5+ minute wav file for input_filename and "el-GR" for input_language and there were no errors. If the issue still occurs and especially so that you can reproduce it on Speech Studio then please consider reporting it there as well.

import threading

def recognize_speech_from_file():
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.speech_recognition_language = input_language
    audio_config = speechsdk.audio.AudioConfig(filename=input_filename)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    recognition_done = threading.Event()

    def recognized_cb(evt):
        result = evt.result
        if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print('RECOGNIZED: {}'.format(result.text))
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print('NO MATCH: {}'.format(result.no_match_details.reason))

    def canceled_cb(evt):
        result = evt.result
        if result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = result.cancellation_details
            print('CANCELED: {}'.format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print('Error details: {}'.format(cancellation_details.error_details))
                recognition_done.set()

    def stopped_cb(evt):
        print('SESSION STOPPED: {}'.format(evt))
        recognition_done.set()

    speech_recognizer.recognizing.connect(lambda evt: print('Recognizing: {}'.format(evt.result.text)))
    speech_recognizer.recognized.connect(recognized_cb)
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(stopped_cb)
    speech_recognizer.canceled.connect(canceled_cb)

    speech_recognizer.start_continuous_recognition()
    recognition_done.wait()
    speech_recognizer.stop_continuous_recognition()

pankopon self-assigned this Feb 21, 2025

pankopon added in-review In review pending close Closed soon without new activity service-side issue no reproduce We cannot reproduce this issue python Pull requests that update Python code labels Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (`el-GR`), the recognition process hangs, and after a long time it fails #2737

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (`el-GR`), the recognition process hangs, and after a long time it fails #2737

PavlosIsaris commented Feb 11, 2025 •

edited

Loading

pankopon commented Feb 21, 2025

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails #2737

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails #2737

Comments

PavlosIsaris commented Feb 11, 2025 • edited Loading

Issue with el-GR language: Speech Recognition canceled: CancellationReason.Error - Client buffer exceeded maximum size

Description

Code to Reproduce

pankopon commented Feb 21, 2025

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (`el-GR`), the recognition process hangs, and after a long time it fails #2737

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (`el-GR`), the recognition process hangs, and after a long time it fails #2737

PavlosIsaris commented Feb 11, 2025 •

edited

Loading

Issue with `el-GR` language: Speech Recognition canceled: CancellationReason.Error - Client buffer exceeded maximum size