Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails #2737

Open
PavlosIsaris opened this issue Feb 11, 2025 · 1 comment
Assignees
Labels
in-review In review no reproduce We cannot reproduce this issue pending close Closed soon without new activity python Pull requests that update Python code service-side issue

Comments

@PavlosIsaris
Copy link

PavlosIsaris commented Feb 11, 2025

Issue with el-GR language: Speech Recognition canceled: CancellationReason.Error - Client buffer exceeded maximum size

Description

When using the Azure Speech SDK to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails with the following error:

Speech Recognition canceled: CancellationReason.Error Error details: Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer.

However, when the language is set to English (en-US), the recognition process continues as expected (even though the output is english words). This seems to be a bug in the Azure Speech SDK.

Also, the same happens in Azure Speech Studio. If you select "Greek" in the languages dropdown and upload an audio file with Greek contents, the process hangs and it does not output any text:

Image

  • Speech SDK log taken from a run that exhibits the reported issue.

Log file:

STTLogs.txt

  • A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.

Code to Reproduce

import os
import time
import argparse
import azure.cognitiveservices.speech as speechsdk
from pydub import AudioSegment
from env_loader import load_env_values

class AzureSpeechToTextParser:
    """
    Parses an audio file to text using the Azure Speech SDK.
    Supports both compressed (MP3, MP4) and uncompressed (WAV) formats.
    Converts compressed files to WAV before processing, then deletes the WAV file after processing.
    """
    def __init__(self, audio_file_path: str, subscription_key: str, service_region: str):
        super().__init__(audio_file_path)
        self.subscription_key = subscription_key
        self.service_region = service_region
        self.done = False
        self.speech_recognizer = None
        self.recognized_texts = []
        self.temp_wav_file = None  # Stores temp WAV file if conversion is needed

    def convert_to_wav(self):
        """
        Converts MP3/MP4 to WAV format and returns the new file path.
        """
        file_extension = os.path.splitext(self.audio_file_path)[-1].lower()

        if file_extension in [".mp3", ".mp4"]:
            print(f"Converting {file_extension} to WAV for processing...")

            wav_path = self.audio_file_path.replace(file_extension, ".wav")
            audio = AudioSegment.from_file(self.audio_file_path, format=file_extension[1:])  # Remove dot from extension
            audio = audio.set_channels(1).set_frame_rate(16000)  # Ensure Azure compatibility
            audio.export(wav_path, format="wav")

            self.temp_wav_file = wav_path  # Store for deletion later
            return wav_path
        return self.audio_file_path  # If already WAV, return as is

    def parse(self):
        """
        Parses the audio file into text. Converts MP3/MP4 to WAV before processing.
        Deletes the temporary WAV file after recognition.
        """
        self.done = False
        self.recognized_texts = []
        temp_file_used = False  # Track if conversion happened

        try:
            print("Initializing Azure Speech SDK...")
            speech_config = speechsdk.SpeechConfig(subscription=self.subscription_key, region=self.service_region)
            speech_config.speech_recognition_language = "el-GR"  # Change to "en-US" to make it work
            log_file_path = os.path.join(os.path.dirname(__file__), "STTLogs.txt")
            speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, log_file_path)
            # Convert file if necessary
            self.audio_file_path = self.convert_to_wav()
            if self.temp_wav_file:
                temp_file_used = True  # Mark that we need to delete the file later

            # Process the WAV file with Azure
            audio_config = speechsdk.audio.AudioConfig(filename=self.audio_file_path)
            self.speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

            self.speech_recognizer.recognized.connect(self.recognized_callback)
            self.speech_recognizer.canceled.connect(self.cancel_callback)
            self.speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
            

            print("Starting speech recognition...")
            self.speech_recognizer.start_continuous_recognition()

            while not self.done:
                time.sleep(0.5)

            self.speech_recognizer.stop_continuous_recognition()
            return self.recognized_texts

        except Exception as e:
            print(f"An error occurred during speech recognition: {e}")
            return None

        finally:
            # Delete the temporary WAV file if it was created
            if temp_file_used and self.temp_wav_file:
                try:
                    os.remove(self.temp_wav_file)
                    print(f"Deleted temporary file: {self.temp_wav_file}")
                except Exception as e:
                    print(f"Failed to delete temporary file: {e}")

    def recognized_callback(self, event):
        """Handles recognized text."""
        print(f"\n\nRecognized phrase: {event.result.text}")
        self.recognized_texts.append(event.result.text)

    def cancel_callback(self, event):
        """Handles recognition cancellation."""
        cancellation_details = event.result.cancellation_details
        print(f"Speech Recognition canceled: {cancellation_details.reason}")
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print(f"Error details: {cancellation_details.error_details}")
        self.done = True

# Main execution
if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert audio file to text using Azure Speech SDK.")
    parser.add_argument("--file", required=True, help="Path to the audio file to be processed.")
    args = parser.parse_args()

    try:
        env_values = load_env_values()
        speech_key = env_values["AZURE_SPEECH_KEY"]
        speech_region = env_values["AZURE_SPEECH_REGION"]

        if not speech_key or not speech_region:
            raise ValueError("Azure speech key and region must be set in the environment variables")

        # set the file path as the "--file" argument
        file_path = args.file
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"File not found: {file_path}")

        parser = AzureSpeechToTextParser(file_path, speech_key, speech_region)
        texts = parser.parse()
        print("\n\nFinal recognized text:")
        # print each element in the list with its index
        for i, text in enumerate(texts):
            print(f"{i + 1}. {text}")
    except Exception as e:
        print(f"An error occurred in the main method: {e}")
  • If relevant, a WAV file of your input audio.

Unfortunately GitHub does not allow the uploading of .wav files.

  • Additional information as shown below

Describe the bug

A clear and concise description of what the bug is. If things are not working as you expect,
describe exactly what you are getting and why that is not what you expect.
For example, speech recognition "does not work" may mean you got a cancellation
event with a particular error message, or you did not get any recognition events,
or the recognition result you got contains text that does not match what was spoken.

To Reproduce

Steps to reproduce the behavior:

  1. Run the python class provided as follows: python path/to/AzureSpeechToTextParser.py --file ~/Downloads/test.wav
  2. Try to set the speech_config.speech_recognition_language = "en-US" to temporarily make it "work" (even though the output will not be Greek words)

Expected behavior

The speech recognition process should work for Greek (el-GR) (it should output Greek phrases), without hanging and without exceeding the client buffer size.

** Actual Behavior**

The speech recognition process fails with the following error:

Speech Recognition canceled: CancellationReason.Error
Error details: Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech = "1.42.0"

Platform, Operating System, and Programming Language

  • OS: Ubuntu 22.04
  • Hardware - x64
  • Programming language: Python
@PavlosIsaris PavlosIsaris changed the title When using the Azure Speech SDK to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails When using the Azure Speech SDK (or Speech Studio) to recognize speech in Greek (el-GR), the recognition process hangs, and after a long time it fails Feb 11, 2025
@pankopon
Copy link
Contributor

Hi, you can upload files of any format when you first zip the file(s) and then attach the zip package.

The SDK does not have language specific logic so issues like this are typically due to the service behavior on a certain language or region.

I used the following simple code with a 5+ minute wav file for input_filename and "el-GR" for input_language and there were no errors. If the issue still occurs and especially so that you can reproduce it on Speech Studio then please consider reporting it there as well.

import threading

def recognize_speech_from_file():
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.speech_recognition_language = input_language
    audio_config = speechsdk.audio.AudioConfig(filename=input_filename)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    recognition_done = threading.Event()

    def recognized_cb(evt):
        result = evt.result
        if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print('RECOGNIZED: {}'.format(result.text))
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print('NO MATCH: {}'.format(result.no_match_details.reason))

    def canceled_cb(evt):
        result = evt.result
        if result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = result.cancellation_details
            print('CANCELED: {}'.format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print('Error details: {}'.format(cancellation_details.error_details))
                recognition_done.set()

    def stopped_cb(evt):
        print('SESSION STOPPED: {}'.format(evt))
        recognition_done.set()

    speech_recognizer.recognizing.connect(lambda evt: print('Recognizing: {}'.format(evt.result.text)))
    speech_recognizer.recognized.connect(recognized_cb)
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(stopped_cb)
    speech_recognizer.canceled.connect(canceled_cb)

    speech_recognizer.start_continuous_recognition()
    recognition_done.wait()
    speech_recognizer.stop_continuous_recognition()

@pankopon pankopon self-assigned this Feb 21, 2025
@pankopon pankopon added in-review In review pending close Closed soon without new activity service-side issue no reproduce We cannot reproduce this issue python Pull requests that update Python code labels Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-review In review no reproduce We cannot reproduce this issue pending close Closed soon without new activity python Pull requests that update Python code service-side issue
Projects
None yet
Development

No branches or pull requests

2 participants