-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Latency from around 700ms to 2000ms - code running in india and using centralindia deployment #2730
Comments
I'm not sure what type of latency you are trying to measure. network latency, latency between recognized events, ... Most people asking about speech latency are looking at User Perceived Latency, which is the time from when they start speaking until the first hypothesized event is received back from the service. To measure that, you would need to modify your code some. I'd probably modify your while wait loop to take keyboard input, and each time a key is pressed, reset your "start_time = time.time()". In this way you can see the latency the user will see between speaking and the application responding to them. |
If you need to speed up the latency of that very first recognition, you can bypass the initial network latency caused by NS Lookup, connection to the service, doing certificate validation, upgrading from http to websockets.... Creates a recognizer with the given settingsspeech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config) Get the connection object from the recognizerconnection = speechsdk.Connection.from_recognizer(speech_recognizer) Open the connection to reduce latencyconnection.open(True) but that will only help speed up the very first recognition. |
Hi @BrianMouncer The latency I am talking about here, is the latency given by the sdk, which is as good as user perceieved latency. This comes as a property (SpeechServiceResponse_RecognitionLatencyMs) of the results of stt.
This is very strange that it comes around 700 ms to 2 seconds and hence can't be used in realtime use cases like voicebot etc. Going through the code, do you think, there is any change, that I can do to reduce latency. I want to auto detect language (hi-IN or en-IN) throughout the session and translate it to English using the azure SpeechServices api. I am interested in reducing the latency not only for the first recognition, but for all the recognitions. |
I am using azure speech services for speech to text and I am getting very high latency.
I am using centralindia deployment of speech service.
pip show azure-cognitiveservices-speech ⬇
My code looks like this 🔽
Attached is the output with latency and other parameters 🔽
azure_stt_test.txt
The text was updated successfully, but these errors were encountered: