In the previous blog, we learned about Text-to-Speech Recognition in Python. This time we will learn about Speech Recognition in Python. This will be a major step toward the creation of a voice assistant.
The speech Recognition Package is not built into Python, so we’re going to install it using the Python package installer.
There are many voice recognition packages that exist on PyPI. Some of them are:
In this blog, we will primarily focus on SpeechRecognition Module.
$ pip install SpeechRecognition
This will install the Speech Recognition Package in Python. Now, we can use this package and its function for speech recognition. And can move a step further in our Voice Assistant Creation.
The Speech Recognition will use our machine’s microphone to recognize the speech and convert to string. We will have to install PyAudio for this purpose.
pip install pyaudio, an error occurs, so this time we will install pyaudio by downloading and then installing using pipwin.
Download PyAudio .whl file from the link. Change the directory to the downloaded file.
$ pip install .\PyAudio-0.2.11-cp39-cp39-win_amd64.whl
One more workaround is first to install pipwin then install pyaudio using pipwin.
$ pip install pipwin $ pipwin install pyaudio
The necessary packages for Speech Recognition have been installed. Now we can code the speech recognition in python.
Speech Recognition in Python
import speech_recognition as sr recognizer = sr.Recognizer() with sr.Microphone() as source: print("Listening...") recognizer.adjust_for_ambient_noise(source) audio = recognizer.listen(source) try: print("Recognizing...") query = recognizer.recognize_google(audio) except sr.UnknownValueError: print("Could not understand audio") print(query.lower())
Let’s understand the code line by line.
First of all import
speech_recognition library, in this case, we have imported it as an alias as the original name of the import is quite long.
Recognizer class in Speech Recognition Library
recognizer = sr.Recognizer()
After importing, the first step is to create an instance of the Recognizer present in the speech_recognition library.
Now the recognition variable that contains the speech recognition instance of the Recognizer will be used to call any function in it.
with sr.Microphone() as source:
After the creation of the instance of Recognizer, we access the machine’s Microphone using the speech_recognition library.
The code above is the code for the file handling in Python. For the microphone instance, we provide an alias i.e. the source.
Adjust For Ambient Noise
adjust_for_ambient_noise() function from the recognizer takes the microphone instance as input. This function makes the necessary changes to the settings that allow the speech to be heard in a slightly noisy environment.
Save Microphone Input Using Listen()
audio = recognizer.listen(source)
Above code stores the speech in an audio variable.
Error Handling in Speech Recognition
One thing to keep in mind is that the recognizer might fail sometime to recognize the speech and thus we will have to handle the error.
It may happen that the error might affect our microphone so handling error becomes necessary.
try: print("Recognizing...") query = recognizer.recognize_google(audio) except sr.UnknownValueError: print("Could not understand audio") print(query.lower())
Speech Recognizer API
There are many recognizers that the SpeechRecogntion Module ships with. Example:
recognize_bing(): Microsoft Bing Speech Recognisers
recognize_google(): Google Web Speech Recognizer API
recognize_google_cloud(): requires the installation of the google-cloud-speech package
recognize_houndify(): Houndify by SoundHound
recognize_ibm(): IBM Speech-to-Text Recognizer
recognize_sphinx(): CMU Sphinx – PocketSphinx must be installed
In the above list of recognizers, only
recognize_sphinx() works offline with a Sphinx CMU engine, rest requires internet connection.
Here, we will be using the Google Web Speech Recognizer API.
recognize_google() function uses the google speech recognizer to recognize the recorded voice.
recognize_google() function fails to recognize the speech it will throw an UnknownValueError, that we catch in the except block and print the error message to the console.
If everything works fine, the recognized speech will be saved in the query variable and printed.
This is our simple Python Speech Recognition. Our Next target is to Combine Text-to-Speech and Speech Recognition so that we can give an assistant command and give voice output.