Text to Speech (TTS) in Python Using Pyttsx3 Read it later

5/5 - (5 votes)

In today’s world, where automation is the need of the hour, the Text-to-Speech (TTS) technology is gaining popularity at an exponential rate. Text-to-Speech (TTS) allows users to convert written text into spoken words, which is useful in a wide range of applications such as automated customer service, accessibility for visually impaired users, and language learning. Python is a versatile programming language that is widely used for developing TTS applications. In this blog post, we will explore how to use Pyttsx3, a Python library for TTS, to build powerful text to speech (TTS) applications.

What is Pyttsx3?

Pyttsx3 is a Python library that allows developers to create text to speech (TTS) applications in a simple and easy manner. It is a cross-platform library that supports various operating systems, including Windows, Linux, and macOS.

Pyttsx3 in Python is a wrapper for the eSpeak and Microsoft Speech API (SAPI) text-to-speech engines, which provide high-quality speech synthesis capabilities. Pyttsx3 is easy to use and provides a simple interface for controlling speech output, including pitch, volume, and rate.

Install Python’s Pyttsx3 Library

Before we start, let’s install Pyttsx3 using pip, which is the most popular package manager for Python. Open a terminal or command prompt and type the following command:

pip install pyttsx3

Once installed, we can start using Pyttsx3 to build text to speech (TTS) applications in Python.

Text to Speech using Pyttsx3 in Python

Using Pyttsx3 is straightforward. We first need to import the pyttsx3 library in our Python code. We can do this by using the following command:

import pyttsx3

After importing the library, we need to create an object of the pyttsx3.init() class. This object will act as our text-to-speech engine. We can create the object using the following command:

engine = pyttsx3.init()

Once we have created the engine object, we can use its say() method to convert our text into speech. The say() method takes a string as input, which is the text we want to convert into speech. We can use the following command to convert our text into speech:

engine.say("Hello World")

In this example, we are converting the string “Hello World” into speech using the say() method.

After we have converted our text into speech, we need to play it. We can use the runAndWait() method of the engine object to play the speech. The runAndWait() method waits until the speech is complete before returning control to the program. We can use the following command to play our speech:

engine.runAndWait()

This command will play the speech generated by the say() method.

Customizing Voice Properties in Pyttsx3

Pyttsx3 provides the ability to customize various properties of the voice used for speech, such as the speaking rate, volume, and language. These properties can be set using the setProperty() method of the engine object.

Changing the Python Pyttsx3 TTS Voice

By default, Pyttsx3 uses the voice installed on our system. However, we can change the voice using the setProperty() method of the engine object.

The setProperty() method takes two arguments: the property we want to set and the value we want to set it to. We can use the following command to change the voice:

engine.setProperty('voice', 'en-us')

In this example, we are changing the voice to “en-us”.

Setting the Pyttsx3 Voice Speed

We can also change the speed of the speech using the setProperty() method.

engine.setProperty('rate', 150)

In this example, we are setting the speed of the speech to 150 words per minute.

Change TTS Voice In Pyttsx3 Python

We can also change the voice of the engine, the default is the voice of a male named David.

To change the voice of the pyttsx3 engine, first, we will have to get the list of objects of voices.

voices = engine.getProperty('voices')

The getProperty() function of the pyttsx3 package takes a string as a parameter and returns an object matching the string.

print(voices)

When we print the voices, we get a list that contains two objects.

[<pyttsx3.voice.Voice object at 0x000001AF1634D820>, <pyttsx3.voice.Voice object at 0x000001AF1634DB80>]

Let’s Print Each Voices.

print(voices[0])
print(voices[1])

voices[0] – Microsoft David Desktop (Male Voice Default).

voices[1] – Female Voice named Microsoft Zira Desktop.

# Voices[0]

<Voice id=HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_DAVID_11.0
name=Microsoft David Desktop - English (United States)
languages=[]
gender=None
age=None>

# voices[1]

<Voice id=HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0
name=Microsoft Zira Desktop - English (United States)
languages=[]
gender=None
age=None>

Now when we have understood the voice property. Also, stored the voices list from the engine.getProperty('voices') function. Let’s change the Voice of the engine to Female Voice Named Zira.

The naming convention of the pyttsx3 functions is very convenient. getProperty() function is used to get the property related to that string. So, the setProperty() function is used to set property based on a string.

engine.setProperty('voice', voices[1].id)

The above code changes the voice to Zira, the setProperty() function takes the property name and the changing property id.

Change Pitch in Pyttsx3

Pyttsx3 provides the ability to change the pitch of the speech using the setProperty() method of the engine object. This can be useful in creating more natural-sounding speech or for adding emphasis to certain parts of the text.

To change the pitch of the speech, we can use the setProperty() method with the parameter 'pitch' and a value between 0 and 1. A value of 0 represents the lowest pitch and 1 represents the highest pitch. The default value for pitch is 0.5.

engine.setProperty('pitch', 0.8)  # Sets the pitch to 0.8

In this code, we are setting the pitch to 0.8, which represents a higher pitch than the default value.

Check Voice Pitch Support in Python Pyttsx3

It is important to note that not all voices support changing the pitch, so it may not have any effect depending on the voice that is being used. We can check if a voice supports pitch by using the getProperty() method of the engine object with the parameter 'voices'.

voices = engine.getProperty('voices')
for voice in voices:
    if 'pitch' in voice.__dict__:
        print(f"{voice.id} supports pitch")

In this code, we are iterating through all the available voices and checking if the voice supports pitch by looking for the 'pitch' attribute in the voice object. If the voice supports pitch, we print a message indicating that the voice supports pitch.

Closing the Pyttsx3 Engine

After we are done with the TTS, we need to close the engine using the stop() method of the engine object.

engine.stop()

This command will stop the engine and free up any resources used by it.

Convert a text file to speech using Python Pyttsx3

Pyttsx3 allows us to convert a text file to speech. Here’s an example:

import pyttsx3

engine = pyttsx3.init()
with open('sample.txt') as file:
    text = file.read().replace('\n', '')
engine.say(text)
engine.runAndWait()

In this example, we read the contents of a text file named sample.txt and use the say and runAndWait methods to convert the text to speech and play the output.

Save Text to Speech into File using Pyttsx3 Python

Sometimes, we may want to save the speech generated by Pyttsx3 to a file for later use. Fortunately, Pyttsx3 provides a simple way to do this. We can use the save_to_file() method of the engine object to save the speech to a file. This method takes two arguments: the text to be spoken and the name of the file to which the speech should be saved.

engine.save_to_file('Hello, world!', 'output.mp3')
engine.runAndWait()

In this code, we are using the save_to_file() method to save the text “Hello, world!” to a file named “output.mp3”. We then call the runAndWait() method to wait for the speech to finish before proceeding with the rest of the code.

By default, Pyttsx3 saves the speech in the WAV format. However, we can also save the speech in other formats such as MP3 and OGG by specifying the file extension in the file name.

engine.save_to_file('Hello, world!', 'output.ogg')
engine.runAndWait()

In this code, we are saving the speech to a file named “output.ogg”, which will be in the OGG format.

It’s worth noting that Pyttsx3 uses the audio encoding format specified by the system’s audio driver. This means that the audio format may vary depending on the system on which the code is executed. If you need to ensure a specific audio format, you may need to use an external audio library to convert the audio file to the desired format.

Python Text to Speech with Flask using Pyttsx3

Pyttsx3 can also be used with Flask, which is a popular Python web framework. Here’s an example:

from flask import Flask, request
import pyttsx3

app = Flask(__name__)
engine = pyttsx3.init()

@app.route('/')
def index():
    return '''
        <form method="POST" action="/speak">
            <input type="text" name="text">
            <input type="submit" value="Speak">
        </form>
    '''

@app.route('/speak', methods=['POST'])
def speak():
    text = request.form['text']
    engine.say(text)
    engine.runAndWait()
    return 'Speech completed'

if __name__ == '__main__':
    app.run(debug=True)

In this example, we define a Flask application that has two routes: the root route that displays a form to enter text, and the /speak route that converts the text to speech using Pyttsx3. We create an instance of the engine class and use the say and runAndWait methods in the /speak route to convert the text to speech and play the output.

Python Pyttsx3 Text to Speech Callbacks

Pyttsx3 provides a Engine class that allows us to define callbacks, which are functions that are called at specific points during the speech synthesis process. Here’s an example:

import pyttsx3

def onStart(name):
    print('Starting to speak:', name)

def onEnd(name, completed):
    if completed:
        print('Speech completed successfully:', name)
    else:
        print('Speech interrupted:', name)

engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('finished-utterance', onEnd)
engine.say("Hello, world!")
engine.runAndWait()

In this example, we define two callback functions named onStart and onEnd, which are called when speech synthesis starts and ends, respectively.

We then create an instance of the engine class and use the connect method to register the callback functions. Finally, we use the say and runAndWait methods to convert the text to speech and play the output.

Utterance Event Handlers

Utterance event handlers are used to handle events related to the utterances, such as the completion of speech, the start of speech, and errors during speech. We can use these event handlers to perform specific actions when certain events occur during speech.

Pyttsx3 provides four types of utterance event handlers:

1. started-utterance

This event occurs when the engine starts speaking an utterance.

Example:

import pyttsx3

def on_started_utterance(name):
    print("Started speaking...")

engine = pyttsx3.init()
engine.connect('started-utterance', on_started_utterance)

engine.say("This text will be spoken")
engine.runAndWait()

In this code, we are defining a function on_started_utterance() that will be called when the 'started-utterance' event occurs.

When the engine starts speaking the text, the utterance event handler on_started_utterance() will be called and will print the message “Started speaking…”.

2. finished-utterance

This event occurs when the engine finishes speaking an utterance.

Example:

import pyttsx3

def on_finished_utterance(name):
    print("Finished speaking...")

engine = pyttsx3.init()
engine.connect('finished-utterance', on_finished_utterance)

engine.say("This text will be spoken")
engine.runAndWait()

In this code, we are defining a function on_finished_utterance() that will be called when the 'finished-utterance' event occurs.

When the engine finishes speaking the text, the utterance event handler on_finished_utterance() will be called and will print the message “Finished speaking…”.

3. started-word

This event occurs when the engine starts speaking a word.

Example:

import pyttsx3

def on_started_word(name, location, length):
    print(f"Started speaking word at {location}...")

engine = pyttsx3.init()
engine.connect('started-word', on_started_word)

engine.say("This text will be spoken")
engine.runAndWait()

In this code, we are defining a function on_started_word() that will be called when the 'started-word' event occurs.

When the engine starts speaking each word in the text, the utterance event handler on_started_word() will be called and will print a message indicating the location of the word.

4. finished-word

This event occurs when the engine finishes speaking a word.

Example:

import pyttsx3

def on_finished_word(name, location, length):
    print(f"Finished speaking word at {location + length}...")

engine = pyttsx3.init()
engine.connect('finished-word', on_finished_word)

engine.say("This text will be spoken")
engine.runAndWait()

In this code, we are defining a function on_finished_word() that will be called when the 'finished-word' event occurs.

When the engine finishes speaking each word in the text, the utterance event handler on_finished_word() will be called and will print a message indicating the end of the word.

Error Event Handler

Pyttsx3 also provides an error event handler that can be used to handle errors that may occur during the text to speech process.

The error event handler is called whenever an error occurs during the speaking process.

The syntax for registering an error event handler is similar to registering other event handlers. We use the connect() method of the engine object to register the error event handler function, as shown below:

import pyttsx3

def on_error(name, error):
    print(f"Error occurred while speaking: {error}")

engine = pyttsx3.init()
engine.connect('error', on_error)

engine.say("This text will be spoken")
engine.runAndWait()

In the above code, we have defined a function on_error() which takes two arguments: the name of the event that occurred (in this case, 'error'), and the error message itself.

If an error occurs during the speaking process, the error event handler on_error() will be called, and it will print a message indicating the error.

By using the error event handler, we can handle errors that may occur during the text-to-speech process and respond to them appropriately. This can help us to create more robust and reliable text-to-speech applications using pyttsx3 in Python.

Driver Event Loop

Pyttsx3 in Python also supports driver event handlers. These handlers allow you to register functions that will be called when certain events occur in the Text-to-Speech engine’s event loop.

To use driver event handlers in pyttsx3, we first need to run the engine’s event loop using the startLoop() method. This method starts a new thread that runs the event loop, allowing the main thread to continue executing other code.

import pyttsx3

def on_start(name):
    print(f"Starting to speak {name}...")

def on_end(name, completed):
    print(f"Finished speaking {name}, completed={completed}")
    engine.endLoop()

engine = pyttsx3.init()
engine.connect('started-utterance', on_start)
engine.connect('finished-utterance', on_end)

engine.say("This text will be spoken")
engine.startLoop()

In this code, we are defining a function on_start() and on_end() that will be called when the 'started-utterance' and ‘finished-utterance' event occurs.

When we run this code, the engine will speak the text “This text will be spoken”, and the on_start() function will be called when the engine starts speaking the utterance.

Once the engine finishes speaking the utterance, the on_end() function will be called, and the event loop will be stopped by calling endLoop().

Difference between startLoop() and runAndWait()

When using Pyttsx3 for text to speech, there are two main methods that can be used to initiate speech: startLoop() and runAndWait(). While these methods might seem similar at first glance, they actually have some important differences in their behavior.

startLoop()

The startLoop() method is used to start the speech synthesis engine in a separate thread, which allows the application to continue running while the engine is speaking. This can be useful when the application needs to perform other tasks while the engine is speaking.

import pyttsx3

engine = pyttsx3.init()
engine.startLoop()

engine.say("Hello, world!")
engine.say("This is a test.")

while True:
    pass

In this example, we initialize the Pyttsx3 engine and call startLoop() to start the engine in a separate thread. We then use the say() method to queue up two phrases to be spoken.

Finally, we enter an infinite loop to keep the application running, since the engine will continue speaking in the background.

runAndWait()

The runAndWait() method is used to start the speech synthesis engine and block the application until the engine has finished speaking. This can be useful when the application needs to wait for the engine to finish speaking before performing other tasks.

import pyttsx3

engine = pyttsx3.init()

engine.say("Hello, world!")
engine.say("This is a test.")

engine.runAndWait()

In this example, we initialize the Pyttsx3 engine and use the say() method to queue up two phrases to be spoken. We then call runAndWait() to start the engine and block the application until the engine has finished speaking both phrases.

The key difference between these methods is that startLoop() does not block the application, while runAndWait() does. This can impact the behavior of the application and the user experience, so it is important to choose the appropriate method based on the specific needs of the application.

Speech Synthesis Markup Language (SSML)

SSML stands for Speech Synthesis Markup Language and it is a markup language that is used to control various aspects of the Text to Speech output such as pronunciation, pitch, rate, volume, etc. It allows users to add additional information to the text that is being spoken to provide a better TTS experience.

SSML is divided into three types of tags:

1. Text Processing Tags

  • <say-as> tag: It is used to specify how a specific text should be pronounced. For example, you can use this tag to specify that a certain set of numbers should be read as a telephone number.
    • Input: <speak>My phone number is <say-as interpret-as="telephone">+1-123-456-7890</say-as>.</speak>
    • Output: “My phone number is plus one, one two three, four five six, seven eight nine zero.”
  • <phoneme> tag: It is used to specify the phonetic pronunciation of a word. This tag can be useful when you need to specify a word with a non-standard pronunciation.
    • Input: <speak>The word "schedule" can be pronounced as <phoneme alphabet="ipa" ph="ˈʃɛdjuːl">shedyool</phoneme> or <phoneme alphabet="ipa" ph="ˈskɛdjuːl">skedyool</phoneme>.</speak>
    • Output: “The word ‘schedule’ can be pronounced as shedyool or skedyool.”
  • <sub> tag: It is used to replace a word with another word or phrase. This tag can be useful when you need to correct a word or when you want to use a synonym instead of the original word.
    • Input: <speak>She ate a <sub alias="big">large</sub> slice of pizza.</speak>
    • Output: “She ate a big slice of pizza.”
  • <break> tag: It is used to insert a pause into the TTS output. You can specify the duration of the pause in milliseconds.
    • Input: <speak>This is a sentence.<break time="1000ms"/>This is another sentence.</speak>
    • Output: “This is a sentence. (1 second pause) This is another sentence.”

2. Prosody Tags

  • <prosody> tag: It is used to adjust the pitch, rate, and volume of the TTS output. You can also specify a duration for the changes.
    • Input: <speak><prosody pitch="+30%">This is spoken with a higher pitch.</prosody></speak>
    • Output: “This is spoken with a higher pitch.”
  • <emphasis> tag: It is used to emphasize a word or phrase in the TTS output.
    • Input: <speak>The <emphasis level="strong">cat</emphasis> ran quickly.</speak>
    • Output: “The CAT ran quickly.”
  • <say-as interpret-as="interjection"> tag: It is used to add an interjection to the TTS output. This tag can be useful when you want to add emotion to the TTS output.
    • Input: <speak><say-as interpret-as="interjection">Wow!</say-as> That was amazing.</speak>
    • Output: “Wow! That was amazing.”
  • <say-as interpret-as="spell-out"> tag: It is used to spell out a word or set of numbers.
    • Input: <speak>The year is <say-as interpret-as="spell-out">2023</say-as>.</speak>
    • Output: “The year is two zero two three.”

3. Audio Tags

  • <audio> tag: It is used to include an audio file in the TTS output. You can specify the URL of the audio file.
    • Input: <speak>Listen to this sound:<audio src="https://example.com/sound.mp3"/></speak>
    • Output: Plays the audio file located at .
  • <desc> tag: It is used to provide a description of the audio file. This tag can be useful for accessibility purposes.
    • Input: <speak>Listen to this sound:<audio src="https://example.com/sound.mp3"><desc>This is the sound of a bird singing.</desc></audio></speak>
    • Output: Plays the audio file located at and provides a description of the audio file for accessibility purposes.
  • <p> tag: It is used to pause the TTS output while the audio file is playing.
    • Input: <speak>This is a sentence. <p><audio src="https://example.com/sound.mp3"/></p> This is another sentence.</speak>
    • Output: “This is a sentence. (audio file plays) This is another sentence.”

Python Pyttsx3 supports the use of SSML tags, allowing users to customize the Text-to-Speech output in a variety of ways. The say() method of the engine object can be used to convert text to speech using SSML tags. The text should be enclosed in a <speak> tag to indicate that it is SSML.

Pyttsx3 SSML Example

Here is an example of using SSML to add a pause in the TTS output:

import pyttsx3

engine = pyttsx3.init()
engine.say('<speak>This is a test <break time="500ms"/> of SSML tags.</speak>')
engine.runAndWait()

In this code, the <break> tag is used to insert a pause of 500 milliseconds in the TTS output.

Other SSML tags can be used to customize the TTS output in various ways. For example, the <prosody> tag can be used to adjust the pitch, rate, and volume of the TTS output. The <emphasis> tag can be used to emphasize certain words in the TTS output.

Here is an example of using the <prosody> tag to adjust the pitch, rate, and volume of the TTS output:

import pyttsx3

engine = pyttsx3.init()
engine.say('<speak><prosody pitch="high" rate="slow" volume="loud">This is a test of SSML tags.</prosody></speak>')
engine.runAndWait()

In this code, we are using the <prosody> tag to adjust the pitch to “high”, the rate to “slow”, and the volume to “loud” for the text “This is a test of SSML tags”.

By using SSML tags in Text-to-Speech using Python Pyttsx3, we can create a more customized and natural-sounding TTS experience for our users.

What’s Next?

If you’re interested in exploring more about Text to Speech, you should definitely check out the blog post on using Google Cloud Text to Speech (TTS) API. It covers everything from project creation on Google Cloud Platform to REST API Endpoint implementation, and it’s a fantastic companion to this blog.

You can also learn about Speech Recognition in Python.

Wrapping Up

In conclusion, we have covered a lot of ground in this complete guide to using Pyttsx3 for text-to-speech in Python. We started by learning how to install Pyttsx3 and set up a basic TTS application, and then went on to explore some of the more advanced features such as changing engine properties, using callbacks and event handlers, and working with driver event loops.

We also learned about SSML and how to use it in Pyttsx3 to add more control over the prosody and audio output of our TTS application. By using SSML tags like <prosody>, <say-as>, and <audio>, we can create more dynamic and expressive speech output.

We can say that, Pyttsx3 is a powerful and flexible tool for adding text-to-speech functionality to Python applications. With its easy-to-use API and extensive documentation, it is a great choice for developers looking to add TTS to their projects. Whether you are building a chatbot, a virtual assistant, or any other type of application that requires speech output, Pyttsx3 is definitely worth checking out.

References

Was This Article Helpful?

3 Comments

  1. I am using Pyttsx3 in a Python app. The text to speech works fine but then the app just closes after the speech. I’m using the same code as above in my app.

    def speak(self,text):
        engine = pyttsx3.init()
        engine.say(text)
        engine.runAndWait()
        engine.stop()

    Any suggestions?? Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *