Building a Text-to-Speech (TTS) Web Application Using Python Flask and TTS Library

Shubham Gupta Leave a Comment October 13, 2024

Building a Text-to-Speech (TTS) Web Application Using Python Flask and `TTS` Library

In this tutorial, we will walk through the steps to create a simple web application for converting text to speech (TTS) using Python’s Flask framework and a deep learning-based TTS model. We’ll build a basic form to take text input from the user, synthesize speech using the TTS library, and save the output as an audio file.

Prerequisites

Before starting, make sure you have:

Python 3.9.6 installed
Basic knowledge of Python, Flask, and HTML
The necessary Python packages installed (as explained in this guide)

Project Structure

Here’s what the structure of your project will look like:

/tts_project
    /templates
        index.html
    main.py
    tts.py

index.html: Contains the form for text input.main.py: The Flask application that handles routing and form submission.tts.py: Contains the logic for converting text into speech using a pre-trained model.

Step 1: Set Up the HTML Form (`index.html`)

The first step is to create an HTML file that will serve as the front-end for our web app. This file contains a simple form where users can enter the text they want to convert to speech.

`templates/index.html`

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>TTS Web App</title>
</head>
<body>
    <h1>Text-to-Speech Converter</h1>
    <form action="/submit" method="POST">
        <label for="inputData">Enter your text:</label>
        <input type="text" id="inputData" name="inputData" required>
        <button type="submit">Convert to Speech</button>
    </form>
</body>
</html>

This form will submit the entered text to the Flask server via a POST request for conversion to speech.

Step 2: Create the Flask Application (`main.py`)

The next step is to set up the Flask backend that will handle the form submission and use the TTS library to convert the input text to speech.

`main.py`

from flask import Flask, request, render_template
from tts import convertTTS

app = Flask(__name__)

# Route to display the form
@app.route('/')
def index():
    return render_template('index.html')

# Route to handle form submission and text-to-speech conversion
@app.route('/submit', methods=['POST'])
def submit():
    if request.method == 'POST':
        # Retrieve the text input from the form
        input_data = request.form['inputData']
        
        # Call the convertTTS function from tts.py
        convertTTS(input_data, 'output_audio')
        
        # Respond with the processed data
        return f"<h1>Text-to-Speech Conversion Complete: {input_data}</h1>"

if __name__ == '__main__':
    app.run(debug=True)

This script uses Flask to serve the index.html file and handle the form submission.
When the form is submitted, the input text is passed to the convertTTS function from tts.py, which synthesizes the speech.

Step 3: Text-to-Speech Conversion Logic (`tts.py`)

Now, let’s add the logic for converting text to speech using the TTS library and saving the output as a .wav file.

`tts.py`

# python version 3.9.6
# python --version
# python -m venv venv
# venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install TTS --cache-dir "D:/internship/text_to_speech/.cache"
# build tools Visual C++
# pip uninstall torch torchvision torchaudio
# pip install transformers datasets torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --cache-dir "D:/internship/tts_project/.cache"
# pip install soundfile --cache-dir "D:/internship/tts_project/.cache"

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf  # To save the output as a wav file

# Step 1: Load the model configuration
config = XttsConfig()
config.load_json("D:/internship/text_to_speech/assets/tts_configs/config.json")

# Step 2: Initialize the model
model = Xtts.init_from_config(config)

# Step 3: Load the pre-trained weights
model.load_checkpoint(config, checkpoint_dir="D:/internship/text_to_speech/assets/tts_configs", eval=True)

# Optional: If you have CUDA installed and want to use GPU, uncomment the line below
# model.cuda()

def convertTTS(text, output_file):
    # Step 4: Synthesize the output
    outputs = model.synthesize(
        text,
        config,
        speaker_wav="Girl.wav",  # Replace with the correct path
        gpt_cond_len=3,
        language="en",
    )

    # Step 5: Save the synthesized speech to a wav file
    output_wav = outputs['wav']
    sf.write(str(output_file) + '.wav', output_wav, config.audio.sample_rate)

    print("Speech synthesis complete and saved to output.wav")

This script uses the TTS library to load a pre-trained model and convert the input text to speech.The synthesized speech is saved as a .wav file using the soundfile library.Make sure to replace the "Girl.wav" with the correct speaker’s audio file path, and adjust the paths in the configuration and checkpoint accordingly.

Conclusion

You’ve now built a simple text-to-speech (TTS) web application using Flask and the TTS library. This application allows users to enter text into a form, which is then converted into speech using a deep learning model and saved as an audio file.

You can further enhance this application by:

Adding a feature to allow users to download the generated audio file.
Supporting multiple languages or different voices.
Implementing more advanced error handling for user inputs.

This setup can serve as the foundation for more complex TTS applications in the future!