Building a Text-to-Speech (TTS) Web Application Using Python Flask and TTS
Library
In this tutorial, we will walk through the steps to create a simple web application for converting text to speech (TTS) using Python’s Flask framework and a deep learning-based TTS model. We’ll build a basic form to take text input from the user, synthesize speech using the TTS
library, and save the output as an audio file.
Prerequisites
Before starting, make sure you have:
- Python 3.9.6 installed
- Basic knowledge of Python, Flask, and HTML
- The necessary Python packages installed (as explained in this guide)
Project Structure
Here’s what the structure of your project will look like:
/tts_project
/templates
index.html
main.py
tts.py
index.html
: Contains the form for text input.main.py
: The Flask application that handles routing and form submission.tts.py
: Contains the logic for converting text into speech using a pre-trained model.
Step 1: Set Up the HTML Form (index.html
)
The first step is to create an HTML file that will serve as the front-end for our web app. This file contains a simple form where users can enter the text they want to convert to speech.
templates/index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>TTS Web App</title>
</head>
<body>
<h1>Text-to-Speech Converter</h1>
<form action="/submit" method="POST">
<label for="inputData">Enter your text:</label>
<input type="text" id="inputData" name="inputData" required>
<button type="submit">Convert to Speech</button>
</form>
</body>
</html>
This form will submit the entered text to the Flask server via a POST request for conversion to speech.
Step 2: Create the Flask Application (main.py
)
The next step is to set up the Flask backend that will handle the form submission and use the TTS
library to convert the input text to speech.
main.py
from flask import Flask, request, render_template
from tts import convertTTS
app = Flask(__name__)
# Route to display the form
@app.route('/')
def index():
return render_template('index.html')
# Route to handle form submission and text-to-speech conversion
@app.route('/submit', methods=['POST'])
def submit():
if request.method == 'POST':
# Retrieve the text input from the form
input_data = request.form['inputData']
# Call the convertTTS function from tts.py
convertTTS(input_data, 'output_audio')
# Respond with the processed data
return f"<h1>Text-to-Speech Conversion Complete: {input_data}</h1>"
if __name__ == '__main__':
app.run(debug=True)
- This script uses Flask to serve the
index.html
file and handle the form submission. - When the form is submitted, the input text is passed to the
convertTTS
function fromtts.py
, which synthesizes the speech.
Step 3: Text-to-Speech Conversion Logic (tts.py
)
Now, let’s add the logic for converting text to speech using the TTS
library and saving the output as a .wav
file.
tts.py
# python version 3.9.6
# python --version
# python -m venv venv
# venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install TTS --cache-dir "D:/internship/text_to_speech/.cache"
# build tools Visual C++
# pip uninstall torch torchvision torchaudio
# pip install transformers datasets torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --cache-dir "D:/internship/tts_project/.cache"
# pip install soundfile --cache-dir "D:/internship/tts_project/.cache"
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf # To save the output as a wav file
# Step 1: Load the model configuration
config = XttsConfig()
config.load_json("D:/internship/text_to_speech/assets/tts_configs/config.json")
# Step 2: Initialize the model
model = Xtts.init_from_config(config)
# Step 3: Load the pre-trained weights
model.load_checkpoint(config, checkpoint_dir="D:/internship/text_to_speech/assets/tts_configs", eval=True)
# Optional: If you have CUDA installed and want to use GPU, uncomment the line below
# model.cuda()
def convertTTS(text, output_file):
# Step 4: Synthesize the output
outputs = model.synthesize(
text,
config,
speaker_wav="Girl.wav", # Replace with the correct path
gpt_cond_len=3,
language="en",
)
# Step 5: Save the synthesized speech to a wav file
output_wav = outputs['wav']
sf.write(str(output_file) + '.wav', output_wav, config.audio.sample_rate)
print("Speech synthesis complete and saved to output.wav")
This script uses the TTS
library to load a pre-trained model and convert the input text to speech.The synthesized speech is saved as a .wav
file using the soundfile
library.Make sure to replace the "Girl.wav"
with the correct speaker’s audio file path, and adjust the paths in the configuration and checkpoint accordingly.
Conclusion
You’ve now built a simple text-to-speech (TTS) web application using Flask and the TTS
library. This application allows users to enter text into a form, which is then converted into speech using a deep learning model and saved as an audio file.
You can further enhance this application by:
- Adding a feature to allow users to download the generated audio file.
- Supporting multiple languages or different voices.
- Implementing more advanced error handling for user inputs.
This setup can serve as the foundation for more complex TTS applications in the future!