How to Build a Text-to-Speech (TTS) Application Using Python and SQLite

Shubham Gupta Leave a Comment September 6, 2024

In this article, we’ll walk through how to build a basic Text-to-Speech (TTS) system using Python, the TTS library, and SQLite to store and manage tasks. The steps involve setting up the Python environment, installing necessary dependencies, working with pre-trained TTS models, and storing tasks in a SQLite database for batch processing.

Overview of AI/ML

Python and Visual Studio Code setup

How to Set Up a Text-to-Speech Project with XTTS Model

Step 1: Set Up the Python Environment

To begin, make sure you’re using Python version 3.9.6 or higher. Use a virtual environment for isolating dependencies and keeping your environment clean.

# Check Python version
python --version

# Set up a virtual environment
python -m venv venv

# Activate the virtual environment
venv/Scripts/activate

# Upgrade pip to the latest version
python.exe -m pip install --upgrade pip

Step 2: Install Required Dependencies

The next step involves installing various libraries required for our project. This includes TTS (for text-to-speech processing), Torch (for neural network handling), and others like Soundfile for handling audio files.

# Install TTS library
pip install TTS --cache-dir "D:/internship/tts/.cache"

# Uninstall pre-existing versions of Torch
pip uninstall torch torchvision torchaudio

# Install specific versions of Torch and related libraries
pip install transformers datasets torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --cache-dir "D:/internship/tts/.cache"

# Optionally, install CUDA if you want GPU support
https://developer.nvidia.com/cuda-downloads

# Install soundfile library to handle audio
pip install soundfile --cache-dir "D:/internship/tts/.cache"

# Install DeepSpeed for optimization (optional)
pip install deepspeed==0.10.3 --cache-dir "D:/internship/tts/.cache"

Step 3: Load the TTS Model

Now that the environment is set up, we will proceed by loading the TTS model and using it to synthesize speech. The pre-trained TTS model can be configured using the XttsConfig class.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf

# Load the model configuration
config = XttsConfig()
config.load_json("D:/internship/tts/assets/tts_configs/config.json")

# Initialize the model
model = Xtts.init_from_config(config)

# Load pre-trained weights
model.load_checkpoint(config, checkpoint_dir="D:/internship/tts/assets/tts_configs", eval=True)

# Optional: Use GPU with CUDA
# model.cuda()

Step 4: Convert Text to Speech

We now create a function convertTTS() that converts text into speech and saves the output as a .wav file.

def convertTTS(text, input_audio, output_audio):
    outputs = model.synthesize(
        text,
        config,
        speaker_wav=input_audio,  # Path to the input wav file
        gpt_cond_len=3,
        language="en",
    )

    # Save the synthesized speech to a wav file
    output_wav = outputs['wav']
    sf.write(output_audio, output_wav, config.audio.sample_rate)
    print("Speech synthesis complete and saved to output.wav")

Step 5: Manage Tasks with SQLite

To manage the text data we want to convert, we’ll use SQLite. In this example, a SQLite database will store the tasks, each containing a text field that the TTS system will convert into speech.

Create the database and table:

import sqlite3

conn = sqlite3.connect('tts.db')
cursor = conn.cursor()

def createTable():
    cursor.execute('''CREATE TABLE IF NOT EXISTS tasks (
                        id INTEGER PRIMARY KEY AUTOINCREMENT,
                        text TEXT NOT NULL
                        )''')
    conn.commit()

2. Insert tasks into the database:

def createTask(text):
    cursor.execute('''INSERT INTO tasks (text) VALUES (?)''', (text,))
    conn.commit()

3. Fetch tasks from the database:

def fetchTasks():
    cursor.execute('SELECT * FROM tasks')
    return cursor.fetchall()

4. Process each task and convert it to speech: For each task, we retrieve the text and process it through the convertTTS() function.

for task in fetchTasks():
    convertTTS(task[1], 'input.wav', str(task[0]) + '.wav')

# Close the connection after processing
conn.close()

Basic setup for TTS database and tasks table

import sqlite3
# Connect to database (or create if it doesn't exist)
conn = sqlite3.connect('tts.db')

# Create a cursor object
cursor = conn.cursor()

# Create a table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS tasks (
                  id INTEGER PRIMARY KEY AUTOINCREMENT,
                  text TEXT NOT NULL
                 )''')

# Commit the changes
conn.commit()

# Insert a user into the table
cursor.execute('''INSERT INTO tasks (text)
                  VALUES (?)''', ('hello how are you?',))

# Commit the changes
conn.commit()

# Fetch all rows from the users table
cursor.execute('SELECT * FROM tasks')
rows = cursor.fetchall()

# Display the rows
for row in rows:
    print(row)

# Close the connection
conn.close()

Complete code

# python version 3.9.6
# python --version
# python -m venv venv
# venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install TTS --cache-dir "D:/internship/tts/.cache"
# pip uninstall torch torchvision torchaudio
# pip install transformers datasets torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --cache-dir "D:/internship/tts/.cache"
# https://developer.nvidia.com/cuda-downloads
# pip install soundfile --cache-dir "D:/internship/tts/.cache"
# pip install deepspeed==0.10.3 --cache-dir "D:/internship/tts/.cache" optional

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf  # To save the output as a wav file

# Step 1: Load the model configuration
config = XttsConfig()
config.load_json("D:/internship/tts/assets/tts_configs/config.json")

# Step 2: Initialize the model
model = Xtts.init_from_config(config)

# Step 3: Load the pre-trained weights
model.load_checkpoint(config, checkpoint_dir="D:/internship/tts/assets/tts_configs", eval=True)

# Optional: If you have CUDA installed and want to use GPU, uncomment the line below
# model.cuda()

# Step 4: Synthesize the output

def convertTTS(text, input_audio, output_audio):
    outputs = model.synthesize(
        text,
        config,
        speaker_wav=input_audio,  # Replace with the correct path
        gpt_cond_len=3,
        language="en",
    )

    # Step 5: Save the synthesized speech to a wav file
    output_wav = outputs['wav']
    sf.write(output_audio, output_wav, config.audio.sample_rate)

    print("Speech synthesis complete and saved to output.wav")

import sqlite3

conn = sqlite3.connect('tts.db')

# Create a cursor object
cursor = conn.cursor()

def createTable():
    # Create a table (if it doesn't exist)
    cursor.execute('''CREATE TABLE IF NOT EXISTS tasks (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    text TEXT NOT NULL
                    )''')

    # Commit the changes
    conn.commit()

def createTask(text):
    # Insert a user into the table
    cursor.execute('''INSERT INTO tasks (text)
                    VALUES (?)''', (text,))

    # Commit the changes
    conn.commit()

def fetchTasks():
    # Fetch all rows from the users table
    cursor.execute('SELECT * FROM tasks')
    return cursor.fetchall()

# print(fetchTasks())

for task in fetchTasks():
    convertTTS(task[1], 'input.wav', str(task[0]) + '.wav')

# Close the connection
conn.close()

Conclusion

In this project, we demonstrated how to build a basic TTS application using Python. We walked through the process of setting up the environment, installing required dependencies, loading a pre-trained TTS model, and processing text tasks stored in a SQLite database.

This is just a starting point—there are many ways to extend and improve this system, such as adding a user interface, supporting multiple languages, or using GPU acceleration for faster processing.

Feel free to experiment with the code and take your text-to-speech application to the next level!