Developing Scalable AI/ML Projects with Python Modularity | AI/ML basics – Session 9

Shubham Gupta 2 Comments September 15, 2024

Developing Scalable AI/ML Projects with Python Modularity | AI/ML basics – Session 9

Creating a modular system in Python using Flask allows developers to manage and scale applications efficiently by dividing them into smaller, reusable components. In this article, we’ll explore how to create a modular Flask project by integrating two machine learning projects: Text-to-Speech (TTS) and Email Spam Detection. We’ll leverage Flask’s GET APIs to connect these two projects and develop a scalable, modular system.

What is a Modular System?

A modular system is one that is divided into independent, interchangeable components, making it easy to maintain, test, and scale. Each module performs a specific function and can be integrated or removed without affecting the overall system. By creating a modular system, you can build complex applications by reusing existing modules and avoiding code duplication.

Complete code for Email Spam Detection Project

import os
import pandas as pd
import string
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Step 1: Set up the environment (only run these commands in the terminal)
# python -m venv venv
# ./venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install pandas scikit-learn numpy nltk --cache-dir "D:/internship/supervised_learning/email_spam_ml/.cache"

# Step 2: Load and process email files, create the dataset

# Define paths to the directories containing the spam and ham emails
spam_dir = 'D:/internship/supervised_learning/email_spam_ml/datasets/spam'
ham_dir = 'D:/internship/supervised_learning/email_spam_ml/datasets/easy_ham'

# Function to read all email files and store them in a list
def load_emails_from_directory(directory, label):
    emails = []
    for filename in os.listdir(directory):
        with open(os.path.join(directory, filename), 'r', encoding='latin-1') as file:
            email_content = file.read()
            emails.append((email_content, label))  # Tuple (email_content, label)
    return emails

# Load spam and ham emails
spam_emails = load_emails_from_directory(spam_dir, 1)  # 1 for spam
ham_emails = load_emails_from_directory(ham_dir, 0)    # 0 for ham

# Combine spam and ham into a single list
all_emails = spam_emails + ham_emails

# Create a DataFrame with two columns: 'email' and 'label'
df = pd.DataFrame(all_emails, columns=['email', 'label'])

# Save the DataFrame to a CSV file
df.to_csv('spam_ham_dataset.csv', index=False)
print(f'Dataset saved with {len(df)} emails.')

# Step 3: Preprocess the emails
nltk.download('stopwords')

# Define preprocessing functions
def to_lowercase(text):
    return text.lower()

def remove_punctuation(text):
    return ''.join([char for char in text if char not in string.punctuation])

def remove_stopwords(text):
    stop_words = set(stopwords.words('english'))
    return ' '.join([word for word in text.split() if word not in stop_words])

# Apply preprocessing
df['cleaned_email'] = df['email'].apply(lambda x: to_lowercase(x))
df['cleaned_email'] = df['cleaned_email'].apply(lambda x: remove_punctuation(x))
df['cleaned_email'] = df['cleaned_email'].apply(lambda x: remove_stopwords(x))

# Step 4: Vectorize the text data using TF-IDF
tfidf = TfidfVectorizer(max_features=3000)
X = tfidf.fit_transform(df['cleaned_email']).toarray()
y = df['label']

# Step 5: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 6: Train the Naive Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)

# Step 7: Evaluate the model
y_pred = model.predict(X_test)

# Print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Print confusion matrix
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

# Print classification report
print('Classification Report:')
print(classification_report(y_test, y_pred))

# Step 8: Test the model with a new email
def check_spam(email_text, model, tfidf_vectorizer):
    processed_email = to_lowercase(remove_punctuation(remove_stopwords(email_text)))
    email_vector = tfidf_vectorizer.transform([processed_email])
    prediction = model.predict(email_vector)
    return "SPAM" if prediction[0] == 1 else "NOT SPAM"

# Test the model with a new email
# new_email = "Congratulations! You've won a free iPhone. Click here to claim your prize."
# result = check_spam(new_email, model, tfidf)
# print(result)

# pip install Flask
from flask import Flask, jsonify, request

app = Flask(__name__)

# Sample route for the home page
@app.route('/')
def home():
    return "Welcome to the Spam Detection!"

# API to get a message from a GET parameter
@app.route('/api/message', methods=['GET'])

def get_message():
    # Retrieve the 'email_text' parameter from the URL, with a default value if it's not provided
    email_text = request.args.get('email_text', '')
    result = check_spam(email_text, model, tfidf)
    return jsonify({"message": f"This mail is {result}!"})

if __name__ == '__main__':
    app.run(debug=True, port=6000)

#http://127.0.0.1:6000/api/message?email_text=how%20are%20you

Complete code for Text to Speech

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf  # To save the output as a wav file

# Step 1: Load the model configuration
config = XttsConfig()
config.load_json("D:/internship/tts/assets/tts_configs/config.json")

# Step 2: Initialize the model
model = Xtts.init_from_config(config)

# Step 3: Load the pre-trained weights
model.load_checkpoint(config, checkpoint_dir="D:/internship/tts/assets/tts_configs", eval=True)

# Optional: If you have CUDA installed and want to use GPU, uncomment the line below
# model.cuda()

f = open('data.txt', 'r')
data = f.read()
f.close()



import re
import time

def split_text_on_punctuation(text, length=250):
    # Use regex to split based on ., !, ?, or , and keep the punctuation
    sentences = re.split(r'([.!?,])', text)
    chunks = []
    current_chunk = ""

    for i in range(0, len(sentences)-1, 2):  # Iterate by step of 2 to include punctuation
        sentence = sentences[i] + sentences[i+1]  # Re-combine sentence with punctuation
        if len(current_chunk) + len(sentence) <= length:
            current_chunk += sentence
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

def createTTS(text):
    outputs = model.synthesize(
        text,
        config,
        speaker_wav="input.wav",  # Replace with the correct path
        gpt_cond_len=3,
        language="en",
    )

    # Step 5: Save the synthesized speech to a wav file
    output_wav = outputs['wav']
    timestamp = time.time()
    sf.write(str(timestamp) + '.wav', output_wav, config.audio.sample_rate)
    return timestamp


# pip install Flask
from flask import Flask, jsonify, request

app = Flask(__name__)

# Sample route for the home page
@app.route('/')
def home():
    return "Welcome to the TTS!"

# API to get a message from a GET parameter
@app.route('/api/convert', methods=['GET'])

def get_text():
    # Retrieve the 'email_text' parameter from the URL, with a default value if it's not provided
    text = request.args.get('text', '')
    result = createTTS(text)
    return jsonify({"message": f"Created audio file for {result}!"})

if __name__ == '__main__':
    app.run(debug=True)    

#http://127.0.0.1:5000/api/convert?text=how%20are%20you

Complete code for modular project

# python -m venv venv
# ./venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install requests

import requests

def getSpamResponse(text):
    # URL of the API
    # http://127.0.0.1:6000/api/message?email_text=Congratulations! You've won a free iPhone. Click here to claim your prize.
    url = "http://127.0.0.1:6000/api/message?email_text=" + text

    # Sending a GET request to the API
    response = requests.get(url)

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Print the JSON response
        json_data = response.json()  # Parse the JSON response
        return json_data['message']
    else:
        # Print an error message if the request was not successful
        return f"Failed to retrieve data. Status code: {response.status_code}"

def getTTSResponse(text):
    # URL of the API
    url = "http://127.0.0.1:5000/api/convert?text=" + text

    # Sending a GET request to the API
    response = requests.get(url)

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Print the JSON response
        json_data = response.json()  # Parse the JSON response
        return json_data['message']
    else:
        # Print an error message if the request was not successful
        return f"Failed to retrieve data. Status code: {response.status_code}"

# "Congratulations! You've won a free iPhone. Click here to claim your prize."

user_input = input("Give some text:")

spam_response = getSpamResponse(user_input)

if spam_response == 'This mail is SPAM!':
    print("You can't convert text to speech. Give some other text")
else:
    print("You are good to go")    
    print(getTTSResponse(user_input))

Conclusion

In this article, we built a modular system using Flask by integrating two separate machine learning projects: Text-to-Speech and Email Spam Detection. Flask serves as the backbone for connecting these modules, allowing us to create a scalable and maintainable system.

By following a modular approach, you can easily extend this system by adding new modules (e.g., speech-to-text, sentiment analysis) without disrupting the existing functionality. Flask’s simple and flexible structure makes it an ideal choice for building modular machine learning systems.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Durga Prasad

2 months ago

how to configure postman api with this modular project ?

Shubham Gupta

Author

Reply to Durga Prasad

Can you elaborate more. What do you want to know? In video I have mentioned step by step process for configuration. In future sessions I told about basics of creating and running flask apis. You can watch them in internship portal. Suggest you to join live session so that I can help you.