Developing Scalable AI/ML Projects with Python Modularity | AI/ML basics – Session 9
Creating a modular system in Python using Flask allows developers to manage and scale applications efficiently by dividing them into smaller, reusable components. In this article, we’ll explore how to create a modular Flask project by integrating two machine learning projects: Text-to-Speech (TTS) and Email Spam Detection. We’ll leverage Flask’s GET
APIs to connect these two projects and develop a scalable, modular system.
Read more -> Understanding Python Flask: A Beginner’s Guide to GET and POST Requests – Session 9
What is a Modular System?
A modular system is one that is divided into independent, interchangeable components, making it easy to maintain, test, and scale. Each module performs a specific function and can be integrated or removed without affecting the overall system. By creating a modular system, you can build complex applications by reusing existing modules and avoiding code duplication.
Complete code for Email Spam Detection Project
Read more
- Install Ubuntu and Set Up a Flask API on Linode – Session 6
- Building an Email Spam Detection Model – Supervised Learning – Session 7
- Save a trained model for later use – Session 7
import os
import pandas as pd
import string
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Step 1: Set up the environment (only run these commands in the terminal)
# python -m venv venv
# ./venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install pandas scikit-learn numpy nltk --cache-dir "D:/internship/supervised_learning/email_spam_ml/.cache"
# Step 2: Load and process email files, create the dataset
# Define paths to the directories containing the spam and ham emails
spam_dir = 'D:/internship/supervised_learning/email_spam_ml/datasets/spam'
ham_dir = 'D:/internship/supervised_learning/email_spam_ml/datasets/easy_ham'
# Function to read all email files and store them in a list
def load_emails_from_directory(directory, label):
emails = []
for filename in os.listdir(directory):
with open(os.path.join(directory, filename), 'r', encoding='latin-1') as file:
email_content = file.read()
emails.append((email_content, label)) # Tuple (email_content, label)
return emails
# Load spam and ham emails
spam_emails = load_emails_from_directory(spam_dir, 1) # 1 for spam
ham_emails = load_emails_from_directory(ham_dir, 0) # 0 for ham
# Combine spam and ham into a single list
all_emails = spam_emails + ham_emails
# Create a DataFrame with two columns: 'email' and 'label'
df = pd.DataFrame(all_emails, columns=['email', 'label'])
# Save the DataFrame to a CSV file
df.to_csv('spam_ham_dataset.csv', index=False)
print(f'Dataset saved with {len(df)} emails.')
# Step 3: Preprocess the emails
nltk.download('stopwords')
# Define preprocessing functions
def to_lowercase(text):
return text.lower()
def remove_punctuation(text):
return ''.join([char for char in text if char not in string.punctuation])
def remove_stopwords(text):
stop_words = set(stopwords.words('english'))
return ' '.join([word for word in text.split() if word not in stop_words])
# Apply preprocessing
df['cleaned_email'] = df['email'].apply(lambda x: to_lowercase(x))
df['cleaned_email'] = df['cleaned_email'].apply(lambda x: remove_punctuation(x))
df['cleaned_email'] = df['cleaned_email'].apply(lambda x: remove_stopwords(x))
# Step 4: Vectorize the text data using TF-IDF
tfidf = TfidfVectorizer(max_features=3000)
X = tfidf.fit_transform(df['cleaned_email']).toarray()
y = df['label']
# Step 5: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 6: Train the Naive Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)
# Step 7: Evaluate the model
y_pred = model.predict(X_test)
# Print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
# Print confusion matrix
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
# Print classification report
print('Classification Report:')
print(classification_report(y_test, y_pred))
# Step 8: Test the model with a new email
def check_spam(email_text, model, tfidf_vectorizer):
processed_email = to_lowercase(remove_punctuation(remove_stopwords(email_text)))
email_vector = tfidf_vectorizer.transform([processed_email])
prediction = model.predict(email_vector)
return "SPAM" if prediction[0] == 1 else "NOT SPAM"
# Test the model with a new email
# new_email = "Congratulations! You've won a free iPhone. Click here to claim your prize."
# result = check_spam(new_email, model, tfidf)
# print(result)
# pip install Flask
from flask import Flask, jsonify, request
app = Flask(__name__)
# Sample route for the home page
@app.route('/')
def home():
return "Welcome to the Spam Detection!"
# API to get a message from a GET parameter
@app.route('/api/message', methods=['GET'])
def get_message():
# Retrieve the 'email_text' parameter from the URL, with a default value if it's not provided
email_text = request.args.get('email_text', '')
result = check_spam(email_text, model, tfidf)
return jsonify({"message": f"This mail is {result}!"})
if __name__ == '__main__':
app.run(debug=True, port=6000)
#http://127.0.0.1:6000/api/message?email_text=how%20are%20you
Complete code for Text to Speech
Read more
- How to Set Up a Python Virtual Environment in Visual Studio Code – Session 1
- How to Set Up a Text-to-Speech Project with XTTS Model – Session 2
- How to Build a Text-to-Speech (TTS) Application Using Python and SQLite – Session 3
- Text-to-Speech System Using Python and the Xtts Model – Text Limitations and Solution – Session 4
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf # To save the output as a wav file
# Step 1: Load the model configuration
config = XttsConfig()
config.load_json("D:/internship/tts/assets/tts_configs/config.json")
# Step 2: Initialize the model
model = Xtts.init_from_config(config)
# Step 3: Load the pre-trained weights
model.load_checkpoint(config, checkpoint_dir="D:/internship/tts/assets/tts_configs", eval=True)
# Optional: If you have CUDA installed and want to use GPU, uncomment the line below
# model.cuda()
f = open('data.txt', 'r')
data = f.read()
f.close()
import re
import time
def split_text_on_punctuation(text, length=250):
# Use regex to split based on ., !, ?, or , and keep the punctuation
sentences = re.split(r'([.!?,])', text)
chunks = []
current_chunk = ""
for i in range(0, len(sentences)-1, 2): # Iterate by step of 2 to include punctuation
sentence = sentences[i] + sentences[i+1] # Re-combine sentence with punctuation
if len(current_chunk) + len(sentence) <= length:
current_chunk += sentence
else:
chunks.append(current_chunk.strip())
current_chunk = sentence
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
def createTTS(text):
outputs = model.synthesize(
text,
config,
speaker_wav="input.wav", # Replace with the correct path
gpt_cond_len=3,
language="en",
)
# Step 5: Save the synthesized speech to a wav file
output_wav = outputs['wav']
timestamp = time.time()
sf.write(str(timestamp) + '.wav', output_wav, config.audio.sample_rate)
return timestamp
# pip install Flask
from flask import Flask, jsonify, request
app = Flask(__name__)
# Sample route for the home page
@app.route('/')
def home():
return "Welcome to the TTS!"
# API to get a message from a GET parameter
@app.route('/api/convert', methods=['GET'])
def get_text():
# Retrieve the 'email_text' parameter from the URL, with a default value if it's not provided
text = request.args.get('text', '')
result = createTTS(text)
return jsonify({"message": f"Created audio file for {result}!"})
if __name__ == '__main__':
app.run(debug=True)
#http://127.0.0.1:5000/api/convert?text=how%20are%20you
Complete code for modular project
# python -m venv venv
# ./venv/Scripts/activate
# python.exe -m pip install --upgrade pip
# pip install requests
import requests
def getSpamResponse(text):
# URL of the API
# http://127.0.0.1:6000/api/message?email_text=Congratulations! You've won a free iPhone. Click here to claim your prize.
url = "http://127.0.0.1:6000/api/message?email_text=" + text
# Sending a GET request to the API
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Print the JSON response
json_data = response.json() # Parse the JSON response
return json_data['message']
else:
# Print an error message if the request was not successful
return f"Failed to retrieve data. Status code: {response.status_code}"
def getTTSResponse(text):
# URL of the API
url = "http://127.0.0.1:5000/api/convert?text=" + text
# Sending a GET request to the API
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Print the JSON response
json_data = response.json() # Parse the JSON response
return json_data['message']
else:
# Print an error message if the request was not successful
return f"Failed to retrieve data. Status code: {response.status_code}"
# "Congratulations! You've won a free iPhone. Click here to claim your prize."
user_input = input("Give some text:")
spam_response = getSpamResponse(user_input)
if spam_response == 'This mail is SPAM!':
print("You can't convert text to speech. Give some other text")
else:
print("You are good to go")
print(getTTSResponse(user_input))
Conclusion
In this article, we built a modular system using Flask by integrating two separate machine learning projects: Text-to-Speech and Email Spam Detection. Flask serves as the backbone for connecting these modules, allowing us to create a scalable and maintainable system.
By following a modular approach, you can easily extend this system by adding new modules (e.g., speech-to-text, sentiment analysis) without disrupting the existing functionality. Flask’s simple and flexible structure makes it an ideal choice for building modular machine learning systems.
how to configure postman api with this modular project ?
Can you elaborate more. What do you want to know? In video I have mentioned step by step process for configuration. In future sessions I told about basics of creating and running flask apis. You can watch them in internship portal. Suggest you to join live session so that I can help you.