Clean up

gatewayd-io · Oct 28, 2024 · 7db6c54 · 7db6c54
1 parent 096859a
commit 7db6c54
Show file tree

Hide file tree

Showing 2 changed files with 0 additions and 114 deletions.
diff --git a/training/README.md b/training/README.md
diff --git a/training/train.py b/training/train.py
@@ -1,43 +1,3 @@
-"""Deep Learning Model Training with LSTM
-
-This Python script is used for training a deep learning model using
-Long Short-Term Memory (LSTM) networks.
-
-The script starts by importing necessary libraries. These include `sys`
-for interacting with the system, `pandas` for data manipulation, `tensorflow`
-for building and training the model, `sklearn` for splitting the dataset and
-calculating metrics, and `numpy` for numerical operations.
-
-The script expects two command-line arguments: the input file and the output directory.
-If these are not provided, the script will exit with a usage message.
-
-The input file is expected to be a CSV file, which is loaded into a pandas DataFrame.
-The script assumes that this DataFrame has a column named "Query" containing the text
-data to be processed, and a column named "Label" containing the target labels.
-
-The text data is then tokenized using the `Tokenizer` class from
-`tensorflow.keras.preprocessing.text` (TF/IDF). The tokenizer is fit on the text data
-and then used to convert the text into sequences of integers. The sequences are then
-padded to a maximum length of 100 using the `pad_sequences` function.
-
-The data is split into a training set and a test set using the `train_test_split` function
-from `sklearn.model_selection`. The split is stratified, meaning that the distribution of
-labels in the training and test sets should be similar.
-
-A Sequential model is created using the `Sequential` class from `tensorflow.keras.models`.
-The model consists of an Embedding layer, an LSTM layer, and a Dense layer. The model is
-compiled with the Adam optimizer and binary cross-entropy loss function, and it is trained
-on the training data.
-
-After training, the model is used to predict the labels of the test set. The predictions
-are then compared with the true labels to calculate various performance metrics, including
-accuracy, recall, precision, F1 score, specificity, and ROC. These metrics are printed to
-the console.
-
-Finally, the trained model is saved in the SavedModel format to the output directory
-specified by the second command-line argument.
-"""
-
 import sys
 import pandas as pd
 from tensorflow.keras.preprocessing.text import Tokenizer