Skip to content

Commit

Permalink
Clean up
Browse files Browse the repository at this point in the history
  • Loading branch information
mostafa committed Oct 28, 2024
1 parent 096859a commit 7db6c54
Show file tree
Hide file tree
Showing 2 changed files with 0 additions and 114 deletions.
74 changes: 0 additions & 74 deletions training/README.md

This file was deleted.

40 changes: 0 additions & 40 deletions training/train.py
Original file line number Diff line number Diff line change
@@ -1,43 +1,3 @@
"""Deep Learning Model Training with LSTM
This Python script is used for training a deep learning model using
Long Short-Term Memory (LSTM) networks.
The script starts by importing necessary libraries. These include `sys`
for interacting with the system, `pandas` for data manipulation, `tensorflow`
for building and training the model, `sklearn` for splitting the dataset and
calculating metrics, and `numpy` for numerical operations.
The script expects two command-line arguments: the input file and the output directory.
If these are not provided, the script will exit with a usage message.
The input file is expected to be a CSV file, which is loaded into a pandas DataFrame.
The script assumes that this DataFrame has a column named "Query" containing the text
data to be processed, and a column named "Label" containing the target labels.
The text data is then tokenized using the `Tokenizer` class from
`tensorflow.keras.preprocessing.text` (TF/IDF). The tokenizer is fit on the text data
and then used to convert the text into sequences of integers. The sequences are then
padded to a maximum length of 100 using the `pad_sequences` function.
The data is split into a training set and a test set using the `train_test_split` function
from `sklearn.model_selection`. The split is stratified, meaning that the distribution of
labels in the training and test sets should be similar.
A Sequential model is created using the `Sequential` class from `tensorflow.keras.models`.
The model consists of an Embedding layer, an LSTM layer, and a Dense layer. The model is
compiled with the Adam optimizer and binary cross-entropy loss function, and it is trained
on the training data.
After training, the model is used to predict the labels of the test set. The predictions
are then compared with the true labels to calculate various performance metrics, including
accuracy, recall, precision, F1 score, specificity, and ROC. These metrics are printed to
the console.
Finally, the trained model is saved in the SavedModel format to the output directory
specified by the second command-line argument.
"""

import sys
import pandas as pd
from tensorflow.keras.preprocessing.text import Tokenizer
Expand Down

0 comments on commit 7db6c54

Please sign in to comment.