Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minhtrung23fix pylint #44

Open
wants to merge 105 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
4acbe2e
Create ci.yml
minhtrung23 Sep 1, 2024
0c7f00c
Update __main__.py
minhtrung23 Sep 1, 2024
3146375
Update conf.py
minhtrung23 Sep 1, 2024
8ed731d
Update __main__.py
minhtrung23 Sep 1, 2024
2e7b702
Update cli.py
minhtrung23 Sep 1, 2024
8884e74
Update test_execution.py
minhtrung23 Sep 1, 2024
e98cf5a
Update test_wrapper.py
minhtrung23 Sep 1, 2024
871ffd2
Merge branch 'stair-lab:main' into minhtrung23fix-pylint
minhtrung23 Sep 4, 2024
b259286
Fix convention for .github/workflows/python-package.yml.py
minhtrung23 Sep 5, 2024
b6563df
Fix convention for docs.source.conf.py
minhtrung23 Sep 5, 2024
5920fa8
Fix convention for .github/workflows/python-package.yml.py
minhtrung23 Sep 5, 2024
a1a7b2e
Fix convention for github/workflows/python-package.yml.py
minhtrung23 Sep 5, 2024
2e658d3
Fix convention for .github/workflows/python-package.yml.py
minhtrung23 Sep 5, 2024
075d5c5
Fix convention for .github/workflows/python-package.yml.py
minhtrung23 Sep 5, 2024
c6f8769
Fix convention for src/melt/tools/data/dataset.py
minhtrung23 Sep 7, 2024
22519f4
Fix convention for src/melt/tools/data/loader.py
minhtrung23 Sep 7, 2024
73ca80f
Fix convention for src/melt/tools/data/__init__.py
minhtrung23 Sep 7, 2024
b2c7b95
Fix convention for src/melt/tools/data/parser.py
minhtrung23 Sep 7, 2024
d5441be
Delete .github/workflows/.github/workflows/ci.yml
minhtrung23 Sep 7, 2024
a7bd907
Fix convention for src/melt/tools/data/dataset.py.py
minhtrung23 Sep 7, 2024
2749bc5
Fix convention for src/melt/tools/metrics/data_stats_metric/__init__.py
minhtrung23 Sep 8, 2024
40c9143
Fix convention for src/melt/tools/metrics/data_stats_metric/data_stat…
minhtrung23 Sep 8, 2024
981e446
Fix convention for src/melt/tools/metrics/summac/utils_misc.py
minhtrung23 Sep 8, 2024
dc7f6b5
Fix convention for src/melt/tools/metrics/base.py
minhtrung23 Sep 8, 2024
442e72f
Fix convention for src/melt/tools/metrics/basic_metrics.py
minhtrung23 Sep 8, 2024
5860b17
Fix convention for src/melt/tools/metrics/bias.py
minhtrung23 Sep 8, 2024
0185b8e
Fix convention for src/melt/tools/metrics/calibration_metric.py
minhtrung23 Sep 8, 2024
6c08ec1
Fix convention for src/melt/tools/metrics/ir.py
minhtrung23 Sep 8, 2024
5e05d5a
Fix convention for src/melt/tools/metrics/language.py
minhtrung23 Sep 8, 2024
4eb2711
Fix convention for src/melt/tools/metrics/name_detector.py
minhtrung23 Sep 8, 2024
5182b82
Fix convention for src/melt/tools/metrics/name_detector.py
minhtrung23 Sep 8, 2024
c1564d5
Fix convention for src/melt/tools/metrics/name_detector.py
minhtrung23 Sep 8, 2024
59a0102
Fix convention for src/melt/tools/metrics/post_process.py
minhtrung23 Sep 8, 2024
d71867c
Fix convention for src/melt/tools/metrics/question_answering.py
minhtrung23 Sep 8, 2024
ce64b6e
Fix convention for src/melt/tools/metrics/reasoning.py
minhtrung23 Sep 8, 2024
74dd703
Fix convention for docs/source/conf.py
minhtrung23 Sep 8, 2024
2c66631
Fix convention for src/melt/tools/metrics/summac/model_summac.py
minhtrung23 Sep 9, 2024
ecc972a
Fix convention for src/melt/tools/metrics/question_answering.py
minhtrung23 Sep 9, 2024
40b4564
Merge branch 'dev' into minhtrung23fix-pylint
martinakaduc Sep 10, 2024
13a67b7
Fix convention for src/melt/tools/pipelines/__question_answering.py
minhtrung23 Sep 11, 2024
bb77293
Fix convention for src/melt/tools/pipelines/__question_answering.py
minhtrung23 Sep 11, 2024
6d77012
Create __question_answering_without_context.py
minhtrung23 Sep 11, 2024
37c1dbc
Update __question_answering_without_context.py
minhtrung23 Sep 11, 2024
6b1de0a
Create __summarization.py
minhtrung23 Sep 11, 2024
a4b8ad3
Create __multiple_choice_sentiment.py
minhtrung23 Sep 11, 2024
52e2d88
Create __multiple_choice_text_classification.py
minhtrung23 Sep 11, 2024
5cc2351
Update __summarization.py
minhtrung23 Sep 12, 2024
318af4a
Update __multiple_choice_sentiment.py
minhtrung23 Sep 12, 2024
ac6f066
Update __multiple_choice_text_classification.py
minhtrung23 Sep 12, 2024
a33661b
Create __multiple_choice_toxicity.py
minhtrung23 Sep 12, 2024
85f6968
Create __multiple_choice.py
minhtrung23 Sep 12, 2024
a624040
Create __language_modeling.py
minhtrung23 Sep 12, 2024
7303e71
Create __information_retrieval.py
minhtrung23 Sep 13, 2024
ab0bb53
Update __language_modeling.py
minhtrung23 Sep 13, 2024
954a3d2
Merge branch 'dev' into minhtrung23fix-pylint
minhtrung23 Sep 13, 2024
87b5d96
Create __reasoning.py
minhtrung23 Sep 13, 2024
a6d0a48
Create __math.py
minhtrung23 Sep 13, 2024
7d0487f
Create __translation.py
minhtrung23 Sep 13, 2024
78177f6
Create run.py
minhtrung23 Sep 13, 2024
417aec1
Update __main__.py
minhtrung23 Sep 14, 2024
f506a93
Update generation.py
minhtrung23 Sep 14, 2024
41e6938
Update cli.py
minhtrung23 Sep 14, 2024
4f8d096
Update cli.py (final
minhtrung23 Sep 17, 2024
bdf4eb4
Update __main__.py
minhtrung23 Sep 17, 2024
4926fb1
Update __main__.py
minhtrung23 Sep 18, 2024
802f5ea
Update cli.py
minhtrung23 Sep 18, 2024
c572e4a
Update generation.py
minhtrung23 Sep 18, 2024
f05c477
Update script_arguments.py
minhtrung23 Sep 18, 2024
5edc38b
Update __init__.py
minhtrung23 Sep 18, 2024
85f3010
Update dataset.py
minhtrung23 Sep 18, 2024
30ee3c5
Update parser.py
minhtrung23 Sep 18, 2024
6a34843
Update loader.py
minhtrung23 Sep 18, 2024
8561e41
Update script_arguments.py
minhtrung23 Sep 18, 2024
a3d57ff
Update __init__.py
minhtrung23 Sep 19, 2024
a030063
Update data_stats_metric.py
minhtrung23 Sep 19, 2024
58b338d
Update base.py
minhtrung23 Sep 19, 2024
ec9940e
Update basic_metrics.py
minhtrung23 Sep 19, 2024
0ac4e53
Update bias.py
minhtrung23 Sep 19, 2024
87a188d
Update calibration_metric.py
minhtrung23 Sep 19, 2024
3b0e577
Update ir.py
minhtrung23 Sep 19, 2024
f87645a
Update language.py
minhtrung23 Sep 19, 2024
4f03d8e
Update name_detector.py
minhtrung23 Sep 19, 2024
71776ba
Update post_process.py
minhtrung23 Sep 19, 2024
00f50c0
Update question_answering.py
minhtrung23 Sep 19, 2024
27894d5
Update reasoning.py
minhtrung23 Sep 19, 2024
64375a8
Update summary.py
minhtrung23 Sep 19, 2024
6e154c1
Update text_classification.py
minhtrung23 Sep 19, 2024
ccf4066
Update toxicity.py
minhtrung23 Sep 19, 2024
7a16b6a
Update translation_metric.py
minhtrung23 Sep 19, 2024
7b7dff9
Update utils.py
minhtrung23 Sep 19, 2024
ebc23b8
Update __information_retrieval.py
minhtrung23 Sep 20, 2024
b9adaa0
Update __language_modeling.py
minhtrung23 Sep 20, 2024
cf395b0
Update __math.py
minhtrung23 Sep 20, 2024
20f2985
Update __multiple_choice.py
minhtrung23 Sep 20, 2024
d1e1124
Update __multiple_choice_sentiment.py
minhtrung23 Sep 20, 2024
0608ed8
Update __multiple_choice_text_classification.py
minhtrung23 Sep 20, 2024
e18305f
Update __multiple_choice_toxicity.py
minhtrung23 Sep 20, 2024
fb2d800
Update __question_answering.py
minhtrung23 Sep 20, 2024
6b2f1b1
Update __question_answering_without_context.py
minhtrung23 Sep 20, 2024
ada86ff
Update __reasoning.py
minhtrung23 Sep 20, 2024
fd584a5
Update __summarization.py
minhtrung23 Sep 20, 2024
922c2ed
Update __translation.py
minhtrung23 Sep 20, 2024
c8d28a8
Delete src/melt/tools/pipelines/run.py
minhtrung23 Sep 20, 2024
ffe2e69
Update pipelines.py
minhtrung23 Sep 20, 2024
cb97c86
Update pipelines.py
minhtrung23 Sep 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 17 additions & 27 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,53 +10,43 @@
import sys
from datetime import datetime

# -- Path setup --------------------------------------------------------------

# Add the path to your source code here.
# Path setup
sys.path.insert(0, os.path.abspath("../../src"))

# -- Project information -----------------------------------------------------

# Project information
PROJECT = "MELTs"
AUTHOR = "Thu Nguyen Hoang Anh"
COPYRIGHT = f"{datetime.datetime.now().year}, {AUTHOR}"
COPYRIGHT = f"{datetime.now().year}, {AUTHOR}"

# The version info for the project
VERSION = "0.1" # Short version (e.g., '0.1')
RELEASE = "0.1" # Full version (e.g., '0.1.0')
# The full version, including alpha/beta/rc tags
RELEASE = "0.1"

# -- General configuration ---------------------------------------------------
# General configuration
MASTER_DOC = "index"

MASTER_DOC = "index" # The name of the master document

# Sphinx extensions to use
# Sphinx extension modules as strings, can be built-in or custom
EXTENSIONS = [
"sphinx.ext.duration", # Measure build time
"sphinx.ext.autodoc", # Include documentation from docstrings
"sphinx.ext.coverage", # Check for documentation coverage
"sphinx.ext.doctest", # Test embedded doctests
"sphinx_rtd_theme", # Read the Docs theme
"sphinx.ext.duration",
"sphinx.ext.autodoc",
"sphinx.ext.coverage",
"sphinx_rtd_theme",
"sphinx.ext.doctest",
]

# Mock import for autodoc
# List of modules to mock during autodoc generation
AUTODOC_MOCK_IMPORTS = ["pyemd"]

# Paths that contain templates
TEMPLATES_PATH = ["_templates"]

# Patterns to ignore when looking for source files
# List of patterns to ignore when looking for source files
EXCLUDE_PATTERNS = []

# Sort members alphabetically in the autodoc
AUTODOC_MEMBER_ORDER = "alphabetical"

# Theme to use for HTML and HTML Help pages
# Options for HTML output
HTML_THEME = "sphinx_rtd_theme"

# Theme options for customizing the appearance of the theme
HTML_THEME_OPTIONS = {
# You can add theme-specific options here
}

# Paths that contain custom static files (e.g., style sheets)
# Paths for custom static files (like style sheets)
HTML_STATIC_PATH = ["_static"]
102 changes: 13 additions & 89 deletions src/melt/__main__.py
Original file line number Diff line number Diff line change
@@ -1,94 +1,18 @@
"""
This script initializes NLP models and runs the main function from the 'cli' module.

The script performs the following tasks:
1. Downloads the 'punkt' tokenizer models using nltk.
2. Loads the spaCy 'en_core_web_sm' model, downloading it if necessary.
3. Imports and executes the 'main' function from the 'cli' module.

If any module or function cannot be imported, appropriate error messages are displayed.
"""

import logging
"Main"
import spacy
import nltk
from spacy.cli import download as spacy_download
from typing import NoReturn

# Configure logging with a descriptive name for the logger
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(message)s",
level=logging.INFO
)
logger = logging.getLogger("nlp_utils")

def download_nltk_resources() -> NoReturn:
"""Download the necessary NLTK resources.

Logs success or failure messages.
"""
try:
with nltk.download('punkt'):
logger.info("Successfully downloaded NLTK 'punkt' resource.")
except Exception as error:
logger.error("Failed to download NLTK resources: %s", error)
raise


def load_spacy_model(model_name: str = "en_core_web_sm") -> spacy.language.Language:
"""Load and return the spaCy model, downloading it if necessary.

Logs success or failure messages during the model loading process.

Args:
model_name (str): The name of the spaCy model to load.

Returns:
spacy.language.Language: The loaded spaCy model.
"""
try:
model = spacy.load(model_name)
logger.info("Successfully loaded spaCy model: %s", model_name)
except OSError:
logger.warning("spaCy model '%s' not found. Downloading...", model_name)
spacy_download(model_name)
model = spacy.load(model_name)
logger.info("Successfully downloaded and loaded spaCy model: %s", model_name)
except Exception as error:
logger.error("Failed to load spaCy model: %s", error)
raise
return model


def execute_cli_main() -> None:
"""Execute the 'main' function from the CLI module.

Logs success or failure messages about the import process and execution.
"""
try:
from cli import main as cli_main
logger.info("Successfully imported 'main' from 'cli' module.")
except ImportError as import_error:
logger.error("ImportError: %s", import_error)
try:
import cli
cli_main = cli.main
logger.info("Successfully imported 'cli' module directly.")
except ImportError as inner_import_error:
logger.critical("Failed to import 'cli' module: %s", inner_import_error)
raise
cli_main()


def main() -> None:
"""Main function to set up resources and execute the CLI.
from melt.cli import main

Ensures proper logging and execution flow.
"""
download_nltk_resources()
load_spacy_model()
execute_cli_main()
nltk.download('punkt_tab')
try:
spacy.load("en_core_web_sm")
except OSError:
print(
"Downloading the spacy en_core_web_sm model\n"
"(don't worry, this will only happen once)"
)
from spacy.cli import download

download("en_core_web_sm")

if __name__ == "__main__":
main()
main()
71 changes: 12 additions & 59 deletions src/melt/cli.py
Original file line number Diff line number Diff line change
@@ -1,75 +1,28 @@
"""
This script initializes and runs the text generation pipeline using spaCy,
transformers, and dotenv. It also handles downloading the spaCy 'en_core_web_sm'
model if it is not already present.

The main function is responsible for:
1. Loading environment variables.
2. Parsing script arguments.
3. Running the generation process with the parsed arguments.
"""
try:
import spacy
except ImportError as e:
print(f"Failed to import 'spacy': {e}")

"Cli"
import spacy
from transformers import HfArgumentParser
from dotenv import load_dotenv
from melt.script_arguments import ScriptArguments
from melt.generation import generation
try:
spacy.load("en_core_web_sm")
except OSError:
print(
"Downloading the spacy en_core_web_sm model\n"
"(don't worry, this will only happen once)"
)
try:
from spacy.cli import download
download("en_core_web_sm")
from spacy.cli import download

except ImportError as e:
print(f"Failed to import 'spacy.cli': {e}")
try:
from transformers import HfArgumentParser
except ImportError as e:
print(f"Failed to import 'transformers': {e}")
download("en_core_web_sm")

try:
from dotenv import load_dotenv
except ImportError as e:
print(f"Failed to import 'dotenv': {e}")

try:
from .script_arguments import ScriptArguments
except ImportError as e:
print(f"Failed to import 'ScriptArguments' from 'script_arguments': {e}")
try:
from .generation import generation
except ImportError as e:
print(f"Failed to import 'generation' from 'generation': {e}")

def main():
"""
The main function that initializes the environment, parses script arguments,
and triggers the text generation process.

This function performs the following steps:
1. Loads environment variables using `load_dotenv()`.
2. Creates an argument parser for `ScriptArguments` using `HfArgumentParser`.
3. Parses the arguments into data classes.
4. Calls the `generation` function with the parsed arguments to perform the text generation.
# from .to_sheet import to_sheet
# from .to_sheet_std import to_sheet_std

Returns:
None
"""
def main():
"CLI"
load_dotenv()

# Ensure spaCy model is available
ensure_spacy_model()

# Parse command-line arguments
parser = HfArgumentParser(ScriptArguments)
args = parser.parse_args_into_dataclasses()[0]

# Execute the generation function with parsed arguments
generation(args)

if __name__ == "__main__":
main()
91 changes: 23 additions & 68 deletions src/melt/generation.py
Original file line number Diff line number Diff line change
@@ -1,69 +1,14 @@
"""
This module provides functionality for evaluating and
generating data using specified pipelines and datasets.

The `generation` function is the main entry point of this script. It performs the following tasks:
1. Initializes the seed for reproducibility.
2. Loads and processes the dataset using `DatasetWrapper`.
3. Sets up directories for saving results if they don't already exist.
4. Handles continuation of inference from a previous run if specified.
5. Creates a DataLoader for batching dataset examples.
6. Initializes the evaluation pipeline (`EvalPipeline`).
7. Runs the evaluation pipeline and saves the results to JSON files.

The script is designed to work with various configurations
specified in the `script_args` parameter, including options for
few-shot prompting and continuing from previous results.

Modules used:
- `os`: For file and directory operations.
- `.tools.data`: Contains `DatasetWrapper` for
dataset management.
- `.tools.pipelines`: Contains `EvalPipeline` for
evaluation processes.
- `.tools.utils.utils`: Provides utility functions such as
`save_to_json`, `set_seed`, and `read_json`.
- `torch.utils.data`: For data loading with `DataLoader`.
"""
"Generation"
import os
import sys
from torch.utils.data import DataLoader
from .tools.data import DatasetWrapper
from .tools.pipelines import EvalPipeline
from .tools.utils.utils import save_to_json, set_seed, read_json


from melt.tools.data import DatasetWrapper
from melt.tools.pipelines import EvalPipeline
from melt.tools.utils.utils import save_to_json, set_seed, read_json

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
def generation(script_args):
"""
Executes the data generation process based on the provided script arguments.

This function performs the following steps:
1. Sets the random seed for reproducibility using `set_seed`.
2. Loads and optionally processes the dataset using `DatasetWrapper`.
3. Constructs filenames for saving generation results and metrics based on the script arguments.
4. Creates necessary directories for saving results if they don't already exist.
5. Determines the starting index and results to continue
inference from a previous run if specified.
6. Initializes a `DataLoader` for batching the dataset examples.
7. Initializes an `EvalPipeline` for evaluating the data.
8. Runs the evaluation pipeline and saves the results using the `save_results` function.
Args:
script_args (ScriptArguments): An object containing the configuration
and parameters for the data generation process.
- seed (int): Random seed for reproducibility.
- smoke_test (bool): Flag to indicate if a smaller subset
of data should be used for testing.
- dataset_name (str): Name of the dataset.
- model_name (str): Name of the model.
- output_dir (str): Directory to save generation results.
- output_eval_dir (str): Directory to save evaluation metrics.
- continue_infer (bool): Flag to continue inference from a previous run.
- per_device_eval_batch_size (int): Batch size for evaluation.
- fewshot_prompting (bool): Flag for few-shot prompting.

Returns:
None
"""
"Generation"
set_seed(script_args.seed)

# Load dataset (you can process it here)
Expand All @@ -76,19 +21,29 @@ def generation(script_args):
dataset_wrapper.dataset_testing.select(range(n_examples))
)
ds_exact_name = (
script_args.dataset_name.split("/")[-1]
script_args.lang
+ "_"
+ script_args.model_name.split("/")[-1]
+ f"_pt{dataset_wrapper.prompting_strategy}"
+ ("_fewshot" if script_args.fewshot_prompting else "")
+ dataset_wrapper.dataset_info.task
+ "_"
+ script_args.dataset_name.split("/")[-1].replace("_", "-")
+ "_"
+ script_args.model_name.split("/")[-1].replace("_", "-")
+ "_"
+ script_args.prompt_type
+ "_"
+ script_args.category
+ "_"
+ str(script_args.num_fs_shot)
+ "_pt" + dataset_wrapper.prompting_strategy
+ f"_seed{script_args.seed}"
)
)


json_file = os.path.join(
script_args.output_dir, f"generations_{ds_exact_name}.json"
)
metric_file = os.path.join(
script_args.output_eval_dir, f"metrics_{ds_exact_name}.json"
script_args.output_eval_dir, f"{ds_exact_name}.json"
)

# Save results
Expand Down
Loading