Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type4py for Large Datasets #14

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from
Open

Conversation

LangFeng0912
Copy link

Overall: Scripts for processing large datasets have been added.
The adding and updating parts include:
in the "main.py": add new CLI command: "learn_split", "gen_cluster", "infer_projects"
in the "vectorize.py": update the generating datapoints function in batches
add "learn_split.py" for training the model separately
add "gen_cluster.py" for generating clusters based on the model separately
also add new functions for dataset_loading in "data_loadeds.py"

@mir-am mir-am added the enhancement New feature or request label Mar 29, 2023
Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
Also, create a new module called predict_split to load type clusters generated from gen_cluster. The output should be a JSON file that can be processed by eval.

type4py/__main__.py Outdated Show resolved Hide resolved
type4py/__main__.py Outdated Show resolved Hide resolved
type4py/__main__.py Outdated Show resolved Hide resolved
type4py/gen_cluster.py Outdated Show resolved Hide resolved
type4py/gen_cluster.py Outdated Show resolved Hide resolved
type4py/learn_split.py Outdated Show resolved Hide resolved
type4py/learn_split.py Outdated Show resolved Hide resolved
type4py/learn_split.py Outdated Show resolved Hide resolved
type4py/data_loaders.py Show resolved Hide resolved
type4py/vectorize.py Outdated Show resolved Hide resolved
Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new changes. Please address my new comments.

type4py/__main__.py Outdated Show resolved Hide resolved
type4py/__main__.py Outdated Show resolved Hide resolved
type4py/__main__.py Outdated Show resolved Hide resolved
type4py/deploy/infer_project.py Show resolved Hide resolved
type4py/deploy/static_infer.py Outdated Show resolved Hide resolved
type4py/deploy/utils/pyre_merge.py Show resolved Hide resolved
Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the minor comments in the review. Thanks!

type4py/deploy/utils/cst_utils.py Outdated Show resolved Hide resolved
type4py/deploy/utils/cst_utils.py Outdated Show resolved Hide resolved
type4py/deploy/utils/preprocess_utils.py Outdated Show resolved Hide resolved
@mir-am mir-am changed the base branch from main to dev June 6, 2023 14:30
Copy link
Member

@mir-am mir-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants