Type4py for Large Datasets #14

LangFeng0912 · 2023-03-28T08:17:26Z

Overall: Scripts for processing large datasets have been added.
The adding and updating parts include:
in the "main.py": add new CLI command: "learn_split", "gen_cluster", "infer_projects"
in the "vectorize.py": update the generating datapoints function in batches
add "learn_split.py" for training the model separately
add "gen_cluster.py" for generating clusters based on the model separately
also add new functions for dataset_loading in "data_loadeds.py"

mir-am

Thanks for the PR.
Also, create a new module called predict_split to load type clusters generated from gen_cluster. The output should be a JSON file that can be processed by eval.

type4py/__main__.py

type4py/gen_cluster.py

type4py/learn_split.py

type4py/data_loaders.py

type4py/vectorize.py

mir-am

Thanks for the new changes. Please address my new comments.

type4py/__main__.py

type4py/deploy/infer_project.py

type4py/deploy/static_infer.py

type4py/deploy/utils/pyre_merge.py

mir-am

Please address the minor comments in the review. Thanks!

type4py/deploy/utils/cst_utils.py

type4py/deploy/utils/preprocess_utils.py

mir-am

Thanks for addressing the comments.

LangFeng0912 added 5 commits March 22, 2023 10:29

fix the issues in preprocess and np.stack in batches in vec

dbd307f

add the learn_sep into the type4py pepeline

896e819

add gen_cluster and update reduce for batches script

b38ba1e

add infer_project CLI command for infer the dataset

bfb9732

add more comments

4cc376f

mir-am assigned LangFeng0912 Mar 29, 2023

mir-am added the enhancement New feature or request label Mar 29, 2023

mir-am reviewed Mar 31, 2023

View reviewed changes

LangFeng0912 added 6 commits April 6, 2023 18:17

fix the issues

084aaa2

fix the issues, add the predicts script amd exceptions script

298c962

update infer-project .py

0408072

add project-base inference pipeline

e336713

add project-base inference for ml & hybrid

08a9f51

add script explanations

bd4bb81

mir-am reviewed Jun 1, 2023

View reviewed changes

LangFeng0912 added 2 commits June 5, 2023 18:58

update infer-project base approach name t4pyre and t4pyright

51a9693

update t4pyright logic in infer-project approach

5c5f522

mir-am reviewed Jun 6, 2023

View reviewed changes

type4py/deploy/utils/cst_utils.py Outdated Show resolved Hide resolved

type4py/deploy/utils/cst_utils.py Outdated Show resolved Hide resolved

type4py/deploy/utils/preprocess_utils.py Outdated Show resolved Hide resolved

mir-am changed the base branch from main to dev June 6, 2023 14:30

LangFeng0912 added 3 commits June 8, 2023 10:46

rename type_preprocess script

8698994

update TypeAnnotationFinder & Masker to libsa4py and import from it

e9b1a11

update comments

0b491e2

mir-am reviewed Jun 9, 2023

View reviewed changes

LangFeng0912 added 7 commits August 17, 2023 13:44

update vectorize

afbddd7

update preprocess

2e80ff1

update learn_split.py

f7d1fef

update learn_split.py

ed8af0d

update learn_split.py

2ff621e

update learn_split.py

1a5332a

update learn_split.py

40f1302

LangFeng0912 added 20 commits August 18, 2023 12:42

update pipeline

9aa640b

update pipeline

e01df3c

update pipeline

3eb4fe9

update Dockerfile for cuda version

33d169f

update model parameters

8e0f0de

update model parameters

3dc74c5

update model parameters

97c1c11

update infer main approach

67b8a70

update infer main approach

b369011

update infer main approach

acbdae1

update infer main approach

d612597

update infer main approach

704af65

update type preprocess_list

6fab977

update type preprocess_list

ace9baf

update eval scripts

1750b74

update eval scripts

46b3ea5

update infer project scripts

9592acf

update infer project scripts

40880e1

update infer project scripts

502a1ca

update README.md

3e94481

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type4py for Large Datasets #14

Type4py for Large Datasets #14

LangFeng0912 commented Mar 28, 2023

mir-am left a comment •

edited

Loading

mir-am left a comment

mir-am left a comment

mir-am left a comment

Type4py for Large Datasets #14

Are you sure you want to change the base?

Type4py for Large Datasets #14

Conversation

LangFeng0912 commented Mar 28, 2023

mir-am left a comment • edited Loading

Choose a reason for hiding this comment

mir-am left a comment

Choose a reason for hiding this comment

mir-am left a comment

Choose a reason for hiding this comment

mir-am left a comment

Choose a reason for hiding this comment

mir-am left a comment •

edited

Loading