Skip to content

Latest commit

 

History

History
45 lines (42 loc) · 1.12 KB

DATA.md

File metadata and controls

45 lines (42 loc) · 1.12 KB

Data Generation

To get arithmetic expression dataset, run the following command:

python3 arithmetic/data.py \
    --file ${DATA_DIR} \
    --length ${NUMBER_OF_OPERATORS} \
    --train_size 1e6 \
    --test_size 1e5\
    --number_range 11\
    --under
  • Here number_range specifies the number field (should be a prime).
  • under means there is a part of training data whose number of operators is under ${NUMBER_OF_OPERATORS}.

Script for linear equation dataset:

python3 equation/data.py \
    --file ${DATA_DIR} \
    --length ${NUMBER_OF_VARIABLES} \
    --train_size 1e6 \
    --test_size 1e5\
    --number_range 11

Script for longest increasing subsequence dataset:

python3 LIS/data.py \
    --file ${DATA_DIR} \
    --length ${LEN_INPUTS} \
    --train_size 1e6 \
    --test_size 1e5\
    --number_range ${NUM_RANGE}
  • In our experiment, we set number_range to 250.

Script for edit distance dataset:

python3 ED/data.py \
    --file ${DATA_DIR} \
    --length ${LEN_OF_FIRST_STRING} \
    --train_size 1e6 \
    --test_size 1e5\
    --using 8
  • Here using + 2 = the max size of working vocabulary.