- TODO List
- write scripts/dump_single/check_dump_single.py, which is used to check the result of dump_single.py
- write data/wrds_handler.py, which can handler one dataset
- dump document
- dump check document
- place .bin file on /storage/qlib/qlib_data/wrds/
- write main.ipynb, show results of each dataset's result of dataset.prepare,if it is too long to load data, select the longest observed period instreumetns as a market,(save test100.txt in instruments subdir of each dataset)
- Dataset
dataset | dump | check | main.ipynb |
---|---|---|---|
CRSP monthly stocks | ✅ | ✅ | ✅ |
Compustat fundamentals | ✅ | ✅ | ✅ |
Compustat price data | ✅ | ✅ | ✅ |
Compustat Global fundamentals | ✅ | ✅ | ✅ |
Compustat Global price data | ⬜️ | ⬜️ | ⬜️ |
Compustat Global FX rates | ✅ | ✅ | ✅ |
Add the following code in your ~/.bashrc
then restart your shell
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/miniconda3/etc/profile.d/conda.sh" ]; then
. "/opt/miniconda3/etc/profile.d/conda.sh"
else
export PATH="/opt/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
# >>> acitvate wrds enviroment>>>
conda activate wrds
# <<< conda initialize <<<
-
clone the repository
git clone https://github.com/caisikai/wrds
-
Dump the
.parquet
file into.bin
file#go into repository cd wrds python scripts/dump_single/dump_single.py dump_all --csv_path /storage/wrds/crsp/sasdata/a_stock/msf.parquet --qlib_dir /storage/qlib/qlib_data/crsp/a_stock/msf_test --symbol_field_name permno --date_field_name date
See more samples in here
-
Check Dump
#go into repository cd wrds python check_dump_single.py check_single --qlib_dir /storage/qlib/qlib_data/wrds/comp/d_global/currency/g_exrt_mth/ --check_symbol_num -1 --parquet_path /storage/wrds/comp/sasdata/d_global/currency/g_exrt_mth.parquet
see more samples in here
-
dataset demo
data/wrsd_handler.py
has a demo handler class that can provide customized datasetmain.ipynb
provides multiple dataset demo that can prepare some pd.dataframe which can be used for downstream task
- Current version code can not support for Compustat Global price data, where there are about 0.2 billion row, which consume too many memory.
- Fuse the price data and fundamental data