This repository contains code for training deep gaussian processes for VaR prediction conditioned on LLM embeddings, e.g. LLaMa-3 finetuned using LoRA/QLoRA. Finetuning data is retrieved using the newsdata API. Combining finetuned LLMs and deep GPs is a good choice when:
(1) There is reasonably large and up to date corpus for training and generating RAG embeddings;
(2) The number of assets under consideration is relatively small, i.e. , since GP inference scales as
;
(3) Robust uncertainty estimates are crucial for success of the application (e.g. risk estimation with VaR).
Value at Risk (VaR) is a statistical measure that estimates the maximum potential loss an investment portfolio might experience, within a defined timeframe and confidence level, assuming stationary market conditions. VaR can be optimized based on a portfolio with weights and assets
represented as deep gaussian processes:
where the kernel is parameterized by a neural network
[1]
and the kernel function is, e.g., an RBF kernel
Since GPs produce probabilistic outputs, we can evaluate the VaR based on a trained GP and a user-specified risk level (
where
-
First install the requirements:
pip install -r requirements.txt
-
Collect text from the newsdata API by running
python grab_news_text.py
The text retrieved can be filtered by country, category, or language with the params argument.
-
Preprocess and filter raw text data using
python fineweb-2-pipeline.py --dataset="path/to/your/dataset"
-
Finetune with
tune run lora_finetune_single_device --config ./8B_lora_newstext.yaml
-
Train deep GP with
python dgp.py