Skip to content
/ gp_lora Public

Train deep gaussian processes based on LoRA finetuned LLMs.

Notifications You must be signed in to change notification settings

nngabe/gp_lora

Repository files navigation

LoRA conditioned Deep Gaussian Processes

This repository contains code for training deep gaussian processes for VaR prediction conditioned on LLM embeddings, e.g. LLaMa-3 finetuned using LoRA/QLoRA. Finetuning data is retrieved using the newsdata API. Combining finetuned LLMs and deep GPs is a good choice when:

(1) There is reasonably large and up to date corpus for training and generating RAG embeddings;

(2) The number of assets under consideration is relatively small, i.e. , since GP inference scales as ;

(3) Robust uncertainty estimates are crucial for success of the application (e.g. risk estimation with VaR).

Background

Value at Risk (VaR) is a statistical measure that estimates the maximum potential loss an investment portfolio might experience, within a defined timeframe and confidence level, assuming stationary market conditions. VaR can be optimized based on a portfolio with weights and assets represented as deep gaussian processes:

where the kernel is parameterized by a neural network [1]

and the kernel function is, e.g., an RBF kernel

.

Since GPs produce probabilistic outputs, we can evaluate the VaR based on a trained GP and a user-specified risk level ($\alpha$ = 0 is risk-free, $\alpha$ = 1 is maximal risk) [2]

where $f(\mathbf{x},\mathbf{z})$ is a deep GP conditioned on a context embedding $\mathbf{z}$ generated by our finetuned LLaMa model. The VaR is defined as the minimum upper bound $\omega$ required such that $f(\mathbf{x},\mathbf{z})$ is less than the specified risk level.

How to Use This Repository

  1. First install the requirements:

    pip install -r requirements.txt
    
  2. Collect text from the newsdata API by running

    python grab_news_text.py
    

    The text retrieved can be filtered by country, category, or language with the params argument.

  3. Preprocess and filter raw text data using

    python fineweb-2-pipeline.py --dataset="path/to/your/dataset"
    
  4. Finetune with

    tune run lora_finetune_single_device --config ./8B_lora_newstext.yaml
    
  5. Train deep GP with

    python dgp.py
    

About

Train deep gaussian processes based on LoRA finetuned LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published