Skip to content

Commit

Permalink
worked on paper references
Browse files Browse the repository at this point in the history
  • Loading branch information
LaurenzBeck committed Oct 8, 2024
1 parent f1eec5b commit 5f3b426
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 9 deletions.
Binary file added docs/jats/data_drifts_by_topology_changes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/jats/sampling_tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 5 additions & 6 deletions docs/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -155,12 +155,11 @@ @article{paszke_pytorch_nodate
file = {Paszke et al. - PyTorch An Imperative Style, High-Performance Dee.pdf:C\:\\Users\\Hundgeburth\\Zotero\\storage\\XZP46DKD\\Paszke et al. - PyTorch An Imperative Style, High-Performance Dee.pdf:application/pdf},
}

@misc{maintainers_torchvision_2016,
title = {{TorchVision}: {PyTorch}'s {Computer} {Vision} library},
url = {https://github.com/pytorch/vision},
author = {maintainers, TorchVision and {contributors}},
month = nov,
year = {2016},
@software{torchvision2016,
title = {{{TorchVision}}: {{PyTorch}}'s {{Computer Vision}} Library},
author = {TorchVision maintainers and contributors},
date = {2016-11},
url = {https://github.com/pytorch/vision}
}

@misc{c0fec0de_anytree_2016,
Expand Down
6 changes: 3 additions & 3 deletions docs/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ title: 'StreamGen: a Python framework for generating streams of labeled data'
tags:
- Python
- Data Generation
- Synthetic Data
- Data Streams
- Continual Learning
- Data Structures
- Function Composition
authors:
- name: Laurenz A. Farthofer
Expand All @@ -20,13 +20,13 @@ date: 19 August 2024
bibliography: paper.bib
---

![A tree of sampling functions and transformations as a new data structure and framework for synthetic data generation. Samples are generated by traversing the tree from the root to the leaves. Each path through the tree represents its own class-conditional distribution. Each branching point represents a categorical distribution which determines the path to take for a sample during the tree traversal. By changing the parameters of the transformations over time, such trees can represent evolving distribution suitable to generate data streams (see \autoref{fig:parameter_schedule}).\label{fig:sampling_tree}](images/sampling_tree.png){ width=95% }
![A tree of sampling functions and transformations as a new data structure and framework for synthetic data generation. Samples are generated by traversing the tree from the root to the leaves. Each path through the tree represents its own class-conditional distribution. The branching points represent categorical distributions which determine the path to take for a sample during the tree traversal. By changing the parameters of the transformations over time, such trees can represent evolving distribution suitable to generate data streams (see \autoref{fig:parameter_schedule}).\label{fig:sampling_tree}](images/sampling_tree.png){ width=95% }

# Summary

StreamGen is a framework for generating streams of labeled, synthetic data from trees composed of sampling functions and transformation monoids (see \autoref{fig:sampling_tree}).

Due to the expensive nature of the labelling process, researchers and machine learning practitioners often rely on existing datasets and stochastic data augmentation pipelines like `torchvision.transforms.Compose` objects in torchvision [@maintainers_torchvision_2016]. While such methods and datasets are enough to study learning from static domains, emerging research fields like continual learning study learning on long streams of data, representing evolving experiences. StreamGen addresses this need by giving researchers a tool to model time-dependent, diverse class-conditional distributions.
Due to the expensive nature of the labelling process, researchers and machine learning practitioners often rely on existing datasets and stochastic data augmentation pipelines like `torchvision.transforms.Compose` objects [TorchVision @torchvision2016]. While such methods and datasets are enough to study learning from static domains, emerging research fields like continual learning study learning on long streams of data, representing evolving experiences. StreamGen addresses this need by giving researchers a tool to model time-dependent, diverse class-conditional distributions.

Such distributions can be represented through the use of a [tree](https://en.wikipedia.org/wiki/Tree_(data_structure)) data structure (or other more general linked structures like directed acyclic graphs (DAG)) to store sampling functions and transformations. Samples are generated by traversing the tree from the root to the leaves. Each branching point represents a categorical distribution which determines the path to take for a sample during the tree traversal.

Expand Down

0 comments on commit 5f3b426

Please sign in to comment.