Skip to content

Commit

Permalink
docs: add benchmarks
Browse files Browse the repository at this point in the history
  • Loading branch information
percevalw committed Feb 14, 2024
1 parent f52c932 commit e8c3a08
Show file tree
Hide file tree
Showing 4 changed files with 437 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ repos:
hooks:
- id: blacken-docs
additional_dependencies: [black==20.8b1]
exclude: notebooks/
exclude: ^(notebooks/|docs/benchmark)
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,24 @@ pip install foldedtensor
- C++ optimized code for fast data loading from Python lists and refolding
- Flexibility in data representation, making it easy to switch between different layouts when needed

## Example
## Examples

At its simplest, `foldedtensor` can be used to convert nested Python lists into a PyTorch tensor:

```python
from foldedtensor import as_folded_tensor

ft = as_folded_tensor(
[
[0, 1, 2],
[3],
],
)
# FoldedTensor([[0, 1, 2],
# [3, 0, 0]])
```

You can also specify names and flattened/unflattened dimensions at the time of creation:

```python
import torch
Expand All @@ -54,7 +71,11 @@ ft = as_folded_tensor(
print(ft)
# FoldedTensor([[1, 2, 3],
# [4, 3, 0]])
```

Once created, you can change the shape of the tensor by refolding it:

```python
# Refold on the lines and words dims (flatten the samples dim)
print(ft.refold(("lines", "words")))
# FoldedTensor([[1, 0],
Expand All @@ -67,7 +88,11 @@ print(ft.refold(("lines", "words")))
# Refold on the words dim only: flatten everything
print(ft.refold(("words",)))
# FoldedTensor([1, 2, 3, 4, 3])
```

The tensor can be further used with standard PyTorch operations:

```python
# Working with PyTorch operations
embedder = torch.nn.Embedding(10, 16)
embedding = embedder(ft.refold(("words",)))
Expand All @@ -79,6 +104,10 @@ print(refolded_embedding.shape)
# torch.Size([2, 5, 16]) # 2 samples, 5 words max, 16 dims
```

## Benchmarks

View the comparisons of `foldedtensor` against various alternatives here: [docs/benchmarks](https://github.com/aphp/foldedtensor/blob/main/docs/benchmark.md).

## Comparison with alternatives

Unlike other ragged or nested tensor implementations, a FoldedTensor does not enforce a specific structure on the nested data, and does not require padding all dimensions. This provides the user with greater flexibility when working with data that can be arranged in multiple ways depending on the data transformation. Moreover, the C++ optimization ensures high performance, making it ideal for handling deeply nested tensors efficiently.
Expand Down
159 changes: 159 additions & 0 deletions docs/benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@

Benchmarks
----------

This file was generated from [`scripts/benchmark.py`](../scripts/benchmark.py).

It compares the performance of `foldedtensor` with various alternatives for padding
and working with nested lists and tensors.

Versions:
- `torch.__version__ == '2.0.1'`
- `foldedtensor.__version__ == '0.3.2'`


## Case 1 (pad variable lengths nested list)

The following 3-levelled nested lists has lengths of 32, then between 50 and 100, and then between 25 and 30.
nested_list = make_nested_list(32, (50, 100), (25, 30), value=1)

Comparisons:
%timeit python_padding(nested_list)
# 100 loops, best of 5: 13.32 ms per loop


%timeit foldedtensor.as_folded_tensor(nested_list)
# 100 loops, best of 5: 0.63 ms per loop



## Case 2 (same lengths nested lists)

```python
nested_list = make_nested_list(32, 100, 30, value=1)

%timeit torch.tensor(nested_list)
# 100 loops, best of 5: 6.42 ms per loop


%timeit torch.LongTensor(nested_list)
# 100 loops, best of 5: 2.64 ms per loop


%timeit python_padding(nested_list)
# 100 loops, best of 5: 15.92 ms per loop


%timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0)
# 100 loops, best of 5: 2.88 ms per loop


%timeit foldedtensor.as_folded_tensor(nested_list)
# 100 loops, best of 5: 0.93 ms per loop


```


## Case 3 (simple list)

```python
simple_list = make_nested_list(10000, value=1)

%timeit torch.tensor(simple_list)
# 100 loops, best of 5: 0.63 ms per loop


%timeit torch.LongTensor(simple_list)
# 100 loops, best of 5: 0.26 ms per loop


%timeit python_padding(simple_list)
# 100 loops, best of 5: 0.27 ms per loop


%timeit foldedtensor.as_folded_tensor(simple_list)
# 100 loops, best of 5: 0.07 ms per loop


```


## Case 4 (same lengths nested lists to flat tensor)

```python
nested_list = make_nested_list(32, 100, 30, value=1)

%timeit torch.tensor(nested_list).view(-1)
# 100 loops, best of 5: 6.42 ms per loop


%timeit torch.LongTensor(nested_list).view(-1)
# 100 loops, best of 5: 2.68 ms per loop


%timeit python_padding(nested_list).view(-1)
# 100 loops, best of 5: 15.92 ms per loop


%timeit foldedtensor.as_folded_tensor(nested_list).view(-1)
# 100 loops, best of 5: 0.96 ms per loop


%timeit foldedtensor.as_folded_tensor(nested_list, data_dims=(2,))
# 100 loops, best of 5: 0.92 ms per loop


```

## Case 5 (variable lengths nested lists) to padded embeddings

Nested lists with different lengths (second level lists have lengths between 50 and 150). We compare `foldedtensor` with `torch.nested`.
```python
nested_list = make_nested_list(32, (50, 150), 30, value=1)

# Padding with 0

%timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0)
# 100 loops, best of 5: 3.05 ms per loop


%timeit foldedtensor.as_folded_tensor(nested_list).as_tensor()
# 100 loops, best of 5: 0.95 ms per loop


# Padding with 1

%timeit torch.nested.nested_tensor([torch.FloatTensor(sub) for sub in nested_list]).to_padded_tensor(1)
# 100 loops, best of 5: 3.59 ms per loop


%timeit x = foldedtensor.as_folded_tensor(nested_list); x.masked_fill_(x.mask, 1)
# 100 loops, best of 5: 1.29 ms per loop


```


## Case 6 (2d padding)

```python
nested_list = make_nested_list(160, (50, 150), value=1)

%timeit python_padding(nested_list)
# 100 loops, best of 5: 1.18 ms per loop


%timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0)
# 100 loops, best of 5: 1.06 ms per loop


%timeit torch.nn.utils.rnn.pad_sequence([torch.LongTensor(sub) for sub in nested_list], batch_first=True, padding_value=0)
# 100 loops, best of 5: 0.76 ms per loop


%timeit foldedtensor.as_folded_tensor(nested_list)
# 100 loops, best of 5: 0.13 ms per loop


```
Loading

0 comments on commit e8c3a08

Please sign in to comment.