Skip to content

Commit

Permalink
add support for loops in CF graphs
Browse files Browse the repository at this point in the history
  • Loading branch information
amakelov committed Jul 5, 2024
1 parent f59aa99 commit a2c3b0f
Show file tree
Hide file tree
Showing 40 changed files with 3,482 additions and 3,128 deletions.
8 changes: 4 additions & 4 deletions docs/docs/blog/cf.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Tidy computations
In data-driven fields like machine learning, a lot of effort is spent organizing
computational data so that it can be analyzed and manipulated. This blog
post introduces the `ComputationFrame` (CF) data structure, which provides a
natural and simple grammar of operations to automate this. It is implemented as
[part of](https://amakelov.github.io/mandala/03_cf/)
*computational data* — results of running programs — so that it can be analyzed
and manipulated. This blog post introduces the `ComputationFrame` (CF) data
structure, which provides a natural and simple grammar of operations to automate
this. It is implemented as [part of](https://amakelov.github.io/mandala/03_cf/)
[mandala](https://github.com/amakelov/mandala), a Python library for experiment
tracking and incremental computation.

Expand Down
20 changes: 10 additions & 10 deletions docs/docs/topics/02_retracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ with storage:
Loading data
Training model
Getting accuracy
AtomRef(1.0, hid='d16...', cid='b67...')
AtomRef(0.99, hid='d16...', cid='12a...')


## Retracing your steps with memoization
Expand All @@ -102,8 +102,8 @@ with storage:
```

AtomRef(hid='d0f...', cid='908...', in_memory=False) AtomRef(hid='f1a...', cid='69f...', in_memory=False)
AtomRef(hid='caf...', cid='5b8...', in_memory=False)
AtomRef(hid='d16...', cid='b67...', in_memory=False)
AtomRef(hid='caf...', cid='bf2...', in_memory=False)
AtomRef(hid='d16...', cid='12a...', in_memory=False)


This puts all the `Ref`s along the way in your local variables (as if you've
Expand All @@ -118,7 +118,7 @@ storage.unwrap(acc)



1.0
0.99



Expand All @@ -140,14 +140,14 @@ with storage:
print(acc)
```

AtomRef(hid='d16...', cid='b67...', in_memory=False)
AtomRef(hid='d16...', cid='12a...', in_memory=False)
Training model
Getting accuracy
AtomRef(1.0, hid='6fd...', cid='b67...')
AtomRef(0.99, hid='6fd...', cid='12a...')
Loading data
Training model
Getting accuracy
AtomRef(0.84, hid='158...', cid='6c4...')
AtomRef(0.86, hid='158...', cid='70e...')
Training model
Getting accuracy
AtomRef(0.91, hid='214...', cid='97b...')
Expand Down Expand Up @@ -178,8 +178,8 @@ with storage:
print(n_class, n_estimators, storage.unwrap(acc))
```

2 5 1.0
2 10 1.0
2 5 0.99
2 10 0.99
5 10 0.91


Expand All @@ -199,5 +199,5 @@ with storage:
print(storage.unwrap(acc), storage.unwrap(model))
```

0.84 RandomForestClassifier(max_depth=2, n_estimators=5)
0.86 RandomForestClassifier(max_depth=2, n_estimators=5)

45 changes: 29 additions & 16 deletions docs/docs/topics/03_cf.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,14 +259,17 @@ print(cf.df(values='refs').to_markdown())
```

Extracting tuples from the computation graph:
var_0@output_0, var_1@output_1 = train_model(y_train=y_train, n_estimators=n_estimators, X_train=X_train)
Joining on columns: {'y_train', 'X_train', 'n_estimators', 'train_model'}
| | X_train | n_estimators | y_train | train_model | var_1 | var_0 |
var_0@output_0, var_1@output_1 = train_model(X_train=X_train, n_estimators=n_estimators, y_train=y_train)
Found variables {'var_0', 'var_1'} containing final elements
For variable var_1, found dependencies in nodes Index(['X_train', 'n_estimators', 'var_1', 'y_train', 'train_model'], dtype='object')
For variable var_0, found dependencies in nodes Index(['X_train', 'n_estimators', 'var_0', 'y_train', 'train_model'], dtype='object')
Merging history for the variable var_0 on columns: {'y_train', 'X_train', 'train_model', 'n_estimators'}
| | y_train | n_estimators | X_train | train_model | var_1 | var_0 |
|---:|:-----------------------------------------------------|:-----------------------------------------------------|:-----------------------------------------------------|:----------------------------------------------|:-----------------------------------------------------|:-----------------------------------------------------|
| 0 | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='98c...', cid='29d...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | Call(train_model, cid='c4f...', hid='5f7...') | AtomRef(hid='760...', cid='46b...', in_memory=False) | AtomRef(hid='b25...', cid='462...', in_memory=False) |
| 1 | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='9fd...', cid='4ac...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | Call(train_model, cid='5af...', hid='514...') | AtomRef(hid='784...', cid='238...', in_memory=False) | AtomRef(hid='331...', cid='e64...', in_memory=False) |
| 2 | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='235...', cid='c04...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | Call(train_model, cid='204...', hid='c55...') | AtomRef(hid='5b7...', cid='f0a...', in_memory=False) | AtomRef(hid='208...', cid='c75...', in_memory=False) |
| 3 | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='120...', cid='9bc...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | Call(train_model, cid='3be...', hid='e60...') | AtomRef(hid='646...', cid='acb...', in_memory=False) | AtomRef(hid='522...', cid='d5a...', in_memory=False) |
| 0 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='9fd...', cid='4ac...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='5af...', hid='514...') | AtomRef(hid='784...', cid='238...', in_memory=False) | AtomRef(hid='331...', cid='e64...', in_memory=False) |
| 1 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='98c...', cid='29d...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='c4f...', hid='5f7...') | AtomRef(hid='760...', cid='46b...', in_memory=False) | AtomRef(hid='b25...', cid='462...', in_memory=False) |
| 2 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='235...', cid='c04...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='204...', hid='c55...') | AtomRef(hid='5b7...', cid='f0a...', in_memory=False) | AtomRef(hid='208...', cid='c75...', in_memory=False) |
| 3 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='120...', cid='9bc...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='3be...', hid='e60...') | AtomRef(hid='646...', cid='acb...', in_memory=False) | AtomRef(hid='522...', cid='d5a...', in_memory=False) |


##
Expand Down Expand Up @@ -510,16 +513,26 @@ print(cf.df().drop(columns=['X_train', 'y_train']).to_markdown())

Extracting tuples from the computation graph:
X_train@output_0, y_train@output_2 = generate_dataset(random_seed=random_seed)
var_0@output_0, var_1@output_1 = train_model(y_train=y_train, n_estimators=n_estimators, X_train=X_train)
var_0@output_0, var_1@output_1 = train_model(X_train=X_train, n_estimators=n_estimators, y_train=y_train)
var_2@output_0 = eval_model(model=var_0)
Joining on columns: {'random_seed', 'y_train', 'X_train', 'generate_dataset', 'n_estimators', 'train_model'}
Joining on columns: {'random_seed', 'y_train', 'X_train', 'generate_dataset', 'var_0', 'n_estimators', 'train_model'}
| | n_estimators | random_seed | generate_dataset | train_model | var_1 | var_0 | eval_model | var_2 |
|---:|---------------:|--------------:|:---------------------------------------------------|:----------------------------------------------|--------:|:-----------------------------------------------------|:---------------------------------------------|--------:|
| 0 | 80 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='3be...', hid='e60...') | 0.83 | RandomForestClassifier(max_depth=2, n_estimators=80) | Call(eval_model, cid='137...', hid='d32...') | 0.82 |
| 1 | 40 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='5af...', hid='514...') | 0.82 | RandomForestClassifier(max_depth=2, n_estimators=40) | Call(eval_model, cid='38f...', hid='5d3...') | 0.81 |
| 2 | 20 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='204...', hid='c55...') | 0.8 | RandomForestClassifier(max_depth=2, n_estimators=20) | | nan |
| 3 | 10 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='c4f...', hid='5f7...') | 0.74 | RandomForestClassifier(max_depth=2, n_estimators=10) | | nan |
Found variables {'var_2', 'var_0', 'var_1'} containing final elements
For variable var_1, found dependencies in nodes Index(['X_train', 'n_estimators', 'var_1', 'y_train', 'random_seed',
'train_model', 'generate_dataset'],
dtype='object')
For variable var_0, found dependencies in nodes Index(['X_train', 'n_estimators', 'var_0', 'y_train', 'random_seed',
'train_model', 'generate_dataset'],
dtype='object')
For variable var_2, found dependencies in nodes Index(['var_2', 'X_train', 'n_estimators', 'y_train', 'random_seed', 'var_0',
'train_model', 'generate_dataset', 'eval_model'],
dtype='object')
Merging history for the variable var_0 on columns: {'X_train', 'train_model', 'n_estimators', 'y_train', 'random_seed', 'generate_dataset'}
Merging history for the variable var_2 on columns: {'X_train', 'train_model', 'n_estimators', 'y_train', 'random_seed', 'generate_dataset', 'var_0'}
| | random_seed | generate_dataset | n_estimators | train_model | var_1 | var_0 | eval_model | var_2 |
|---:|--------------:|:---------------------------------------------------|---------------:|:----------------------------------------------|--------:|:-----------------------------------------------------|:---------------------------------------------|--------:|
| 0 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | 40 | Call(train_model, cid='5af...', hid='514...') | 0.82 | RandomForestClassifier(max_depth=2, n_estimators=40) | Call(eval_model, cid='38f...', hid='5d3...') | 0.81 |
| 1 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | 10 | Call(train_model, cid='c4f...', hid='5f7...') | 0.74 | RandomForestClassifier(max_depth=2, n_estimators=10) | | nan |
| 2 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | 20 | Call(train_model, cid='204...', hid='c55...') | 0.8 | RandomForestClassifier(max_depth=2, n_estimators=20) | | nan |
| 3 | 42 | Call(generate_dataset, cid='19a...', hid='c3f...') | 80 | Call(train_model, cid='3be...', hid='e60...') | 0.83 | RandomForestClassifier(max_depth=2, n_estimators=80) | Call(eval_model, cid='137...', hid='d32...') | 0.82 |


Importantly, we see that some computations only partially follow the full
Expand Down
Loading

0 comments on commit a2c3b0f

Please sign in to comment.