Skip to content

Commit

Permalink
Rewrite index page
Browse files Browse the repository at this point in the history
  • Loading branch information
algal committed Jan 27, 2025
1 parent 15a5fa8 commit 0b3911c
Show file tree
Hide file tree
Showing 3 changed files with 260 additions and 256 deletions.
242 changes: 140 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,20 @@ $ pip install fasttransform
### Transform

Transform is a class that lets you create reusable data transformations.
It behaves like a function, you can call it to `encode` your data. In
addition it has an optional `decode` method that will reverse the
function. And an optional `setup` method that can initialize some inner
state.
You initialize a Transform by passing in or decorating a raw function.
The Transform then provides an enhanced version of that function via
`Transform.encodes`, which can be used in your data pipeline.

It provides various conveniences:

- **Reversibility**. You can collect the raw function and its inverse
into one transform object.
- **Customized initialization** You can customize the exact behavior of
a transform function on initialization.
- **Type-based mulitiple dispatch**. Transforms can specialize their
behavior based on the runtime types of their arguments.
- **Type conversion/preservation**. Transforms help you maintain desired
return types.

The simplest way to create a Transform is by decorating a function:

Expand All @@ -36,7 +46,7 @@ from fasttransform import Transform, Pipeline

``` python
@Transform
def add_one(x: int):
def add_one(x):
return x + 1

# Usage
Expand All @@ -45,31 +55,12 @@ add_one(2)

3

Transforms are **flexible**. You can specify multiple transforms with
different type annotations and it will automatically pick up the correct
one.

``` python
def inc1(x:int): return x+1
def inc2(x:str): return x+"a"

t = Transform(enc=(inc1,inc2))

t(5), t('b')
```
### Reversibility

(6, 'ba')

If an input type does not match any of the type annotations then the
original input is returned.

``` python
add_one(2.0)
```

2.0

Transforms are **reversible**, if you provide a `decode` function.
To make a transform reversible, you provide the raw function and its
inverse. This is useful in data pipelines where, for instance, you might
want to normalize and then de-normalize numerical values, or encode to
category indexes and then decode back to categories.

``` python
def enc(x): return x*2
Expand All @@ -82,148 +73,195 @@ t(2), t.decode(2), t.decode(t(2))

(4, 1, 2)

Transforms can be **stateful**, you can initialize them with the `setup`
method. This may be useful when you want to set scaling parameters based
on your training split in your machine learning pipeline.
### Customized initialization

You can customize an individual Transform instance at initialization
time, so that it can depend on aggregate properties of the data set.

Here we define a z-score normalization Transform by defining `encodes`
and `decodes` methods directly:

``` python
import statistics

class NormalizeMean(Transform):
def setups(self, items):
self.mean = sum(items) / len(items)
self.mean = statistics.mean(items)
self.std = statistics.stdev(items)

def encodes(self, x):
return x - self.mean
return (x - self.mean) / self.std

def decodes(self, x):
return x + self.mean
return x * self.std + self.mean

normalize = NormalizeMean()
normalize.setup([1, 2, 3, 4, 5])
normalize.mean
```

3.0
3

``` python
normalize(3.0)
```
### Type-based multiple dispatch

0.0
Instead of providing one raw functions, you can provide multiple raw
functions which differ in their parameter types. Tranform will use
type-based dispatch to automatically execute the correct function.

Transforms are **extendedible**, this may be useful when you want to
create one Transform that can handle different data types.
This is handy when your inputs come in different types (eg., different
image formats, different numerical types).

``` python
@NormalizeMean
def encodes(self, x:float): return x + self.mean + 5
def inc1(x:int): return x+1
def inc2(x:str): return x+"a"

@NormalizeMean
def decodes(self, x:float): return x + self.mean + 5
t = Transform(enc=(inc1,inc2))

normalize(2.0)
t(5), t('b')
```

10.0

Transforms try to be **type preserving** in the following order:

1. your function’s return type annotation
2. your function’s actual input type, if it was a subtype of the return
value
3. if None is the return type annotation then no conversion will be
done
(6, 'ba')

Let’s illustrate this with an example of a custom `float` subtype:
If an input type does not match any of the type annotations then the
original input is returned.

``` python
class FS(float):
def __repr__(self): return f'FS({float(self)})'
add_one(2.0)
```

By default multiplying such a subtype with a regular `float` returns a
`float`.
3.0

``` python
FS(5.0) * 5.0
normalize(3.0)
```

25.0
0.0

### Type conversion/preservation

You initialize a Transform by passing in or decorating a raw function.

A Transform `encodes` or `decodes` will note the return type of its raw
function, which may be defined explicitly or implicitly, and enhance
type-handling behavior in three ways:

1. **Guaranteed return type**. It will always return the return type of
the raw function, promoting values if necessary.

2. **Type Preservation**. It will return the runtime type of its
argument, whenever that is a subtype of the return type.

3. **Opt-out conversion**. If you explicitly mark the raw function’s
return type as `None`, then it will not perform any type conversion
or preservation.

Examples help make this clear:

However, in Transform you can change this behavior with type
annotations.
#### Guaranteed return type

Illustration of case 1:
Say you define `FS`, a subclass of `float`. The usual Python type
promotion behavior means that an `FS` times a `float` is still a
`float`:

``` python
def enc(x)->FS: return x*2
t = Transform(enc)
t(1)
class FS(float):
def __repr__(self): return f'FS({float(self)})'

f1 = float(1)
FS2 = FS(2)

val = f1 * FS2
type(val) # => float
```

FS(2.0)
float

Illustration of case 2:
With Transform, you can define a new multiplication operation which will
be guaranteed to return a `FS`, because Transform reads the required raw
function’s annotated return type:

``` python
def enc(x): return x*2
t = Transform(enc)
t(FS(1))
def double_FS(x)->FS: return FS(2)*x
t = Transform(double_FS)
val = t(1)
assert isinstance(val,FS)
val
```

FS(2.0)

Note that in the case below, where the input is a `float` and the return
type is `FS` there’s not conversion. The reason is: we can’t make sure
some special information about `FS` is lost when converting to its
parent class `float`.
#### Type preservation

Let us say that we define a transform *without* any return type
annotation, so that the raw function is defined only by the behavior of
multiplying its argument by the float 2.0.

Multiplying the subtype `FS` with the float value 2 would normally
return a `float`. However, Transform’s `encodes` will *preserve the
runtime type of its argument*, so that it returns `FS`:

``` python
def enc(x): return FS(x*2)
t = Transform(enc)
t(1.0)
def double(x): return x*2.0 # no type annotation
t = Transform(double)
fs1 = FS(1)
val = t(fs1)
assert isinstance(val,FS)
val # => FS(2), an FS value of 2
```

FS(2.0)

Illustration of case 3:
#### Opt-out conversion

Sometimes you don’t want Transform to do any type-based logic. You can
opt-out of this system by declaring that your raw function’s return type
is `None`:

``` python
def enc(x)->None: return x*2
t = Transform(enc)
t(FS(1))
def double_none(x) -> None: return x*2.0 # "None" returnt type means "no conversion"
t = Transform(double_none)
fs1 = FS(1)
val = t(fs1)
assert isinstance(val,float)
val # => 2.0, a float of 2, because of fallback to standard Python type logic
```

2.0

In the last case we see a `float` because a mutiplication of `FS` with a
`float` returns a `float` and no additional type conversion is done.

### Pipelines

Transforms can be combined into larger **Pipelines**:

``` python
p = Pipeline((t, normalize))
def double(x): return x*2.0
def halve(x): return x/2.0
dt = Transform(double,halve)

p(5) # 5 * 2 - 3
```
class NormalizeMean(Transform):
def setups(self, items):
self.mean = statistics.mean(items)
self.std = statistics.stdev(items)

def encodes(self, x):
return (x - self.mean) / self.std

def decodes(self, x):
return x * self.std + self.mean

7.0

``` python
p.decode(7) # (7 + 3) / 2
p = Pipeline((dt, normalize))

v = p(5)
v
```

10.0
4.427188724235731

If you’re wondering the types are changing from `int` to `float` in this
case:
``` python
p.decode(v)
```

`self.mean` in the `NormalizeMean` transform is a `float`. So the
automatic type conversion does not trigger here, as `float` is not a
subtype of `int`. And that’s probably a good thing, because otherwise we
might lose some information here whenever `self.mean` has some decimal
value.
5.0

### Documentation

Expand Down
Loading

0 comments on commit 0b3911c

Please sign in to comment.