Skip to content

Commit 145466c

Browse files
AdamGlusteinwrieg123
authored andcommitted
Rewrite documentation to be less focused on finance examples; fix a wide set of issues and missing parts in the docs (Point72#338)
Signed-off-by: Adam Glustein <[email protected]>
1 parent f653548 commit 145466c

File tree

9 files changed

+200
-232
lines changed

9 files changed

+200
-232
lines changed

docs/wiki/api-references/csp.profiler-API.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
Users can simply run graphs under a `Profiler()` context to extract profiling information.
44
The code snippet below runs a graph in profile mode and extracts the profiling data by calling `results()`.
5+
Note that profiling can also be done in real-time with live updating visuals: see the [how-to](Profile-CSP-Code#profiling-a-real-time-cspgraph) guide here.
56

67
```python
78
from csp import profiler

docs/wiki/concepts/CSP-Graph.md

Lines changed: 56 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -8,107 +8,113 @@
88

99
## Anatomy of a `csp.graph`
1010

11-
To reiterate, `csp.graph` methods are called in order to construct the graph and are only executed before the engine is run.
12-
`csp.graph` methods don't do anything special, they are essentially regular python methods, but they can be defined to accept inputs and generate outputs similar to `csp.nodes`.
13-
This is solely used for type checking.
11+
`csp.graph` methods are called in order to construct the graph and are only executed before the engine is run. A graph is a collection of nodes and adapters which can either be executed as an argument to `csp.run` or composed into a larger graph.
12+
The `csp.graph` decorator is only used for type validation and it is optional when creating a CSP program. A standard Python function without the decorator can also be passed as an argument to `csp.run` if type validation is not required.
1413
`csp.graph` methods can be created to encapsulate components of a graph, and can be called from other `csp.graph` methods in order to help facilitate graph building.
1514

1615
Simple example:
1716

1817
```python
1918
@csp.graph
20-
def calc_symbol_pnl(symbol: str, trades: ts[Trade]) -> ts[float]:
21-
# sub-graph code needed to compute pnl for given symbol and symbol's trades
22-
# sub-graph can subscribe to market data for the symbol as needed
23-
...
19+
def calc_user_time(session_data: ts[UserSession]) -> ts[float]:
20+
# sub-graph code needed to compute the time a user spends on a website
21+
session_time = session_data.logout_time - session_data.login_time
22+
time_online = csp.stats.sum(session_time)
23+
return time_online
2424

2525

2626
@csp.graph
27-
def calc_portfolio_pnl(symbols: [str]) -> ts[float]:
28-
symbol_pnl = []
29-
for symbol in symbols:
30-
symbol_trades = trade_adapter.subscribe(symbol)
31-
symbol_pnl.append(calc_symbol_pnl(symbol, symbol_trades))
27+
def calc_site_traffic(users: List[str]) -> ts[float]:
28+
user_time = []
29+
for user in users:
30+
user_sessions = get_session(user)
31+
user_time.append(calc_user_time(user_sessions))
3232

33-
return csp.sum(symbol_pnl)
33+
return csp.sum(user_time)
3434
```
3535

36-
In this simple example we have a `csp.graph` component `calc_symbol_pnl` which encapsulates computing pnl for a single symbol.
37-
`calc_portfolio_pnl` is a graph that computes portfolio level pnl, it invokes the symbol-level pnl calc for every symbol, then sums up the results for the portfolio level pnl.
36+
In this simple example we compute the total time all users spend on a website. We have a `csp.graph` subcomponent `calc_user_time` which computes the time a single user spends on the site throughout the run.
37+
Then, in `calc_site_traffic` we compute the total user traffic by creating the user-level subgraph for each account and aggregating the results.
3838

39-
## Graph Propagation and Single-dispatch
39+
## Graph Propagation and Single-Dispatch
4040

41-
The CSP graph propagation algorithm ensures that all nodes are executed *once* per engine cycle, and in the correct order.
42-
Correct order means, that all input dependencies of a given node are guaranteed to have been evaluated before a given node is executed.
43-
Take this graph for example:
41+
The CSP graph propagation algorithm ensures that all nodes are executed *after* any of their dependencies on a given engine cycle.
42+
43+
> \[!IMPORTANT\]
44+
> An *engine cycle* refers to a single execution of a CSP graph. There can be multiple engine cycles at the same *timestamp*; for example, a single data source may have two events both at `2020-01-01 00:00:00`. These events will be executed in two *cycles* that both occur at the same timestamp. Another case where multiple cycles can occur is [csp.feedback](Add-Cycles-in-Graphs).
45+
46+
For example, consider the graph below:
4447

4548
![359407953](https://github.com/Point72/csp/assets/3105306/d9416353-6755-4e37-8467-01da516499cf)
4649

47-
On a given cycle lets say the `bid` input ticks.
48-
The CSP engine will ensure that **`mid`** is executed, followed by **`spread`** and only once **`spread`**'s output is updated will **`quote`** be called.
49-
When **`quote`** executes it will have the latest values of the `mid` and `spread` calc for this cycle.
50+
Individuals nodes are executed in *rank order* where the rank of a node is defined as the longest path between the node and an input adapter. The "mid" node is at rank 1, while "spread" is at rank 2 and "quote" is rank 3. Therefore, if "bid" ticks on a given engine cycle then "mid" will be executed before "spread" and "quote". Note that the order of node execution *within* a rank is undefined, and users should never rely on the execution order of nodes at the same rank.
5051

5152
## Graph Pruning
5253

53-
One should note a subtle optimization technique in CSP graphs.
54-
Any part of a graph that is created at graph building time, but is NOT connected to any output nodes, will be pruned from the graph and will not exist during runtime.
54+
Any node in a graph that is not connected to an output will be pruned from the graph and will not exist during runtime.
5555
An output is defined as either an output adapter or a `csp.node` without any outputs of its own.
56-
The idea here is that we can avoid doing work if it doesn't result in any output being generated.
57-
In general its best practice for all `csp.nodes` to be \***side-effect free**, in other words they shouldn't mutate any state outside of the node.
58-
Assuming all nodes are side-effect free, pruning the graph would not have any noticeable effects.
56+
Pruning is an optimization which avoids executing nodes whose result will be discarded.
57+
As a result, it's best practice for any `csp.node` to be \***side-effect free**; they shouldn't mutate any state outside of the node.
58+
59+
## Executing a Graph
60+
61+
Graphs can be executed using the `csp.run` function. Execution takes place in either real-time or historical mode (see [Execution Modes](Execution-Modes)) depending on the `realtime` argument. Graph execution begin at a `starttime` and ends at an `endtime`; the `endtime` argument can either be a `datetime` which is past the start *or* a `timedelta` which is the duration of the run. For example, if we wish to run our `calc_site_traffic` graph over one week of historical data we can execute it with:
62+
63+
```python
64+
csp.run(calc_site_traffic, users=['alice', 'bob'], starttime=start, endtime=timedelta(weeks=1), realtime=False)
65+
```
5966

6067
## Collecting Graph Outputs
6168

62-
If the `csp.graph` passed to `csp.run` has outputs, the full timeseries will be returned from `csp.run` like so:
69+
There are multiple methods of getting in-process outputs after executing a `csp.graph`. If the graph returns one or more time-series, the full history of those values will be returned from `csp.run`.
6370

64-
**outputs example**
71+
**return example**
6572

6673
```python
6774
import csp
6875
from datetime import datetime, timedelta
6976

7077
@csp.graph
7178
def my_graph() -> ts[int]:
72-
return csp.merge(csp.const(1), csp.const(2, timedelta(seconds=1)))
79+
return csp.merge(csp.const(1), csp.const(2, delay=timedelta(seconds=1)))
7380

74-
if __name__ == '__main__':
75-
res = csp.run(my_graph, starttime=datetime(2021,11,8))
76-
print(res)
81+
res = csp.run(my_graph, starttime=datetime(2021,11,8))
7782
```
7883

79-
result:
84+
res:
8085

8186
```raw
8287
{0: [(datetime.datetime(2021, 11, 8, 0, 0), 1), (datetime.datetime(2021, 11, 8, 0, 0, 1), 2)]}
8388
```
8489

85-
Note that the result is a list of `(datetime, value)` tuples.
90+
Note that the result is a list of `(time, value)` tuples. You can have the result returned as two separate NumPy arrays, one for the times and one for the values, by setting `output_numpy=True` in the `run` call.
8691

87-
You can also use [csp.add_graph_output](Base-Adapters-API#cspadd_graph_output) to add outputs.
88-
These do not need to be in the top-level graph called directly from `csp.run`.
92+
```python
93+
res = csp.run(my_graph, starttime=datetime(2021,11,8), output_numpy=True)
94+
```
95+
96+
res:
8997

90-
This gives the same result:
98+
```raw
99+
{0: (array(['2021-11-08T00:00:00.000000000', '2021-11-08T00:00:01.000000000'], dtype='datetime64[ns]'), array([1, 2], dtype=int64))}
100+
```
101+
102+
You can also use [csp.add_graph_output](Base-Adapters-API#cspadd_graph_output) to add outputs.
103+
These do not need to be in the top-level graph called directly from `csp.run`. Users can also specify the amount of history they want stored in the output using the `tick_count` and `tick_history` arguments to `add_graph_output`. For example, if only the last value needs to be stored set `tick_count=1`.
91104

92105
**add_graph_output example**
93106

94107
```python
95108
@csp.graph
96109
def my_graph():
97-
csp.add_graph_output('a', csp.merge(csp.const(1), csp.const(2, timedelta(seconds=1))))
98-
```
110+
same_thing = csp.merge(csp.const(1), csp.const(2, delay=timedelta(seconds=1)))
111+
csp.add_graph_output('my_name', same_thing)
99112

100-
In addition to python outputs like above, you can set the optional `csp.run` argument `output_numpy` to `True` to get outputs as numpy arrays:
101-
102-
**numpy outputs**
103-
104-
```python
105-
result = csp.run(my_graph, starttime=datetime(2021,11,8), output_numpy=True)
113+
res = csp.run(my_graph, starttime=datetime(2021,11,8))
106114
```
107115

108-
result:
116+
res:
109117

110118
```raw
111-
{0: (array(['2021-11-08T00:00:00.000000000', '2021-11-08T00:00:01.000000000'], dtype='datetime64[ns]'), array([1, 2], dtype=int64))}
119+
{'my_name': [(datetime.datetime(2021, 11, 8, 0, 0), 1), (datetime.datetime(2021, 11, 8, 0, 0, 1), 2)]}
112120
```
113-
114-
Note that the result there is a tuple per output, containing two numpy arrays, one with the datetimes and one with the values.

docs/wiki/concepts/CSP-Node.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
- [Table of Contents](#table-of-contents)
44
- [Anatomy of a `csp.node`](#anatomy-of-a-cspnode)
55
- [Basket inputs](#basket-inputs)
6-
- [**Node Outputs**](#node-outputs)
6+
- [Node Outputs](#node-outputs)
77
- [Basket Outputs](#basket-outputs)
88
- [Generic Types](#generic-types)
99

@@ -21,7 +21,7 @@ They may (or may not) generate an output as a result of an input tick.
2121
```python
2222
from datetime import timedelta
2323

24-
@csp.node # 1
24+
@csp.node(name='my_node') # 1
2525
def demo_node(n: int, xs: ts[float], ys: ts[float]) -> ts[float]: # 2
2626
with csp.alarms(): # 3
2727
# Define an alarm time-series of type bool # 4
@@ -52,7 +52,7 @@ def demo_node(n: int, xs: ts[float], ys: ts[float]) -> ts[float]: # 2
5252

5353
Lets review line by line
5454

55-
1\) Every CSP node must start with the **`@csp.node`** decorator
55+
1\) Every CSP node must start with the **`@csp.node`** decorator. The name of the node will be the name of the function, unless a `name` argument is provided. The name is used when visualizing a graph with `csp.show_graph` or profiling with CSP's builtin [`profiler`](#Profile-csp-code).
5656

5757
2\) CSP nodes are fully typed and type-checking is strictly enforced.
5858
All arguments must be typed, as well as all outputs.
@@ -269,3 +269,6 @@ This allows us to pass in a `ts[int]` for example, and get a `ts[int]` as an out
269269

270270
`const` takes value as an *instance* of type `T`, and returns a timeseries of type `T`.
271271
So we can call `const(5)` and get a `ts[int]` output, or `const('hello!')` and get a `ts[str]` output, etc...
272+
273+
If a value is provided rather than an explicit type argument (for example, to `const`) then CSP resolves the type using internal logic. In some cases, it may be easier to override the automatic type inference.
274+
Users can force a type variable to be a specific value with the `.using` function. For example, `csp.const(1)` will be resolved to a `ts[int]`; if you want to instead force the type to be `float`, do `csp.const.using(T=float)(1)`.

0 commit comments

Comments
 (0)