This Python project generates future-looking probability density function (PDFs) and cumulative distribution functions (CDFs) for the prices of publicly traded securities using options data. The output is visualized with matplotlib, and the project also includes a user-friendly web-based dashboard interface built with Streamlit.
- Clone the repo
git clone https://github.com/jmholzer/probabilistic-pdfs.git
- Navigate to the project directory
cd probabilistic-pdfs
- Install Python dependencies
pip install -r requirements.txt
- Install the project
pip install .
Please note that this project requires Python 3.10 or later.
Option 1: To start the web-based dashboard, run the following command:
probabilistic
This will start a local web server and you should be able to access the dashboard in your web browser at localhost:8501
.
The user will need to provide their own options data in a CSV file with the columns 'strike', and 'last_price'. Sample data for SPY can be found in the data
folder.
Option 2: To use probabilistic from within python, see example_script.py
for a demo:
The user will need to specify 4 arguments:
input_csv_path
: a string containing the file path of the options data in a csv, with the columns 'strike' and 'last_price'current_price
: a number of the underlying asset's current pricedays_foward
: a number of the days between the current date and the strike dateoutput_csv_path
, a string containing the file path where the user wishes to save the results The output will be a csv file containing 3 columns: price, probability density, cumulative probability
from probabilistic import cli
input_csv_path = "data/AAPL_currentdateNov14_callMar15_currentprice18480_CLEAN.csv"
current_price = 184.8
days_forward = 123
output_csv_path = "/Users/username/Downloads/results.csv"
cli.csv_runner.run(input_csv_path, float(current_price), int(days_forward), output_csv_path)
An option is a financial derivative that gives the holder the right, but not the obligation, to buy or sell an asset at a specified price (strike price) on a certain date in the future. Intuitively, the value of an option depends on the probability that it will be profitable or "in-the-money" at expiration.
Why? Consider this scenario: You possess an option to sell a stock for $100 tomorrow, and as of the market's close today, the stock's price stands at $10. Intuitively, this option appears to hold significant value due to the high likelihood of its exercise. However, if it were certain that the stock's price would surge to $200 at the opening bell tomorrow, the chance of exercising your option profitably drops to zero. Consequently, the option's value evaporates. This illustrates how the price of an option is linked to the probability of its being in the money—that is, the likelihood that the option can be exercised at a profit. Consequently, by knowing the price of an option, we can work backwards to calculate the consensus probability of its future price.
To recap, the price of an option reflects the market's collective expectation about the future price of the underlying asset, and is inherently tied to the probability of its outcome (the option being in-the-money) occuring. By working backwards, we can solve for the probability of outcomes occuring along a continuum of strike prices, and thus generate a PDF of the market's collective expectation of the future price of the underlying asset.
For a simplified worked example, see this excellent blog post. For a complete reading of the financial theory, see this paper.
The process of generating the PDFs and CDFs is as follows:
- For an underlying asset, options data along the full range of strike prices are read from a CSV file to create a DataFrame. This gives us a table of strike prices along with the last price1 each option sold for
- Using the Black-Sholes formula, we convert strike prices into implied volatilities (IV)2
- Using B-spline, we fit a curve-of-best-fit onto the discrete observations of IV over the full range of strike prices3. Thus, we have extracted a continuous model from discrete IV observations - this is called the volatility smile
- From the volatility smile, we use Black-Scholes to convert IVs back to prices. Thus, we arrive at a continuous curve of options prices along the full range of strike prices
- From the continuous price curve, we use numerical differentiation to get the first derivative of prices. Then we numerically differentiate again to get the second derivative of prices. The second derivative of prices multiplied by a discount factor
$\exp^{r*\uptau}$ , results in the probability density function 4 - Once we have the PDF, we can calculate the CDF
- Quartiles (25th, 50th, and 75th percentiles) of each distribution are also derived
An example of the input and output for the sample AAPL options chain data for the expiry date of Mar 15 2024 (taken on Nov 14 2023) included in data/
is:
This project is a preview, it is not currently licensed. Not financial advice.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Footnotes
-
We chose to use last price instead of calculating the mid-price given the bid-ask spread. This is because Yahoo Finance, a common source for options chain data, often lacks bid-ask data. See for example Apple options ↩
-
We convert from price-space to IV-space, and then back to price-space as described in step 4. See this blog post for a breakdown of why we do this double conversion ↩
-
See this paper for more details. In summary, options markets contains noise. Therefore, generating a volatility smile through simple interpolation will result in a noisy smile function. Then converting back to price-space will result in a noisy price curve. And finally when we numerically twice differentiate the price curve, noise will be amplified and the resulting PDF will be meaningless. Thus, we need either a parametric or non-parametric model to try to extract the true relationship between IV and strike price from the noisy observations. The paper suggests a 3rd order B-spline as a possible model choice ↩