Skip to content

Commit

Permalink
Fix doctests
Browse files Browse the repository at this point in the history
  • Loading branch information
Saransh-cpp committed Jul 14, 2022
1 parent 1ef9a96 commit f3f6933
Show file tree
Hide file tree
Showing 3 changed files with 263 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/src/getting_started/linear_regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ julia> dLdW, dLdb, _, _ = gradient(loss, W, b, x, y)

We can now update the parameters, following the gradient descent algorithm -

```jldoctest linear_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
julia> W .= W .- 0.1 .* dLdW
1-element Vector{Float32}:
1.8144473
Expand Down
262 changes: 262 additions & 0 deletions docs/src/getting_started/logistic_regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# Logistic Regression

The following page contains a step-by-step walkthrough of the logistic regression algorithm in `Julia` using `Flux`! We will be creating a simple logistic regression model without any usage of `Flux` and then would compare the different working parts with `Flux`'s implementation.

Let's start by importing the required `Julia` packages!

```julia logistic_regression
julia> using Flux

julia> using Statistics

julia> using MLDatasets

julia> using DataFrames
```

## Dataset
Let's start by importing a dataset from `MLDatasets.jl`! We will be using the `Iris` dataset that contains the data of three different `Iris` species. The data consists of 150 data points (`x`s), each having 4 features. Each of this `x` is mapped to `y`, the name of a particular `Iris` specie (a class or a label).

```julia logistic_regression
julia> Iris()
dataset Iris:
metadata => Dict{String, Any} with 4 entries
features => 150×4 DataFrame
targets => 150×1 DataFrame
dataframe => 150×5 DataFrame

julia> x, y = Iris(as_df=false)[:]
(features = [5.1 4.9 6.2 5.9; 3.5 3.0 3.4 3.0; 1.4 1.4 5.4 5.1; 0.2 0.2 2.3 1.8], targets = InlineStrings.String15["Iris-setosa" "Iris-setosa" "Iris-virginica" "Iris-virginica"])
```

Our next step would be to convert this data in a form that can be fed to a machine learning model. The `x` values are arranged in a matrix and thus don't need any alterations but the labels must be one hot encoded. [Here](https://discourse.julialang.org/t/all-the-ways-to-do-one-hot-encoding/64807) is a great discourse thread on different techniques that can be used to one hot encode a data with or without using any external `Julia` package.

```julia logistic_regression
julia> y_r = reshape(y, (150, 1));

julia> custom_y_onehot = unique(y_r) .== permutedims(y_r)
3×150 BitMatrix:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
```

This same operation can also be performed using `Flux`'s `onehotbatch` function! We will use both of these outputs parallely to show how intuitive `Flux` is!

```julia logistic_regression
julia> flux_y_onehot = Flux.onehotbatch(y_r, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"])
3×150×1 OneHotArray(::Matrix{UInt32}) with eltype Bool:
[:, :, 1] =
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```

Our data is ready! The next step would be to build a classifier for the same.

## Building a model

A logistic regression model is mathematically defined as -

```math
model(x) = σ(Wx + b)
```

where `W` is the weight matrix, `b` is the bias vector, and `σ` is any activation function. For our case, let's use the `softmax` activation function as we will be performing a multiclass classifictation task. We can define our model in `Julia` using the exact same notation!

```julia logistic_regression
julia> m(x) = W*x .+ b
m (generic function with 1 method)
```

Note that this model lacks an activation function right now, but we will come back to that.

We can now move ahead to initalize the parameters of our model. TO keep it simple, let's use `Julia`'s `rand` function to initialize the weights, and let's initialize the biases as `0`. Given that our model has 4 inputs (4 features in every data point), and 3 outputs (3 different classes), the parameters can be initialized in the following way -

```julia logistic_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
julia> W = rand(Float32, 3, 4)
3×4 Matrix{Float32}:
0.660353 0.474309 0.170792 0.239653
0.790627 0.15147 0.707435 0.923513
0.3684 0.20105 0.399129 0.17404

julia> b = [0.0f0, 0.0f0, 0.0f0]
3-element Vector{Float32}:
0.0
0.0
0.0
```

Now our model is capable of taking in the complete data, and predicting the class of each `x` in one go! But, we need to make sure that our model outputs the probability of an input belonging to a particular class. As our model as 3 outputs, each one of them would denote the probability of the input belonging to that particular class.

To map our outputs to a probability value, we will use an activation function. It would make sense to use a `softmax` activation function here, which is mathematically described as -

```math
σ(\vec{x}) = \frac{\\e^{z_{i}}}{\\sum_{j=1}^{k} \\e^{z_{j}}}
```

The `softmax` function scales down the outputs to probability values such that the sum of all the final outputs is equal to `1`. Let's implement this in `Julia`!

```julia logistic_regression
julia> custom_softmax(x) = exp.(x) ./ sum(exp.(x), dims=1)
custom_softmax (generic function with 1 method)
```

The implementation looks straight forward enough! Note that we specify `dims=1` in the `sum` function to calculate the sum of probabilities across columns. Remember, we will have a `3X150` matrix (predicted `y`s) as the output of our model, where each column would be mapped to a column in the `x` matrix. Now, we will have to take the sum of the probability values across columns (that is, each output value) for the activation function to work; hence `dims=1`.

Let's combine this `softmax` function with our model to construct the complete `custom_model`.

```julia logistic_regression
julia> custom_model(x) = m(x) |> custom_softmax
custom_model (generic function with 1 method)
```

Let's check if our model works.

```julia logistic_regression
julia> custom_model(x) |> size
(3, 150)
```

It works! Let's check if the `softmax` function is working as it should.

```julia logistic_regression
julia> all(custom_model(x) .< 1.0f0 .&& custom_model(x) .> 0.0f0)
true

julia> sum(custom_model(x), dims=1)
1×150 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
```

Every output value is between `0` and `1`, and every column adds to `1`!

`Flux` provides the users with a very simple API which almost feels like writing your own code! Let's convert our `custom_model` to a `Flux` model.

```julia logistic_regression
julia> flux_model = Dense(4 => 1) |> softmax
Chain(
Dense(4 => 3), # 5 parameters
NNlib.softmax,
)
```

A [`Dense(4 => 3)`](@ref Dense) layer denotes a layer with four inputs (four features in every data point) and three outputs (three classes or labels). This layer is exactly same as the mathematical model defined by us above! Under the hood, `Flux` too calculates the output using the same expression! But, we don't have to initialize the parameters ourselves this time, instead `Flux` does it for us.

```julia linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
julia> flux_model.weight, flux_model.bias
(Float32[1.0764818], Float32[0.0])
```

Now we can check if our model is acting right. We can pass the complete data in one go, with each data point having four features (four inputs) -

```julia linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
julia> flux_model(x) |> size
(1, 61)

julia> flux_model(x)[1], y[1]
(-1.7474315f0, -7.0f0)
```

## Loss and accuracy
```julia
julia> custom_logitcrossentropy(ŷ, y) = mean(.-sum(y .* logsoftmax(ŷ; dims = 1); dims = 1))
custom_logitcrossentropy (generic function with 1 method)

julia> function custom_loss(x, y)
= custom_model(x)
custom_logitcrossentropy(ŷ, y)
end
custom_loss (generic function with 1 method)

julia> custom_loss(x, y_onehot)
1.0606989738802028
```
```julia
julia> function loss(x, y)
= model(x)
Flux.logitcrossentropy(ŷ, y)
end
loss (generic function with 1 method)

julia> loss(x, y_onehot)
1.092375933564367
```

```julia
julia> findmax(y_onehot, dims=1)
([1 1 1 1;;;], [CartesianIndex(1, 1, 1) CartesianIndex(1, 2, 1) CartesianIndex(2, 149, 1) CartesianIndex(2, 150, 1);;;])

julia> mxidx = findmax(y_onehot, dims=1)[2]
1×150×1 Array{CartesianIndex{3}, 3}:
[:, :, 1] =
CartesianIndex(1, 1, 1) CartesianIndex(1, 2, 1) CartesianIndex(1, 3, 1) CartesianIndex(1, 4, 1) CartesianIndex(2, 148, 1) CartesianIndex(2, 149, 1) CartesianIndex(2, 150, 1)

julia> mxidx[1]
CartesianIndex(1, 1, 1)

julia> mxidx[1].I
(1, 1, 1)

julia> mxidx[1].I[1]
1

julia> y_cold = Vector{String}(undef, 150);

julia> for i = 1:150
if mxidx[i].I[1] == 1
y_cold[i] = "Iris-setosa"
elseif mxidx[i].I[1] == 2
y_cold[i] = "Iris-virginica"
elseif mxidx[i].I[1] == 3
y_cold[i] = "Iris-versicolor"
end
end

julia> istrue = Flux.onecold(y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== y_cold;

julia> all(istrue)
true
```
```julia
julia> custom_accuracy(x, y) = mean(Flux.onecold(custom_model(x), ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== Flux.onecold(custom_y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]))

julia> custom_accuracy(x, y)
0.3333333333333333
```
```julia
julia> accuracy(x, y) = mean(Flux.onecold(model(x), ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== Flux.onecold(y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]))
accuracy (generic function with 1 method)

julia> accuracy(x, y)
0.3333333333333333
```

## Training the model
```julia
julia> opt = Descent(0.1)
Descent(0.1)

julia> params = Flux.params(W, b)
Params([Float32[0.6603528 0.47430867 0.17079216 0.23965251; 0.7906274 0.15146977 0.7074347 0.92351294; 0.3684004 0.20104975 0.39912927 0.17404026], Float32[0.0, 0.0, 0.0]])

julia> for i = 1:100
Flux.train!(custom_loss, params, [(x, custom_y_onehot)], opt)
@show custom_accuracy(x, y)
end
```

```julia
julia> params = Flux.params(model)
Params([Float32[0.55286723 -0.030403392 0.41436023 -0.2771595; -0.09287064 0.38187975 0.42391905 0.037785027; 0.14706837 0.29528287 0.2445691 0.3731384], Float32[0.0, 0.0, 0.0]])
```

```julia
julia> for i = 1:100
Flux.train!(loss, params, [(x, y_onehot)], opt)
if accuracy(x, y) >= 0.98 break end
end

julia> @show accuracy(x, y)
accuracy(x, y) = 0.98
```
Binary file added xy.jld2
Binary file not shown.

0 comments on commit f3f6933

Please sign in to comment.