Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For a 1.3 release #977

Merged
merged 31 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b6056dc
Add prompt to REPL example
abhro Mar 30, 2024
7ae5821
annotate type for old_model field of Machine type
ablaom Apr 8, 2024
190de70
annotate type of operation field in Node type
ablaom Apr 8, 2024
54ed311
Merge pull request #968 from abhro/patch-1
ablaom Apr 9, 2024
2752e08
Update docstring examples and code
abhro Apr 22, 2024
d0002b3
make test of iterator(...) more robust
ablaom Apr 23, 2024
f8b4c2c
Merge pull request #972 from JuliaAI/fix-test
ablaom Apr 23, 2024
6e77d6a
Merge pull request #969 from JuliaAI/predict-type-instability
ablaom Apr 24, 2024
326f9b5
add CompactPerformanceEvaluation type
ablaom Apr 24, 2024
5356cd3
add test
ablaom Apr 24, 2024
b0818a3
bump 1.3
ablaom Apr 24, 2024
2bc3bec
adpat Resampler to adjusted evaluate signature
ablaom Apr 24, 2024
686741e
Remove method-less definition for _recursive_show
abhro Apr 24, 2024
dbba742
Collapse docstring for holdout
abhro Apr 24, 2024
c0c7f8a
add InSample resampling strategy to close #967
ablaom Apr 24, 2024
bc55ffd
put back per_fold data into display, but with better layout
ablaom Apr 25, 2024
5bd8f82
update evaluate! docstring
ablaom Apr 25, 2024
43b48cc
put back the train_test_rows, as needed by update(::Resampler)
ablaom Apr 26, 2024
d8bdc7d
fix typo creeping into unrelated test and fix display
ablaom Apr 26, 2024
d4f744d
add `caches_data(::Machine)` accessor function
ablaom Apr 26, 2024
b91db0a
found a way to remove train_test_rows from CompactPerformanceEvaluation
ablaom Apr 26, 2024
8ed15ce
move SE column in evaluation display to second table
ablaom Apr 26, 2024
8cb6f26
drop invalid smoke test
ablaom Apr 27, 2024
45cb77c
add coverage
ablaom Apr 28, 2024
c83cae9
try updating codecov action to @v3
ablaom Apr 29, 2024
f811dc3
Merge pull request #973 from JuliaAI/compact-performance-evaluations
ablaom Apr 30, 2024
0e6db81
Merge branch 'dev' into insample-evaluations
ablaom Apr 30, 2024
2c85c30
typo identified in review
ablaom May 6, 2024
d6b1930
Merge pull request #975 from JuliaAI/insample-evaluations
ablaom May 6, 2024
e0ca155
Merge branch 'docstring-patch-1' of https://github.com/abhro/MLJBase.…
ablaom May 6, 2024
af10ff2
Merge branch 'abhro-docstring-patch-1' into dev
ablaom May 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
env:
JULIA_NUM_THREADS: 2
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
- uses: codecov/codecov-action@v3
with:
file: lcov.info
docs:
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJBase"
uuid = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
authors = ["Anthony D. Blaom <[email protected]>"]
version = "1.2.1"
version = "1.3"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down
4 changes: 2 additions & 2 deletions src/MLJBase.jl
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,8 @@ export machines, sources, Stack,
export TransformedTargetModel

# resampling.jl:
export ResamplingStrategy, Holdout, CV, StratifiedCV, TimeSeriesCV,
evaluate!, Resampler, PerformanceEvaluation
export ResamplingStrategy, InSample, Holdout, CV, StratifiedCV, TimeSeriesCV,
evaluate!, Resampler, PerformanceEvaluation, CompactPerformanceEvaluation

# `MLJType` and the abstract `Model` subtypes are exported from within
# src/composition/abstract_types.jl
Expand Down
32 changes: 17 additions & 15 deletions src/composition/learning_networks/nodes.jl
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ See also [`node`](@ref), [`Source`](@ref), [`origins`](@ref),
[`sources`](@ref), [`fit!`](@ref).

"""
struct Node{T<:Union{Machine, Nothing}} <: AbstractNode
struct Node{T<:Union{Machine, Nothing},Oper} <: AbstractNode

operation # eg, `predict` or a static operation, such as `exp`
operation::Oper # eg, `predict` or a static operation, such as `exp`
machine::T # is `nothing` for static operations

# nodes called to get args for `operation(model, ...) ` or
Expand All @@ -43,9 +43,11 @@ struct Node{T<:Union{Machine, Nothing}} <: AbstractNode
# order consistent with extended graph, excluding self
nodes::Vector{AbstractNode}

function Node(operation,
machine::T,
args::AbstractNode...) where T<:Union{Machine, Nothing}
function Node(
operation::Oper,
machine::T,
args::AbstractNode...,
) where {T<:Union{Machine, Nothing}, Oper}

# check the number of arguments:
# if machine === nothing && isempty(args)
Expand All @@ -70,7 +72,7 @@ struct Node{T<:Union{Machine, Nothing}} <: AbstractNode
vcat(nodes_, (nodes(n) for n in machine.args)...) |> unique
end

return new{T}(operation, machine, args, origins_, nodes_)
return new{T,Oper}(operation, machine, args, origins_, nodes_)
end
end

Expand Down Expand Up @@ -407,14 +409,14 @@ of nodes, sources and other arguments.

### Examples

```
X = source(π)
W = @node sin(X)
```julia-repl
julia> X = source(π)
julia> W = @node sin(X)
julia> W()
0

X = source(1:10)
Y = @node selectrows(X, 3:4)
julia> X = source(1:10)
julia> Y = @node selectrows(X, 3:4)
julia> Y()
3:4

Expand All @@ -423,10 +425,10 @@ julia> Y(["one", "two", "three", "four"])
"three"
"four"

X1 = source(4)
X2 = source(5)
add(a, b, c) = a + b + c
N = @node add(X1, 1, X2)
julia> X1 = source(4)
julia> X2 = source(5)
julia> add(a, b, c) = a + b + c
julia> N = @node add(X1, 1, X2)
julia> N()
10

Expand Down
18 changes: 9 additions & 9 deletions src/composition/learning_networks/signatures.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@

**Private method.**

Return a dictionary of machines, keyed on model, for the all machines in the completed
learning network for which `node` is the greatest lower bound. Only machines bound to
symbolic models are included. Values are always vectors, even if they contain only a
single machine.
Return a dictionary of machines, keyed on model, for the all machines in the
completed learning network for which `node` is the greatest lower bound. Only
machines bound to symbolic models are included. Values are always vectors,
even if they contain only a single machine.

"""
function machines_given_model(node::AbstractNode)
Expand All @@ -35,14 +35,14 @@ attempt_scalarize(v) = length(v) == 1 ? v[1] : v

**Private method.**

Given a dictionary of machine vectors, keyed on model names (symbols), broadcast `f` over
each vector, and make the result, in the returned named tuple, the value associated with
the corresponding model name as key.
Given a dictionary of machine vectors, keyed on model names (symbols), broadcast
`f` over each vector, and make the result, in the returned named tuple, the
value associated with the corresponding model name as key.

Singleton vector values are scalarized, unless `scalarize = false`.

If a value in the computed named tuple is `nothing`, or a vector of `nothing`s, then the
entry is dropped from the tuple, unless `drop_nothings=false`.
If a value in the computed named tuple is `nothing`, or a vector of `nothing`s,
then the entry is dropped from the tuple, unless `drop_nothings=false`.

"""
function tuple_keyed_on_model(f, machines_given_model; scalarize=true, drop_nothings=true)
Expand Down
17 changes: 8 additions & 9 deletions src/composition/models/stacking.jl
Original file line number Diff line number Diff line change
Expand Up @@ -337,12 +337,12 @@ internal_stack_report(
) = NamedTuple{}()

"""
internal_stack_report(
m::Stack,
verbosity::Int,
y::AbstractNode,
folds_evaluations::Vararg{AbstractNode},
)
internal_stack_report(
m::Stack,
verbosity::Int,
y::AbstractNode,
folds_evaluations::Vararg{AbstractNode},
)

When measure/measures is provided, the folds_evaluation will have been filled by
`store_for_evaluation`. This function is not doing any heavy work (not constructing nodes
Expand Down Expand Up @@ -518,7 +518,7 @@ function oos_set(m::Stack{modelnames}, Xs::Source, ys::Source, tt_pairs) where m
end

#######################################
################# Prefit #################
################# Prefit ##############
#######################################

function prefit(m::Stack{modelnames}, verbosity::Int, X, y) where modelnames
Expand Down Expand Up @@ -564,8 +564,7 @@ const DOC_STACK =
Stack(; metalearner=nothing, name1=model1, name2=model2, ..., keyword_options...)

Implements the two-layer generalized stack algorithm introduced by
[Wolpert
(1992)](https://www.sciencedirect.com/science/article/abs/pii/S0893608005800231)
[Wolpert (1992)](https://www.sciencedirect.com/science/article/abs/pii/S0893608005800231)
and generalized by [Van der Laan et al
(2007)](https://biostats.bepress.com/ucbbiostat/paper222/). Returns an
instance of type `ProbabilisticStack` or `DeterministicStack`,
Expand Down
6 changes: 3 additions & 3 deletions src/composition/models/transformed_target_model.jl
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ const ERR_MODEL_UNSPECIFIED = ArgumentError(
"Expecting atomic model as argument. None specified. "
)
const ERR_TRANSFORMER_UNSPECIFIED = ArgumentError(
"You must specify `transformer=...`. ."
"You must specify `transformer=...`. ."
)
const ERR_TOO_MANY_ARGUMENTS = ArgumentError(
"At most one non-keyword argument, a model, allowed. "
Expand Down Expand Up @@ -123,7 +123,7 @@ y -> mode.(y))`.
A model that normalizes the target before applying ridge regression,
with predictions returned on the original scale:

```
```julia
@load RidgeRegressor pkg=MLJLinearModels
model = RidgeRegressor()
tmodel = TransformedTargetModel(model, transformer=Standardizer())
Expand All @@ -132,7 +132,7 @@ tmodel = TransformedTargetModel(model, transformer=Standardizer())
A model that applies a static `log` transformation to the data, again
returning predictions to the original scale:

```
```julia
tmodel2 = TransformedTargetModel(model, transformer=y->log.(y), inverse=z->exp.(y))
```

Expand Down
49 changes: 30 additions & 19 deletions src/data/data.jl
Original file line number Diff line number Diff line change
Expand Up @@ -104,23 +104,28 @@ corresponding `fractions` of `length(nrows(X))`, where valid fractions
are floats between 0 and 1 whose sum is less than one. The last
fraction is not provided, as it is inferred from the preceding ones.

For "synchronized" partitioning of multiple objects, use the
`multi=true` option described below.
For synchronized partitioning of multiple objects, use the
`multi=true` option.

julia> partition(1:1000, 0.8)
([1,...,800], [801,...,1000])
```julia-repl
julia> partition(1:1000, 0.8)
([1,...,800], [801,...,1000])

julia> partition(1:1000, 0.2, 0.7)
([1,...,200], [201,...,900], [901,...,1000])
julia> partition(1:1000, 0.2, 0.7)
([1,...,200], [201,...,900], [901,...,1000])

julia> partition(reshape(1:10, 5, 2), 0.2, 0.4)
([1 6], [2 7; 3 8], [4 9; 5 10])
julia> partition(reshape(1:10, 5, 2), 0.2, 0.4)
([1 6], [2 7; 3 8], [4 9; 5 10])

X, y = make_blobs() # a table and vector
Xtrain, Xtest = partition(X, 0.8, stratify=y)
julia> X, y = make_blobs() # a table and vector
julia> Xtrain, Xtest = partition(X, 0.8, stratify=y)
```

(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)
Here's an example of synchronized partitioning of multiple objects:

```julia-repl
julia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)
```

## Keywords

Expand Down Expand Up @@ -209,7 +214,7 @@ Returns a tuple of tables/vectors with length one greater than the
number of supplied predicates, with the last component including all
previously unselected columns.

```
```julia-repl
julia> table = DataFrame(x=[1,2], y=['a', 'b'], z=[10.0, 20.0], w=["A", "B"])
2×4 DataFrame
Row │ x y z w
Expand All @@ -218,7 +223,7 @@ julia> table = DataFrame(x=[1,2], y=['a', 'b'], z=[10.0, 20.0], w=["A", "B"])
1 │ 1 a 10.0 A
2 │ 2 b 20.0 B

Z, XY, W = unpack(table, ==(:z), !=(:w))
julia> Z, XY, W = unpack(table, ==(:z), !=(:w));
julia> Z
2-element Vector{Float64}:
10.0
Expand Down Expand Up @@ -300,9 +305,11 @@ The method is curried, so that `restrict(folds, i)` is the operator
on data defined by `restrict(folds, i)(X) = restrict(X, folds, i)`.

### Example

folds = ([1, 2], [3, 4, 5], [6,])
restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]
#
```julia
folds = ([1, 2], [3, 4, 5], [6,])
restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]
```

See also [`corestrict`](@ref)

Expand All @@ -322,7 +329,9 @@ all elements of `folds`. Here `folds` is a vector or tuple of integer
vectors, typically representing row indices or a vector, matrix or
table.

complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
```julia
complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
```

"""
complement(f, i) = reduce(vcat, collect(f)[Not(i)])
Expand All @@ -345,8 +354,10 @@ on data defined by `corestrict(folds, i)(X) = corestrict(X, folds, i)`.

### Example

folds = ([1, 2], [3, 4, 5], [6,])
corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
```julia
folds = ([1, 2], [3, 4, 5], [6,])
corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
```

"""
corestrict(f::NTuple{N}, i) where N = FoldComplementRestrictor{i,N}(f)
Expand Down
2 changes: 1 addition & 1 deletion src/data/datasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ const COERCE_SUNSPOTS = (
(:sunspot_number=>Continuous),)

"""
load_dataset(fpath, coercions)
load_dataset(fpath, coercions)

Load one of standard dataset like Boston etc assuming the file is a
comma separated file with a header.
Expand Down
Loading
Loading