Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122

Open
nsajko opened this issue Apr 20, 2022 · 4 comments

Comments

@nsajko
Copy link
Contributor

nsajko commented Apr 20, 2022

This line causes a lot of allocation, reported by --track-allocation=all:

@timeit hsd.timer "ξg_" ξg_ = (ξg + ξtk / pt.τ

So there's possibly room for improvement at or around that line.

@mtanneau
Copy link
Member

I tried a couple of different syntaxes, and it looks like, when using BigFloat arithmetic, things like

dot((ξxzl ./ pt.xl) .* dat.lflag, dat.l .* dat.lflag)

allocate a lot of memory indeed.
IIRC, I chose that syntax because then I didn't have to worry about dimensions (everything has the same size, and what needs to be zero is always zero).

Based on your comment, I think a better design choice would be to drop these .* flag and slice the arrays instead.
Something like

dot( (ξxzl ./ pt.xl)[dat.lflag], dat.l[dat.lflag])

@nsajko
Copy link
Contributor Author

nsajko commented Apr 21, 2022

A quick thought, maybe useful: I think that part of the reason for the current inefficiency is the bool-float multiplication method from base/bool.jl Base.:*(x::Bool, y::T)::promote_type(Bool,T) where T<:AbstractFloat, which takes special care for the case when the float value is NaN or infinity. I'm not sure how does Tulip handle NaN internally, but an idea that comes to mind is this:
If it's known that y (of floating point type) is finite and not NaN, it would probably be faster to convert x from Bool to UInt8 or UInt before multiplying.

@mtanneau
Copy link
Member

The main reason for using bool-float multiplication was because some of these can be Inf or NaN, yes. They get multiplied by false and get reset to zero, which makes the math work.

I did not realize at the time it could cause big performance issues.

@nsajko
Copy link
Contributor Author

nsajko commented Jan 6, 2023

I started optimizing IPM/HSD using MutableArithmetics with good results, however I wonder which arithmetics are supposed to be supported for HSD? It's likely that I'll break some non-BigFloat arithmetics while developing, so I'm wondering how much do the current tests cover this?

nsajko added a commit to nsajko/Tulip.jl that referenced this issue Jan 12, 2023
We now do basically this for a logical vector l: `dot(a[l], b[l])`,
instead of `dot(a .* l,  b .* l) as before.

This is as suggested here:
ds4dm#122 (comment)

Helps with the performance of the BigFloat arithmetic.

This is the output with the script from the previous commit:
```
exp = 6: 27.391877 seconds (142.35 M allocations: 21.646 GiB, 11.38% gc time, 23.70% compilation time)
elapsed time (ns):  27391877044
gc time (ns):       3116697350
bytes allocated:    23242616545
pool allocs:        142136686
non-pool GC allocs: 9144
malloc() calls:     118007
realloc() calls:    88649
free() calls:       116981
minor collections:  506
full collections:   0

exp = 9: 289.050404 seconds (1.12 G allocations: 175.440 GiB, 29.73% gc time)
elapsed time (ns):  289050404133
gc time (ns):       85940338982
bytes allocated:    188377360520
pool allocs:        1117528611
non-pool GC allocs: 75494
malloc() calls:     728937
realloc() calls:    554273
free() calls:       743507
minor collections:  4186
full collections:   3
```

Fixes ds4dm#122
nsajko added a commit to nsajko/Tulip.jl that referenced this issue Apr 16, 2023
We now do basically this for a logical vector l: `dot(a[l], b[l])`,
instead of `dot(a .* l,  b .* l) as before.

This is as suggested here:
ds4dm#122 (comment)

Helps with the performance of the BigFloat arithmetic.

A Julia script and a Unix shell script were used to conduct an
experiment for assesing the impact of this commit and the previous
commit on performance. The scripts and the resulting CSV file follow.

The benchmark experiment is conducted and the CSV created by removing
sources of system load on a computer and running the shell script four
times: once with the `init` command and three times with the `run`
command, each time checking out a different commit in the Tulip git
repo.

Unix (Bourne) shell script:
```sh

set -u

command=$1

julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes'

script=tulip_benchmark.jl

case "$command" in

init_csv)
  printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value
  ;;

run)
  tulip_version=$2
  $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version"
  ;;

*)
  printf '%s\n' error 2>&1
  exit 1
  ;;

esac
```

Julia script:
```julia
const benchmark_seconds = 500
const polynomial_degree = 20
setprecision(BigFloat, 12 * 2^7)
using BenchmarkTools
import
  FindMinimaxPolynomial,  # v0.2.3
  Tulip,
  MathOptInterface
const FMP = FindMinimaxPolynomial
const MMX = FMP.Minimax
const PPTI = FMP.PolynomialPassingThroughIntervals
const NE = FMP.NumericalErrorTypes
const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial
const mmx = MMX.minimax_polynomial
const error_type_relative = NE.RelativeError()
const MOI = MathOptInterface
const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err

function make_lp()
  lp = Tulip.Optimizer{BigFloat}()

  # Remove iteration limit just in case
  MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000)

  # Disable presolve, speeds things up
  #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0)

  lp
end

const itv = (-big"2.0"^-3, big"45.0")

odd_monomials(n::Int) = 1:2:n

sind_mmx(n::Int) =
  mmx(
    make_lp,
    sind,
    (itv,),
    odd_monomials(n),

    # Small factor to have less variance in the results
    initial_perturb_factor = 1//(2^20),

    # We're benchmarking LP, so disable other stuff
    worst_segments_density = 5,
    worst_segments_breadth_limit = 2,
    worst_segments_depth_ratio = 1/2,

    # Exit right after the first step
    exit_condition = true,
  )

function report(estimator; benchmark, benchmark_name)
  b = estimator(benchmark)
  println("$benchmark_name,$estimator,time,$(b.time)")
  println("$benchmark_name,$estimator,gctime,$(b.gctime)")
  println("$benchmark_name,$estimator,memory,$(b.memory)")
  println("$benchmark_name,$estimator,allocs,$(b.allocs)")
end

function report(;benchmark, benchmark_name)
  for quantile in (minimum, median, maximum)
    report(
      quantile,
      benchmark = benchmark,
      benchmark_name = benchmark_name,
    )
  end
end

report(
  benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds),
  benchmark_name = first(ARGS),
)
```

CSV results:
```csv
Tulip version,estimator,measurement type,value
v0.9.5,minimum,time,2.63759609e8
v0.9.5,minimum,gctime,3.4505713e7
v0.9.5,minimum,memory,495299296
v0.9.5,minimum,allocs,3629874
v0.9.5,median,time,4.2594161e8
v0.9.5,median,gctime,1.3250321e8
v0.9.5,median,memory,495299296
v0.9.5,median,allocs,3629874
v0.9.5,maximum,time,4.55935021e8
v0.9.5,maximum,gctime,1.40426286e8
v0.9.5,maximum,memory,495299296
v0.9.5,maximum,allocs,3629874
MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8
MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7
MutableArithmetics for IPM/HSD,minimum,memory,442052896
MutableArithmetics for IPM/HSD,minimum,allocs,3238720
MutableArithmetics for IPM/HSD,median,time,4.22323273e8
MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8
MutableArithmetics for IPM/HSD,median,memory,442052896
MutableArithmetics for IPM/HSD,median,allocs,3238720
MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8
MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8
MutableArithmetics for IPM/HSD,maximum,memory,442052896
MutableArithmetics for IPM/HSD,maximum,allocs,3238720
IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8
IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7
IPM/HSD: use logical slicing ...,minimum,memory,386588512
IPM/HSD: use logical slicing ...,minimum,allocs,2833356
IPM/HSD: use logical slicing ...,median,time,3.76039574e8
IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8
IPM/HSD: use logical slicing ...,median,memory,386588512
IPM/HSD: use logical slicing ...,median,allocs,2833356
IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8
IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8
IPM/HSD: use logical slicing ...,maximum,memory,386588512
IPM/HSD: use logical slicing ...,maximum,allocs,2833356
```

Fixes ds4dm#122
nsajko added a commit to nsajko/Tulip.jl that referenced this issue Apr 20, 2023
We now do basically this for a logical vector l: `dot(a[l], b[l])`,
instead of `dot(a .* l,  b .* l) as before.

This is as suggested here:
ds4dm#122 (comment)

Helps with the performance of the BigFloat arithmetic.

A Julia script and a Unix shell script were used to conduct an
experiment for assesing the impact of this commit and the previous
commit on performance. The scripts and the resulting CSV file follow.

The benchmark experiment is conducted and the CSV created by removing
sources of system load on a computer and running the shell script four
times: once with the `init` command and three times with the `run`
command, each time checking out a different commit in the Tulip git
repo.

Unix (Bourne) shell script:
```sh

set -u

command=$1

julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes'

script=tulip_benchmark.jl

case "$command" in

init_csv)
  printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value
  ;;

run)
  tulip_version=$2
  $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version"
  ;;

*)
  printf '%s\n' error 2>&1
  exit 1
  ;;

esac
```

Julia script:
```julia
const benchmark_seconds = 500
const polynomial_degree = 20
setprecision(BigFloat, 12 * 2^7)
using BenchmarkTools
import
  FindMinimaxPolynomial,  # v0.2.3
  Tulip,
  MathOptInterface
const FMP = FindMinimaxPolynomial
const MMX = FMP.Minimax
const PPTI = FMP.PolynomialPassingThroughIntervals
const NE = FMP.NumericalErrorTypes
const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial
const mmx = MMX.minimax_polynomial
const error_type_relative = NE.RelativeError()
const MOI = MathOptInterface
const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err

function make_lp()
  lp = Tulip.Optimizer{BigFloat}()

  # Remove iteration limit just in case
  MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000)

  # Disable presolve, speeds things up
  #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0)

  lp
end

const itv = (-big"2.0"^-3, big"45.0")

odd_monomials(n::Int) = 1:2:n

sind_mmx(n::Int) =
  mmx(
    make_lp,
    sind,
    (itv,),
    odd_monomials(n),

    # Small factor to have less variance in the results
    initial_perturb_factor = 1//(2^20),

    # We're benchmarking LP, so disable other stuff
    worst_segments_density = 5,
    worst_segments_breadth_limit = 2,
    worst_segments_depth_ratio = 1/2,

    # Exit right after the first step
    exit_condition = true,
  )

function report(estimator; benchmark, benchmark_name)
  b = estimator(benchmark)
  println("$benchmark_name,$estimator,time,$(b.time)")
  println("$benchmark_name,$estimator,gctime,$(b.gctime)")
  println("$benchmark_name,$estimator,memory,$(b.memory)")
  println("$benchmark_name,$estimator,allocs,$(b.allocs)")
end

function report(;benchmark, benchmark_name)
  for quantile in (minimum, median, maximum)
    report(
      quantile,
      benchmark = benchmark,
      benchmark_name = benchmark_name,
    )
  end
end

report(
  benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds),
  benchmark_name = first(ARGS),
)
```

CSV results:
```csv
Tulip version,estimator,measurement type,value
v0.9.5,minimum,time,2.63759609e8
v0.9.5,minimum,gctime,3.4505713e7
v0.9.5,minimum,memory,495299296
v0.9.5,minimum,allocs,3629874
v0.9.5,median,time,4.2594161e8
v0.9.5,median,gctime,1.3250321e8
v0.9.5,median,memory,495299296
v0.9.5,median,allocs,3629874
v0.9.5,maximum,time,4.55935021e8
v0.9.5,maximum,gctime,1.40426286e8
v0.9.5,maximum,memory,495299296
v0.9.5,maximum,allocs,3629874
MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8
MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7
MutableArithmetics for IPM/HSD,minimum,memory,442052896
MutableArithmetics for IPM/HSD,minimum,allocs,3238720
MutableArithmetics for IPM/HSD,median,time,4.22323273e8
MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8
MutableArithmetics for IPM/HSD,median,memory,442052896
MutableArithmetics for IPM/HSD,median,allocs,3238720
MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8
MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8
MutableArithmetics for IPM/HSD,maximum,memory,442052896
MutableArithmetics for IPM/HSD,maximum,allocs,3238720
IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8
IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7
IPM/HSD: use logical slicing ...,minimum,memory,386588512
IPM/HSD: use logical slicing ...,minimum,allocs,2833356
IPM/HSD: use logical slicing ...,median,time,3.76039574e8
IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8
IPM/HSD: use logical slicing ...,median,memory,386588512
IPM/HSD: use logical slicing ...,median,allocs,2833356
IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8
IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8
IPM/HSD: use logical slicing ...,maximum,memory,386588512
IPM/HSD: use logical slicing ...,maximum,allocs,2833356
```

Fixes ds4dm#122
nsajko added a commit to nsajko/Tulip.jl that referenced this issue Apr 20, 2023
We now do basically this for a logical vector l: `dot(a[l], b[l])`,
instead of `dot(a .* l,  b .* l) as before.

This is as suggested here:
ds4dm#122 (comment)

Helps with the performance of the BigFloat arithmetic.

A Julia script and a Unix shell script were used to conduct an
experiment for assesing the impact of this commit and the previous
commit on performance. The scripts and the resulting CSV file follow.

The benchmark experiment is conducted and the CSV created by removing
sources of system load on a computer and running the shell script four
times: once with the `init_csv` command and three times with the `run`
command, each time checking out a different commit in the Tulip git
repo.

Unix (Bourne) shell script:
```sh

set -u

command=$1

julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes'

script=tulip_benchmark.jl

case "$command" in

init_csv)
  printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value
  ;;

run)
  tulip_version=$2
  $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version"
  ;;

*)
  printf '%s\n' error 2>&1
  exit 1
  ;;

esac
```

Julia script:
```julia
const benchmark_seconds = 500
const polynomial_degree = 20
setprecision(BigFloat, 12 * 2^7)
using BenchmarkTools
import
  FindMinimaxPolynomial,  # v0.2.3
  Tulip,
  MathOptInterface
const FMP = FindMinimaxPolynomial
const MMX = FMP.Minimax
const PPTI = FMP.PolynomialPassingThroughIntervals
const NE = FMP.NumericalErrorTypes
const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial
const mmx = MMX.minimax_polynomial
const error_type_relative = NE.RelativeError()
const MOI = MathOptInterface
const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err

function make_lp()
  lp = Tulip.Optimizer{BigFloat}()

  # Remove iteration limit just in case
  MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000)

  # Disable presolve, speeds things up
  #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0)

  lp
end

const itv = (-big"2.0"^-3, big"45.0")

odd_monomials(n::Int) = 1:2:n

sind_mmx(n::Int) =
  mmx(
    make_lp,
    sind,
    (itv,),
    odd_monomials(n),

    # Small factor to have less variance in the results
    initial_perturb_factor = 1//(2^20),

    # We're benchmarking LP, so disable other stuff
    worst_segments_density = 5,
    worst_segments_breadth_limit = 2,
    worst_segments_depth_ratio = 1/2,

    # Exit right after the first step
    exit_condition = true,
  )

function report(estimator; benchmark, benchmark_name)
  b = estimator(benchmark)
  println("$benchmark_name,$estimator,time,$(b.time)")
  println("$benchmark_name,$estimator,gctime,$(b.gctime)")
  println("$benchmark_name,$estimator,memory,$(b.memory)")
  println("$benchmark_name,$estimator,allocs,$(b.allocs)")
end

function report(;benchmark, benchmark_name)
  for quantile in (minimum, median, maximum)
    report(
      quantile,
      benchmark = benchmark,
      benchmark_name = benchmark_name,
    )
  end
end

report(
  benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds),
  benchmark_name = first(ARGS),
)
```

CSV results:
```csv
Tulip version,estimator,measurement type,value
v0.9.5,minimum,time,2.63759609e8
v0.9.5,minimum,gctime,3.4505713e7
v0.9.5,minimum,memory,495299296
v0.9.5,minimum,allocs,3629874
v0.9.5,median,time,4.2594161e8
v0.9.5,median,gctime,1.3250321e8
v0.9.5,median,memory,495299296
v0.9.5,median,allocs,3629874
v0.9.5,maximum,time,4.55935021e8
v0.9.5,maximum,gctime,1.40426286e8
v0.9.5,maximum,memory,495299296
v0.9.5,maximum,allocs,3629874
MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8
MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7
MutableArithmetics for IPM/HSD,minimum,memory,442052896
MutableArithmetics for IPM/HSD,minimum,allocs,3238720
MutableArithmetics for IPM/HSD,median,time,4.22323273e8
MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8
MutableArithmetics for IPM/HSD,median,memory,442052896
MutableArithmetics for IPM/HSD,median,allocs,3238720
MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8
MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8
MutableArithmetics for IPM/HSD,maximum,memory,442052896
MutableArithmetics for IPM/HSD,maximum,allocs,3238720
IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8
IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7
IPM/HSD: use logical slicing ...,minimum,memory,386588512
IPM/HSD: use logical slicing ...,minimum,allocs,2833356
IPM/HSD: use logical slicing ...,median,time,3.76039574e8
IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8
IPM/HSD: use logical slicing ...,median,memory,386588512
IPM/HSD: use logical slicing ...,median,allocs,2833356
IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8
IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8
IPM/HSD: use logical slicing ...,maximum,memory,386588512
IPM/HSD: use logical slicing ...,maximum,allocs,2833356
```

Fixes ds4dm#122
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants