[WIP] Do not store histogram on SplitInfo #36

ogrisel · 2018-11-05T13:52:15Z

Tentative fix for #31.

ogrisel · 2018-11-05T13:53:19Z

At this time some tests are broken (need to be updated) and ~~the code runs slightly slower than on master for some reason I do not understand yet~~ (on my laptop the speed is the same as on master).

The memory leak issue should be fixed though.

ogrisel · 2018-11-05T14:41:34Z

The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.

ogrisel · 2018-11-05T15:26:37Z

The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.

@NicolasHug noted that false sharing might not be a problem in a write only datastructure updated in a parallel for loop. I don't know.

What I observe though is that on a 12 cores machine, the code runs 2x faster with "tbb" as the numba.config.THREADING_LAYER that with "workqueue". "'omp'" performance is closer to "tbb" than "workqueue". But even with `"tbb", LightGBM is significantly faster than pygbm on this machine.

ogrisel · 2018-11-05T15:34:02Z

Actually I tried again with master and the various threading backends and this PR is either as fast or faster than master. I must have done something wrong when I reported the initial slow down.

In any case, tbb is significantly faster than the workqueue backend when the number of cores is large (e.g. 12 in my case).

ogrisel · 2018-11-05T15:34:23Z

@NicolasHug I have to go, feel free to update the failing tests and merge this PR.

codecov · 2018-11-05T15:52:06Z

Codecov Report

Merging #36 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   94.34%   94.36%   +0.02%     
==========================================
  Files           8        8              
  Lines         778      781       +3     
==========================================
+ Hits          734      737       +3     
  Misses         44       44

Impacted Files	Coverage Δ
pygbm/splitting.py	`99.47% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 154def0...0960a2f. Read the comment docs.

NicolasHug · 2018-11-05T16:25:45Z

Here are the same plots from #31 (comment) now. I don't know how I feel about the 1e7 case.

We have now the following results with the benchmark from: #30 (comment) (numba is pre-compiled here for fair comparisons):

Laptop with 8GB RAM, i5 7th gen.

Lightgbm: 75.408s, ROC AUC: 0.8293
Pygbm: 83.022s, ROC AUC: 0.8156
No VIRT explosion

😄

Code:

from urllib.request import urlretrieve
import os
from gzip import GzipFile
from time import time
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from joblib import Memory
from pygbm import GradientBoostingMachine
from lightgbm import LGBMRegressor
import numba
import gc

HERE = os.path.dirname(__file__)
URL = ("https://archive.ics.uci.edu/ml/machine-learning-databases/00280/"
       "HIGGS.csv.gz")
m = Memory(location='/tmp', mmap_mode='r')

@m.cache
def load_data():
    filename = os.path.join(HERE, URL.rsplit('/', 1)[-1])
    if not os.path.exists(filename):
        print(f"Downloading {URL} to {filename} (2.6 GB)...")
        urlretrieve(URL, filename)
        print("done.")

    print(f"Parsing {filename}...")
    tic = time()
    with GzipFile(filename) as f:
        df = pd.read_csv(f, header=None, dtype=np.float32)
    toc = time()
    print(f"Loaded {df.values.nbytes / 1e9:0.3f} GB in {toc - tic:0.3f}s")
    return df

df = load_data()

n_leaf_nodes = 255
n_trees = 500
lr = 0.05
max_bins = 255
subsample = 1000000 # Change this to 10000000 if you wish, or to None

target = df.values[:, 0]
data = np.ascontiguousarray(df.values[:, 1:])
data_train, data_test, target_train, target_test = train_test_split(
    data, target, test_size=50000, random_state=0)

if subsample is not None:
    data_train, target_train = data_train[:subsample], target_train[:subsample]

n_samples, n_features = data_train.shape
print(f"Training set with {n_samples} records with {n_features} features.")

gc.collect()

print("Compiling pygbm...")
tic = time()
pygbm_model = GradientBoostingMachine(learning_rate=lr, max_iter=n_trees,
                                      max_bins=max_bins,
                                      max_leaf_nodes=n_leaf_nodes,
                                      random_state=0, scoring=None,
                                      verbose=0, validation_split=None)
pygbm_model.fit(data_train[:100], target_train[:100])
toc = time()
predicted_test = pygbm_model.predict(data_test)
roc_auc = roc_auc_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}")
del pygbm_model
del predicted_test

print("Fitting a LightGBM model...")
tic = time()
lightgbm_model = LGBMRegressor(n_estimators=n_trees, num_leaves=n_leaf_nodes,
                               learning_rate=lr, silent=False)
lightgbm_model.fit(data_train, target_train)
toc = time()
predicted_test = lightgbm_model.predict(data_test)
roc_auc = roc_auc_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}")
del lightgbm_model
del predicted_test
gc.collect()

print("Fitting a pygbm model...")
tic = time()
pygbm_model = GradientBoostingMachine(learning_rate=lr, max_iter=n_trees,
                                      max_bins=max_bins,
                                      max_leaf_nodes=n_leaf_nodes,
                                      random_state=0, scoring=None,
                                      verbose=1, validation_split=None)
pygbm_model.fit(data_train, target_train)
toc = time()
predicted_test = pygbm_model.predict(data_test)
roc_auc = roc_auc_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}")
del pygbm_model
del predicted_test
gc.collect()


if hasattr(numba, 'threading_layer'):

Log:

[LightGBM] [Info] Total Bins 6143
[LightGBM] [Info] Number of data: 1000000, number of used features: 28
[LightGBM] [Info] Start training from score 0.529479
Training set with 1000000 records with 28 features.
Compiling pygbm...
done in 9.049s, ROC AUC: 0.5340
Fitting a LightGBM model...
done in 75.408s, ROC AUC: 0.8293
Fitting a pygbm model...
Binning 0.112 GB of data: 1.003 s (111.671 MB/s)
Fitting gradient boosted rounds:
[0/500]
[1/500] 255 leaf nodes, max depth 13 in 0.470s
[2/500] 255 leaf nodes, max depth 17 in 0.403s
[3/500] 255 leaf nodes, max depth 15 in 0.376s
[4/500] 255 leaf nodes, max depth 19 in 0.363s
[5/500] 255 leaf nodes, max depth 17 in 0.358s
[6/500] 255 leaf nodes, max depth 15 in 0.351s
[7/500] 255 leaf nodes, max depth 17 in 0.348s
[8/500] 255 leaf nodes, max depth 17 in 0.344s
[9/500] 255 leaf nodes, max depth 15 in 0.340s
[10/500] 255 leaf nodes, max depth 17 in 0.338s
[11/500] 255 leaf nodes, max depth 15 in 0.336s
[12/500] 255 leaf nodes, max depth 16 in 0.333s
[13/500] 255 leaf nodes, max depth 17 in 0.331s
[14/500] 255 leaf nodes, max depth 21 in 0.332s
[15/500] 255 leaf nodes, max depth 20 in 0.331s
[16/500] 255 leaf nodes, max depth 20 in 0.331s
[17/500] 255 leaf nodes, max depth 18 in 0.329s
[18/500] 255 leaf nodes, max depth 20 in 0.329s
[19/500] 255 leaf nodes, max depth 15 in 0.327s
[20/500] 255 leaf nodes, max depth 21 in 0.326s
[21/500] 255 leaf nodes, max depth 22 in 0.326s
[22/500] 255 leaf nodes, max depth 17 in 0.325s
[23/500] 255 leaf nodes, max depth 18 in 0.324s
[24/500] 255 leaf nodes, max depth 20 in 0.324s
[25/500] 255 leaf nodes, max depth 18 in 0.324s
[26/500] 255 leaf nodes, max depth 19 in 0.324s
[27/500] 255 leaf nodes, max depth 19 in 0.323s
[28/500] 255 leaf nodes, max depth 18 in 0.322s
[29/500] 255 leaf nodes, max depth 19 in 0.322s
[30/500] 255 leaf nodes, max depth 19 in 0.321s
[31/500] 255 leaf nodes, max depth 18 in 0.321s
[32/500] 255 leaf nodes, max depth 19 in 0.320s
[33/500] 255 leaf nodes, max depth 21 in 0.320s
[34/500] 255 leaf nodes, max depth 19 in 0.320s
[35/500] 255 leaf nodes, max depth 17 in 0.319s
[36/500] 255 leaf nodes, max depth 17 in 0.318s
[37/500] 255 leaf nodes, max depth 16 in 0.318s
[38/500] 255 leaf nodes, max depth 18 in 0.317s
[39/500] 255 leaf nodes, max depth 23 in 0.317s
[40/500] 255 leaf nodes, max depth 21 in 0.316s
[41/500] 255 leaf nodes, max depth 19 in 0.316s
[42/500] 255 leaf nodes, max depth 19 in 0.316s
[43/500] 255 leaf nodes, max depth 19 in 0.315s
[44/500] 255 leaf nodes, max depth 14 in 0.315s
[45/500] 255 leaf nodes, max depth 18 in 0.314s
[46/500] 255 leaf nodes, max depth 23 in 0.314s
[47/500] 255 leaf nodes, max depth 20 in 0.314s
[48/500] 255 leaf nodes, max depth 16 in 0.313s
[49/500] 255 leaf nodes, max depth 24 in 0.313s
[50/500] 255 leaf nodes, max depth 20 in 0.313s
[51/500] 255 leaf nodes, max depth 20 in 0.313s
[52/500] 255 leaf nodes, max depth 26 in 0.312s
[53/500] 255 leaf nodes, max depth 20 in 0.312s
[54/500] 255 leaf nodes, max depth 26 in 0.312s
[55/500] 255 leaf nodes, max depth 20 in 0.311s
[56/500] 255 leaf nodes, max depth 16 in 0.311s
[57/500] 255 leaf nodes, max depth 18 in 0.310s
[58/500] 255 leaf nodes, max depth 21 in 0.310s
[59/500] 255 leaf nodes, max depth 19 in 0.310s
[60/500] 255 leaf nodes, max depth 20 in 0.310s
[61/500] 255 leaf nodes, max depth 17 in 0.309s
[62/500] 255 leaf nodes, max depth 21 in 0.309s
[63/500] 255 leaf nodes, max depth 20 in 0.308s
[64/500] 255 leaf nodes, max depth 26 in 0.308s
[65/500] 255 leaf nodes, max depth 18 in 0.308s
[66/500] 255 leaf nodes, max depth 19 in 0.308s
[67/500] 255 leaf nodes, max depth 24 in 0.308s
[68/500] 255 leaf nodes, max depth 25 in 0.308s
[69/500] 255 leaf nodes, max depth 16 in 0.307s
[70/500] 255 leaf nodes, max depth 20 in 0.307s
[71/500] 255 leaf nodes, max depth 21 in 0.307s
[72/500] 255 leaf nodes, max depth 20 in 0.306s
[73/500] 255 leaf nodes, max depth 19 in 0.306s
[74/500] 255 leaf nodes, max depth 27 in 0.306s
[75/500] 255 leaf nodes, max depth 26 in 0.306s
[76/500] 255 leaf nodes, max depth 19 in 0.305s
[77/500] 255 leaf nodes, max depth 24 in 0.305s
[78/500] 255 leaf nodes, max depth 20 in 0.305s
[79/500] 255 leaf nodes, max depth 23 in 0.304s
[80/500] 255 leaf nodes, max depth 22 in 0.303s
[81/500] 255 leaf nodes, max depth 22 in 0.303s
[82/500] 255 leaf nodes, max depth 18 in 0.303s
[83/500] 255 leaf nodes, max depth 18 in 0.303s
[84/500] 255 leaf nodes, max depth 23 in 0.302s
[85/500] 255 leaf nodes, max depth 22 in 0.302s
[86/500] 255 leaf nodes, max depth 22 in 0.302s
[87/500] 255 leaf nodes, max depth 18 in 0.301s
[88/500] 255 leaf nodes, max depth 21 in 0.301s
[89/500] 255 leaf nodes, max depth 23 in 0.301s
[90/500] 255 leaf nodes, max depth 25 in 0.300s
[91/500] 255 leaf nodes, max depth 18 in 0.300s
[92/500] 255 leaf nodes, max depth 22 in 0.300s
[93/500] 255 leaf nodes, max depth 30 in 0.300s
[94/500] 255 leaf nodes, max depth 17 in 0.300s
[95/500] 255 leaf nodes, max depth 19 in 0.299s
[96/500] 255 leaf nodes, max depth 24 in 0.299s
[97/500] 255 leaf nodes, max depth 19 in 0.298s
[98/500] 255 leaf nodes, max depth 26 in 0.298s
[99/500] 255 leaf nodes, max depth 17 in 0.298s
[100/500] 255 leaf nodes, max depth 21 in 0.298s
[101/500] 255 leaf nodes, max depth 21 in 0.297s
[102/500] 255 leaf nodes, max depth 23 in 0.297s
[103/500] 255 leaf nodes, max depth 24 in 0.297s
[104/500] 255 leaf nodes, max depth 29 in 0.297s
[105/500] 255 leaf nodes, max depth 21 in 0.297s
[106/500] 255 leaf nodes, max depth 24 in 0.297s
[107/500] 255 leaf nodes, max depth 19 in 0.296s
[108/500] 255 leaf nodes, max depth 25 in 0.296s
[109/500] 255 leaf nodes, max depth 19 in 0.296s
[110/500] 255 leaf nodes, max depth 26 in 0.295s
[111/500] 255 leaf nodes, max depth 19 in 0.295s
[112/500] 255 leaf nodes, max depth 18 in 0.294s
[113/500] 255 leaf nodes, max depth 26 in 0.294s
[114/500] 255 leaf nodes, max depth 23 in 0.294s
[115/500] 255 leaf nodes, max depth 24 in 0.293s
[116/500] 255 leaf nodes, max depth 27 in 0.293s
[117/500] 255 leaf nodes, max depth 19 in 0.293s
[118/500] 255 leaf nodes, max depth 19 in 0.293s
[119/500] 255 leaf nodes, max depth 23 in 0.292s
[120/500] 255 leaf nodes, max depth 19 in 0.292s
[121/500] 255 leaf nodes, max depth 19 in 0.291s
[122/500] 255 leaf nodes, max depth 20 in 0.291s
[123/500] 255 leaf nodes, max depth 24 in 0.291s
[124/500] 255 leaf nodes, max depth 19 in 0.290s
[125/500] 255 leaf nodes, max depth 20 in 0.290s
[126/500] 255 leaf nodes, max depth 18 in 0.290s
[127/500] 255 leaf nodes, max depth 23 in 0.290s
[128/500] 255 leaf nodes, max depth 23 in 0.289s
[129/500] 255 leaf nodes, max depth 16 in 0.289s
[130/500] 255 leaf nodes, max depth 20 in 0.288s
[131/500] 255 leaf nodes, max depth 18 in 0.288s
[132/500] 255 leaf nodes, max depth 22 in 0.287s
[133/500] 255 leaf nodes, max depth 32 in 0.287s
[134/500] 255 leaf nodes, max depth 19 in 0.287s
[135/500] 255 leaf nodes, max depth 17 in 0.286s
[136/500] 255 leaf nodes, max depth 18 in 0.286s
[137/500] 255 leaf nodes, max depth 20 in 0.285s
[138/500] 255 leaf nodes, max depth 16 in 0.285s
[139/500] 255 leaf nodes, max depth 23 in 0.284s
[140/500] 255 leaf nodes, max depth 21 in 0.283s
[141/500] 255 leaf nodes, max depth 23 in 0.283s
[142/500] 255 leaf nodes, max depth 16 in 0.283s
[143/500] 255 leaf nodes, max depth 21 in 0.282s
[144/500] 255 leaf nodes, max depth 21 in 0.281s
[145/500] 255 leaf nodes, max depth 21 in 0.280s
[146/500] 255 leaf nodes, max depth 26 in 0.280s
[147/500] 255 leaf nodes, max depth 19 in 0.279s
[148/500] 255 leaf nodes, max depth 18 in 0.278s
[149/500] 255 leaf nodes, max depth 16 in 0.277s
[150/500] 255 leaf nodes, max depth 18 in 0.277s
[151/500] 255 leaf nodes, max depth 21 in 0.276s
[152/500] 255 leaf nodes, max depth 20 in 0.275s
[153/500] 255 leaf nodes, max depth 21 in 0.274s
[154/500] 255 leaf nodes, max depth 20 in 0.273s
[155/500] 255 leaf nodes, max depth 18 in 0.273s
[156/500] 255 leaf nodes, max depth 20 in 0.272s
[157/500] 255 leaf nodes, max depth 22 in 0.272s
[158/500] 255 leaf nodes, max depth 20 in 0.271s
[159/500] 255 leaf nodes, max depth 19 in 0.270s
[160/500] 255 leaf nodes, max depth 17 in 0.270s
[161/500] 255 leaf nodes, max depth 16 in 0.269s
[162/500] 255 leaf nodes, max depth 21 in 0.268s
[163/500] 255 leaf nodes, max depth 20 in 0.268s
[164/500] 255 leaf nodes, max depth 22 in 0.268s
[165/500] 255 leaf nodes, max depth 18 in 0.267s
[166/500] 255 leaf nodes, max depth 22 in 0.266s
[167/500] 255 leaf nodes, max depth 18 in 0.266s
[168/500] 255 leaf nodes, max depth 23 in 0.266s
[169/500] 255 leaf nodes, max depth 21 in 0.266s
[170/500] 255 leaf nodes, max depth 20 in 0.265s
[171/500] 255 leaf nodes, max depth 20 in 0.264s
[172/500] 255 leaf nodes, max depth 18 in 0.264s
[173/500] 255 leaf nodes, max depth 20 in 0.264s
[174/500] 255 leaf nodes, max depth 20 in 0.263s
[175/500] 255 leaf nodes, max depth 19 in 0.264s
[176/500] 255 leaf nodes, max depth 24 in 0.263s
[177/500] 255 leaf nodes, max depth 22 in 0.262s
[178/500] 255 leaf nodes, max depth 20 in 0.262s
[179/500] 255 leaf nodes, max depth 21 in 0.262s
[180/500] 255 leaf nodes, max depth 19 in 0.261s
[181/500] 255 leaf nodes, max depth 22 in 0.260s
[182/500] 255 leaf nodes, max depth 25 in 0.259s
[183/500] 255 leaf nodes, max depth 28 in 0.258s
[184/500] 255 leaf nodes, max depth 22 in 0.257s
[185/500] 255 leaf nodes, max depth 20 in 0.256s
[186/500] 255 leaf nodes, max depth 21 in 0.255s
[187/500] 255 leaf nodes, max depth 26 in 0.254s
[188/500] 255 leaf nodes, max depth 18 in 0.253s
[189/500] 255 leaf nodes, max depth 25 in 0.252s
[190/500] 255 leaf nodes, max depth 18 in 0.251s
[191/500] 255 leaf nodes, max depth 19 in 0.251s
[192/500] 255 leaf nodes, max depth 20 in 0.251s
[193/500] 255 leaf nodes, max depth 27 in 0.250s
[194/500] 255 leaf nodes, max depth 21 in 0.249s
[195/500] 255 leaf nodes, max depth 21 in 0.249s
[196/500] 255 leaf nodes, max depth 27 in 0.248s
[197/500] 255 leaf nodes, max depth 21 in 0.247s
[198/500] 255 leaf nodes, max depth 20 in 0.247s
[199/500] 255 leaf nodes, max depth 22 in 0.247s
[200/500] 255 leaf nodes, max depth 25 in 0.246s
[201/500] 255 leaf nodes, max depth 19 in 0.246s
[202/500] 255 leaf nodes, max depth 18 in 0.246s
[203/500] 255 leaf nodes, max depth 17 in 0.245s
[204/500] 255 leaf nodes, max depth 20 in 0.245s
[205/500] 255 leaf nodes, max depth 17 in 0.244s
[206/500] 255 leaf nodes, max depth 17 in 0.244s
[207/500] 255 leaf nodes, max depth 21 in 0.244s
[208/500] 255 leaf nodes, max depth 17 in 0.243s
[209/500] 255 leaf nodes, max depth 18 in 0.243s
[210/500] 255 leaf nodes, max depth 18 in 0.243s
[211/500] 255 leaf nodes, max depth 22 in 0.243s
[212/500] 255 leaf nodes, max depth 16 in 0.243s
[213/500] 255 leaf nodes, max depth 18 in 0.242s
[214/500] 255 leaf nodes, max depth 23 in 0.242s
[215/500] 255 leaf nodes, max depth 20 in 0.242s
[216/500] 255 leaf nodes, max depth 20 in 0.241s
[217/500] 255 leaf nodes, max depth 19 in 0.241s
[218/500] 255 leaf nodes, max depth 15 in 0.241s
[219/500] 255 leaf nodes, max depth 23 in 0.240s
[220/500] 255 leaf nodes, max depth 19 in 0.240s
[221/500] 255 leaf nodes, max depth 24 in 0.239s
[222/500] 255 leaf nodes, max depth 22 in 0.238s
[223/500] 255 leaf nodes, max depth 18 in 0.238s
[224/500] 255 leaf nodes, max depth 23 in 0.237s
[225/500] 255 leaf nodes, max depth 21 in 0.237s
[226/500] 255 leaf nodes, max depth 27 in 0.236s
[227/500] 255 leaf nodes, max depth 22 in 0.235s
[228/500] 255 leaf nodes, max depth 30 in 0.235s
[229/500] 255 leaf nodes, max depth 18 in 0.234s
[230/500] 255 leaf nodes, max depth 20 in 0.233s
[231/500] 255 leaf nodes, max depth 19 in 0.233s
[232/500] 255 leaf nodes, max depth 18 in 0.232s
[233/500] 255 leaf nodes, max depth 17 in 0.232s
[234/500] 255 leaf nodes, max depth 22 in 0.231s
[235/500] 255 leaf nodes, max depth 25 in 0.230s
[236/500] 255 leaf nodes, max depth 21 in 0.230s
[237/500] 255 leaf nodes, max depth 26 in 0.229s
[238/500] 255 leaf nodes, max depth 20 in 0.228s
[239/500] 255 leaf nodes, max depth 26 in 0.228s
[240/500] 255 leaf nodes, max depth 26 in 0.227s
[241/500] 255 leaf nodes, max depth 21 in 0.227s
[242/500] 255 leaf nodes, max depth 19 in 0.226s
[243/500] 255 leaf nodes, max depth 18 in 0.226s
[244/500] 255 leaf nodes, max depth 19 in 0.226s
[245/500] 255 leaf nodes, max depth 21 in 0.225s
[246/500] 255 leaf nodes, max depth 25 in 0.225s
[247/500] 255 leaf nodes, max depth 21 in 0.224s
[248/500] 255 leaf nodes, max depth 22 in 0.223s
[249/500] 255 leaf nodes, max depth 27 in 0.223s
[250/500] 255 leaf nodes, max depth 33 in 0.222s
[251/500] 255 leaf nodes, max depth 29 in 0.222s
[252/500] 255 leaf nodes, max depth 22 in 0.221s
[253/500] 255 leaf nodes, max depth 23 in 0.220s
[254/500] 255 leaf nodes, max depth 21 in 0.220s
[255/500] 255 leaf nodes, max depth 21 in 0.219s
[256/500] 255 leaf nodes, max depth 27 in 0.219s
[257/500] 255 leaf nodes, max depth 24 in 0.219s
[258/500] 255 leaf nodes, max depth 22 in 0.219s
[259/500] 255 leaf nodes, max depth 20 in 0.218s
[260/500] 255 leaf nodes, max depth 24 in 0.218s
[261/500] 255 leaf nodes, max depth 24 in 0.217s
[262/500] 255 leaf nodes, max depth 19 in 0.217s
[263/500] 255 leaf nodes, max depth 23 in 0.217s
[264/500] 255 leaf nodes, max depth 23 in 0.216s
[265/500] 255 leaf nodes, max depth 21 in 0.216s
[266/500] 255 leaf nodes, max depth 21 in 0.216s
[267/500] 255 leaf nodes, max depth 22 in 0.215s
[268/500] 255 leaf nodes, max depth 23 in 0.215s
[269/500] 255 leaf nodes, max depth 24 in 0.214s
[270/500] 255 leaf nodes, max depth 19 in 0.214s
[271/500] 255 leaf nodes, max depth 16 in 0.213s
[272/500] 255 leaf nodes, max depth 18 in 0.213s
[273/500] 255 leaf nodes, max depth 17 in 0.213s
[274/500] 255 leaf nodes, max depth 22 in 0.212s
[275/500] 255 leaf nodes, max depth 24 in 0.212s
[276/500] 255 leaf nodes, max depth 19 in 0.211s
[277/500] 255 leaf nodes, max depth 24 in 0.211s
[278/500] 255 leaf nodes, max depth 25 in 0.210s
[279/500] 255 leaf nodes, max depth 16 in 0.210s
[280/500] 255 leaf nodes, max depth 16 in 0.210s
[281/500] 255 leaf nodes, max depth 18 in 0.210s
[282/500] 255 leaf nodes, max depth 24 in 0.209s
[283/500] 255 leaf nodes, max depth 22 in 0.209s
[284/500] 255 leaf nodes, max depth 19 in 0.209s
[285/500] 255 leaf nodes, max depth 27 in 0.209s
[286/500] 255 leaf nodes, max depth 23 in 0.209s
[287/500] 255 leaf nodes, max depth 22 in 0.208s
[288/500] 255 leaf nodes, max depth 25 in 0.208s
[289/500] 255 leaf nodes, max depth 21 in 0.207s
[290/500] 255 leaf nodes, max depth 21 in 0.207s
[291/500] 255 leaf nodes, max depth 28 in 0.207s
[292/500] 255 leaf nodes, max depth 24 in 0.206s
[293/500] 255 leaf nodes, max depth 23 in 0.206s
[294/500] 255 leaf nodes, max depth 25 in 0.205s
[295/500] 255 leaf nodes, max depth 23 in 0.205s
[296/500] 255 leaf nodes, max depth 24 in 0.205s
[297/500] 255 leaf nodes, max depth 24 in 0.204s
[298/500] 255 leaf nodes, max depth 30 in 0.204s
[299/500] 255 leaf nodes, max depth 22 in 0.203s
[300/500] 255 leaf nodes, max depth 22 in 0.203s
[301/500] 255 leaf nodes, max depth 20 in 0.203s
[302/500] 255 leaf nodes, max depth 19 in 0.203s
[303/500] 255 leaf nodes, max depth 20 in 0.203s
[304/500] 255 leaf nodes, max depth 25 in 0.202s
[305/500] 255 leaf nodes, max depth 24 in 0.202s
[306/500] 255 leaf nodes, max depth 24 in 0.202s
[307/500] 255 leaf nodes, max depth 17 in 0.201s
[308/500] 255 leaf nodes, max depth 25 in 0.201s
[309/500] 255 leaf nodes, max depth 20 in 0.201s
[310/500] 255 leaf nodes, max depth 19 in 0.200s
[311/500] 255 leaf nodes, max depth 19 in 0.200s
[312/500] 255 leaf nodes, max depth 18 in 0.200s
[313/500] 255 leaf nodes, max depth 22 in 0.199s
[314/500] 255 leaf nodes, max depth 19 in 0.199s
[315/500] 255 leaf nodes, max depth 17 in 0.199s
[316/500] 255 leaf nodes, max depth 18 in 0.198s
[317/500] 255 leaf nodes, max depth 17 in 0.198s
[318/500] 255 leaf nodes, max depth 19 in 0.198s
[319/500] 255 leaf nodes, max depth 20 in 0.198s
[320/500] 255 leaf nodes, max depth 17 in 0.197s
[321/500] 255 leaf nodes, max depth 17 in 0.197s
[322/500] 255 leaf nodes, max depth 20 in 0.197s
[323/500] 255 leaf nodes, max depth 17 in 0.197s
[324/500] 255 leaf nodes, max depth 19 in 0.197s
[325/500] 255 leaf nodes, max depth 22 in 0.197s
[326/500] 255 leaf nodes, max depth 21 in 0.196s
[327/500] 255 leaf nodes, max depth 18 in 0.196s
[328/500] 255 leaf nodes, max depth 17 in 0.196s
[329/500] 255 leaf nodes, max depth 18 in 0.196s
[330/500] 255 leaf nodes, max depth 20 in 0.196s
[331/500] 255 leaf nodes, max depth 18 in 0.195s
[332/500] 255 leaf nodes, max depth 20 in 0.195s
[333/500] 255 leaf nodes, max depth 20 in 0.195s
[334/500] 255 leaf nodes, max depth 18 in 0.195s
[335/500] 255 leaf nodes, max depth 20 in 0.194s
[336/500] 255 leaf nodes, max depth 20 in 0.194s
[337/500] 255 leaf nodes, max depth 22 in 0.194s
[338/500] 255 leaf nodes, max depth 19 in 0.194s
[339/500] 255 leaf nodes, max depth 21 in 0.193s
[340/500] 255 leaf nodes, max depth 20 in 0.193s
[341/500] 255 leaf nodes, max depth 18 in 0.193s
[342/500] 255 leaf nodes, max depth 19 in 0.192s
[343/500] 255 leaf nodes, max depth 20 in 0.192s
[344/500] 255 leaf nodes, max depth 19 in 0.192s
[345/500] 255 leaf nodes, max depth 21 in 0.191s
[346/500] 255 leaf nodes, max depth 23 in 0.191s
[347/500] 255 leaf nodes, max depth 19 in 0.191s
[348/500] 255 leaf nodes, max depth 18 in 0.191s
[349/500] 255 leaf nodes, max depth 18 in 0.190s
[350/500] 255 leaf nodes, max depth 18 in 0.191s
[351/500] 255 leaf nodes, max depth 21 in 0.191s
[352/500] 255 leaf nodes, max depth 26 in 0.191s
[353/500] 255 leaf nodes, max depth 28 in 0.190s
[354/500] 255 leaf nodes, max depth 26 in 0.190s
[355/500] 255 leaf nodes, max depth 36 in 0.190s
[356/500] 255 leaf nodes, max depth 29 in 0.189s
[357/500] 255 leaf nodes, max depth 32 in 0.189s
[358/500] 255 leaf nodes, max depth 25 in 0.189s
[359/500] 255 leaf nodes, max depth 20 in 0.189s
[360/500] 255 leaf nodes, max depth 25 in 0.189s
[361/500] 255 leaf nodes, max depth 23 in 0.188s
[362/500] 255 leaf nodes, max depth 23 in 0.188s
[363/500] 255 leaf nodes, max depth 27 in 0.188s
[364/500] 255 leaf nodes, max depth 23 in 0.187s
[365/500] 255 leaf nodes, max depth 23 in 0.187s
[366/500] 255 leaf nodes, max depth 32 in 0.187s
[367/500] 255 leaf nodes, max depth 20 in 0.187s
[368/500] 255 leaf nodes, max depth 21 in 0.187s
[369/500] 255 leaf nodes, max depth 20 in 0.187s
[370/500] 255 leaf nodes, max depth 18 in 0.186s
[371/500] 255 leaf nodes, max depth 32 in 0.186s
[372/500] 255 leaf nodes, max depth 32 in 0.186s
[373/500] 255 leaf nodes, max depth 21 in 0.186s
[374/500] 255 leaf nodes, max depth 23 in 0.185s
[375/500] 255 leaf nodes, max depth 22 in 0.185s
[376/500] 255 leaf nodes, max depth 23 in 0.185s
[377/500] 255 leaf nodes, max depth 25 in 0.185s
[378/500] 255 leaf nodes, max depth 24 in 0.184s
[379/500] 255 leaf nodes, max depth 23 in 0.184s
[380/500] 255 leaf nodes, max depth 28 in 0.184s
[381/500] 255 leaf nodes, max depth 21 in 0.184s
[382/500] 255 leaf nodes, max depth 24 in 0.183s
[383/500] 255 leaf nodes, max depth 24 in 0.183s
[384/500] 255 leaf nodes, max depth 21 in 0.183s
[385/500] 255 leaf nodes, max depth 24 in 0.183s
[386/500] 255 leaf nodes, max depth 26 in 0.182s
[387/500] 255 leaf nodes, max depth 24 in 0.182s
[388/500] 255 leaf nodes, max depth 32 in 0.182s
[389/500] 255 leaf nodes, max depth 29 in 0.182s
[390/500] 255 leaf nodes, max depth 19 in 0.181s
[391/500] 255 leaf nodes, max depth 20 in 0.181s
[392/500] 255 leaf nodes, max depth 23 in 0.181s
[393/500] 255 leaf nodes, max depth 25 in 0.181s
[394/500] 255 leaf nodes, max depth 29 in 0.181s
[395/500] 255 leaf nodes, max depth 25 in 0.180s
[396/500] 255 leaf nodes, max depth 20 in 0.180s
[397/500] 255 leaf nodes, max depth 23 in 0.180s
[398/500] 255 leaf nodes, max depth 23 in 0.180s
[399/500] 255 leaf nodes, max depth 18 in 0.180s
[400/500] 255 leaf nodes, max depth 22 in 0.179s
[401/500] 255 leaf nodes, max depth 19 in 0.179s
[402/500] 255 leaf nodes, max depth 22 in 0.179s
[403/500] 255 leaf nodes, max depth 25 in 0.179s
[404/500] 255 leaf nodes, max depth 29 in 0.178s
[405/500] 255 leaf nodes, max depth 25 in 0.178s
[406/500] 255 leaf nodes, max depth 26 in 0.178s
[407/500] 255 leaf nodes, max depth 25 in 0.178s
[408/500] 255 leaf nodes, max depth 30 in 0.177s
[409/500] 255 leaf nodes, max depth 30 in 0.177s
[410/500] 255 leaf nodes, max depth 25 in 0.177s
[411/500] 255 leaf nodes, max depth 30 in 0.177s
[412/500] 255 leaf nodes, max depth 23 in 0.177s
[413/500] 255 leaf nodes, max depth 24 in 0.176s
[414/500] 255 leaf nodes, max depth 23 in 0.176s
[415/500] 255 leaf nodes, max depth 24 in 0.176s
[416/500] 255 leaf nodes, max depth 26 in 0.176s
[417/500] 255 leaf nodes, max depth 27 in 0.176s
[418/500] 255 leaf nodes, max depth 21 in 0.175s
[419/500] 255 leaf nodes, max depth 27 in 0.175s
[420/500] 255 leaf nodes, max depth 26 in 0.175s
[421/500] 255 leaf nodes, max depth 28 in 0.175s
[422/500] 255 leaf nodes, max depth 26 in 0.175s
[423/500] 255 leaf nodes, max depth 25 in 0.174s
[424/500] 255 leaf nodes, max depth 32 in 0.174s
[425/500] 255 leaf nodes, max depth 25 in 0.174s
[426/500] 255 leaf nodes, max depth 31 in 0.174s
[427/500] 255 leaf nodes, max depth 35 in 0.173s
[428/500] 255 leaf nodes, max depth 29 in 0.173s
[429/500] 255 leaf nodes, max depth 28 in 0.173s
[430/500] 255 leaf nodes, max depth 32 in 0.173s
[431/500] 255 leaf nodes, max depth 25 in 0.173s
[432/500] 255 leaf nodes, max depth 27 in 0.172s
[433/500] 255 leaf nodes, max depth 31 in 0.172s
[434/500] 255 leaf nodes, max depth 32 in 0.172s
[435/500] 255 leaf nodes, max depth 24 in 0.172s
[436/500] 255 leaf nodes, max depth 29 in 0.172s
[437/500] 255 leaf nodes, max depth 25 in 0.171s
[438/500] 255 leaf nodes, max depth 22 in 0.171s
[439/500] 255 leaf nodes, max depth 25 in 0.171s
[440/500] 255 leaf nodes, max depth 23 in 0.171s
[441/500] 255 leaf nodes, max depth 21 in 0.171s
[442/500] 255 leaf nodes, max depth 23 in 0.171s
[443/500] 255 leaf nodes, max depth 23 in 0.170s
[444/500] 255 leaf nodes, max depth 21 in 0.170s
[445/500] 255 leaf nodes, max depth 34 in 0.170s
[446/500] 255 leaf nodes, max depth 24 in 0.170s
[447/500] 255 leaf nodes, max depth 22 in 0.170s
[448/500] 255 leaf nodes, max depth 19 in 0.170s
[449/500] 255 leaf nodes, max depth 26 in 0.170s
[450/500] 255 leaf nodes, max depth 26 in 0.170s
[451/500] 255 leaf nodes, max depth 24 in 0.169s
[452/500] 255 leaf nodes, max depth 24 in 0.169s
[453/500] 255 leaf nodes, max depth 21 in 0.169s
[454/500] 255 leaf nodes, max depth 20 in 0.169s
[455/500] 255 leaf nodes, max depth 21 in 0.169s
[456/500] 255 leaf nodes, max depth 23 in 0.169s
[457/500] 255 leaf nodes, max depth 20 in 0.169s
[458/500] 255 leaf nodes, max depth 25 in 0.169s
[459/500] 255 leaf nodes, max depth 22 in 0.168s
[460/500] 255 leaf nodes, max depth 27 in 0.168s
[461/500] 255 leaf nodes, max depth 22 in 0.168s
[462/500] 255 leaf nodes, max depth 21 in 0.168s
[463/500] 255 leaf nodes, max depth 25 in 0.168s
[464/500] 255 leaf nodes, max depth 26 in 0.168s
[465/500] 255 leaf nodes, max depth 24 in 0.168s
[466/500] 255 leaf nodes, max depth 21 in 0.167s
[467/500] 255 leaf nodes, max depth 29 in 0.167s
[468/500] 255 leaf nodes, max depth 19 in 0.167s
[469/500] 255 leaf nodes, max depth 17 in 0.167s
[470/500] 255 leaf nodes, max depth 22 in 0.167s
[471/500] 255 leaf nodes, max depth 28 in 0.167s
[472/500] 255 leaf nodes, max depth 19 in 0.166s
[473/500] 255 leaf nodes, max depth 20 in 0.167s
[474/500] 255 leaf nodes, max depth 24 in 0.167s
[475/500] 255 leaf nodes, max depth 20 in 0.166s
[476/500] 255 leaf nodes, max depth 21 in 0.166s
[477/500] 255 leaf nodes, max depth 18 in 0.166s
[478/500] 255 leaf nodes, max depth 19 in 0.166s
[479/500] 255 leaf nodes, max depth 21 in 0.166s
[480/500] 255 leaf nodes, max depth 21 in 0.166s
[481/500] 255 leaf nodes, max depth 21 in 0.166s
[482/500] 255 leaf nodes, max depth 24 in 0.166s
[483/500] 255 leaf nodes, max depth 23 in 0.166s
[484/500] 255 leaf nodes, max depth 27 in 0.166s
[485/500] 255 leaf nodes, max depth 20 in 0.165s
[486/500] 255 leaf nodes, max depth 21 in 0.165s
[487/500] 255 leaf nodes, max depth 24 in 0.165s
[488/500] 255 leaf nodes, max depth 21 in 0.165s
[489/500] 255 leaf nodes, max depth 22 in 0.165s
[490/500] 255 leaf nodes, max depth 23 in 0.165s
[491/500] 255 leaf nodes, max depth 23 in 0.165s
[492/500] 255 leaf nodes, max depth 25 in 0.165s
[493/500] 255 leaf nodes, max depth 20 in 0.165s
[494/500] 255 leaf nodes, max depth 22 in 0.164s
[495/500] 255 leaf nodes, max depth 27 in 0.164s
[496/500] 255 leaf nodes, max depth 21 in 0.164s
[497/500] 255 leaf nodes, max depth 21 in 0.164s
[498/500] 255 leaf nodes, max depth 26 in 0.164s
[499/500] 255 leaf nodes, max depth 24 in 0.164s
[500/500] 255 leaf nodes, max depth 23 in 0.164s
Fit 500 trees in 83.011 s, (127500 total leaf nodes)
Time spent finding best splits:  57.967s
Time spent applying splits:      11.217s
Time spent predicting:           4.851s
done in 83.022s, ROC AUC: 0.8156
Threading layer chosen: tbb

Also, I tried to reproduce the leak with a minimal example and this is what I came up with. It's pretty weird and I'd like to know if you can reproduce it before submitting it to numba because I feel like I'm tripping:

import numpy as np
import psutil
from numba import (njit, jitclass, prange, float32, uint8, uint32, typeof,
                   optional)

@jitclass([
    ('attr', uint32),
])
class JitClass:
    def __init__(self):
        self.attr = 3


@njit
def f():
    cs = [JitClass() for i in range(1000)]
    array = np.empty(shape=10, dtype=np.uint32)

    # If I remove this loop, there is no leak
    for i, c in enumerate(cs):
        c.attr[i] = array[i]  # this should not even pass!
        # c.attr = array[i]  # <-- leak still here if we do this instead.

    # this should not work either!
    something_that_should_not_compile
    blahblahblah
    why_is_this_passing

    # a = a + 3  <-- this produces an error, as expected (a is not defined)

    return array 


class C:
    def g(self):
        self.array = f()

    
p = psutil.Process()
for _ in range(10000):
    o = C()
    o.g()
    del o
    # leak proportional to the size of cs, independent to the size of array
    print(f"{p.memory_info().rss / 1e6} MB")

@ogrisel I'm happy to merge as is, but I'd like your input on the 1e7 case first.

ogrisel · 2018-11-05T17:30:39Z

Cool, let's merge as it's already a net improvement.

About the minimal reproduction case, I confirm I get the leak with your code, without any error message or exception. Just memory usage increasing as reported by psutil.

ogrisel · 2018-11-05T17:31:35Z

We still have a discrepancy in terms of results with LightGBM though. But there is another issue for that.

NicolasHug · 2018-11-05T19:54:36Z

I have opened numba/numba#3473 and numba/numba#3472 regarding the leak and some other weird stuff I found.

Do not store histogram on SplitInfo

c7547c2

ogrisel mentioned this pull request Nov 5, 2018

Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

Closed

Fixed tests

0960a2f

ogrisel merged commit f0e3409 into master Nov 5, 2018

ogrisel deleted the fix-histogram-memory-usage branch November 5, 2018 17:30

ogrisel mentioned this pull request Nov 6, 2018

Benchmark results with better parameters #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Do not store histogram on SplitInfo #36

[WIP] Do not store histogram on SplitInfo #36

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018 •

edited

Loading

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

codecov bot commented Nov 5, 2018 •

edited

Loading

NicolasHug commented Nov 5, 2018 •

edited

Loading

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

NicolasHug commented Nov 5, 2018

[WIP] Do not store histogram on SplitInfo #36

[WIP] Do not store histogram on SplitInfo #36

Conversation

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018 • edited Loading

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

codecov bot commented Nov 5, 2018 • edited Loading

Codecov Report

NicolasHug commented Nov 5, 2018 • edited Loading

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

NicolasHug commented Nov 5, 2018

ogrisel commented Nov 5, 2018 •

edited

Loading

codecov bot commented Nov 5, 2018 •

edited

Loading

NicolasHug commented Nov 5, 2018 •

edited

Loading