Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Do not store histogram on SplitInfo #36

Merged
merged 2 commits into from
Nov 5, 2018

Conversation

ogrisel
Copy link
Owner

@ogrisel ogrisel commented Nov 5, 2018

Tentative fix for #31.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

At this time some tests are broken (need to be updated) and the code runs slightly slower than on master for some reason I do not understand yet (on my laptop the speed is the same as on master).

The memory leak issue should be fixed though.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.

@NicolasHug noted that false sharing might not be a problem in a write only datastructure updated in a parallel for loop. I don't know.

What I observe though is that on a 12 cores machine, the code runs 2x faster with "tbb" as the numba.config.THREADING_LAYER that with "workqueue". "'omp'" performance is closer to "tbb" than "workqueue". But even with `"tbb", LightGBM is significantly faster than pygbm on this machine.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

Actually I tried again with master and the various threading backends and this PR is either as fast or faster than master. I must have done something wrong when I reported the initial slow down.

In any case, tbb is significantly faster than the workqueue backend when the number of cores is large (e.g. 12 in my case).

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

@NicolasHug I have to go, feel free to update the failing tests and merge this PR.

@codecov
Copy link

codecov bot commented Nov 5, 2018

Codecov Report

Merging #36 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   94.34%   94.36%   +0.02%     
==========================================
  Files           8        8              
  Lines         778      781       +3     
==========================================
+ Hits          734      737       +3     
  Misses         44       44
Impacted Files Coverage Δ
pygbm/splitting.py 99.47% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 154def0...0960a2f. Read the comment docs.

@NicolasHug
Copy link
Collaborator

NicolasHug commented Nov 5, 2018

Here are the same plots from #31 (comment) now. I don't know how I feel about the 1e7 case.

leak
leak2

We have now the following results with the benchmark from: #30 (comment) (numba is pre-compiled here for fair comparisons):

Laptop with 8GB RAM, i5 7th gen.

Lightgbm: 75.408s, ROC AUC: 0.8293
Pygbm: 83.022s, ROC AUC: 0.8156
No VIRT explosion

😄

Code:

from urllib.request import urlretrieve
import os
from gzip import GzipFile
from time import time
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from joblib import Memory
from pygbm import GradientBoostingMachine
from lightgbm import LGBMRegressor
import numba
import gc

HERE = os.path.dirname(__file__)
URL = ("https://archive.ics.uci.edu/ml/machine-learning-databases/00280/"
       "HIGGS.csv.gz")
m = Memory(location='/tmp', mmap_mode='r')

@m.cache
def load_data():
    filename = os.path.join(HERE, URL.rsplit('/', 1)[-1])
    if not os.path.exists(filename):
        print(f"Downloading {URL} to {filename} (2.6 GB)...")
        urlretrieve(URL, filename)
        print("done.")

    print(f"Parsing {filename}...")
    tic = time()
    with GzipFile(filename) as f:
        df = pd.read_csv(f, header=None, dtype=np.float32)
    toc = time()
    print(f"Loaded {df.values.nbytes / 1e9:0.3f} GB in {toc - tic:0.3f}s")
    return df

df = load_data()

n_leaf_nodes = 255
n_trees = 500
lr = 0.05
max_bins = 255
subsample = 1000000 # Change this to 10000000 if you wish, or to None

target = df.values[:, 0]
data = np.ascontiguousarray(df.values[:, 1:])
data_train, data_test, target_train, target_test = train_test_split(
    data, target, test_size=50000, random_state=0)

if subsample is not None:
    data_train, target_train = data_train[:subsample], target_train[:subsample]

n_samples, n_features = data_train.shape
print(f"Training set with {n_samples} records with {n_features} features.")

gc.collect()

print("Compiling pygbm...")
tic = time()
pygbm_model = GradientBoostingMachine(learning_rate=lr, max_iter=n_trees,
                                      max_bins=max_bins,
                                      max_leaf_nodes=n_leaf_nodes,
                                      random_state=0, scoring=None,
                                      verbose=0, validation_split=None)
pygbm_model.fit(data_train[:100], target_train[:100])
toc = time()
predicted_test = pygbm_model.predict(data_test)
roc_auc = roc_auc_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}")
del pygbm_model
del predicted_test

print("Fitting a LightGBM model...")
tic = time()
lightgbm_model = LGBMRegressor(n_estimators=n_trees, num_leaves=n_leaf_nodes,
                               learning_rate=lr, silent=False)
lightgbm_model.fit(data_train, target_train)
toc = time()
predicted_test = lightgbm_model.predict(data_test)
roc_auc = roc_auc_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}")
del lightgbm_model
del predicted_test
gc.collect()

print("Fitting a pygbm model...")
tic = time()
pygbm_model = GradientBoostingMachine(learning_rate=lr, max_iter=n_trees,
                                      max_bins=max_bins,
                                      max_leaf_nodes=n_leaf_nodes,
                                      random_state=0, scoring=None,
                                      verbose=1, validation_split=None)
pygbm_model.fit(data_train, target_train)
toc = time()
predicted_test = pygbm_model.predict(data_test)
roc_auc = roc_auc_score(target_test, predicted_test)
print(f"done in {toc - tic:.3f}s, ROC AUC: {roc_auc:.4f}")
del pygbm_model
del predicted_test
gc.collect()


if hasattr(numba, 'threading_layer'):
 

Log:

[LightGBM] [Info] Total Bins 6143
[LightGBM] [Info] Number of data: 1000000, number of used features: 28
[LightGBM] [Info] Start training from score 0.529479
Training set with 1000000 records with 28 features.
Compiling pygbm...
done in 9.049s, ROC AUC: 0.5340
Fitting a LightGBM model...
done in 75.408s, ROC AUC: 0.8293
Fitting a pygbm model...
Binning 0.112 GB of data: 1.003 s (111.671 MB/s)
Fitting gradient boosted rounds:
[0/500]
[1/500] 255 leaf nodes, max depth 13 in 0.470s
[2/500] 255 leaf nodes, max depth 17 in 0.403s
[3/500] 255 leaf nodes, max depth 15 in 0.376s
[4/500] 255 leaf nodes, max depth 19 in 0.363s
[5/500] 255 leaf nodes, max depth 17 in 0.358s
[6/500] 255 leaf nodes, max depth 15 in 0.351s
[7/500] 255 leaf nodes, max depth 17 in 0.348s
[8/500] 255 leaf nodes, max depth 17 in 0.344s
[9/500] 255 leaf nodes, max depth 15 in 0.340s
[10/500] 255 leaf nodes, max depth 17 in 0.338s
[11/500] 255 leaf nodes, max depth 15 in 0.336s
[12/500] 255 leaf nodes, max depth 16 in 0.333s
[13/500] 255 leaf nodes, max depth 17 in 0.331s
[14/500] 255 leaf nodes, max depth 21 in 0.332s
[15/500] 255 leaf nodes, max depth 20 in 0.331s
[16/500] 255 leaf nodes, max depth 20 in 0.331s
[17/500] 255 leaf nodes, max depth 18 in 0.329s
[18/500] 255 leaf nodes, max depth 20 in 0.329s
[19/500] 255 leaf nodes, max depth 15 in 0.327s
[20/500] 255 leaf nodes, max depth 21 in 0.326s
[21/500] 255 leaf nodes, max depth 22 in 0.326s
[22/500] 255 leaf nodes, max depth 17 in 0.325s
[23/500] 255 leaf nodes, max depth 18 in 0.324s
[24/500] 255 leaf nodes, max depth 20 in 0.324s
[25/500] 255 leaf nodes, max depth 18 in 0.324s
[26/500] 255 leaf nodes, max depth 19 in 0.324s
[27/500] 255 leaf nodes, max depth 19 in 0.323s
[28/500] 255 leaf nodes, max depth 18 in 0.322s
[29/500] 255 leaf nodes, max depth 19 in 0.322s
[30/500] 255 leaf nodes, max depth 19 in 0.321s
[31/500] 255 leaf nodes, max depth 18 in 0.321s
[32/500] 255 leaf nodes, max depth 19 in 0.320s
[33/500] 255 leaf nodes, max depth 21 in 0.320s
[34/500] 255 leaf nodes, max depth 19 in 0.320s
[35/500] 255 leaf nodes, max depth 17 in 0.319s
[36/500] 255 leaf nodes, max depth 17 in 0.318s
[37/500] 255 leaf nodes, max depth 16 in 0.318s
[38/500] 255 leaf nodes, max depth 18 in 0.317s
[39/500] 255 leaf nodes, max depth 23 in 0.317s
[40/500] 255 leaf nodes, max depth 21 in 0.316s
[41/500] 255 leaf nodes, max depth 19 in 0.316s
[42/500] 255 leaf nodes, max depth 19 in 0.316s
[43/500] 255 leaf nodes, max depth 19 in 0.315s
[44/500] 255 leaf nodes, max depth 14 in 0.315s
[45/500] 255 leaf nodes, max depth 18 in 0.314s
[46/500] 255 leaf nodes, max depth 23 in 0.314s
[47/500] 255 leaf nodes, max depth 20 in 0.314s
[48/500] 255 leaf nodes, max depth 16 in 0.313s
[49/500] 255 leaf nodes, max depth 24 in 0.313s
[50/500] 255 leaf nodes, max depth 20 in 0.313s
[51/500] 255 leaf nodes, max depth 20 in 0.313s
[52/500] 255 leaf nodes, max depth 26 in 0.312s
[53/500] 255 leaf nodes, max depth 20 in 0.312s
[54/500] 255 leaf nodes, max depth 26 in 0.312s
[55/500] 255 leaf nodes, max depth 20 in 0.311s
[56/500] 255 leaf nodes, max depth 16 in 0.311s
[57/500] 255 leaf nodes, max depth 18 in 0.310s
[58/500] 255 leaf nodes, max depth 21 in 0.310s
[59/500] 255 leaf nodes, max depth 19 in 0.310s
[60/500] 255 leaf nodes, max depth 20 in 0.310s
[61/500] 255 leaf nodes, max depth 17 in 0.309s
[62/500] 255 leaf nodes, max depth 21 in 0.309s
[63/500] 255 leaf nodes, max depth 20 in 0.308s
[64/500] 255 leaf nodes, max depth 26 in 0.308s
[65/500] 255 leaf nodes, max depth 18 in 0.308s
[66/500] 255 leaf nodes, max depth 19 in 0.308s
[67/500] 255 leaf nodes, max depth 24 in 0.308s
[68/500] 255 leaf nodes, max depth 25 in 0.308s
[69/500] 255 leaf nodes, max depth 16 in 0.307s
[70/500] 255 leaf nodes, max depth 20 in 0.307s
[71/500] 255 leaf nodes, max depth 21 in 0.307s
[72/500] 255 leaf nodes, max depth 20 in 0.306s
[73/500] 255 leaf nodes, max depth 19 in 0.306s
[74/500] 255 leaf nodes, max depth 27 in 0.306s
[75/500] 255 leaf nodes, max depth 26 in 0.306s
[76/500] 255 leaf nodes, max depth 19 in 0.305s
[77/500] 255 leaf nodes, max depth 24 in 0.305s
[78/500] 255 leaf nodes, max depth 20 in 0.305s
[79/500] 255 leaf nodes, max depth 23 in 0.304s
[80/500] 255 leaf nodes, max depth 22 in 0.303s
[81/500] 255 leaf nodes, max depth 22 in 0.303s
[82/500] 255 leaf nodes, max depth 18 in 0.303s
[83/500] 255 leaf nodes, max depth 18 in 0.303s
[84/500] 255 leaf nodes, max depth 23 in 0.302s
[85/500] 255 leaf nodes, max depth 22 in 0.302s
[86/500] 255 leaf nodes, max depth 22 in 0.302s
[87/500] 255 leaf nodes, max depth 18 in 0.301s
[88/500] 255 leaf nodes, max depth 21 in 0.301s
[89/500] 255 leaf nodes, max depth 23 in 0.301s
[90/500] 255 leaf nodes, max depth 25 in 0.300s
[91/500] 255 leaf nodes, max depth 18 in 0.300s
[92/500] 255 leaf nodes, max depth 22 in 0.300s
[93/500] 255 leaf nodes, max depth 30 in 0.300s
[94/500] 255 leaf nodes, max depth 17 in 0.300s
[95/500] 255 leaf nodes, max depth 19 in 0.299s
[96/500] 255 leaf nodes, max depth 24 in 0.299s
[97/500] 255 leaf nodes, max depth 19 in 0.298s
[98/500] 255 leaf nodes, max depth 26 in 0.298s
[99/500] 255 leaf nodes, max depth 17 in 0.298s
[100/500] 255 leaf nodes, max depth 21 in 0.298s
[101/500] 255 leaf nodes, max depth 21 in 0.297s
[102/500] 255 leaf nodes, max depth 23 in 0.297s
[103/500] 255 leaf nodes, max depth 24 in 0.297s
[104/500] 255 leaf nodes, max depth 29 in 0.297s
[105/500] 255 leaf nodes, max depth 21 in 0.297s
[106/500] 255 leaf nodes, max depth 24 in 0.297s
[107/500] 255 leaf nodes, max depth 19 in 0.296s
[108/500] 255 leaf nodes, max depth 25 in 0.296s
[109/500] 255 leaf nodes, max depth 19 in 0.296s
[110/500] 255 leaf nodes, max depth 26 in 0.295s
[111/500] 255 leaf nodes, max depth 19 in 0.295s
[112/500] 255 leaf nodes, max depth 18 in 0.294s
[113/500] 255 leaf nodes, max depth 26 in 0.294s
[114/500] 255 leaf nodes, max depth 23 in 0.294s
[115/500] 255 leaf nodes, max depth 24 in 0.293s
[116/500] 255 leaf nodes, max depth 27 in 0.293s
[117/500] 255 leaf nodes, max depth 19 in 0.293s
[118/500] 255 leaf nodes, max depth 19 in 0.293s
[119/500] 255 leaf nodes, max depth 23 in 0.292s
[120/500] 255 leaf nodes, max depth 19 in 0.292s
[121/500] 255 leaf nodes, max depth 19 in 0.291s
[122/500] 255 leaf nodes, max depth 20 in 0.291s
[123/500] 255 leaf nodes, max depth 24 in 0.291s
[124/500] 255 leaf nodes, max depth 19 in 0.290s
[125/500] 255 leaf nodes, max depth 20 in 0.290s
[126/500] 255 leaf nodes, max depth 18 in 0.290s
[127/500] 255 leaf nodes, max depth 23 in 0.290s
[128/500] 255 leaf nodes, max depth 23 in 0.289s
[129/500] 255 leaf nodes, max depth 16 in 0.289s
[130/500] 255 leaf nodes, max depth 20 in 0.288s
[131/500] 255 leaf nodes, max depth 18 in 0.288s
[132/500] 255 leaf nodes, max depth 22 in 0.287s
[133/500] 255 leaf nodes, max depth 32 in 0.287s
[134/500] 255 leaf nodes, max depth 19 in 0.287s
[135/500] 255 leaf nodes, max depth 17 in 0.286s
[136/500] 255 leaf nodes, max depth 18 in 0.286s
[137/500] 255 leaf nodes, max depth 20 in 0.285s
[138/500] 255 leaf nodes, max depth 16 in 0.285s
[139/500] 255 leaf nodes, max depth 23 in 0.284s
[140/500] 255 leaf nodes, max depth 21 in 0.283s
[141/500] 255 leaf nodes, max depth 23 in 0.283s
[142/500] 255 leaf nodes, max depth 16 in 0.283s
[143/500] 255 leaf nodes, max depth 21 in 0.282s
[144/500] 255 leaf nodes, max depth 21 in 0.281s
[145/500] 255 leaf nodes, max depth 21 in 0.280s
[146/500] 255 leaf nodes, max depth 26 in 0.280s
[147/500] 255 leaf nodes, max depth 19 in 0.279s
[148/500] 255 leaf nodes, max depth 18 in 0.278s
[149/500] 255 leaf nodes, max depth 16 in 0.277s
[150/500] 255 leaf nodes, max depth 18 in 0.277s
[151/500] 255 leaf nodes, max depth 21 in 0.276s
[152/500] 255 leaf nodes, max depth 20 in 0.275s
[153/500] 255 leaf nodes, max depth 21 in 0.274s
[154/500] 255 leaf nodes, max depth 20 in 0.273s
[155/500] 255 leaf nodes, max depth 18 in 0.273s
[156/500] 255 leaf nodes, max depth 20 in 0.272s
[157/500] 255 leaf nodes, max depth 22 in 0.272s
[158/500] 255 leaf nodes, max depth 20 in 0.271s
[159/500] 255 leaf nodes, max depth 19 in 0.270s
[160/500] 255 leaf nodes, max depth 17 in 0.270s
[161/500] 255 leaf nodes, max depth 16 in 0.269s
[162/500] 255 leaf nodes, max depth 21 in 0.268s
[163/500] 255 leaf nodes, max depth 20 in 0.268s
[164/500] 255 leaf nodes, max depth 22 in 0.268s
[165/500] 255 leaf nodes, max depth 18 in 0.267s
[166/500] 255 leaf nodes, max depth 22 in 0.266s
[167/500] 255 leaf nodes, max depth 18 in 0.266s
[168/500] 255 leaf nodes, max depth 23 in 0.266s
[169/500] 255 leaf nodes, max depth 21 in 0.266s
[170/500] 255 leaf nodes, max depth 20 in 0.265s
[171/500] 255 leaf nodes, max depth 20 in 0.264s
[172/500] 255 leaf nodes, max depth 18 in 0.264s
[173/500] 255 leaf nodes, max depth 20 in 0.264s
[174/500] 255 leaf nodes, max depth 20 in 0.263s
[175/500] 255 leaf nodes, max depth 19 in 0.264s
[176/500] 255 leaf nodes, max depth 24 in 0.263s
[177/500] 255 leaf nodes, max depth 22 in 0.262s
[178/500] 255 leaf nodes, max depth 20 in 0.262s
[179/500] 255 leaf nodes, max depth 21 in 0.262s
[180/500] 255 leaf nodes, max depth 19 in 0.261s
[181/500] 255 leaf nodes, max depth 22 in 0.260s
[182/500] 255 leaf nodes, max depth 25 in 0.259s
[183/500] 255 leaf nodes, max depth 28 in 0.258s
[184/500] 255 leaf nodes, max depth 22 in 0.257s
[185/500] 255 leaf nodes, max depth 20 in 0.256s
[186/500] 255 leaf nodes, max depth 21 in 0.255s
[187/500] 255 leaf nodes, max depth 26 in 0.254s
[188/500] 255 leaf nodes, max depth 18 in 0.253s
[189/500] 255 leaf nodes, max depth 25 in 0.252s
[190/500] 255 leaf nodes, max depth 18 in 0.251s
[191/500] 255 leaf nodes, max depth 19 in 0.251s
[192/500] 255 leaf nodes, max depth 20 in 0.251s
[193/500] 255 leaf nodes, max depth 27 in 0.250s
[194/500] 255 leaf nodes, max depth 21 in 0.249s
[195/500] 255 leaf nodes, max depth 21 in 0.249s
[196/500] 255 leaf nodes, max depth 27 in 0.248s
[197/500] 255 leaf nodes, max depth 21 in 0.247s
[198/500] 255 leaf nodes, max depth 20 in 0.247s
[199/500] 255 leaf nodes, max depth 22 in 0.247s
[200/500] 255 leaf nodes, max depth 25 in 0.246s
[201/500] 255 leaf nodes, max depth 19 in 0.246s
[202/500] 255 leaf nodes, max depth 18 in 0.246s
[203/500] 255 leaf nodes, max depth 17 in 0.245s
[204/500] 255 leaf nodes, max depth 20 in 0.245s
[205/500] 255 leaf nodes, max depth 17 in 0.244s
[206/500] 255 leaf nodes, max depth 17 in 0.244s
[207/500] 255 leaf nodes, max depth 21 in 0.244s
[208/500] 255 leaf nodes, max depth 17 in 0.243s
[209/500] 255 leaf nodes, max depth 18 in 0.243s
[210/500] 255 leaf nodes, max depth 18 in 0.243s
[211/500] 255 leaf nodes, max depth 22 in 0.243s
[212/500] 255 leaf nodes, max depth 16 in 0.243s
[213/500] 255 leaf nodes, max depth 18 in 0.242s
[214/500] 255 leaf nodes, max depth 23 in 0.242s
[215/500] 255 leaf nodes, max depth 20 in 0.242s
[216/500] 255 leaf nodes, max depth 20 in 0.241s
[217/500] 255 leaf nodes, max depth 19 in 0.241s
[218/500] 255 leaf nodes, max depth 15 in 0.241s
[219/500] 255 leaf nodes, max depth 23 in 0.240s
[220/500] 255 leaf nodes, max depth 19 in 0.240s
[221/500] 255 leaf nodes, max depth 24 in 0.239s
[222/500] 255 leaf nodes, max depth 22 in 0.238s
[223/500] 255 leaf nodes, max depth 18 in 0.238s
[224/500] 255 leaf nodes, max depth 23 in 0.237s
[225/500] 255 leaf nodes, max depth 21 in 0.237s
[226/500] 255 leaf nodes, max depth 27 in 0.236s
[227/500] 255 leaf nodes, max depth 22 in 0.235s
[228/500] 255 leaf nodes, max depth 30 in 0.235s
[229/500] 255 leaf nodes, max depth 18 in 0.234s
[230/500] 255 leaf nodes, max depth 20 in 0.233s
[231/500] 255 leaf nodes, max depth 19 in 0.233s
[232/500] 255 leaf nodes, max depth 18 in 0.232s
[233/500] 255 leaf nodes, max depth 17 in 0.232s
[234/500] 255 leaf nodes, max depth 22 in 0.231s
[235/500] 255 leaf nodes, max depth 25 in 0.230s
[236/500] 255 leaf nodes, max depth 21 in 0.230s
[237/500] 255 leaf nodes, max depth 26 in 0.229s
[238/500] 255 leaf nodes, max depth 20 in 0.228s
[239/500] 255 leaf nodes, max depth 26 in 0.228s
[240/500] 255 leaf nodes, max depth 26 in 0.227s
[241/500] 255 leaf nodes, max depth 21 in 0.227s
[242/500] 255 leaf nodes, max depth 19 in 0.226s
[243/500] 255 leaf nodes, max depth 18 in 0.226s
[244/500] 255 leaf nodes, max depth 19 in 0.226s
[245/500] 255 leaf nodes, max depth 21 in 0.225s
[246/500] 255 leaf nodes, max depth 25 in 0.225s
[247/500] 255 leaf nodes, max depth 21 in 0.224s
[248/500] 255 leaf nodes, max depth 22 in 0.223s
[249/500] 255 leaf nodes, max depth 27 in 0.223s
[250/500] 255 leaf nodes, max depth 33 in 0.222s
[251/500] 255 leaf nodes, max depth 29 in 0.222s
[252/500] 255 leaf nodes, max depth 22 in 0.221s
[253/500] 255 leaf nodes, max depth 23 in 0.220s
[254/500] 255 leaf nodes, max depth 21 in 0.220s
[255/500] 255 leaf nodes, max depth 21 in 0.219s
[256/500] 255 leaf nodes, max depth 27 in 0.219s
[257/500] 255 leaf nodes, max depth 24 in 0.219s
[258/500] 255 leaf nodes, max depth 22 in 0.219s
[259/500] 255 leaf nodes, max depth 20 in 0.218s
[260/500] 255 leaf nodes, max depth 24 in 0.218s
[261/500] 255 leaf nodes, max depth 24 in 0.217s
[262/500] 255 leaf nodes, max depth 19 in 0.217s
[263/500] 255 leaf nodes, max depth 23 in 0.217s
[264/500] 255 leaf nodes, max depth 23 in 0.216s
[265/500] 255 leaf nodes, max depth 21 in 0.216s
[266/500] 255 leaf nodes, max depth 21 in 0.216s
[267/500] 255 leaf nodes, max depth 22 in 0.215s
[268/500] 255 leaf nodes, max depth 23 in 0.215s
[269/500] 255 leaf nodes, max depth 24 in 0.214s
[270/500] 255 leaf nodes, max depth 19 in 0.214s
[271/500] 255 leaf nodes, max depth 16 in 0.213s
[272/500] 255 leaf nodes, max depth 18 in 0.213s
[273/500] 255 leaf nodes, max depth 17 in 0.213s
[274/500] 255 leaf nodes, max depth 22 in 0.212s
[275/500] 255 leaf nodes, max depth 24 in 0.212s
[276/500] 255 leaf nodes, max depth 19 in 0.211s
[277/500] 255 leaf nodes, max depth 24 in 0.211s
[278/500] 255 leaf nodes, max depth 25 in 0.210s
[279/500] 255 leaf nodes, max depth 16 in 0.210s
[280/500] 255 leaf nodes, max depth 16 in 0.210s
[281/500] 255 leaf nodes, max depth 18 in 0.210s
[282/500] 255 leaf nodes, max depth 24 in 0.209s
[283/500] 255 leaf nodes, max depth 22 in 0.209s
[284/500] 255 leaf nodes, max depth 19 in 0.209s
[285/500] 255 leaf nodes, max depth 27 in 0.209s
[286/500] 255 leaf nodes, max depth 23 in 0.209s
[287/500] 255 leaf nodes, max depth 22 in 0.208s
[288/500] 255 leaf nodes, max depth 25 in 0.208s
[289/500] 255 leaf nodes, max depth 21 in 0.207s
[290/500] 255 leaf nodes, max depth 21 in 0.207s
[291/500] 255 leaf nodes, max depth 28 in 0.207s
[292/500] 255 leaf nodes, max depth 24 in 0.206s
[293/500] 255 leaf nodes, max depth 23 in 0.206s
[294/500] 255 leaf nodes, max depth 25 in 0.205s
[295/500] 255 leaf nodes, max depth 23 in 0.205s
[296/500] 255 leaf nodes, max depth 24 in 0.205s
[297/500] 255 leaf nodes, max depth 24 in 0.204s
[298/500] 255 leaf nodes, max depth 30 in 0.204s
[299/500] 255 leaf nodes, max depth 22 in 0.203s
[300/500] 255 leaf nodes, max depth 22 in 0.203s
[301/500] 255 leaf nodes, max depth 20 in 0.203s
[302/500] 255 leaf nodes, max depth 19 in 0.203s
[303/500] 255 leaf nodes, max depth 20 in 0.203s
[304/500] 255 leaf nodes, max depth 25 in 0.202s
[305/500] 255 leaf nodes, max depth 24 in 0.202s
[306/500] 255 leaf nodes, max depth 24 in 0.202s
[307/500] 255 leaf nodes, max depth 17 in 0.201s
[308/500] 255 leaf nodes, max depth 25 in 0.201s
[309/500] 255 leaf nodes, max depth 20 in 0.201s
[310/500] 255 leaf nodes, max depth 19 in 0.200s
[311/500] 255 leaf nodes, max depth 19 in 0.200s
[312/500] 255 leaf nodes, max depth 18 in 0.200s
[313/500] 255 leaf nodes, max depth 22 in 0.199s
[314/500] 255 leaf nodes, max depth 19 in 0.199s
[315/500] 255 leaf nodes, max depth 17 in 0.199s
[316/500] 255 leaf nodes, max depth 18 in 0.198s
[317/500] 255 leaf nodes, max depth 17 in 0.198s
[318/500] 255 leaf nodes, max depth 19 in 0.198s
[319/500] 255 leaf nodes, max depth 20 in 0.198s
[320/500] 255 leaf nodes, max depth 17 in 0.197s
[321/500] 255 leaf nodes, max depth 17 in 0.197s
[322/500] 255 leaf nodes, max depth 20 in 0.197s
[323/500] 255 leaf nodes, max depth 17 in 0.197s
[324/500] 255 leaf nodes, max depth 19 in 0.197s
[325/500] 255 leaf nodes, max depth 22 in 0.197s
[326/500] 255 leaf nodes, max depth 21 in 0.196s
[327/500] 255 leaf nodes, max depth 18 in 0.196s
[328/500] 255 leaf nodes, max depth 17 in 0.196s
[329/500] 255 leaf nodes, max depth 18 in 0.196s
[330/500] 255 leaf nodes, max depth 20 in 0.196s
[331/500] 255 leaf nodes, max depth 18 in 0.195s
[332/500] 255 leaf nodes, max depth 20 in 0.195s
[333/500] 255 leaf nodes, max depth 20 in 0.195s
[334/500] 255 leaf nodes, max depth 18 in 0.195s
[335/500] 255 leaf nodes, max depth 20 in 0.194s
[336/500] 255 leaf nodes, max depth 20 in 0.194s
[337/500] 255 leaf nodes, max depth 22 in 0.194s
[338/500] 255 leaf nodes, max depth 19 in 0.194s
[339/500] 255 leaf nodes, max depth 21 in 0.193s
[340/500] 255 leaf nodes, max depth 20 in 0.193s
[341/500] 255 leaf nodes, max depth 18 in 0.193s
[342/500] 255 leaf nodes, max depth 19 in 0.192s
[343/500] 255 leaf nodes, max depth 20 in 0.192s
[344/500] 255 leaf nodes, max depth 19 in 0.192s
[345/500] 255 leaf nodes, max depth 21 in 0.191s
[346/500] 255 leaf nodes, max depth 23 in 0.191s
[347/500] 255 leaf nodes, max depth 19 in 0.191s
[348/500] 255 leaf nodes, max depth 18 in 0.191s
[349/500] 255 leaf nodes, max depth 18 in 0.190s
[350/500] 255 leaf nodes, max depth 18 in 0.191s
[351/500] 255 leaf nodes, max depth 21 in 0.191s
[352/500] 255 leaf nodes, max depth 26 in 0.191s
[353/500] 255 leaf nodes, max depth 28 in 0.190s
[354/500] 255 leaf nodes, max depth 26 in 0.190s
[355/500] 255 leaf nodes, max depth 36 in 0.190s
[356/500] 255 leaf nodes, max depth 29 in 0.189s
[357/500] 255 leaf nodes, max depth 32 in 0.189s
[358/500] 255 leaf nodes, max depth 25 in 0.189s
[359/500] 255 leaf nodes, max depth 20 in 0.189s
[360/500] 255 leaf nodes, max depth 25 in 0.189s
[361/500] 255 leaf nodes, max depth 23 in 0.188s
[362/500] 255 leaf nodes, max depth 23 in 0.188s
[363/500] 255 leaf nodes, max depth 27 in 0.188s
[364/500] 255 leaf nodes, max depth 23 in 0.187s
[365/500] 255 leaf nodes, max depth 23 in 0.187s
[366/500] 255 leaf nodes, max depth 32 in 0.187s
[367/500] 255 leaf nodes, max depth 20 in 0.187s
[368/500] 255 leaf nodes, max depth 21 in 0.187s
[369/500] 255 leaf nodes, max depth 20 in 0.187s
[370/500] 255 leaf nodes, max depth 18 in 0.186s
[371/500] 255 leaf nodes, max depth 32 in 0.186s
[372/500] 255 leaf nodes, max depth 32 in 0.186s
[373/500] 255 leaf nodes, max depth 21 in 0.186s
[374/500] 255 leaf nodes, max depth 23 in 0.185s
[375/500] 255 leaf nodes, max depth 22 in 0.185s
[376/500] 255 leaf nodes, max depth 23 in 0.185s
[377/500] 255 leaf nodes, max depth 25 in 0.185s
[378/500] 255 leaf nodes, max depth 24 in 0.184s
[379/500] 255 leaf nodes, max depth 23 in 0.184s
[380/500] 255 leaf nodes, max depth 28 in 0.184s
[381/500] 255 leaf nodes, max depth 21 in 0.184s
[382/500] 255 leaf nodes, max depth 24 in 0.183s
[383/500] 255 leaf nodes, max depth 24 in 0.183s
[384/500] 255 leaf nodes, max depth 21 in 0.183s
[385/500] 255 leaf nodes, max depth 24 in 0.183s
[386/500] 255 leaf nodes, max depth 26 in 0.182s
[387/500] 255 leaf nodes, max depth 24 in 0.182s
[388/500] 255 leaf nodes, max depth 32 in 0.182s
[389/500] 255 leaf nodes, max depth 29 in 0.182s
[390/500] 255 leaf nodes, max depth 19 in 0.181s
[391/500] 255 leaf nodes, max depth 20 in 0.181s
[392/500] 255 leaf nodes, max depth 23 in 0.181s
[393/500] 255 leaf nodes, max depth 25 in 0.181s
[394/500] 255 leaf nodes, max depth 29 in 0.181s
[395/500] 255 leaf nodes, max depth 25 in 0.180s
[396/500] 255 leaf nodes, max depth 20 in 0.180s
[397/500] 255 leaf nodes, max depth 23 in 0.180s
[398/500] 255 leaf nodes, max depth 23 in 0.180s
[399/500] 255 leaf nodes, max depth 18 in 0.180s
[400/500] 255 leaf nodes, max depth 22 in 0.179s
[401/500] 255 leaf nodes, max depth 19 in 0.179s
[402/500] 255 leaf nodes, max depth 22 in 0.179s
[403/500] 255 leaf nodes, max depth 25 in 0.179s
[404/500] 255 leaf nodes, max depth 29 in 0.178s
[405/500] 255 leaf nodes, max depth 25 in 0.178s
[406/500] 255 leaf nodes, max depth 26 in 0.178s
[407/500] 255 leaf nodes, max depth 25 in 0.178s
[408/500] 255 leaf nodes, max depth 30 in 0.177s
[409/500] 255 leaf nodes, max depth 30 in 0.177s
[410/500] 255 leaf nodes, max depth 25 in 0.177s
[411/500] 255 leaf nodes, max depth 30 in 0.177s
[412/500] 255 leaf nodes, max depth 23 in 0.177s
[413/500] 255 leaf nodes, max depth 24 in 0.176s
[414/500] 255 leaf nodes, max depth 23 in 0.176s
[415/500] 255 leaf nodes, max depth 24 in 0.176s
[416/500] 255 leaf nodes, max depth 26 in 0.176s
[417/500] 255 leaf nodes, max depth 27 in 0.176s
[418/500] 255 leaf nodes, max depth 21 in 0.175s
[419/500] 255 leaf nodes, max depth 27 in 0.175s
[420/500] 255 leaf nodes, max depth 26 in 0.175s
[421/500] 255 leaf nodes, max depth 28 in 0.175s
[422/500] 255 leaf nodes, max depth 26 in 0.175s
[423/500] 255 leaf nodes, max depth 25 in 0.174s
[424/500] 255 leaf nodes, max depth 32 in 0.174s
[425/500] 255 leaf nodes, max depth 25 in 0.174s
[426/500] 255 leaf nodes, max depth 31 in 0.174s
[427/500] 255 leaf nodes, max depth 35 in 0.173s
[428/500] 255 leaf nodes, max depth 29 in 0.173s
[429/500] 255 leaf nodes, max depth 28 in 0.173s
[430/500] 255 leaf nodes, max depth 32 in 0.173s
[431/500] 255 leaf nodes, max depth 25 in 0.173s
[432/500] 255 leaf nodes, max depth 27 in 0.172s
[433/500] 255 leaf nodes, max depth 31 in 0.172s
[434/500] 255 leaf nodes, max depth 32 in 0.172s
[435/500] 255 leaf nodes, max depth 24 in 0.172s
[436/500] 255 leaf nodes, max depth 29 in 0.172s
[437/500] 255 leaf nodes, max depth 25 in 0.171s
[438/500] 255 leaf nodes, max depth 22 in 0.171s
[439/500] 255 leaf nodes, max depth 25 in 0.171s
[440/500] 255 leaf nodes, max depth 23 in 0.171s
[441/500] 255 leaf nodes, max depth 21 in 0.171s
[442/500] 255 leaf nodes, max depth 23 in 0.171s
[443/500] 255 leaf nodes, max depth 23 in 0.170s
[444/500] 255 leaf nodes, max depth 21 in 0.170s
[445/500] 255 leaf nodes, max depth 34 in 0.170s
[446/500] 255 leaf nodes, max depth 24 in 0.170s
[447/500] 255 leaf nodes, max depth 22 in 0.170s
[448/500] 255 leaf nodes, max depth 19 in 0.170s
[449/500] 255 leaf nodes, max depth 26 in 0.170s
[450/500] 255 leaf nodes, max depth 26 in 0.170s
[451/500] 255 leaf nodes, max depth 24 in 0.169s
[452/500] 255 leaf nodes, max depth 24 in 0.169s
[453/500] 255 leaf nodes, max depth 21 in 0.169s
[454/500] 255 leaf nodes, max depth 20 in 0.169s
[455/500] 255 leaf nodes, max depth 21 in 0.169s
[456/500] 255 leaf nodes, max depth 23 in 0.169s
[457/500] 255 leaf nodes, max depth 20 in 0.169s
[458/500] 255 leaf nodes, max depth 25 in 0.169s
[459/500] 255 leaf nodes, max depth 22 in 0.168s
[460/500] 255 leaf nodes, max depth 27 in 0.168s
[461/500] 255 leaf nodes, max depth 22 in 0.168s
[462/500] 255 leaf nodes, max depth 21 in 0.168s
[463/500] 255 leaf nodes, max depth 25 in 0.168s
[464/500] 255 leaf nodes, max depth 26 in 0.168s
[465/500] 255 leaf nodes, max depth 24 in 0.168s
[466/500] 255 leaf nodes, max depth 21 in 0.167s
[467/500] 255 leaf nodes, max depth 29 in 0.167s
[468/500] 255 leaf nodes, max depth 19 in 0.167s
[469/500] 255 leaf nodes, max depth 17 in 0.167s
[470/500] 255 leaf nodes, max depth 22 in 0.167s
[471/500] 255 leaf nodes, max depth 28 in 0.167s
[472/500] 255 leaf nodes, max depth 19 in 0.166s
[473/500] 255 leaf nodes, max depth 20 in 0.167s
[474/500] 255 leaf nodes, max depth 24 in 0.167s
[475/500] 255 leaf nodes, max depth 20 in 0.166s
[476/500] 255 leaf nodes, max depth 21 in 0.166s
[477/500] 255 leaf nodes, max depth 18 in 0.166s
[478/500] 255 leaf nodes, max depth 19 in 0.166s
[479/500] 255 leaf nodes, max depth 21 in 0.166s
[480/500] 255 leaf nodes, max depth 21 in 0.166s
[481/500] 255 leaf nodes, max depth 21 in 0.166s
[482/500] 255 leaf nodes, max depth 24 in 0.166s
[483/500] 255 leaf nodes, max depth 23 in 0.166s
[484/500] 255 leaf nodes, max depth 27 in 0.166s
[485/500] 255 leaf nodes, max depth 20 in 0.165s
[486/500] 255 leaf nodes, max depth 21 in 0.165s
[487/500] 255 leaf nodes, max depth 24 in 0.165s
[488/500] 255 leaf nodes, max depth 21 in 0.165s
[489/500] 255 leaf nodes, max depth 22 in 0.165s
[490/500] 255 leaf nodes, max depth 23 in 0.165s
[491/500] 255 leaf nodes, max depth 23 in 0.165s
[492/500] 255 leaf nodes, max depth 25 in 0.165s
[493/500] 255 leaf nodes, max depth 20 in 0.165s
[494/500] 255 leaf nodes, max depth 22 in 0.164s
[495/500] 255 leaf nodes, max depth 27 in 0.164s
[496/500] 255 leaf nodes, max depth 21 in 0.164s
[497/500] 255 leaf nodes, max depth 21 in 0.164s
[498/500] 255 leaf nodes, max depth 26 in 0.164s
[499/500] 255 leaf nodes, max depth 24 in 0.164s
[500/500] 255 leaf nodes, max depth 23 in 0.164s
Fit 500 trees in 83.011 s, (127500 total leaf nodes)
Time spent finding best splits:  57.967s
Time spent applying splits:      11.217s
Time spent predicting:           4.851s
done in 83.022s, ROC AUC: 0.8156
Threading layer chosen: tbb

Also, I tried to reproduce the leak with a minimal example and this is what I came up with. It's pretty weird and I'd like to know if you can reproduce it before submitting it to numba because I feel like I'm tripping:

import numpy as np
import psutil
from numba import (njit, jitclass, prange, float32, uint8, uint32, typeof,
                   optional)

@jitclass([
    ('attr', uint32),
])
class JitClass:
    def __init__(self):
        self.attr = 3


@njit
def f():
    cs = [JitClass() for i in range(1000)]
    array = np.empty(shape=10, dtype=np.uint32)

    # If I remove this loop, there is no leak
    for i, c in enumerate(cs):
        c.attr[i] = array[i]  # this should not even pass!
        # c.attr = array[i]  # <-- leak still here if we do this instead.

    # this should not work either!
    something_that_should_not_compile
    blahblahblah
    why_is_this_passing

    # a = a + 3  <-- this produces an error, as expected (a is not defined)

    return array 


class C:
    def g(self):
        self.array = f()

    
p = psutil.Process()
for _ in range(10000):
    o = C()
    o.g()
    del o
    # leak proportional to the size of cs, independent to the size of array
    print(f"{p.memory_info().rss / 1e6} MB")

@ogrisel I'm happy to merge as is, but I'd like your input on the 1e7 case first.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

Cool, let's merge as it's already a net improvement.

About the minimal reproduction case, I confirm I get the leak with your code, without any error message or exception. Just memory usage increasing as reported by psutil.

@ogrisel ogrisel merged commit f0e3409 into master Nov 5, 2018
@ogrisel ogrisel deleted the fix-histogram-memory-usage branch November 5, 2018 17:30
@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

We still have a discrepancy in terms of results with LightGBM though. But there is another issue for that.

@NicolasHug
Copy link
Collaborator

I have opened numba/numba#3473 and numba/numba#3472 regarding the leak and some other weird stuff I found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants