You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First thank you very much for this wonderful software!
I notice that for same number of samples and features, if only difference is
the labeling type so one problem is classification and the other problem is
regression, the time taken for construction of regression forest will be
considerably longer than classification forest (using default parameters for
msplit and keep ntrees the same. We also estimate variable importance along the
way.) Is there any reasons behind this?
Thanks a lot!
Original issue reported on code.google.com by [email protected] on 27 Sep 2012 at 8:15
The text was updated successfully, but these errors were encountered:
Hi Kang
yeh there is a difference between the regression/classification code. when
creating tree you need to split data but before splitting you need to sort data
falling into a node. the classification code uses a pre-sorted array and that
makes the classification code scale as O(number of example) whereas regression
code uses on the fly code and that makes regression code scale as O(nlog(n)) -
best sort code scaling.
i am guessing you have lots of examples and thats one reason regression might
be slower.
the other reason might be that regression trees may be split totally (i.e leaf
nodes have the minimum number of examples) whereas your classification trees
might be much simpler (a low VC dimension)
calculate the mean number of nodes in the model created, that might give you
some more idea
mean(modelRf.ndbigtree) (classification)
mean(modelRf.ndtree)(regression)
Original comment by abhirana on 27 Sep 2012 at 10:41
Original issue reported on code.google.com by
[email protected]
on 27 Sep 2012 at 8:15The text was updated successfully, but these errors were encountered: