-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] PCA transformation speedup #1539
Conversation
Current coverage is 88.30% (diff: 97.05%)@@ master #1539 diff @@
==========================================
Files 77 77
Lines 7629 7654 +25
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 6735 6759 +24
- Misses 894 895 +1
Partials 0 0
|
@@ -28,3 +28,27 @@ def scale(values, min=0, max=1): | |||
if ptp == 0: | |||
return np.clip(values, min, max) | |||
return (-minval + values) / ptp * (max - min) + min | |||
|
|||
|
|||
class ComputeValueWCommon: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not belong here. Put it in pca.py, where it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be used for other transformations. It belongs in some base orange classes as it is referenced in table.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Will be used for other transformations as part of this PR? Utils are tools, like screwdrivers and wrenches. This doesn't look like a tool, rather a very specific piece of plug I, unrelatedly, happen to not appreciate at all. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not, not as a part of this PR, but I'll definitely use it a lot in an add-on. This thing tries to at least partially solve the problem where a lot of features are based on the same computation. This also frequently occurs in text mining - also @nikicc was questioning it once.
Or do you perhaps suggest to put this in table.py or variable.py?
While I enjoy the performance increase, I don't like how the change introduces considerable branching on the consumer end.
|
def __call__(self, data): | ||
if data.domain != self.projection.pre_domain: | ||
data = data.from_table(self.projection.pre_domain, data) | ||
return self.projection.transform(data.X)[:, self.feature] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result would be pretty much the same if you just somehow intelligently memoized this self.projection.transform()
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could ob course be memoized and shared between different compute_value functions for the same domain. But for lesser memory use that cache would have to be invalidated after transformation. Therefore, I did it so that the common part is exposed to the tranformer, which can destroy the intermediate results after transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't currently have a better idea. 😕 I just strongly feelknow this isn't something a user of the projector should worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My line of reasoning. I would like to clear the cache after transformation. The only thing that knows when transformation ends is the user. Therefore the user has to be aware of it.
An alternative solution would be to call something like clear_cache() for all the variables after transformation, but this has the same awareness-drawback while being less direct. Also, theoretically, such caches would have problems with concurrent transforms...
Speedup of transformations into the PCA space on my laptop: - adult: 6.07s -> 0.21s - an infrared dataset with shape (16384, 1608): (more than 40 minutes - cancelled) -> 6.69s
|
||
Parameters | ||
---------- | ||
compute_shared: a callable that takes a Orange.data.Table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callable[[Orange.data.Table], object] PEP484
Speedup of transformations into the PCA space on my laptop:
6.07s -> 0.21s
(more than 40 minutes - cancelled) -> 6.69s