Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

factor analysis: new widget #251

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 188 additions & 0 deletions orangecontrib/prototypes/widgets/owfactoranalysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
import numpy as np

from sklearn.decomposition import FactorAnalysis

from AnyQt.QtCore import Qt
from AnyQt.QtWidgets import QTableView

from Orange.data import Table, Domain
from Orange.widgets import settings
from Orange.widgets.widget import OWWidget
from orangewidget.widget import Input, Output
from orangewidget.utils.widgetpreview import WidgetPreview
from Orange.widgets.utils.slidergraph import SliderGraph
from orangewidget import gui

from pyqtgraph import mkPen, TextItem
from AnyQt.QtGui import QColor


class Rotation:
NoRotation, Varimax, Quartimax = 0, 1, 2

@staticmethod
def items():
return ["NoRotation", "Varimax", "Quartimax"]


class OWFactorAnalysis(OWWidget):
name = "Factor Analysis"
description = "Randomly selects a subset of instances from the dataset."
icon = "icons/DataSamplerB.svg"
priority = 20

class Inputs:
data = Input("Data", Table)

class Outputs:
sample = Output("Sampled Data", Table)

suriikata marked this conversation as resolved.
Show resolved Hide resolved
n_components = settings.ContextSetting(1)
setting_for_rotation = settings.Setting(Rotation.NoRotation)
suriikata marked this conversation as resolved.
Show resolved Hide resolved
autocommit = settings.Setting(True)

def __init__(self):
super().__init__() # since OWFactorAnalysis is a derivative of OWWidget, first intialize OWFA
suriikata marked this conversation as resolved.
Show resolved Hide resolved
self.dataset = None

# Main area settings
self.mainArea.setVisible(True)
suriikata marked this conversation as resolved.
Show resolved Hide resolved

self.attr_box = gui.hBox(self.mainArea, margin=0)

gui.spin(
self.attr_box, self, "n_components", label="Number of components:",
minv=1, maxv=100, step=1, controlWidth=30,
callback=[self.factor_analysis, self.commit.deferred], # deferred = zapoznelo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, but I don't think any new widget uses it. It is cleaner to write a method like this:

def _n_components_changed(self):
    self.factor_analysis()
    self.commit.deferred()

and then use callback=_n_components_changed.

Allowing specifying a list as callback was one of many bad ideas I had 15 years ago.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it a bad idea?🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't put it into words. :) The code is usually cleaner if the action that needs to occur in response to, say, a changed value of spin, is described in a function.

Perhaps one reason is also that you can give the function a descriptive name, like _on_components_spin_changed. Aleš, whose code is a few levels above any of ours, does it this way and it looks really neat. In his code, there is a section with handlers for various events, and you can just go there and see everything. If you just put it into lists, it soon becomes messy.

I recently tried to imitate this here: https://github.com/biolab/orange3-timeseries/blob/master/orangecontrib/timeseries/widgets/owmovingtransform.py#L337.

)

gui.comboBox(
self.attr_box, self, "setting_for_rotation", label="Rotation:", labelWidth=50,
items=Rotation.items(), orientation=Qt.Horizontal,
contentsLength=12, callback=self.factor_analysis
)

gui.auto_commit(
self.attr_box, self, 'autocommit', 'Commit',
orientation=Qt.Horizontal
)
gui.separator(self.mainArea) # do i need it? >>> the first element in mainArea
suriikata marked this conversation as resolved.
Show resolved Hide resolved

self.plot = SliderGraph("Factor 1", "Factor 2", self.prazna_funkcija)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't use SliderGraph, so this is moot, but: if you need a no-op function, use lambda x: None (this would be a function that accepts one argument and does nothing).

By the way, the whole point of SliderGraph is to call this function when the slider moves. Which proves that you don't need SliderGraph here. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i replaced self.prazna_funkcija, but i will leave this unresolved until i figure out how to replace the SliderGraph🤔

self.mainArea.layout().addWidget(self.plot)

def prazna_funkcija(self): # bc _init_ Slidergrapha requires "callback"
pass

def get_range(self, factor):
max_value = factor[0]
for i in range(len(factor)):
if factor[i] > max_value:
max_value = factor[i]
suriikata marked this conversation as resolved.
Show resolved Hide resolved

min_value = factor[0]
for i in range(len(factor)):
if factor[i] < min_value:
min_value = factor[i]

# adjust and scale by 0.1
min_value = min_value - 0.1 * abs(min_value)
max_value = max_value + 0.1 * abs(max_value)

# return the abs value of maximum
return max(abs(min_value), abs(max_value))

def set_range(self):
factor1_range = self.get_range(self.factor1)
factor2_range = self.get_range(self.factor2)
self.plot.setRange(xRange=(-factor1_range, factor1_range), yRange=(-factor2_range, factor2_range))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder ... this creates a different scaling for x and y axes. Maybe you'd have to use the same (that is the larger) factor for both?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is that, if one factor is much larger than the other, having the same scale would make the smaller one difficult to see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be correct. If one is negligible in comparison with another, it should also look negligible.


def setup_plot(self):
self.plot.clear_plot()
if self.n_components == 1:
return

self.factor1 = self.result.X[0]
self.factor2 = self.result.X[1]

self.set_range()

foreground = self.plot.palette().text().color()
foreground.setAlpha(128)

names = []
for i in range(len(self.dataset.domain.attributes)):
name = self.dataset.domain.attributes[i].name
names.append(name)
suriikata marked this conversation as resolved.
Show resolved Hide resolved

for x, y, n in zip(self.factor1, self.factor2, names):
suriikata marked this conversation as resolved.
Show resolved Hide resolved
x_vektor, y_vektor = [0, x], [0, y]
suriikata marked this conversation as resolved.
Show resolved Hide resolved
self.plot.plot(x_vektor, y_vektor, pen=mkPen(QColor(Qt.red), width=1), antialias=True)
suriikata marked this conversation as resolved.
Show resolved Hide resolved

if n is not None:
suriikata marked this conversation as resolved.
Show resolved Hide resolved
label = TextItem(
text=n, anchor=(0, 1), color=foreground)
label.setPos(x_vektor[-1], y_vektor[-1])
self.plot.x = x_vektor
self.plot._set_anchor(label, len(x_vektor) - 1, True)
self.plot.addItem(label)

""" TABELA TODO: factor loadings po rotaciji
box = gui.vBox(self.mainArea, box = "Eigenvalue Scores")
self.left_side.setContentsMargins(0,0,0,0)
table = self.table_view = QTableView(self.mainArea)
#table.setModel(self.table_model)
table.setSelectionMode(QTableView.SingleSelection)
table.setSelectionBehavior(QTableView.SelectRows)
table.setItemDelegate(gui.ColoredBarItemDelegate(self, color=Qt.cyan))
#table.selectionModel().selectionChanged.connect(self.select_row)
table.setMaximumWidth(300)
table.horizontalHeader().setStretchLastSection(True)
table.horizontalHeader().hide()
table.setShowGrid(False)
box.layout().addWidget(table)
"""

@Inputs.data
def set_data(self, dataset):
# self.closeContext()
if dataset is None:
self.sample = None
suriikata marked this conversation as resolved.
Show resolved Hide resolved
else:
# self.openContext(dataset.domain) # what is the function of context?
pass

self.dataset = dataset
self.commit.now()

def factor_analysis(self):
# with chosen n_components and depending on the user-selected rotation, calculate the FA on self.dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nicer way to organize this function would be to first set the FA method and then call it (because the call is same for all).

        if self.setting_for_rotation == 0:
            fa = FactorAnalysis(self.n_components)
        elif self.setting_for_rotation == 1:
            fa = FactorAnalysis(self.n_components, rotation="varimax")
        elif self.setting_for_rotation == 2:
            fa = FactorAnalysis(self.n_components, rotation="quartimax")
        result = fa.fit(self.dataset.X)

Also, you have defined the constants for rotations, so you can use NoRotation instead of 0 etc.

However, all that differs is the value of the argument rotation. You can get rid of if-s, for instance like this:

rotation = [None, "varimax", "quartimax"][self.setting_for_rotation]
result = FactorAnalysis(self.n_components, rotation=rotation).fit(self.dataset.X)

if self.setting_for_rotation == 0:
result = FactorAnalysis(self.n_components).fit(self.dataset.X)
suriikata marked this conversation as resolved.
Show resolved Hide resolved
elif self.setting_for_rotation == 1:
result = FactorAnalysis(self.n_components, rotation="varimax").fit(self.dataset.X)
elif self.setting_for_rotation == 2:
result = FactorAnalysis(self.n_components, rotation="quartimax").fit(self.dataset.X)
else:
print("Error:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no effect, nobody sees it. In this case, you can assume this won't happen, so you can just change the last elif to else (well, you don't have to because you don't need if-s anyway).

There is however a pattern you can sometimes use for cases where something shouldn't happen and probably doesn't, but you'd like to check it and are OK with widget crashing if it does happen (because the widget, for instance, can't continue). In your case, if you'd like to do this, you'd put

assert self.setting_for_rotation <= 2 before if-s. If the condition is not true, the widget will fail with assertion error.

That said, don't do this here, because setting_for_rotation will always be correct.

For a good example, OWPredictions attribute self.data. If self.data is not None, it constructs ... some model (whatever). Now, a method _commit_predictions immediately exits if there is no data:

        if not self.data:
            return

Later in this function, you can see

datamodel = self.dataview.model()

This datamodel is the model, which must and does and will exist if self.data exists. So datamodel cannot be None. But this may not be immediately obvious to people who read this code, so the code continues with

assert datamodel is not None  # because we have data

First, this tells the reader: don't worry, we're not checking that datamodel is not None, because it can't be, and the comment tells why.

Second, if there is a bug in the code, the assertion will fail with AssertionError. This doesn't help the user at all -- but the widget would fail later anyway, and failing here makes it easier to find and understand the problem if it occurs.

Gosh, you're getting an advanced course in programming in these comments. :)

Anyway, using assertions is good and we should use them more often.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the in depth explanation, very useful!


# from result variable (instance of class) only extract the table we are interested in (components)
calculated_components = result.components_

# transform the table back to Orange.data.Table
self.result = Table.from_numpy(Domain(self.dataset.domain.attributes),
suriikata marked this conversation as resolved.
Show resolved Hide resolved
calculated_components)

@gui.deferred
def commit(self):
if self.dataset is None:
self.Outputs.sample.send(None)
else:
self.factor_analysis()
# send self.result in Outputs channel
self.Outputs.sample.send(self.result)
self.setup_plot()


if __name__ == "__main__":
WidgetPreview(OWFactorAnalysis).run(Table("iris"))