Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear_regression_data_generator.m #6

Open
indra-ipd opened this issue Nov 28, 2018 · 2 comments
Open

linear_regression_data_generator.m #6

indra-ipd opened this issue Nov 28, 2018 · 2 comments

Comments

@indra-ipd
Copy link

indra-ipd commented Nov 28, 2018

Hello,

@hiroyuki-kasai Thank you for creating this project with a wide variety of algorithms.

I went through the code in linear_regression_data_generator.m and was not quite clear how the data is being generated. Also, on running the code I find that all my rows have the same number. Can you explain how the dataset is generated for linear regression?

% set number of dimensions
d = 50;
% set number of samples
n = 7000;
% generate data
std = 0.25
data = linear_regression_data_generator(n, d, std);

Attached below is the data(x_train) and label(y_train) generated

data.xlsx
label.xlsx

Thank you!

@hiroyuki-kasai
Copy link
Owner

Hi,

Thank you for your interest in my code.

As you see the code of linear_regression_data_generator.m, the 'w (=w_opt)' to be solved in the regression problem is set as

w_opt = 0.5 * ones(d+1, 1);

If d = 1, which corresponds to the 2-dimensional case, y = [w_1, w_2]' * [x_1, x_2(=1)], where w_1 is the slope of the line and w_2 is the intersection to the y-axis. Therefore, this case is exactly

y = 1/2*x + 1/2.

When d = 2; we get

z = 1/2x + 1/2y + 1/2.

you can check this case as below;


close all
clear
clc

n = 1000;
d = 2;
std = 0.1;
data = linear_regression_data_generator(n, d, std);

x = data.x_train(1,:);
y = data.x_train(2,:);
z = data.y_train;

figure
% plot z = 1/2x + 1/2y + 0.5;
scatter3(x, y, z); hold on

% plot the intersection point (0, 0, 0.5)
plot3(0, 0, 0.5, 'ro','MarkerSize', 20, 'MarkerFaceColor', 'red'); hold off

xlabel('x')
ylabel('y')
zlabel('z')

That is why the all rows are the same except the last one that correspond to the intersection.

This behavior comes from how to set w_opt. You would change the way of setting the ideal value of w_opt as you like, then you get different datasets.

I hope this helps.

Best regards,

Hiro

@indra-ipd
Copy link
Author

Thank you very much for the explanation.

Regards,
Indra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants