Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it parallel? #74

Open
marchezinixd opened this issue Sep 13, 2018 · 11 comments
Open

Is it parallel? #74

marchezinixd opened this issue Sep 13, 2018 · 11 comments

Comments

@marchezinixd
Copy link

I'm trying on a really large dataset and checking the resources usage. Apparently it is using only one core. is it possible to set it to use all cores and make it faster?

@chirayukong
Copy link
Collaborator

It should be. Which algorithm are you running? I'll take a look.

@marchezinixd
Copy link
Author

The parameters i used were:
FGES
Sem-Bic
Sem-Bic
Penalty: 100

I have 4 cores and it is using 100% of one but nothing of the others.
The dataset have 105 features, 2.6 million rows
The memory is ok, it is using 14gb and i have a total of 32gb

@jdramsey
Copy link
Contributor

jdramsey commented Sep 14, 2018 via email

@marchezinixd
Copy link
Author

Well i checked it.
I reduced the penalty to 25 and ran it again. The attached image shows how it behaves. Basically there was a few seconds peak that used all cores.
The second graph shows that it uses just one core at a time, but in sequentially uses all cores.
performance

@chirayukong
Copy link
Collaborator

Try it on the causal-cmd cli. Attachment is its distribution. Run it with java -Xmx14G -jar causal-cmd-0.4.0-SNAPSHOT-jar-with-dependencies.jar --algorithm fges --data-type <discrete|continuous> --delimiter <comma|tab> --dataset <your_dataset> --score sem-bic --test sem-bic --penaltyDiscount 100 --json-graph. More about causal-cmd.
causal-cmd-0.4.0-SNAPSHOT-distribution.zip

@marchezinixd
Copy link
Author

Hello @chirayukong sorry for the long time to awnser, i was having trouble with the dataset and how to handle the full size. I did the test, in the cmd it ran in ~5 minutes and had the behaviour of the attached image. When running with python it took ~1 hour and had the same behavior of the previous images. Apparently the new jar have a better performance and parallelize more than the python one. Is it possible to update the pycausal?

screenshot from 2018-09-25 16-45-24
CMD test

@chirayukong
Copy link
Collaborator

The jar file is updated. Please try it. @marchezinixd

@marchezinixd
Copy link
Author

marchezinixd commented Sep 26, 2018

The beginning was a little different, but still following the same old pattern, while the jar ran in 4 minutes, the python is running for 20 minutes and it seems it will not end soon.
Apparently it is a python problem, maybe the way it handles parallelism?
python

@chirayukong
Copy link
Collaborator

Maybe it's a problem on the javabridge library, which I don't know how to fix it. You can run it on causal-cmd and load the json result back to python.

@chirayukong
Copy link
Collaborator

This is the latest one.
causal-cmd-0.4.0-SNAPSHOT-distribution.zip

@marchezinixd
Copy link
Author

Well i'll do it for now. I'll leave the issue open in case you guys have any ideas how to solve the python problem.
Thankyou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants