From a58e7c5594fb8a46c4703fe2c8657f367be8ce11 Mon Sep 17 00:00:00 2001 From: Lennart Rustige Date: Fri, 6 Apr 2018 16:36:03 +0200 Subject: [PATCH] abiding a bit more by the structure --- README.md | 57 +++++++++++++++++++++++++------------------- benchmarks/README.md | 24 +++++++++++++++++++ 2 files changed, 57 insertions(+), 24 deletions(-) create mode 100644 benchmarks/README.md diff --git a/README.md b/README.md index 9b84782..a63f073 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,33 @@ -# ROOT-C++-Python - Benchmarking, comparing, best practices - -## problem to solve -More and more people see benefits in using ML techniques and in addition to that (or apart from that) they see the benefits from taking advantage of the large data science ecosystem around (scipy, numpy, pandas, matplotlib and many many more) in addition to their ROOT based analyses. But instead of then using these tools there seems to be a high level of caution mainly due to: - -- people are not necessarily aware of easy ways to connect ROOT based data with python data science tools -- people fear that using python will be _significantly_ slower than the ROOT based approach - - strongly connected to that: people are not necessarily aware of how to parallelise in python - -## desired outcome -The best case scenario would be to come out of this hackathon with a comprehensive but simple presentation (mini-tutorial), that shows best practices on how to integrate non-ROOT-tools in an overall ROOT based analysis, how to transfer data between the ecosystems and which includes some performance comparisons between the different approaches. -So basically a talk that can be used to mitigate the fears of stepping outside of a purely ROOT based analysis and that gives actual starting points on how to do that. - -I think that the workload of this project would be threefold: -- actually compute some performance comparisons -- search for performance comparisons, tutorials, talks about the topic that already exist.. and add them to the repository (for the latter see e.g. https://github.com/ChristosChristofidis/awesome-deep-learning ) -- compile a talk (maybe a notebook, maybe something else) with a high pedagogical value :) - -## skills / knowledge needed (for the project, not per person) -- didactic skills -- literature research skills -- some programming skills -- ROOT -- Other data storage solutions +# 2018 IML workshop hackathon + +[![twitter][twitter_badge]][hashtag_link] +[![mattermost][mattermost_badge]][mattermost_link] +[![indico][indico_badge]][indico_link] + +## purpose of this repository + +We would like the hackathon to work with little moderation from the IML +coordinators side. E.g. hacking projects should not be submitted to and then +approved by us, proposals should be visible to all participants to give +feedback, make suggestions, etc. + +## how to propose a project + +Our Idea is to have projects submitted as pull requests. ([How to create pull requests][prhowto])The actual content of +the pull request is not so important at this stage (just create a subdirectory +and put useful files inside) the main discussion and attraction of participants +should happen in the pull request discussion. + +There is a template for pull request descriptions in place, the general idea is to describe: + + - What do you want to do? What is the project about? + - Will you work on the project yourself or is it a suggestion for somebody to pick up? + - What prerequisites should/could participants bring with them? / What kind of know-how are you lacking that ideally a participant would contribute (Maybe you want to add a functionality to your favourite ML library and it would be good to know your way around its source code?) + - Are there previous works to build upon? Is there something that can be prepared in advance (set up software installation / download dataset)? + +## outcome + +Whatever it is your project aims for (a study, a learning experience, a new +tool, a feature to an existing tool) we would be happy if you can present +your achievement in one of the upcoming IML meetings. + diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 0000000..9b84782 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,24 @@ +# ROOT-C++-Python - Benchmarking, comparing, best practices + +## problem to solve +More and more people see benefits in using ML techniques and in addition to that (or apart from that) they see the benefits from taking advantage of the large data science ecosystem around (scipy, numpy, pandas, matplotlib and many many more) in addition to their ROOT based analyses. But instead of then using these tools there seems to be a high level of caution mainly due to: + +- people are not necessarily aware of easy ways to connect ROOT based data with python data science tools +- people fear that using python will be _significantly_ slower than the ROOT based approach + - strongly connected to that: people are not necessarily aware of how to parallelise in python + +## desired outcome +The best case scenario would be to come out of this hackathon with a comprehensive but simple presentation (mini-tutorial), that shows best practices on how to integrate non-ROOT-tools in an overall ROOT based analysis, how to transfer data between the ecosystems and which includes some performance comparisons between the different approaches. +So basically a talk that can be used to mitigate the fears of stepping outside of a purely ROOT based analysis and that gives actual starting points on how to do that. + +I think that the workload of this project would be threefold: +- actually compute some performance comparisons +- search for performance comparisons, tutorials, talks about the topic that already exist.. and add them to the repository (for the latter see e.g. https://github.com/ChristosChristofidis/awesome-deep-learning ) +- compile a talk (maybe a notebook, maybe something else) with a high pedagogical value :) + +## skills / knowledge needed (for the project, not per person) +- didactic skills +- literature research skills +- some programming skills +- ROOT +- Other data storage solutions