Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To-Do list associated with streamlining code and improving quality #36

Open
28 tasks
MoAly98 opened this issue Jun 30, 2022 · 1 comment
Open
28 tasks
Labels
help wanted Extra attention is needed

Comments

@MoAly98
Copy link
Owner

MoAly98 commented Jun 30, 2022

The following is a to-do list of work that need be done for the pre-processing and histogramming codes to interface nicely and provide a consistent UX:

For sklimming:

  • Sklimming should replace the Branch objects with Variable or Observable.
  • The idea of Functor sub-class called Builder for a Variable is much neater than the new_branches dictionary.
  • Remove the arg_types argument for Variable/Branch objects and replace with an optional list_str_arg where the default assumption when a str is provided in argument becomes that it is a another Branch
  • Sklimming back-end is too messy -- especially the organisation of different type of branches (new, tmp, on, off)
  • Dask distribution for the reading in the backend in a similar way to the histogramming code.
  • Consistent way of outputting yields using a given branch
  • Config to use a schema in a similar way to histogramming
  • reader code should be a processor code
  • Argparser to select branches and samples in stirring script
  • Look into a coffea backend
  • Make variables/branches optional in Sample constructor -- in case it is needed at only histogramming
  • Remove or actually use job name in general settings
  • indirs in general_settings to become used as default for samples if no where is given
  • Branch/Variable awate of tree rather than using a dict? this way user can just specify which trees to use when
    looking for this variable withour repition
  • Better multiple-tree support

For histogramming:

  • Provide an easy way for user to change sample settings -- does user really need anythig other than to set regex (i.e tag and specify if a sample isdata)?
  • The input_paths finder in InputManager should be able to handle user provided methods of finding paths, possibly via a decorator in the config? This is for users who have NTuples that they want to Histogram without running through pre-prcoessing
  • Data rendering before passing awkward arrays to boost_histograms. This probably means handling masked awkward arrays, since it seems like boost_histograms do not deal well with None in the awkward/numpy arrays (they get dumped in the underflow) boost-histogram issue
  • Regions should not be required -- only Observables should be (inclusive sample can be selected by a dummy function)
  • Overall Systematics
  • Observable.fromFunc() does it really need args -- can we not just replace args with var
  • access to samples if specified sklimconfig in settings, or no?
  • Auto binning (uniform betwen min and max if user not provided binning -- safeguard for user? warn them? flag to tell us its acceptable not to have binning?)
  • Do we need general.from_hists flag to allow retrieving histograms from Histogram files?
  • [] Histo name goes to :: or __ between different XP components
    For both sklimming and histogramming:
  • Remote distribution of jobs with dask(e.g. HTCondotCluster)
  • Can Functor fromStr as it is parse slicing syntax?
  • Test functions for all features
  • Variable and Observable can maybe inherit from a parent class -- what is the benefit a user will get if they have to specify the binning for var by var? except if we support no binning and just do a uniform binning on behalf of user, then yhey can just import their variables from sklimming to histogramming.
@MoAly98 MoAly98 added the help wanted Extra attention is needed label Jun 30, 2022
@MoAly98
Copy link
Owner Author

MoAly98 commented Jun 30, 2022

Mentioning the relevant pull request above which these changes are needed: #35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant