Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor output classes #18

Open
7 tasks
wtraylor opened this issue Mar 31, 2020 · 1 comment
Open
7 tasks

Refactor output classes #18

wtraylor opened this issue Mar 31, 2020 · 1 comment
Labels
performance Performance improvement refactoring Restructuring or rewriting of code

Comments

@wtraylor
Copy link
Owner

These are some ideas I had for improving performance and flexibility of output classes.

Collect output in structs of arrays instead of averaging it all the time ⇒ shift paradigm from “array of structs” to “struct of arrays”! Data are stored in std::unordered_map<Output::Variable, std::vector<double> >, where Variable is an enum class. Make sure that no new elements in the map can be created after construction. Only new values should be appended.

  • Create enum class Output::Variable for all possible output variables.
    • Or should we have on HabitatVariable and one HerbivoreVariable type?
  • Create class Fauna::Output::AggUnitDatum:
    • contains one HerbivoreData for each HFT plus one HabitatData + name of aggregation unit.
    • Each tuple is for one day and one habitat.
    • The AggUnitDatum::habitat_data member variable should be the only HabitatData object that is ever created for this aggregation unit. No copying of data!
    • AggUnitDatum::retrieve() resets all arrays and retrieves the mean in one single datum for each variable.
    • Reserve the data vectors in with a precalculated size:
      • Size for HerbivoreData: HFTs × habitats × days
      • Size for HabitatData: habitats × days
      • The count should be calculated by a framework function (which knows parameters and HFT list).
      • The number of habitats in each aggregation unit must be estimated based on the maximum number per aggregation unit encountered so far.
      • The number of days is simply based on the output.interval option.
  • Create class Fauna::Output::HabitatData:
    • contains an array for each value that is stored in the old HabitatData.
    • All arrays are guaranteed to be always of the same length.
    • HabitatData::aggregate() is for both spatial and temporal aggregation.
  • Create class Fauna::Output::HerbivoreData: Parallel to the new HabitatData.
    • HerbivoreData::aggregate_within_habitat() calculates the average spatially within one habitat.
    • HerbivoreData::aggregate() is for both spatial (between habitats) and temporal (between days) aggregation.
  • Replace HerbivoreInterface::get_output() with HerbivoreInterface::append_output(Output::HerbivoreData&)
    • This adds output from the herbivore to the struct of arrays (std::unordered_map)
    • Only append to those variables that are required for output.
  • Do the same for Habitat::append_output(Output::HabitatData&).
  • From the framework, call HerbivoreInterface::append_output() within each habitat in each day and then append the result to the AggUnitDatum.
@wtraylor wtraylor added refactoring Restructuring or rewriting of code performance Performance improvement labels Mar 31, 2020
@wtraylor
Copy link
Owner Author

wtraylor commented Apr 3, 2020

The output data classes are currently always in a consistent state. With every new datum, the total average is recalculated.
This is a great waste of computing power.
It would be a lot more efficient to first gather a long series of data, and finally calculate the mean or sum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance improvement refactoring Restructuring or rewriting of code
Projects
None yet
Development

No branches or pull requests

1 participant