Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a planar digitisation algorithm #1

Merged
merged 24 commits into from
Jun 24, 2024
Merged

Add a planar digitisation algorithm #1

merged 24 commits into from
Jun 24, 2024

Conversation

jmcarcell
Copy link
Contributor

@jmcarcell jmcarcell commented Oct 3, 2023

BEGINRELEASENOTES

ENDRELEASENOTES

Builds on top of key4hep/k4FWCore#173

Needs Gaudi v38 for using the RootHistograms from Gaudi

Validation has been done with a file with 25 events (Z->qq). On the Marlin side the Overlay has been disabled to make sure the input collections are not modified. Then, I ran the python file in this PR over that file. The parameters
are taken from https://github.com/iLCSoft/CLICPerformance/blob/master/clicConfig/clicReconstruction.xml.

So far everything I have checked looks good, these are the plots that both the processor and this algorithm make that I haven't changed (except the binning for diffu and diffv), the order is first the plot from the processor and then the plot from this algorithm.

This is the change in the local coordinate u for the accepted hits; the resolution is set to 0.003 mm which is what we get in the resolution.
diffu_original
diffu

Same as above but for v
diffv_original
diffv

This is the energy of the hit, should be identical in both cases. Note that there are more entries since this plot is done for every hit, including the ones that are not accepted.
hitE_original
hitE

This is the percentage of hits accepted for each event
hitsAccepted_original
hitsAccepted

This is the smearing in u but divided by the resolution; i.e. normalized to have sigma 1
hu_original
hu

Same as above but with v
hv_original
hv

Note that the number of accepted hits is the same in both cases but this isn't necessary since the random numbers generated for the smearing are different in each case.

@jmcarcell
Copy link
Contributor Author

This PR has been updated and it's ready for comments

CMakeLists.txt Outdated Show resolved Hide resolved
k4Reco/DDPlanarDigi/components/DDPlanarDigi.h Outdated Show resolved Hide resolved
std::string m_collName;

// inline static thread_local std::mt19937 m_engine;
inline static thread_local std::mt19937 m_engine;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 32 bit version of the MT enough here? Or do we need the mt19937_64 64 bit version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't be enough? It's >10^9 values for just drawing from a normal

auto seed = m_uidSvc->getUniqueID(headers[0].getEventNumber(), headers[0].getRunNumber(), this->name());
info() << "Using seed " << seed << " for event " << headers[0].getEventNumber() << " and run "
<< headers[0].getRunNumber() << endmsg;
m_engine.seed(seed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the size of the seed in this case? The mersenne twister needs a comparatively large seed to properly initialize it's internal state. See, e.g. https://codereview.stackexchange.com/a/109266

I am not entirely sure that we have to do this for all events, but maybe we should initialize it properly (and reproducibly) once in initialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seed is uint64_t. But, as long as the seeds for every event are different then it shouldn't matter which seeds are being used . In the link I only see that if you have 2^32 seeds you have at most 2^32 choices for the first generated number so of course you can find them. For all events or not is a different discussion. Even when compiling without optimizations seeding a million times takes a few seconds so I don't think it's much of a problem. The idea of the UniqueIDGenSvc was certainly to do this for each event and in the original processor it was being done for each event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with an improperly seeded MT (as far as I understand it), is that it will not be possible to reach all internal states (and thus random outputs?). So the actual amount of unique random numbers that you get might be much smaller than what is theoretically possible. I am mainly raising this because the Marlin processor used the GSL random library, specifically gsl_rng_ranlxs2, which seems to be "more random" than the MT. I have://www.gnu.org/software/gsl/doc/html/rng.html#c.gsl_rng_ranlxs2), which seems to be "more random" than the MT. I am not sure any of this matters in the end, but it might be the source of somewhat subtle bugs since the random engine changes from Marlin to Gaudi.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcarcell @tmadlener choosing a random engine is not easy... Geant4 default random engine is the so-called MinMax, described here, where they also comment about the MT the following:

In this sense, the underlying dynamical system of the Mersenne Twister has one order of magnitude less entropy than RCARRY. This is related to the singular flaw acknowledged by the authors of the Mersenne Twister, which is that MT has a very long recovery time when it is seeded by a vector containing mostly zeros: the output is observed to be non-random even after outputting a million values. In this instance, the divergence of some trajectory is observed away from the origin. In fact, from the dynamical system point of view this is not merely a manifestation of an unlucky initialization, since any two nearby trajectories of the MT diverge very slowly and this is ultimately what causes the failure of MT in the statistical tests. The total entropy of the MT system is approximately h ≈ 4.8. On the basis of the known behaviour of RCARRY [10,11] and our investigation of the MIXMAX, it appears that an entropy of at least h & 50 is required for a generator to have a sufficiently random trajectory. In light of this, it is perhaps not surprising that the Mersenne Twister fails many tests in its pure form, and still fails some tests even when some additional tempering of the output is applied

Copy link

@atolosadelgado atolosadelgado May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmadlener why not using the ROOT wrapper for the GSL random engine? Edit, I just found a bug that prevents its use at the moment...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmadlener why not using the ROOT wrapper for the GSL random engine? Edit, I just found a bug that prevents its use at the moment...

Most likely because this is a port from the corresponding Marlin processor that did not have a ROOT dependency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oki, fair enough. In case you want to avoid depending on GSL random engines (because of software license), boost offers the same engines, link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using now TRandom2. I will merge this shortly since there hasn't been much discussion other than the random number generator, and the validation should already be done, although I will try to run again.

k4Reco/DDPlanarDigi/components/DDPlanarDigi.cpp Outdated Show resolved Hide resolved
k4Reco/DDPlanarDigi/components/DDPlanarDigi.h Outdated Show resolved Hide resolved
@jmcarcell jmcarcell merged commit a8056b7 into main Jun 24, 2024
5 of 11 checks passed
@jmcarcell jmcarcell deleted the planar-digi branch June 24, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants