Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write out simulation truths to jsonl files #70

Open
maxnoe opened this issue Feb 15, 2018 · 9 comments
Open

Write out simulation truths to jsonl files #70

maxnoe opened this issue Feb 15, 2018 · 9 comments

Comments

@maxnoe
Copy link
Member

maxnoe commented Feb 15, 2018

Important simulation truths is missing in the output

  • energy
  • source position
  • pointing position

at least these 3 are needed to use the files for training of an energy estimator or source reconstruction algorithm

@relleums
Copy link
Member

The simulation truth of CORSIKA is in separate files which have to contain at least the headers of the CORSIKA output. This was done on purpose to avoid mental-mapping and not remove any simulation-truth only it is not used at the moment.
There is an example in the readme.

import photon_stream as ps
import pandas as pd

sim_reader = ps.SimulationReader(
    photon_stream_path='tests/resources/011014.phs.jsonl.gz',
    mmcs_corsika_path='tests/resources/011014.ch'
)

for event in sim_reader:
    # process event ...
    # extract Hillas and other features ....
    # do deep learning ...
    pass

thrown_events = pd.DataFrame(sim_reader.thrown_events())

There is a reader which merges this information:

reader = photon_stream.SimulationReader(photon_stream_path, mmcs_corsika_path)

@maxnoe
Copy link
Member Author

maxnoe commented Feb 15, 2018

I know this!

I think it is a huge advantage, to not have to give people the corsika files so they can do event reconstruction. That's the point.

This was done on purpose to avoid mental-mapping and not remove any simulation-truth only it is not used at the moment.

A little mental mapping for us is much less work than explaining every new bachelor student what Corsika is, why there are these strange other files are, why things are called phi and theta and not azimuth and zenith and why he cannot simply read the json lines.

@relleums
Copy link
Member

The CORSIKA files are tiny when the photon-blocks are removed, as I did it for the simulation sample here https://ihp-pc41.ethz.ch/public/phs/sim/. So I do not see a problem to give the CORSIKA files to the users.
This way we avoid mental mapping and keep all the information from CORSIKA. The users can even reproduce the air-showers based on the run-header.
Is it because one needs to have multiple files do achieve one task? In this case tape-archive is your friend. Personally I prefer to map hierarchy in the file-system. If you have bad feelings because of the DRS-file mayhem in FACT, in this case I agree and they it is very bad. But here it is a very different quality of having two files as they have the same names and only different suffixes.

@maxnoe
Copy link
Member Author

maxnoe commented Feb 15, 2018

No, I just went through an hour explaining someone why these files exist, what theta and phi mean and why there are different coordinate systems, what one has to do to change it and many other things.

How is that easier than just providing 5 additional numbers in the jsonl????

@relleums
Copy link
Member

relleums commented Feb 15, 2018

It is not easier, and yes I know that it takes hours. But solving this is beyond the scope of the photon-stream. For a specific task, five numbers with key-names known by two or three people are fine. But in general we want to have the full simulation truth. We do not know what piece of the simulation truth might be relevant for the user. The user might know the CORSIKA manual. For our stuff, there does not even exist a manual.
However, I agree that additional functions which transform between the reference-frames on the fly make sense. These functions map our knowledge about the different reference-frames of CERES and CORSIKA. But these functions do not belong to the photon-stream I fear. This involves CERES a lot. This belongs to a level above CORSIKA, CERES and the photon-stream. For instance, I have simulated a lot of FACT events for my thesis recently with a different tool than CERES which did not introduce a new reference-frame but kept the reference-frame of CORSIKA.

@kbruegge
Copy link
Member

This way we avoid mental mapping and keep all the information from CORSIKA. The users can even reproduce the air-showers based on the run-header.

I agree. Providing some method to perform the transformation might be helpful. Or even do it 'on-the-fly' as you mention.

@relleums
Copy link
Member

see issue #71 for the discussion on the different reference-frames

@maxnoe maxnoe changed the title Write out simulation truth's to jsonl files Write out simulation truths to jsonl files Feb 15, 2018
@maxnoe
Copy link
Member Author

maxnoe commented Feb 15, 2018

I whole-heartedly disagree.

Having energy, pointing direction and source direction directly at hand and in a well defined coordinate system is such huge usability boost that we shouldn't say

It's in those other binary files, in another coordinate system. Here is some code to read it and to convert it.

That's insane.

@relleums
Copy link
Member

Can we compromise that we agree to find ways to provide all pointing in one 'well defined' reference-frame, but that we will not put it into the 'phs' files? Can we decouple the reference-frame issue from the format-issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants