Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with pyhf JSON workspaces #98

Open
kratsg opened this issue Jun 17, 2019 · 7 comments
Open

Integration with pyhf JSON workspaces #98

kratsg opened this issue Jun 17, 2019 · 7 comments

Comments

@kratsg
Copy link

kratsg commented Jun 17, 2019

/cc @matthewfeickert @lukasheinrich -- we should probably file an issue here to investigate the possibility of getting the HEPData to handle pyhf JSON specifications (additionally teaching it to export the given specification to root+xml as well if needed).

I'm hoping to use this issue as a place to hold discussion on this. For reference, we do have a JSON schema that fully specifies the workspace and will be releasing a pyhf version on pypi shortly that contains the v1.0.0 of this schema.

@lukasheinrich
Copy link

yes this has been a long-term (read: years) project and I initially came up with some code that reads in ROOT workspaces and spits out HepData

https://github.com/lukasheinrich/hf2hd-demo

but we should absolutely revisit this. (though arguably, just uploading the likelihood is sufficient if all the hepdata records can be fully generated from them)

@kratsg
Copy link
Author

kratsg commented Jun 25, 2019

Just a quick note that we do have a very nice feature of pyhf that allows you to produce summaries of the JSON schemas. See diana-hep/pyhf#443 for details. We currently provide (beta) a pyhf inspect command line tool that pretty-prints a summary of the JSON specification in a human-readable format. This can (and probably should) spit out a JSON of the summary as well to be consumed in an automated fashion. Is this something of interest for HEPData to use?

@clelange
Copy link
Collaborator

Hi @kratsg - mind that this tool is mainly meant for converting input to a format that can be ingested by HEPData. Once this is the case for pyhf workspaces (my understanding that this is currently not so), it'd be great if you added this to hepdata_lib. For discussion on what can be added to HEPData and how, you probably have to communicate with the HEPData developers/maintainers directly (I guess preferably by email).

@kratsg
Copy link
Author

kratsg commented Jun 27, 2019

hi @clelange , I did not realize the two were somewhat separated. Should hepdata_lib effectively support something like yaml.dump(json.load(open('workspace.json')))? Really, that's most of the work as the entire specification is in a single JSON document.

So HEPData needs to support this first, before hepdata_lib can write a converter for it?

@clelange
Copy link
Collaborator

I think I wasn't reading carefully, sorry. If you contribute code similar to https://github.com/lukasheinrich/hf2hd-demo that converts the workspace.json into the YAML format that is understood by HEPData as part of the submission.tar.gz (which is effectively what hepdata_lib does for other formats such as ROOT histograms already), this is perfectly fine. Do I understand correctly that this is your plan?
I'm not sure I understand why additional exports to root+xml are needed.

@kratsg
Copy link
Author

kratsg commented Jun 27, 2019

I'm not sure I understand why additional exports to root+xml are needed.

This is usually because "ROOT+XML" is what people already use (the HistFactory workspace) and I think, in some cases already, these have been uploaded for an analysis or two in the past (but I'm reaaaaaally not sure here). The fact that this functionality is possible means it could be useful to have the likelihood exported into different formats depending on what you want.. but I don't know if this is something HEPData wants to do or not.

I think I wasn't reading carefully, sorry. If you contribute code similar to https://github.com/lukasheinrich/hf2hd-demo that converts the workspace.json into the YAML format that is understood by HEPData as part of the submission.tar.gz (which is effectively what hepdata_lib does for other formats such as ROOT histograms already), this is perfectly fine. Do I understand correctly that this is your plan?

yeah, that should be ~what we want :)

@lukasheinrich
Copy link

just note that a conversion into hepdata yaml will always be lossy.. the full likelihood probably will require uploading the full spec to hepdata (either as aux material or as a native integration as @GraemeWatt suggested). Bit a lossy projection an still be useful: The generated hepdata tables can be e.g. the equivalent of pre/post fit plots we usually produce)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants