From 300c8657850690e683d18b5b11230104c0e07c2a Mon Sep 17 00:00:00 2001 From: juacrumar Date: Fri, 6 Dec 2024 12:57:15 +0100 Subject: [PATCH] update docs ; update versions at the end of the fit to include the backend --- .../source/get-started/nnpdfmodules.rst | 2 +- doc/sphinx/source/n3fit/index.rst | 3 +- doc/sphinx/source/n3fit/methodology.rst | 71 +++++++++---------- doc/sphinx/source/n3fit/runcard_detailed.rst | 10 +-- doc/sphinx/source/tutorials/run-fit.rst | 35 ++++----- n3fit/src/n3fit/io/writer.py | 27 +++---- 6 files changed, 72 insertions(+), 76 deletions(-) diff --git a/doc/sphinx/source/get-started/nnpdfmodules.rst b/doc/sphinx/source/get-started/nnpdfmodules.rst index 054fefaa53..0643b43570 100644 --- a/doc/sphinx/source/get-started/nnpdfmodules.rst +++ b/doc/sphinx/source/get-started/nnpdfmodules.rst @@ -14,7 +14,7 @@ for an NNPDF fit is displayed in the figure below. The :ref:`n3fit ` fitting code -------------------------------------------------------------------------------- This module implements the core fitting methodology as implemented through -the ``TensorFlow`` framework. The ``n3fit`` library allows +the ``Keras`` framework. The ``n3fit`` library allows for a flexible specification of the neural network model adopted to parametrise the PDFs, whose settings can be selected automatically via the built-in :ref:`hyperoptimization algorithm `. These diff --git a/doc/sphinx/source/n3fit/index.rst b/doc/sphinx/source/n3fit/index.rst index 5d5705ba97..3225fa529a 100644 --- a/doc/sphinx/source/n3fit/index.rst +++ b/doc/sphinx/source/n3fit/index.rst @@ -6,8 +6,7 @@ Fitting code: ``n3fit`` - ``n3fit`` is the next generation fitting code for NNPDF developed by the N3PDF team :cite:p:`Carrazza:2019mzf` - ``n3fit`` is responsible for fitting PDFs from NNPDF4.0 onwards. -- The code is implemented in python using `Tensorflow `_ - and `Keras `_. +- The code is implemented in python using `Keras `_ and can run with `Tensorflow `_ (default) or `pytorch `_ (with the environment variable ``KERAS_BACKEND=torch``). - The sections below are an overview of the ``n3fit`` design. diff --git a/doc/sphinx/source/n3fit/methodology.rst b/doc/sphinx/source/n3fit/methodology.rst index 8380d5526d..1fdc2365f6 100644 --- a/doc/sphinx/source/n3fit/methodology.rst +++ b/doc/sphinx/source/n3fit/methodology.rst @@ -8,8 +8,8 @@ different in comparison to the latest NNPDF (i.e. `NNPDF3.1 `_. .. note:: @@ -90,7 +90,7 @@ random numbers used in training-validation, ``nnseed`` for the neural network in Neural network architecture --------------------------- -The main advantage of using a modern deep learning backend such as Keras/Tensorflow consists in the +The main advantage of using a modern deep learning backend such as Keras consists in the possibility to change the neural network architecture quickly as the developer is not forced to fine tune the code in order to achieve efficient memory management and PDF convolution performance. @@ -132,41 +132,36 @@ See the `Keras documentation `_. +It is possible to inspect the ``n3fit`` code using `TensorBoard `_ when running with the tensorflow backend. In order to enable the TensorBoard callback in ``n3fit`` it is enough with adding the following options in the runcard: @@ -333,7 +333,7 @@ top-level option: parallel_models: true Note that currently, in order to run with parallel models, one has to set ``savepseudodata: false`` -in the ``fitting`` section of the runcard. Once this is done, the user can run ``n3fit`` with a +in the ``fitting`` section of the runcard. Once this is done, the user can run ``n3fit`` with a replica range to be parallelized (in this case from replica 1 to replica 4). .. code-block:: bash @@ -346,8 +346,8 @@ should run by setting the environment variable ``CUDA_VISIBLE_DEVICES`` to the right index (usually ``0, 1, 2``) or leaving it explicitly empty to avoid running on GPU: ``export CUDA_VISIBLE_DEVICES=""`` -Note that in order to run the replicas in parallel using the GPUs of an Apple Silicon computer (like M1 Mac), it is necessary to also install -the following packages: +Note that in order to run the replicas in parallel using the GPUs of an Apple Silicon computer (like M1 Mac), it is necessary to also install +extra packages. At the timing of writing this worked with ``tensorflow`` 2.13. .. code-block:: bash diff --git a/doc/sphinx/source/tutorials/run-fit.rst b/doc/sphinx/source/tutorials/run-fit.rst index 4293563fb2..40cf0de87e 100644 --- a/doc/sphinx/source/tutorials/run-fit.rst +++ b/doc/sphinx/source/tutorials/run-fit.rst @@ -51,7 +51,7 @@ example of the ``parameter`` dictionary that defines the Machine Learning framew dropout: 0.0 ... -The runcard system is designed such that the user can utilize the program +The runcard system is designed such that the user can utilize the program without having to tinker with the codebase. One can simply modify the options in ``parameters`` to specify the desired architecture of the Neural Network as well as the settings for the optimization algorithm. @@ -164,7 +164,7 @@ folder, which contains a number of files: - ``runcard.exportgrid``: a file containing the PDF grid. - ``runcard.json``: Includes information about the fit (metadata, parameters, times) in json format. -.. note:: +.. note:: The reported χ² refers always to the actual χ², i.e., without positivity loss or other penalty terms. @@ -184,25 +184,26 @@ After obtaining the fit you can proceed with the fit upload and analisis by: Performance of the fit ---------------------- -The ``n3fit`` framework is currently based on `Tensorflow `_ and as such, to -first approximation, anything that makes Tensorflow faster will also make ``n3fit`` faster. - -.. note:: - - Tensorflow only supports the installation via pip. Note, however, that the TensorFlow - pip package has been known to break third party packages. Install it at your own risk. - Only the conda tensorflow-eigen package is tested by our CI systems. - -When you install the nnpdf conda package, you get the -`tensorflow-eigen `_ package, -which is not the default. This is due to a memory explosion found in some of +The ``n3fit`` framework is currently based on `Keras `_ +and it is tested to run with the `Tensorflow `_ +and `pytorch `_ backends. +This also means that anything that make any of these packages faster will also +make ``n3fit`` faster. +Note that at the time of writing, ``TensorFlow`` is approximately 4 times faster than ``pytorch``. + +The default backend for ``keras`` is ``tensorflow``. +In order to change the backend, the environment variable ``KERAS_BACKENDD`` need to be set (e.g., ``KERAS_BACKEND=torch``). + +The best results are obtained with ``tensorflos[and-cuda]`` installed from pip. +When you install the nnpdf conda package, you get the +`tensorflow-eigen `_ package, +which is not the default. This is due to a memory explosion found in some of the conda mkl builds. -If you want to disable MKL without installing ``tensorflow-eigen`` you can always +If you want to disable MKL without installing ``tensorflow-eigen`` you can always set the environment variable ``TF_DISABLE_MKL=1`` before running ``n3fit``. When running ``n3fit`` all versions of the package show similar performance. - When using the MKL version of tensorflow you gain more control of the way Tensorflow will use the multithreading capabilities of the machine by using the following environment variables: @@ -214,7 +215,7 @@ the multithreading capabilities of the machine by using the following environmen These are the best values found for ``n3fit`` when using the mkl version of Tensorflow from conda and were found for TF 2.1 as the default values were suboptimal. For a more detailed explanation on the effects of ``KMP_AFFINITY`` on the performance of -the code please see +the code please see `here `_. By default, ``n3fit`` will try to use as many cores as possible, but this behaviour can be overriden diff --git a/n3fit/src/n3fit/io/writer.py b/n3fit/src/n3fit/io/writer.py index 053fc6d229..7b6e40e140 100644 --- a/n3fit/src/n3fit/io/writer.py +++ b/n3fit/src/n3fit/io/writer.py @@ -394,23 +394,24 @@ def jsonfit( def version(): """Generates a dictionary with misc version info for this run""" versions = {} + try: - # Wrap tf in try-except block as it could possible to run n3fit without tf - import tensorflow as tf - from tensorflow.python.framework import test_util - - versions["keras"] = tf.keras.__version__ - mkl = test_util.IsMklEnabled() - versions["tensorflow"] = f"{tf.__version__}, mkl={mkl}" - except ImportError: - versions["tensorflow"] = "Not available" - versions["keras"] = "Not available" - except AttributeError: - # Check for MKL was only recently introduced and is not part of the official API - versions["tensorflow"] = f"{tf.__version__}, mkl=??" + import keras + + versions["keras"] = f"{keras.__version__} backend={keras.backend()}" + + if keras.backend.backend() == "tensorflow": + import tensorflow as tf + + versions["tensorflow"] = tf.__version__ + elif keras.backend.backend() == "torch": + import torch + + versions["torch"] == torch.__version__ except: # We don't want _any_ uncaught exception to crash the whole program at this point pass + versions["numpy"] = np.__version__ versions["nnpdf"] = n3fit.__version__ try: