Skip to content

Commit

Permalink
Merge pull request #1819 from pace-neutrons/1818_filebacked_info
Browse files Browse the repository at this point in the history
Better documentation on filebacked objects
  • Loading branch information
abuts authored Jan 29, 2025
2 parents 494b1e1 + 6a19646 commit 2c89878
Show file tree
Hide file tree
Showing 5 changed files with 165 additions and 45 deletions.
3 changes: 2 additions & 1 deletion documentation/release_notes/v4.0.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,5 @@ various small changes to documentation.
- #1781 Epic: issues related with large number of small cuts,
changes in design of containers, code base and bugfixes
allowing working with hashable objects.
Issues #1788,#1790,(PR #1801 with quick-fix for #1790),#1808 and #1811
Issues #1788,#1790,(PR #1801 with quick-fix for #1790),#1808 and #1811
- #1818 Improved documentation on filebacked objects.
91 changes: 62 additions & 29 deletions documentation/user_docs/docs/manual/Changing_Horace_settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ All configurations have the following settings:
class_name: 'parallel_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/jacob/.matlab/mprogs_config'
config_folder: '/home/UserName/.matlab/mprogs_config_v4'

- ``class_name``: is the name of the current configuration.
- ``saveable``: Sets whether any changes to the config will be written to file
Expand All @@ -44,55 +44,63 @@ The Horace config (``hor_config``) manages configuration features of the Horace
library functions, such as how functions handle ``NaN`` and ``inf``, the
verbosity of the code and whether to use compiled C++ accelerator codes. It also
contains references to the ``hpc_config`` to manage high-performance
functionality.
functionality and ``parallel_config`` to control the parameters of parallel jobs.

::

hor_config with properties:
hor_config with properties:

mem_chunk_size: 10000000
fb_scale_factor: 10
ignore_nan: 1
ignore_inf: 0
log_level: 10
use_mex: 1
delete_tmp: 1
working_directory: '/temp/Horace_4.0.0.f2f508726'
force_mex_if_use_mex: 0
hpc_config: [1×1 hpc_config]
parallel_config: [1×1 parallel_config]
-----------
init_tests: 0
unit_test_folder: ''
class_name: 'hor_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/UserName/.matlab/mprogs_config_v4'

mem_chunk_size: 10000000
fb_scale_factor: 3
ignore_nan: 1
ignore_inf: 0
log_level: 1
use_mex: 1
delete_tmp: 1
working_directory: '/tmp/'
force_mex_if_use_mex: 0
high_perf_config_info: [1x1 hpc_config]
class_name: 'hor_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/jacob/.matlab/mprogs_config'


- ``mem_chunk_size`` : (Advanced) The volume (in pixels) that are read into
memory at a time during cuts.
- ``fb_scale_factor`` : (Advanced) Number of "pages" (of ``mem_chunk_size``) in
an ``sqw`` to memory-back before falling back to file-backed.
an ``sqw`` to memory-back before falling back to file-backed ``sqw`` object.
See more information about filebacked and memory based objects in
:ref:`manual/Cutting_data_of_interest_from_SQW_files_and_objects:File- and memory-backed cuts`
- ``ignore_nan`` : Whether binning treats ``NaN`` as a value or simply filters
the values before computing the new bins.
- ``ignore_inf`` : Whether binning treats ``inf`` as a value or simply filters
the values before computing the new bins.
- ``log_level`` : How verbose the code should be:

- -1 : No output is produced.

- 0 : Major notifications are printed.

- 1 : Minor notifications are printed.

- 2 : Runs are timed and this information is printed too.
- 0 : Major notifications are printed.
- 1 : Minor notifications are printed.
- 2 : Runs are timed and this information is printed too.

- ``use_mex`` : Whether to use compiled C++ accelerator MEX code to speed up key
Horace operations.
- ``force_mex_if_use_mex`` : (Advanced) If MEX fails for whatever reason, fail
the calculation instead of falling back to Matlab code.
the calculation instead of falling back to MATLAB code.
- ``delete_tmp`` : Whether to automatically delete temporary files after
generating SQW files.
- ``working_directory`` : The directory to which temporary files are written
- ``high_perf_config_info`` : Reference to the HPC configuration (see below)
- ``hpc_config`` : Reference to the HPC configuration (:ref:`see below <HPC config>`:)
- ``parallel_config`` : Reference to the settings to run parallel jobs (:ref:`see below <Parallel Config>`:)

The information which follows ``parallel_config`` option is a service information described :ref:`below <Service info>`.

.. _HPC Config:

HPC Config
==========
Expand All @@ -115,18 +123,19 @@ the ``parallel_config`` as well as a direct reference to the config itself.
parallel_cluster: 'herbert'
parallel_configuration: [1x1 parallel_config]
hpc_options: {1x5 cell}
------------
class_name: 'hpc_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/jacob/.matlab/mprogs_config'
config_folder: '/home/UserName/.matlab/mprogs_config'



- ``build_sqw_in_parallel`` : Whether to use parallel algorithms to generate and
combine SQW objects
- ``combine_sqw_using`` : Determines the algorithm to use for SQW combination

- ``matlab`` : this mode uses Matlab code to combine files. Slowest but most
- ``matlab`` : this mode uses MATLAB code to combine files. Slowest but most
reliable method.

- ``mex_code`` : Uses multi-threaded compiled C++ MEX code to combine
Expand Down Expand Up @@ -157,6 +166,9 @@ more info.
- ``parallel_cluster``
- ``parallel_configuration``

The information which follows ``hpc_options`` option is a service information described :ref:`below <Service info>`.

.. _Parallel Config:

Parallel Config
===============
Expand Down Expand Up @@ -185,10 +197,11 @@ cluster is set up along with threading.
external_mpiexec: ''
slurm_commands: [0x1 containers.Map]
n_cores: 8
--------
class_name: 'parallel_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/jacob/.matlab/mprogs_config'
config_folder: '/home/UserName/.matlab/mprogs_config_v4'

- ``worker``: (Advanced) Parallel worker script to run on instantiating parallel
jobs.
Expand Down Expand Up @@ -233,3 +246,23 @@ cluster is set up along with threading.
submission jobs (if ``parallel_cluster `` is ``slurm_mpi``)
- ``n_cores`` : Quick readout of Matlab's estimate of number of cores on local
machine.

The information which follows ``n_cores`` option is a service information described :ref:`below <Service info>`.

.. _Service info:

Developers and service information present in configuration(s)
--------------------------------------------------------------

- ``init_tests`` : By default false. If set to true tries to identify and set to MATLAB search path
location of Horace unit tests and Horace unit test framework. Unit tests are present in Horace distributions,
cloned from repository only. If unit tests absent, attempt to set this property to true is ignored. Horace
unit tests framework shadows MATLAB-s native unit test framework, so you need to set this property on/off if want to use both.
- ``unit_test_folder`` : the folder where Horace unit tests are located. Applicable only for Horace versions,
downloaded from repository and became available when ``init_tests`` property is set to true.
-
- ``class_name`` : helper read-only property which repeat the name of the configuration class.
- ``saveable`` : if true, changes applied to configuration are saved to disk and will be restored in next MATLAB session.
if false, values remain in memory and will be lost after MATLAB session is closed.
- ``return_defaults`` : by default, its false. Setting this property to true would allow one to retrieve default configuration values.
- ``config_folder`` : the place where the configuration data are stored to be able to restore it in the next MATLAB session.
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,8 @@ Each can independently have one of four different forms below.

.. note::

A value of ``[0]`` is equivalent to ``[]`` using the bin size
of the corresponding axis in the source image.
A value of ``[0]`` is equivalent to ``[]`` using the bin size
of the corresponding axis in the source image.


- ``[lo,hi]`` Integration axis in binning direction.
Expand Down Expand Up @@ -259,6 +259,7 @@ Each can independently have one of four different forms below.
integration ranges for three cuts, the first cut integrates the axis over
``105-107``, the second over ``109-111`` and the third ``113-115``.

.. _File_and_memory-backed_cuts:

File- and memory-backed cuts
----------------------------
Expand Down
73 changes: 70 additions & 3 deletions documentation/user_docs/docs/manual/Save_and_load.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,83 @@ full sqw dataset, then only the binned data will be read.
The returned variable is an ``dnd`` object.


Constructing ``sqw`` object from filename
=========================================

Calling ``sqw`` constructor with the name of binary sqw file is equivalent to invoking ``read_sqw`` function.

.. code-block:: matlab
output = sqw(sqw_filename);
##############################################################
Saving sqw objects from memory and creating filebacked objects
##############################################################

save
====

Saves the ``sqw`` or ``dnd`` object from the MATLAB workspace to the file
specified by ``filename``.
There are two ways of saving ``sqw`` or ``dnd`` objects to files on disk.

First -- use MATLAB ``save`` command, which would
save objects from memory into MATLAB ``.mat`` files:

.. code-block:: matlab
save(object, filename)
save('filename','variable1_name','variable2_name',...);
The benefit of this way of storing data is the possibility of storing multiple objects in a single ``.mat`` file.

Note that the method works for objects in memory so if you use it to save filebacked ``sqw`` objects you will probably obtain
unexpected results, as main part of filebacked ``sqw`` object is not loaded in memory.

.. warning::
**Saving filebacked objects using MATLAB ``save`` command is dangerous!**

One can say that filebacked objects can be created in two ways. First way -- "primary" filebacked objects build over existing sqw files or saved with permanent file name. (see below). These objects are backed by permanent ``.sqw`` files which stay on disk after objects are deleted from MATLAB session. Second way -- "secondary" filebacked objects obtained as the result of various operations with filebacked sqw objects, e.g. large cuts which do not fit memory -- (see :ref:`manual/Cutting_data_of_interest_from_SQW_files_and_objects:File- and memory-backed cuts` for more details about filebacked cuts), the result of unary or binary operations between "primary" filebacked ``sqw`` objects, results of filebacked ``PageOp`` algorithms, etc..
These objects are backed by temporary files with extension ``.tmp_XXXXXXX`` and temporary files get deleted when such objects go out of scope.


MATLAB ``save`` command saves part of filebacked object stored in memory. The operation is saving the reference to the file containing the pixels -- the name and path to the file the object is backed by. As temporary ``sqw`` (``.tmp_XXXXXXX``)
file backing the object gets deleted, ``.mat`` file for stored "secondary" type of filebacked object will contain incorrect reference to the missing file. ``.mat`` file for the "primary" filebacked object will contain reference to existing ``.sqw`` file, so can be restored back until the referred file exists and is not moved or renamed. Obviously, this way of saving filebacked ``sqw`` objects is also not very reliable.

The only reliable way of saving filebacked ``sqw`` object is to use Horace ``save`` command, which stores ``sqw`` object in binary Horace ``.sqw`` file format.
The command for this is:

.. code-block:: matlab
save(sqw_object, filename);
This method saves single object into Horace binary file with extension ``.sqw``, so if you have filebacked ``sqw`` object, the method would correctly
write this object. It will be possible to restore the object later by accessing appropriate ``.sqw`` file. If your filebacked object is backed by temporary file, the object will not be physically saved (long operation) as the major part of this object is already located in file. The file contents will be synchronized with the data in memory and temporary file will be renamed to the name, you have provided as the second input for the ``save`` command.

You, of course, can also use Horace ``save`` command to create Horace binary ``.sqw`` files from ``sqw/dnd`` objects in memory.

See :ref:`manual/Cutting_data_of_interest_from_SQW_files_and_objects:File- and memory-backed cuts` to read a bit more about filebacked and memory based cuts and :ref:`manual/Changing_Horace_settings:Horace Config` for the information on how to set up the size of memory based object.

Create filebacked objects from data on disk
===========================================

If your ``sqw`` file is big enough (see :ref:`mem_chunk_size and fb_scale_factor from "hor_config" class <manual/Changing_Horace_settings:Horace Config>` for numerical meaning of "big enough", the command:

.. code-block:: matlab
fb_obj = sqw('filename');
will create filebacked object ``fb_obj``. You can operate with filebacked object exactly as with memory based object, but many operations which involve operations with pixels will be slower. Alternatively, you may create filebacked object regardless of its size using command:

.. code-block:: matlab
fb_obj = read_sqw('filename','-filebacked');
Note, that this command invoked without `-filebacked` is equivalent to ``sqw('filename')`` and

.. code-block:: matlab
mb_obj = read_sqw('filename','-force_pix_location');
will try to load ``sqw`` object in memory regardless of its size on disk, so will fail if the object is to big to fit the memory.

The filebacked objects created this way, unlike filebacked objects created as the result of the operations with filebacked objects or large ``cut`` operations, are backed by permanent files which would not be deleted if the object in memory gets deleted.
38 changes: 28 additions & 10 deletions horace_core/configuration/@hor_config/hor_config.m
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,31 @@
% -----------
% mem_chunk_size - Maximum length of buffer array to accumulate pixels
% from an input file.
% fb_scale_factor - the product of fb_scale_factor and
% mem_chunk_size defines maximal number of pixels
% to put in memory. If number of pixels in sqw
% object exceeds this product sqw object by default
% becomes filebacked.
% ignore_nan - Ignore NaN data when making cuts
% ignore_inf - Ignore Inf data when making cuts.
% log_level - Set verbosity of informational output.
% use_mex - Use mex files for time-consuming operation, if available
% delete_tmp - Automatically delete temporary files after generating sqw files
% working_directory - The folder to write tmp files.
% --
% high_perf_config_info - helper/compatibility property to access high performance
% computing settings. Use hpc_config to modify hpc
% settings itself.
% hpc_config - helper/compatibility property to access high
% performance computing settings. Use "hpc_config"
% to modify hpc settings themself.
% parallel_config - helper/compartibulity property to access
% parallel computing settings. Use
% "parallel_config" to modify parallel computing
% settings themselves.
%
% force_mex_if_use_mex - Fail if mex can not be used. Used in mex files debugging
%--
% hpc_config - an interface, displaying high performance computing settings.
% Use hpc_config class directly to modify these
% settings.
% init_tests - Enable the unit test functions
% init_tests - Enable Horace specific unit test functions and
% access to Horace unit tests folders. Works for
% Horace downloaded as github repository only.
%
%
properties(Dependent)
Expand Down Expand Up @@ -97,8 +105,14 @@
% Here it provided for information only while changes to this
% property should be made through hpc_config class setters directly.
hpc_config;

% add unit test folders to search path (option for testing)
% the property exposes access to Horace parallel computing
% settings. Here it provided for information only while changes to
% this property should be made through parallel_config class
% setters directly.
parallel_config;
% Enable Horace unit test framework and add unit test folders to
% search path (option for Horace testing). Works for Horace
% retrieved from repository only.
init_tests;
end

Expand Down Expand Up @@ -232,8 +246,12 @@
end

function hpcc = get.hpc_config(~)
hpcc = hpc_config;
hpcc = hpc_config();
end
function parcc = get.parallel_config(~)
parcc = parallel_config();
end


%-----------------------------------------------------------------
% overloaded setters
Expand Down

0 comments on commit 2c89878

Please sign in to comment.