Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LSODE integrator thread-safe in OpenMP parallel loops by adding THREADPRIVATE statements #121

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

yantosca
Copy link
Contributor

Overview

This is the companion PR to #118. We have now made the LSODE integrator thread-safe by adding $OMP THREADPRIVATE statements to the following variables:

  1. RWORK in routine KppLsode (in int/lsode.f90);

    • This was also declared with SAVE to make it a global variable
  2. IWORK in routine KppLsode (in int/lsode.f90)

    • This was also declared with SAVE to make it a global variable
  3. All instances of common block /DLS001/ throughout int/lsode.f90

Validation

I used our fork of the KPP-Standalone, and modified the kpp_standalone.F90 driver program to include an OpenMP loop:

    !$OMP PARALLEL DO             &
    !$OMP DEFAULT( SHARED       ) &
    !$OMP PRIVATE( N, TIN, TOUT )
    DO N = 1, 10

       !$OMP CRITICAL
       print*, '### instance ', N
       !$OMP END CRITICAL

       ! Set ENV
       TIN          = T
       TOUT         = T + OperatorTimestep

       ! Set initial concentrations (C) and reacton rates (RCONST)
       ! to values read from the input file
       C            = Cinit
       RCONST       = R

       ! Integrate the mechanism for an operator timestep
       CALL Integrate( TIN, TOUT, ICNTRL, RCNTRL, ISTATUS, RSTATE, IERR )

    ENDDO
    !$OMP END PARALLEL DO

This will perform an integration on the same initial conditions 10 times. The C array is overwritten by each integration. This is sufficient to test if there is a parallelization error.

When compiled with OpenMP parallelization turned off, the integrations are done in order from 1..10 and we get this screen output:

$ ./kpp_standalone.exe Beijing_L1_20190701_0040.txt ref.log
 Processing sample: Beijing_L1_20190701_0040.txt
 Output file: ref.log
 ### instance            1
 ### instance            2
 ### instance            3
 ### instance            4
 ### instance            5
 ### instance            6
 ### instance            7
 ### instance            8
 ### instance            9
 ### instance           10
 Number of internal timesteps (from 3D run):    12
 Number of internal timesteps ( standalone):    92
 Hexit (from 3D run):     497.80
 Hexit ( standalone):      80.40
Warning: Number of internal steps do not match 3D grid cell
Warning: final timestep does not match 3D grid cell within 0.1%

final concentrations are written to the ref.log file. The difference in Hexit is because the initial conditions came from a run using the Rosenbrock integrator.

Then I recompiled the executable with -fopenmp , and we get this screen output:

$ ./kpp_standalone.exe Beijing_L1_20190701_0040.txt devmp.log
 Processing sample: Beijing_L1_20190701_0040.txt
 Output file: devmp.log
 ### instance           10
 ### instance            5
 ### instance            6
 ### instance            2
 ### instance            4
 ### instance            3
 ### instance            9
 ### instance            7
 ### instance            1
 ### instance            8
 Number of internal timesteps (from 3D run):    12
 Number of internal timesteps ( standalone):    92
 Hexit (from 3D run):     497.80
 Hexit ( standalone):      80.40
Warning: Number of internal steps do not match 3D grid cell
Warning: final timestep does not match 3D grid cell within 0.1%

Here you can see the instances are being done in non-sequential order, as the parallel loop is active. We get identical Hexit as the run with parallelization turned off. Also, the output in devmp.log was identical to ref.log as determined by a diff command.

int/lsode.f90
- Now declare work arrays RWORK and IWORK as SAVEd and THREADPRIVATE
  in order to protect these variables from being overwritten by calls
  to LSODE on other threads in an OpenMP parallel loop.
- Add THREADPRIVATE statements to declare common block /DLS001/
  as private in an OpenMP parallel loop;.  Likewise, this will
  prevent the common block variables from being overwritten by
  other threads.

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca added integrators Related to numerical integrators bugfix Fixes a bug or a technical issue labels Dec 20, 2024
@yantosca yantosca added this to the 3.2.0 milestone Dec 20, 2024
@yantosca yantosca self-assigned this Dec 20, 2024
Copy link
Contributor

@RolfSander RolfSander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking me to review this. However, I don't have much
experience with OpenMP and "thread-safe" coding, so I will not be able
to contribute much here.

@yantosca yantosca linked an issue Dec 20, 2024 that may be closed by this pull request
@yantosca
Copy link
Contributor Author

I spoke too soon. Compiling with GNU 12.2.0 resulted in this error:

[ 68%] Building Fortran object src/GEOS-Chem/KPP/fullchem/CMakeFiles/KPP.dir/gckpp_Integrator.F90.o
/n/netscratch/jacob_lab/Lab/ryantosca/tests/kpptests/lsode/test_lsode/CodeDir/src/GEOS-Chem/KPP/fullchem/gckpp_Integrator.F90:254:27:

  254 |       INTEGER :: IWORK(LIW), IPAR(1), ITOL, ITASK,         &
      |                           ^
Error: ‘iwork’ causes a section type conflict with ‘rwork’
/n/netscratch/jacob_lab/Lab/ryantosca/tests/kpptests/lsode/test_lsode/CodeDir/src/GEOS-Chem/KPP/fullchem/gckpp_Integrator.F90:253:33:

  253 |       REAL(kind=dp) :: RWORK(LRW), RPAR(1)
      |                                 ^
note: ‘rwork’ was declared here

I will look into this further.

Copy link
Member

@jimmielin jimmielin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yantosca for these changes (and the helpful comments)! I am not sure about the section type conflict error or what could be causing it, so unfortunately I cannot help on that.

int/lsode.f90
- Added derived type declaration DLS001
- Added global variable CMN of type DLS001, which is threadprivate

Work to be continued in subsequent commits

Signed-off-by: Bob Yantosca <[email protected]>
int/lsode.f90
- Added more common variables to the CMN object, in order to hold
  these THREADPRIVATE for the OpenMP loop

Signed-off-by: Bob Yantosca <[email protected]>
int/lsode.f90
- Removed unused SDIRK coefficients
- Moved parameters LIW, LRW out of KppLsode to top-of-module,
  thus making these global parameters.
- Added explanatory comments about the threadprivate CMN object
  replacing common block /DLS001/
- Bug fix: removed DLS from arg list to DLSODE routine
- Restored variable declarations that were inadvertently removed
  during the removal of common block /DLS001/
- Trimmed trailing whitespace

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
int/lsode.f90
- The INIT, MXSTEP, MXHNIL, NHNIL, NSLAST, NYH variables were in
  common block /DLS001/ but were not added to the derived type
  DLS001 and object CMN.  This is now fixed.
- Added routine Zero_CMN_Fields to zero the fields of the CMN
  object before calling DLSODE
- Updated comments
- Trimmed trailing whitespace

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca
Copy link
Contributor Author

yantosca commented Jan 2, 2025

@jimmielin: I wasn't able to make much headway with the section-type error. I decided to replace the COMMON /DLS001/ block with a derived-type object that is held !$OMP THREADPRIVATE. Still shaking down some more issues but at least this approach doesn't incur the section-type error. Will request a re-review once I get it working.

@yantosca
Copy link
Contributor Author

yantosca commented Jan 7, 2025

@jimmielin @RolfSander: I have been coming up against a wall trying to resolve the Gfortran "section type conflict" error. I've tried removing the COMMON /DLS001/ block and moving those variables into internal modules to make them threadprivate but that hasn't solved it.

I think the problem might be that the RWORK array is too large to fit in stack memory (where THREADPRIVATE arrays must go). LSODE uses a pretty large dimension for RWORK:

KPP/int/lsode.f90

Lines 217 to 221 in cd03fe0

INTEGER, PARAMETER :: LRW = 25 + 9*NVAR+2*NVAR*NVAR, &
LIW = 32 + NVAR
KPP_REAL :: RWORK(LRW), RPAR(1)
INTEGER :: IWORK(LIW), IPAR(1), ITOL, ITASK, &
IERR, IOPT, MF

For the GEOS-Chem mechanism NVAR is ~320 so LRW would be ~207k elements, which would be 1.656 GB.

I may need to put aside this PR for the time being. @jimmielin, do you have any other ideas I could try?

@yantosca
Copy link
Contributor Author

yantosca commented Jan 7, 2025

@jimmielin: I should say that LSODE fails when I run it in GEOS-Chem Classic but not in the standalone box model. I think that is due to memory. I can try to replicate the issue in the box model by increasing the number of iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a bug or a technical issue integrators Related to numerical integrators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG/ISSUE] LSODE integrator is not thread-safe
3 participants