Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File size reduction #9

Draft
wants to merge 1 commit into
base: develop-for-jedi
Choose a base branch
from

Conversation

junmeiban
Copy link
Collaborator

Currently, we use the restart file for MPAS-JEDI cycling.  While we move to higher-resolution cycling experiments, we face the challenge of both memory usage and disk storage. This PR is mainly to figure out the disk storage issue.

The idea is: instead of using restart files, we move to use the file (init.nc type background/analysis file) that keeps only the necessary fields.  Also split out static fields into a separate file. In MPAS v6, the capability already exists (Soyoung and Bill: skamaroc@096b5d3). We adapt it in V7 to suit our needs.

Technical details:

  • static stream: Includes the mesh, some of sfc_input variables(landmask, shdmin, albedo12m, etc) and parameters for gravity wave drag over orography.

  • da_state stream: Fields are specified in the MPAS-Atmosphere Registry. Includes fields defined in Jake’s doc maps-jedi-da.pdf (section3b) + Soil moisture, soil temperature, etc(these fields are not needed in model initialization but will be used in CRTM)

  • For cold start, both the static stream file and the input stream file should be set to the “init.nc” file produced by the init_atmosphere core;

  • For cycling run (FC), the input stream file should be the new da_state stream file that was previously written by the model (and modified/updated in the DA cycle. (Both static and da_state stream need to be specified in the streams.atmosphere file)

  • For cycling run (DA): use static.nc to read in mesh fields(specified in the streams.atmosphere file) and set config_do_restart = false

  • How to use this capability:
    scripts: /glade/p/mmm/parc/jban/test/pandac/work/JB_fileSizeReduction
    results: /glade/scratch/jban/pandac/JB_fileSizeReduction

  • After implementing the capability, the size of the 120-km init.nc type background/analysis file is about 430M; ~50 variables (the size of original restart file is about 2GB). Memory usage also decreased when we separated the static stream:

memory_usage

Comment on lines +506 to +524
<var_array name="scalars"/>
<var name="initial_time"/>
<var name="xtime"/>
<var name="cldfrac"/>
<var name="re_cloud" packages="mp_thompson_in;mp_wsm6_in"/>
<var name="re_ice" packages="mp_thompson_in;mp_wsm6_in"/>
<var name="re_snow" packages="mp_thompson_in;mp_wsm6_in"/>
<var name="u"/>
<var name="w"/>
<var name="pressure_p"/>
<var name="pressure_base"/>
<var name="rho"/>
<var name="rho_base"/>
<var name="theta"/>
<var name="theta_base"/>
<var name="relhum"/>
<var name="uReconstructZonal"/>
<var name="uReconstructMeridional"/>
<var name="surface_pressure"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is one column misalignment, which leads to spurious diffs. Need to re-align those lines.

Comment on lines +92 to +98
call MPAS_stream_mgr_read(domain % streamManager, streamID='static', whence=MPAS_STREAM_NEAREST, ierr=ierr)
if (ierr /= MPAS_STREAM_MGR_NOERR) then
call mpas_log_write('********************************************************************************', messageType=MPAS_LOG_ERR)
call mpas_log_write('Error reading static fields', messageType=MPAS_LOG_ERR)
call mpas_log_write('********************************************************************************', messageType=MPAS_LOG_CRIT)
end if
call MPAS_stream_mgr_reset_alarms(domain % streamManager, streamID='static', direction=MPAS_STREAM_INPUT, ierr=ierr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this static stream read optional for now (e.g., through an extra namelist parameter 'separate_init_stream' or something more meaningful?), single file restart/init initialization capability should remain unchanged. It may be more acceptable for the model group if we want to commit this change to the official MPAS model repo.

Comment on lines -309 to +307
call mpas_pool_get_config(configs, 'config_do_restart', config_do_restart)

if (.not. associated(config_do_restart)) then
call mpas_log_write('config_do_restart was not found when defining mesh stream.', messageType=MPAS_LOG_ERR)
ierr = 1
else if (config_do_restart) then
write(stream,'(a)') 'restart'
else
write(stream,'(a)') 'input'
end if
write(stream,'(a)') 'static'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 'optional' comment as above.

@jjguerrette
Copy link
Owner

What is the status of this work? @liujake, I saw in the weekly meeting notes that you have another branch that looks to include these changes.

@junmeiban
Copy link
Collaborator Author

What is the status of this work? @liujake, I saw in the weekly meeting notes that you have another branch that looks to include these changes.

Hi @jjguerrette, I will update the codes soon according to Jake's suggetions. If you want to use small size files to run your experiments, you can use Jake’s branch now.

@jjguerrette
Copy link
Owner

Thank you for the info @junmeiban. I will wait until we merge these changes in before using the small file sizes. It sounds like that should not take long.

@liujake
Copy link
Collaborator

liujake commented Aug 25, 2020

Optional restart and mpasout workflow with one single code is low priority now give that we have other more pressing work to do. For now, just use JJ's model branch for restart cycling and my branch for mpasout cycling (already tested with cycling experiments at both 120-km and 30-km).

@liujake
Copy link
Collaborator

liujake commented Aug 25, 2020

@jjguerrette As you will run/save ensemble analysies/forecasts for EDA, large restart file size for those ensembles could be a burden for you when running more cycles. Disk space may not be a problem for you yet. This two streams workflow should be beneficial to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants