Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 22, 2025

This PR implements efficient trajectory frame reading for LAMMPS dump files, allowing users to read only specific frames instead of loading entire trajectories. This addresses the performance issue where workflows only need a subset of frames but must load and process complete trajectories.

Key Features

1. Selective Frame Reading

Added f_idx parameter to dpdata.System() for loading only specified frames:

# Load only frames 23, 56, and 78 from a trajectory
system = dpdata.System(
    'trajectory.dump',
    fmt='lammps/dump',
    type_map=['O', 'H'],
    f_idx=[23, 56, 78]
)

2. Multi-Trajectory Pattern

Implemented the exact frames_dict pattern requested in the issue:

import dpdata.lammps.dump as dump

frames_dict = {
    'trajectory1.dump': [23, 56, 78],
    'trajectory2.dump': [22],
}

# Efficiently load specified frames from multiple trajectories
data = dump.load_frames_from_trajectories(frames_dict, type_map=['O', 'H'])

3. Efficient Block-Based Reading

The implementation uses itertools.zip_longest(*[f] * nlines) to read frames in blocks and skip unwanted frames, as suggested in the issue. This provides significant performance improvements for large trajectories when only a few frames are needed.

Technical Implementation

  • Frame Structure Detection: get_frame_nlines() automatically determines the number of lines per frame
  • Efficient Reading: read_frames() uses block-based reading to skip unwanted frames entirely
  • Enhanced API: Extended load_file() to support both traditional begin/step and new f_idx parameters
  • Seamless Integration: Works with existing system_data() pipeline and dpdata workflow

Performance Benefits

  • Memory Efficiency: Only loads requested frames into memory
  • I/O Efficiency: Skips unwanted frames during file reading without processing
  • Processing Efficiency: No need to load everything then filter

Backward Compatibility

The implementation maintains complete backward compatibility:

  • Existing code using begin and step parameters continues to work unchanged
  • All existing tests pass without modification
  • The f_idx parameter is optional and defaults to None

Testing

Added comprehensive test suite with 22 test cases covering:

  • Basic functionality and API usage
  • Edge cases (empty lists, out-of-range indices, duplicates)
  • Performance validation and correctness verification
  • Backward compatibility with existing workflows

Fixes #367.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@Copilot Copilot AI changed the title [WIP] A more efficient way of reading MD trajectory Implement efficient trajectory frame reading for LAMMPS dump files Aug 22, 2025
@Copilot Copilot AI requested a review from njzjz August 22, 2025 15:30
Copilot finished work on behalf of njzjz August 22, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A more efficient way of reading MD trajectory
2 participants