-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add convertion for small type_map in deepmd system #676
Conversation
WalkthroughWalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant sort_atom_names
participant TypeMapHandler
participant DataUpdater
User->>sort_atom_names: Call function with data and type_map
sort_atom_names->>TypeMapHandler: Determine which types to delete based on atom numbers
TypeMapHandler-->>sort_atom_names: Return updated types
sort_atom_names->>DataUpdater: Update atom names and types
DataUpdater-->>sort_atom_names: Return updated atom names and types
sort_atom_names-->>User: Return sorted atom names and updated types
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Additional context usedRuff
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
CodSpeed Performance ReportMerging #676 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (2)
dpdata/utils.py (2)
Line range hint
6-6
: Remove outdated version block.The version block checking for Python 3.8 is outdated and should be removed, as the minimum supported Python version is now higher.
- if sys.version_info >= (3, 8): - from typing import Literal - else: - from typing_extensions import Literal + from typing import LiteralThis change removes unnecessary code and simplifies the import statements.
Line range hint
125-125
: Rename unused loop control variable.In the
uniq_atom_names
function, the loop control variableidx
is not used and should be renamed to_
to indicate its purpose.- for idx, ii in enumerate(data["atom_names"]): + for _, ii in enumerate(data["atom_names"]):This change makes it clear that the loop index is not used within the loop body.
if not set(data["atom_names"]).issubset(set(type_map)): | ||
# delete the types where atom_numbs == 0 | ||
real_index = np.array(data["atom_numbs"]) > 0 | ||
real_map = list(np.array(data["atom_names"])[real_index]) | ||
assert set(real_map).issubset(set(type_map)) | ||
data["atom_numbs"] = list(np.array(data["atom_numbs"])[real_index]) | ||
idx = np.zeros(len(data["atom_names"]), dtype=int) | ||
idx[real_index] = np.arange(len(real_map)) | ||
data["atom_names"] = real_map | ||
data["atom_types"] = idx[data["atom_types"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the type filtering logic for clarity and performance.
The logic to filter out atom types where atom_numbs
equals zero is complex and may be hard to follow. Consider refactoring for better clarity and potentially improved performance.
- real_index = np.array(data["atom_numbs"]) > 0
- real_map = list(np.array(data["atom_names"])[real_index])
- assert set(real_map).issubset(set(type_map))
- data["atom_numbs"] = list(np.array(data["atom_numbs"])[real_index])
- idx = np.zeros(len(data["atom_names"]), dtype=int)
- idx[real_index] = np.arange(len(real_map))
- data["atom_names"] = real_map
- data["atom_types"] = idx[data["atom_types"]]
+ filtered_atoms = [(name, num) for name, num in zip(data["atom_names"], data["atom_numbs"]) if num > 0]
+ assert all(name in type_map for name, _ in filtered_atoms)
+ data["atom_names"], data["atom_numbs"] = zip(*filtered_atoms) if filtered_atoms else ([], [])
+ data["atom_types"] = np.arange(len(data["atom_names"]))[data["atom_types"]]
This refactoring uses list comprehension for filtering and asserts, making the code more Pythonic and potentially faster by reducing the need for multiple numpy array conversions.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if not set(data["atom_names"]).issubset(set(type_map)): | |
# delete the types where atom_numbs == 0 | |
real_index = np.array(data["atom_numbs"]) > 0 | |
real_map = list(np.array(data["atom_names"])[real_index]) | |
assert set(real_map).issubset(set(type_map)) | |
data["atom_numbs"] = list(np.array(data["atom_numbs"])[real_index]) | |
idx = np.zeros(len(data["atom_names"]), dtype=int) | |
idx[real_index] = np.arange(len(real_map)) | |
data["atom_names"] = real_map | |
data["atom_types"] = idx[data["atom_types"]] | |
if not set(data["atom_names"]).issubset(set(type_map)): | |
# delete the types where atom_numbs == 0 | |
filtered_atoms = [(name, num) for name, num in zip(data["atom_names"], data["atom_numbs"]) if num > 0] | |
assert all(name in type_map for name, _ in filtered_atoms) | |
data["atom_names"], data["atom_numbs"] = zip(*filtered_atoms) if filtered_atoms else ([], []) | |
data["atom_types"] = np.arange(len(data["atom_names"]))[data["atom_types"]] |
Optimize the handling of new atoms.
The current method for handling new atoms is efficient but could be further optimized by directly updating data
within the add_atom_names
function, thus avoiding potential issues with data consistency.
- new_atoms = [e for e in type_map if e not in data["atom_names"]]
- if new_atoms:
- data = add_atom_names(data, new_atoms)
+ data = add_atom_names(data, [e for e in type_map if e not in data["atom_names"]])
This change ensures that data
is always updated regardless of whether new atoms are added, simplifying the control flow and potentially reducing the risk of bugs.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if not set(data["atom_names"]).issubset(set(type_map)): | |
# delete the types where atom_numbs == 0 | |
real_index = np.array(data["atom_numbs"]) > 0 | |
real_map = list(np.array(data["atom_names"])[real_index]) | |
assert set(real_map).issubset(set(type_map)) | |
data["atom_numbs"] = list(np.array(data["atom_numbs"])[real_index]) | |
idx = np.zeros(len(data["atom_names"]), dtype=int) | |
idx[real_index] = np.arange(len(real_map)) | |
data["atom_names"] = real_map | |
data["atom_types"] = idx[data["atom_types"]] | |
if not set(data["atom_names"]).issubset(set(type_map)): | |
# delete the types where atom_numbs == 0 | |
real_index = np.array(data["atom_numbs"]) > 0 | |
real_map = list(np.array(data["atom_names"])[real_index]) | |
assert set(real_map).issubset(set(type_map)) | |
data["atom_numbs"] = list(np.array(data["atom_numbs"])[real_index]) | |
idx = np.zeros(len(data["atom_names"]), dtype=int) | |
idx[real_index] = np.arange(len(real_map)) | |
data["atom_names"] = real_map | |
data["atom_types"] = idx[data["atom_types"]] | |
data = add_atom_names(data, [e for e in type_map if e not in data["atom_names"]]) |
Summary by CodeRabbit