Skip to content

Fixed invalid JSON format from the logs #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Fdms-3741
Copy link

@Fdms-3741 Fdms-3741 commented Aug 11, 2025

The previous method saved data in an invalid JSON format that made it incompatible with any JSON parser. This modification proposes a change in the save and load functionality that allows it to write and read valid JSON.

Warning: This will break imports of previously saved data. The data itself will not be altered; thus, manual modification is required to turn the old log files into valid JSON files.

Summary by CodeRabbit

  • New Features

    • Added a utility to load observations from one or multiple JSON log files into an existing optimizer. Supports file paths and iterables, skips duplicates when configured, respects constraints, and returns the updated optimizer.
  • Documentation

    • Updated logging function documentation with clearer formatting.

Copy link

coderabbitai bot commented Aug 11, 2025

Walkthrough

Updates include a docstring edit in bayes_opt/logger.py and the addition of a new utility function load_logs in bayes_opt/util.py to import observations from JSON log files into a BayesianOptimization instance.

Changes

Cohort / File(s) Summary
Documentation update (logger)
bayes_opt/logger.py
Expanded log_optimization_step docstring with a JSON read/append/write snippet and minor type-annotation formatting; no executable code changes.
Log loading utility (util)
bayes_opt/util.py
Added load_logs to read one or more JSON log files, handle JSON errors, de-duplicate based on optimizer settings, and register observations (including constraint values when applicable). Included supporting imports.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Util as util.load_logs
  participant FS as FileSystem
  participant JSON as json
  participant BO as BayesianOptimization

  User->>Util: load_logs(optimizer, logs)
  loop For each path in logs
    Util->>FS: open(path)
    FS-->>Util: file handle
    Util->>JSON: json.load(file)
    alt JSON decode error
      JSON-->>Util: error
      Util-->>Util: skip file
    else OK
      loop For each iteration in data
        Util->>BO: check duplicate (params)
        alt duplicate and not allowed
          BO-->>Util: exists -> skip
        else not duplicate
          Util->>BO: register(params, target, constraint_value?)
        end
      end
    end
  end
  Util-->>User: return optimizer
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A carrot of logs, a nibble of bytes,
I hop through JSON on moonlit nights.
New trails of params, targets in tow,
I stash them neatly, row by row.
With gentle thumps, I mark the way—
No duplicates here, just data at play. 🥕✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
bayes_opt/util.py (1)

18-32: Clarify and correct the docstring (class name, accepted types, and return type)

The docstring is incomplete and references “BayesianOptimizer”. Tighten it up and describe Iterable inputs.

-    """Load previous ...
+    """Load observations from one or more JSON log files into an optimizer.
@@
-    optimizer : BayesianOptimizer
-        Optimizer the register the previous observations with.
+    optimizer : BayesianOptimization
+        Optimizer to register the previous observations with.
@@
-    logs : str or os.PathLike
-        File to load the logs from.
+    logs : str | os.PathLike | Iterable[str | os.PathLike]
+        Path or iterable of paths to JSON log files to load.
@@
-    The optimizer with the state loaded.
+    BayesianOptimization
+        The optimizer after loading observations.
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14786ff and 1ed36e9.

📒 Files selected for processing (2)
  • bayes_opt/logger.py (1 hunks)
  • bayes_opt/util.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
bayes_opt/util.py (2)
bayes_opt/bayesian_optimization.py (2)
  • BayesianOptimization (42-445)
  • register (178-208)
bayes_opt/target_space.py (3)
  • register (424-518)
  • params (149-156)
  • target (159-166)
🪛 Ruff (0.12.2)
bayes_opt/util.py

16-16: Undefined name BayesianOptimization

(F821)


17-17: Undefined name BayesianOptimization

(F821)

Comment on lines +274 to +285
# Read current data
with self._path.open("r") as f:
fileData = json.load(f)

# Append next data point
fileData.append(data)

# Writes content back to a file
with self._path.open("w") as f:
json.dumps(fileData)

params_config: Mapping[str, ParamsType]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove stray JSON I/O pseudo-code from the docstring; it’s misleading and incorrect

This block references undefined identifiers (self._path, data), and json.dumps() is called without writing to the file. It also interrupts the Parameters section. If you want to document JSON logging, put it under the appropriate file-logger class, not ScreenLogger.

Apply this diff to drop the block:

-            # Read current data
-            with self._path.open("r") as f:
-                fileData = json.load(f)
-
-            # Append next data point
-            fileData.append(data)
-
-            # Writes content back to a file
-            with self._path.open("w") as f:
-                json.dumps(fileData)
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Read current data
with self._path.open("r") as f:
fileData = json.load(f)
# Append next data point
fileData.append(data)
# Writes content back to a file
with self._path.open("w") as f:
json.dumps(fileData)
params_config: Mapping[str, ParamsType]
params_config: Mapping[str, ParamsType]
🤖 Prompt for AI Agents
In bayes_opt/logger.py around lines 274 to 285, remove the stray JSON I/O
pseudo-code block (references to self._path, data, and the incorrect
json.dumps() call) from the ScreenLogger docstring so it no longer interrupts
the Parameters section; if JSON logging needs documenting, add a concise,
correct example under the dedicated file-logger class or module docs and, when
showing real code, use json.dump(fileData, f) to write to the file.

Comment on lines +5 to +13
import json
from collections.abc import Iterable
from os import PathLike
from pathlib import Path

import numpy as np

# from bayes_opt.bayesian_optimization import BayesianOptimization

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined type reference and imports; address Ruff F821 and future-proof typing

Ruff flags BayesianOptimization as undefined. Import it under TYPE_CHECKING to avoid runtime deps/cycles. Also import NotUniqueError for duplicate handling and warnings for diagnostics.

+import warnings
 import json
 from collections.abc import Iterable
 from os import PathLike
 from pathlib import Path
 
 import numpy as np
-# from bayes_opt.bayesian_optimization import BayesianOptimization
+from typing import TYPE_CHECKING
+from bayes_opt.target_space import NotUniqueError
+
+if TYPE_CHECKING:
+    from bayes_opt.bayesian_optimization import BayesianOptimization
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import json
from collections.abc import Iterable
from os import PathLike
from pathlib import Path
import numpy as np
# from bayes_opt.bayesian_optimization import BayesianOptimization
import warnings
import json
from collections.abc import Iterable
from os import PathLike
from pathlib import Path
import numpy as np
from typing import TYPE_CHECKING
from bayes_opt.target_space import NotUniqueError
if TYPE_CHECKING:
from bayes_opt.bayesian_optimization import BayesianOptimization
🤖 Prompt for AI Agents
In bayes_opt/util.py around lines 5 to 13, Ruff reports an undefined reference
to BayesianOptimization and you also need NotUniqueError and warnings for
duplicate handling/diagnostics; add "from typing import TYPE_CHECKING" and move
the BayesianOptimization import into an if TYPE_CHECKING: block so it’s only
used for type hints (avoids runtime deps/cycles), and add imports for
NotUniqueError (e.g., from sqlalchemy.exc import NotUniqueError) and warnings at
the top of the file so duplicate errors and warnings can be handled.

Comment on lines +33 to +55
if isinstance(logs, (str, PathLike)):
logs = [logs]

for log in logs:
try:
with Path(log).open("r") as fil:
fileData = json.load(fil)
except json.JSONDecodeError:
print(f"ERROR: JSON decode error when decoding '{log}'")
continue

for iteration in fileData:
# Prevents duplicate points being registered when an exception can be raised
if not optimizer._allow_duplicate_points and iteration["params"] in optimizer:
continue

optimizer.register(
params=iteration["params"],
target=iteration["target"],
constraint_value=(iteration["constraint"] if optimizer.is_constrained else None),
)

return optimizer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicate detection and error handling; avoid invalid membership test and improve robustness

  • iteration["params"] in optimizer is unsafe unless BayesianOptimization implements contains. Rely on TargetSpace.register to detect duplicates and catch NotUniqueError.
  • Handle missing keys and constraint values gracefully.
  • Prefer warnings over prints; handle file I/O errors and non-list JSON roots.
  • Use snake_case variable names and explicit UTF-8.
     if isinstance(logs, (str, PathLike)):
         logs = [logs]
 
     for log in logs:
-        try:
-            with Path(log).open("r") as fil:
-                fileData = json.load(fil)
-        except json.JSONDecodeError:
-            print(f"ERROR: JSON decode error when decoding '{log}'")
-            continue
-
-        for iteration in fileData:
-            # Prevents duplicate points being registered when an exception can be raised
-            if not optimizer._allow_duplicate_points and iteration["params"] in optimizer:
-                continue
-
-            optimizer.register(
-                params=iteration["params"],
-                target=iteration["target"],
-                constraint_value=(iteration["constraint"] if optimizer.is_constrained else None),
-            )
+        path = Path(log)
+        try:
+            with path.open("r", encoding="utf-8") as fil:
+                file_data = json.load(fil)
+        except json.JSONDecodeError:
+            warnings.warn(f"Skipping '{path}': invalid JSON", RuntimeWarning)
+            continue
+        except OSError as e:
+            warnings.warn(f"Skipping '{path}': {e}", RuntimeWarning)
+            continue
+
+        if not isinstance(file_data, list):
+            warnings.warn(f"Skipping '{path}': expected a JSON array of iterations", RuntimeWarning)
+            continue
+
+        for iteration in file_data:
+            try:
+                params = iteration["params"]
+                target = iteration["target"]
+                constraint_value = iteration.get("constraint") if optimizer.is_constrained else None
+                optimizer.register(params=params, target=target, constraint_value=constraint_value)
+            except KeyError as e:
+                warnings.warn(f"Skipping malformed iteration in '{path}': missing key {e}", RuntimeWarning)
+                continue
+            except NotUniqueError:
+                # Duplicate point; skip when duplicates are not allowed
+                continue
+            except ValueError as e:
+                # e.g., missing constraint value when constrained
+                warnings.warn(f"Skipping iteration from '{path}': {e}", RuntimeWarning)
+                continue
 
     return optimizer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isinstance(logs, (str, PathLike)):
logs = [logs]
for log in logs:
try:
with Path(log).open("r") as fil:
fileData = json.load(fil)
except json.JSONDecodeError:
print(f"ERROR: JSON decode error when decoding '{log}'")
continue
for iteration in fileData:
# Prevents duplicate points being registered when an exception can be raised
if not optimizer._allow_duplicate_points and iteration["params"] in optimizer:
continue
optimizer.register(
params=iteration["params"],
target=iteration["target"],
constraint_value=(iteration["constraint"] if optimizer.is_constrained else None),
)
return optimizer
if isinstance(logs, (str, PathLike)):
logs = [logs]
for log in logs:
path = Path(log)
try:
with path.open("r", encoding="utf-8") as fil:
file_data = json.load(fil)
except json.JSONDecodeError:
warnings.warn(f"Skipping '{path}': invalid JSON", RuntimeWarning)
continue
except OSError as e:
warnings.warn(f"Skipping '{path}': {e}", RuntimeWarning)
continue
if not isinstance(file_data, list):
warnings.warn(f"Skipping '{path}': expected a JSON array of iterations", RuntimeWarning)
continue
for iteration in file_data:
try:
params = iteration["params"]
target = iteration["target"]
constraint_value = iteration.get("constraint") if optimizer.is_constrained else None
optimizer.register(
params=params,
target=target,
constraint_value=constraint_value,
)
except KeyError as e:
warnings.warn(f"Skipping malformed iteration in '{path}': missing key {e}", RuntimeWarning)
continue
except NotUniqueError:
# Duplicate point; skip when duplicates are not allowed
continue
except ValueError as e:
warnings.warn(f"Skipping iteration from '{path}': {e}", RuntimeWarning)
continue
return optimizer

Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 26.31579% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.68%. Comparing base (14786ff) to head (1ed36e9).

Files with missing lines Patch % Lines
bayes_opt/util.py 26.31% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #588      +/-   ##
==========================================
- Coverage   97.80%   96.68%   -1.13%     
==========================================
  Files          10       10              
  Lines        1186     1205      +19     
==========================================
+ Hits         1160     1165       +5     
- Misses         26       40      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@till-m
Copy link
Member

till-m commented Aug 12, 2025

Hey @Fdms-3741,

thanks for opening a PR, contributions are very welcome in this repository.

That being said, I'm a bit confused. Is this LLM-generated code? I'm reasonably sure that the state-saving produces valid JSON, however, even if not, this would not be the right way to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants