Skip to content

Conversation

@seashoo
Copy link

@seashoo seashoo commented May 5, 2025

Motivation

Hi! I'm Sahran Ashoor, an undergraduate research assistant working for the Uncertainty Quantification Lab at the University of Houston. I work under Dr. Ruda Zhang and Taiwo Adebiyi, both of whom having already spoken with Max Balandat regarding incorporating a rebase of botorch/sampling/pathwise (Largely written by James. T. Wilson). The changes included in this pull request are my best attempt at faithfully completing the change logs I was provided (product_kernel_diff.txt).

Have you read the Contributing Guidelines on pull requests?

Yes!

Project Overview

The primary goal was to make the original codebase by Wilson compatible with the latest BoTorch version. To achieve this, we used the original source codes and test suites, which initially revealed several incompatibility issues. Our main contribution involved carefully rebasing Wilson's code while preserving the logic for pathwise sampling. Importantly, all changes were confined to the botorch/sampling/pathwise directory to ensure a seamless integration, passing both local pathwise test suites and BoTorch's global test suites via GitHub workflows.

In terms of code logic, we relied on Wilson's unit tests for prior, updates, and posterior sampling, which we believe are sufficient to validate the correctness of the implementation. However, we welcome your feedback on this approach, and would appreciate any suggestions for additional tests or example scripts to further confirm the robustness of the changes. We are open to collaborating further on this effort.

Test Plan

(Write your test plan here. If you changed any code, please provide us with clear instructions on how you verified your changes work. Bonus points for screenshots and videos!)

The entirety of the testing suite was ran through pytest. Through additional verification we've found that the logic may be offset, but we're hoping to work with you all and further validate these changes under the discretion of Dr. Zhang. Expect further communications directly from my lab that will provide more insight into the rebase.

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/pytorch/botorch, and link to your PR here.)

N/A

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label May 5, 2025
@codecov
Copy link

codecov bot commented May 5, 2025

Codecov Report

❌ Patch coverage is 99.62217% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.93%. Comparing base (0290e3d) to head (ebe03af).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
botorch/utils/types.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main    #2838      +/-   ##
===========================================
- Coverage   100.00%   99.93%   -0.07%     
===========================================
  Files          216      219       +3     
  Lines        20211    20722     +511     
===========================================
+ Hits         20211    20709     +498     
- Misses           0       13      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Balandat
Copy link
Contributor

Balandat commented May 6, 2025

Thanks @seashoo for the PR - this is a big one! It'll take me a bit of time to review this in detail, I plan to do a first higher-level pass this week.

Through additional verification we've found that the logic may be offset

What exactly does this mean?

@TaiwoAdebiyi23
Copy link

Thanks @seashoo for the PR - this is a big one! It'll take me a bit of time to review this in detail, I plan to do a first higher-level pass this week.

Through additional verification we've found that the logic may be offset

What exactly does this mean?

Hi @Balandat,

Thanks for the response! We've included a more detailed Project Overview section in the pull request description to clarify our validation approach. Specifically, we utilized the existing unit test files, which cover prior, updates, and posterior sampling, and ensured that all tests passed as part of this rebase. While these tests are comprehensive, we welcome any additional guidance you might have on further validating the code's robustness.

@Balandat Balandat changed the title Rebase for botorch/sampling/pathwise - Dr. Ruda Zhang, Taiwo Adebiyi, Sahran Ashoor - University of Houson Refactor botorch/sampling/pathwise and add support for product kernels May 18, 2025
Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the main code in the PR in detail; overall this looks great, thanks for the effort and patching some gaps (e.g. _gaussian_update_ModelListGP). I have not reviewed the testing code in detail, but can do that after the next pass.

The key things to address are:

  1. Some additions in the patch file were not included here - curious to understand why (and if this was an oversight let's add them in - I pointed out which ones).
  2. Currently the tests still have some coverage gaps based on the codecov report here. Please add some test cases to also cover the currently uncovered lines.

Comment on lines +164 to +168
task_index = (
num_inputs + model._task_feature
if model._task_feature < 0
else model._task_feature
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to do this so we can always assume that it's positive and we don't have to do this custom handling. But I think this is ok for now as is, that change is beyond the scope of this PR.

Suggested change
task_index = (
num_inputs + model._task_feature
if model._task_feature < 0
else model._task_feature
)
# TODO: Changed `MultiTaskGP` to normalize the task feature in its constructor.
task_index = (
num_inputs + model._task_feature
if model._task_feature < 0
else model._task_feature
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion- I would be willing to come back to this in a separate PR!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved in #2908!

@seashoo
Copy link
Author

seashoo commented Jul 29, 2025

@Balandat, thank you for the comprehensive review and detailed feedback on Wilson's product_kernel.diff implementation! My apologies for the time it's taken to get back to you- some of the implementations took quite some time to fully realize and I've been balancing the work alongside my current internship.

I'm excited to collaborate with you at a faster pace now that I've freed up time! If you have questions regarding any of my specific implementations, feel free to ask in a reply to any of the comments here- I'll be able to communicate much more swiftly now. I've went ahead and resolved some of the upstream merge conflicts that appeared while I was away, and I've also filled up the code coverage gaps as you've asked.

Here's a quick summary of the major changes implemented to address your concerns:

Mathematical Issues Resolved

Product Kernel Implementation

Completely redesigned the product_feature_generator which was failing with relative errors around 0.75 vs tolerance ~0.094

  • Separated finite-dimensional from infinite-dimensional sub-kernels
  • Used Hadamard products for infinite-dimensional kernel combinations
  • Applied outer products to merge finite-dimensional with combined infinite-dimensional kernels
  • Automatically enabled cosine_only=True for multiple infinite-dimensional kernels to avoid problematic tensor products

Transform System Overhaul

Fixed scaling and coordination issues across the transform pipeline

  • SineCosineTransform: Added conditional logic instead of rigid constant scaling
  • Parameter Passing: Improved explicit num_ambient_inputs handling vs complex kwargs manipulation
  • Transform Chaining: Enhanced append_transform utility to properly handle None cases

Architectural Improvements

Feature Map Redesign

Built comprehensive feature map architecture

  • DirectSumFeatureMap: Rewrote raw_output_shape method to handle mixed dimensionality without MagicMock detection
  • SparseDirectSumFeatureMap: Implemented for completeness, available for manual use
  • HadamardProductFeatureMap/OuterProductFeatureMap: Enhanced for proper kernel composition

Code Organization

Restructured from monolithic utils.py into modular package

  • Split into dedicated helpers, mixins, and transforms modules
  • Improved dispatcher system with specific return types (MultitaskKernelFeatureMap, DirectSumFeatureMap, etc.)
  • Fixed sub_kernels parameter usage for LCM kernel compatibility

Code Quality Enhancements

Modern Python Standards

  • Adopted PEP 604 union syntax (A | B vs Union[A, B]) throughout pathwise directory
  • Replaced large comment blocks with Google-style docstrings
  • Cleaned up redundant imports and unused attributes

Type Safety

Enhanced type annotations and return type specificity for better IDE support

Technical details regarding the issues + approaches taken in response to your suggestions are further addressed in my replies!

@Balandat
Copy link
Contributor

Balandat commented Aug 8, 2025

Thanks for all the updates, @seashoo. Sorry for the delay on the review; I've been traveling for a conference and taking some time off. I should be able to provide a detailed review next week.

@facebook-github-bot
Copy link
Contributor

@hvarfner has imported this pull request. If you are a Meta employee, you can view this in D80169068.

Copy link
Contributor

@hvarfner hvarfner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seashoo for the PR! I added some comments and questions too.

num_random_features: int,
num_inputs: int | None = None,
random_feature_scale: float | None = None,
cosine_only: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of the ProductKernel dynamic with cosines here. @seashoo do you have a reference to share on it? It'd be much appreciated!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hvarfner, I believe @seashoo followed the direct implementation from the original patch provided by Balandat, likely implemented by Wilson to make computation tractable for product of infinite dimensional kernels. By enforcing cosine_only == True, we ensure that the feature map takes the element-wise product of the features. Otherwise, we would need to take the tensor product of each pair of sine and cosine features.

I don't think we have a specific reference that comments on this edge case, but as you already know, cosine_only specifies whether to use cosine features with a random phase, instead of paired sine and cosine features, as described in Rahimi & Recht (2007) on Random Fourier Features (RFF) and Sutherland (2015) on RFF error bounds. I'll let you know if I come across any further references that clarify this implementation.

self.input_transform = input_transform
self.output_transform = output_transform

def forward(self, x: Tensor, **kwargs: Any) -> Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forward looks really impressive, but I honestly cannot tell what is going on. Can you add some comments? Specifically L133 onward.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback- I’ve added in-line comments to DirectSumFeatureMap.forward (from the tiling logic through the final concatenation) to clarify:

  • Why I collect/scale individual feature blocks.
  • How the tiling & broadcasting works for lower-dimensional feature maps.
  • The rationale behind the rescaling.
  • How the multi_index trick avoids extra allocations.

Let me know if any sections still feel unclear or if you’d like similar commentary elsewhere.


@property
def raw_output_shape(self) -> Size:
if not self:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this occur?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not self only happens when the DirectSumFeatureMap is empty. That can be because you:

  • Purposely start with an empty container and plan to append feature-maps later, or
  • Deleted the last entry and the list is now length-zero.

We hit this in unit tests that build DirectSumFeatureMap([]) to make sure the class doesn’t crash in that edge case. Returning Size([]) just keeps the object in a sane, queryable state (so output_shape, batch_shape, etc. still work) until real feature maps are added. I’ve added a clarifying comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terrific, thanks!

return torch.concat(blocks, dim=-1)

@property
def raw_output_shape(self) -> Size:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm reading this correctly, raw_output_shape returns the largest broadcastable shape across all sub-kernels? How does it differ from torch.broadcast_shapes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_output_shape does two things in one go:

  1. For every sub-map it first figures out the shape you would get after tiling / broadcasting inside forward (e.g. when a 1-D feature map has to be expanded to align with a 2-D one).
  2. It then concatenates those shapes along the feature dimension, so the final size of the last axis is the sum of the sub-maps’ feature counts.

torch.broadcast_shapes by itself only answers “what common shape can all these tensors be viewed as without copying?”. It never alters the size of any dimension- especially not the last one.

In our case we need to grow the last dimension because we’re gluing feature vectors together; that’s why raw_output_shape can’t be expressed as a single torch.broadcast_shapes call.

block = block.to_dense() if isinstance(block, LinearOperator) else block
block = block if block.is_sparse else block.to_sparse()
else:
multi_index = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment or two would be really helpful here too!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to the sparse branch!

# containing data_covar_module * task_covar_module
from gpytorch.kernels import ProductKernel

if isinstance(model.covar_module, ProductKernel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MTGPs (including the fully Bayesian variant) now use a ProductKernel by definition, so this check and the conditional logic should be obsolete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if there are some edge cases where we don't adhere to the standard ProductKernel(IndexKernel, SomeOtherKernel) where both have the expected active_dims, I think it's best to raise an error!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nudge- both spots (prior_samplers.py and update_strategies.py) now follow the MTGP contract precisely. If a user hand-builds a ProductKernel and forgets active_dims on the data part, we add them so downstream helpers don’t error.

Comment on lines +164 to +168
task_index = (
num_inputs + model._task_feature
if model._task_feature < 0
else model._task_feature
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved in #2908!

# containing data_covar_module * task_covar_module
from gpytorch.kernels import ProductKernel

if isinstance(model.covar_module, ProductKernel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, this conditional logic should not be needed anymore!

INF_DIM_KERNELS: Tuple[Type[Kernel], ...] = (
kernels.MaternKernel,
kernels.RBFKernel,
kernels.MultitaskKernel,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is MultiTaskKernel necessarily in here? Shouldn't IndexKernel be here too in that case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INF_DIM_KERNELS is the “shortcut” list for kernels that are always treated as infinite-dimensional when deciding whether we need random Fourier features etc. RBFKernel and MaternKernel are on the list because their exact feature map is infinite. IndexKernel itself is finite-dimensional (exactly num_tasks features), so we deliberately keep it out of the list.

@seashoo
Copy link
Author

seashoo commented Oct 15, 2025

Hello! I just wanted to follow up on the progress of this PR. I have added a few responses to @hvarfner's comments regarding clarifications + adjusting our logic in respect to the MTGP's use of ProductKernal as a definition.

There's a small merge conflict listed on the PR, but merging with BoTorch's upstream is causing dependency issues with GPyTorch imports on my local. I will investigate this issue further and commit the working upstream in the coming days.

@Balandat
Copy link
Contributor

Thanks!

merging with BoTorch's upstream is causing dependency issues with GPyTorch imports on my local. I will investigate this issue further and commit the working upstream in the coming days.

@seashoo when you say dependency issues, do you mean in terms of installing the package or in terms of causing (e.g. import) failures? There has been a new gpytorch release just recently that you may need to update to: https://github.com/cornellius-gp/gpytorch/releases/tag/v1.14.2

Copy link
Contributor

@hvarfner hvarfner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes, and these look good in isolation - just some minor nits. I'll go through thing again, given that it was a while ago that we last looked at it.

import warnings
warnings.warn(
f"MultiTaskGP with non-ProductKernel detected ({type(model.covar_module)}). "
"Consider using ProductKernel(IndexKernel, SomeOtherKernel) for better compatibility.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the ordering matters for the implementation, but we reverse the order of the two: ProductKernel(SomeKernel, IndexKernel)

import warnings
warnings.warn(
f"MultiTaskGP with non-ProductKernel detected ({type(model.covar_module)}). "
"Consider using ProductKernel(IndexKernel, SomeOtherKernel) for better compatibility.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Can you change the order of the kernels in the error MSG to avoid confusion?

if data_kernel is None:
raise ValueError(
f"Could not identify data kernel from ProductKernel. "
"MTGPs should follow the standard ProductKernel(IndexKernel, SomeOtherKernel) pattern."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

Copy link
Contributor

@hvarfner hvarfner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, extraordinary work. Seems like the branch is not entirely in sync, given the changes in optim. Thanks a lot.

I think the get_train_inputs and get_train_targets methods and dispatching logic is helpful in general, and should probably be located models/utils/helpers.py. If it's not too much work, it would be nice to have it there instead.

return next(iter(dtypes)) if dtypes else None


class DirectSumFeatureMap(FeatureMap, ModuleListMixin[FeatureMap]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave as is!

self.output_transform = output_transform

def forward(self, x: Tensor) -> Tensor:
out = x @ self.weight.transpose(-2, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this initially. Why is the unsqueeze now needed?

def batch_shape(self) -> Size:
return self.kernel.batch_shape
def raw_output_shape(self) -> Size:
return self.kernel.raw_var.shape[-1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see the raw_var not always being on the IndexKernel, or have it be optional - we're looking into that ATM. Can you make it depend on the covar_matrix instead?

> output = (features @ self.weight.unsqueeze(-1)).squeeze(-1)
> ndim = len(self.feature_map.output_shape)
> if ndim > 1: # sum over the remaining feature dimensions
! output = einsum(f"...{ascii_letters[:ndim - 1]}->...", output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool way of doing the summing, but I had issues understanding what was going on. Please go with
output = output.sum(dim=list(range(-ndim + 1, 0)))

or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants