Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement load_and_transform_depth_data #134

Open
OlafBraakman opened this issue Feb 4, 2025 · 0 comments
Open

Implement load_and_transform_depth_data #134

OlafBraakman opened this issue Feb 4, 2025 · 0 comments

Comments

@OlafBraakman
Copy link

OlafBraakman commented Feb 4, 2025

Issues: #122 #14 #69 #121 report the fact that the load_and_transform_depth function is not implemented

I am raising this issue to implement the following data preprocessing steps in a PR, as it yield the reported 35% zero-shot classification for SUN-RGBD depth-only.

Important details for the scene classification task for SUNRGBD:

Scene subset:
The classification task only considers the following classes:
SCENES = ['bathroom', 'bedroom', 'classroom', 'computer_room', 'conference_room', 'corridor', 'dining_area', 'dining_room', 'discussion_area', 'furniture_store', 'home_office', 'kitchen', 'lab', 'lecture_theatre', 'library', 'living_room', 'office', 'rest_space', 'study_space' ]

To reproduce the SUNRGBD results one has to convert the raw depth data to standardized disparity in the following steps:

  1. Convert raw depth (uint16) to meters following the official SUN RGBD toolbox read3dPoints.m Toolbox
depth = cv2.imread(depth_file, cv2.IMREAD_UNCHANGED)
depth = ((depth >> 3) | (depth << 13)).astype(np.float32) / 1000.0
depth[depth > 8] = 8
  1. Convert depth to disparity using correct camera intrinsics. Following the response of @imisra with different baselines for each camera. Focal length for each sample can be obtained from the intrinsics.txt file.
from pathlib import Path # Optional I just used pathlib

focal_path = Path(depth_file).parents[1] / "intrinsics.txt"
focal_length = float(focal_path.read_text().strip().split()[0])
baseline = get_baseline(depth_file)
disparity = baseline * focal_length / depth

def get_baseline(path: str) -> float:
    if "kv1" in path:
        return 0.075
    elif "kv2" in path:
        return 0.075
    elif "realsense" in path:
        return 0.095
    elif "xtion" in path:
        return 0.095 # guessed based on length of 18cm for ASUS xtion v1
    else:
        raise Exception(f"No baseline found for path: {path}")
  1. Depth standardization by finding the mean and std of the disparity values across the training split. I find these values with the compute_depth_mean_std implementation from RGBD-Seg dataset_base.py.

This yields me the following mean and std values:
mean: 24.82968
std: 14.40078

Which can be used to normalize (depending on the raw of refined mode) as follows (based on preprocessing.py Normalize):

if self._depth_mode == 'raw':
    depth_0 = depth == 0
    depth = torchvision.transforms.Normalize(
        mean=24.82968, std=14.40078)(depth)
    # set invalid values back to zero again
    depth[depth_0] = 0
else:
    depth = torchvision.transforms.Normalize(
        mean=self.24.82968, std=14.40078)(depth)

Evaluated over the test split using above approach yield 35.2% depth accuracy

TODO: Create a PR

@OlafBraakman OlafBraakman changed the title Implement load_and_transform_depth Implement load_and_transform_depth_data Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant