A path planning system that uses monocular depth estimation to generate navigable paths for autonomous rovers, implemented on a Jetson Nano platform.
This project implements a vision-based path planning system using a single camera input, optimized for deployment on the NVIDIA Jetson Nano. The system processes video frames to generate 3D point clouds, analyzes terrain traversability, and plans optimal navigation paths while avoiding obstacles.
- Real-time depth estimation from monocular camera input using Jetson Nano
- 3D point cloud generation and processing
- Cost-based terrain analysis
- A* path planning algorithm implementation
- Optimized for embedded deployment
The system consists of three main components:
-
Depth Estimation and Point Cloud Generation
- Uses DepthAnything's ViT-S model for depth inference
- Converts 2D depth maps to 3D point clouds using camera intrinsics
- Implements filtering and optimization for point cloud quality
- Handles coordinate transformations and spatial mapping
-
Terrain Analysis
- Implements grid-based terrain segmentation
- Calculates traversability costs based on:
- Slope climbing requirements
- Obstacle height assessment
- Combined cost metrics for path optimization
- Generates comprehensive terrain accessibility maps
-
Path Planning
- A* algorithm implementation for optimal path finding
- Integrates terrain cost analysis for path selection
- Avoids obstacles while minimizing energy costs
- Provides efficient route planning in real-world environments
The project is currently implemented as two separate processes:
-
Point Cloud Generation (
image_to_pointcloud.py
)- Handles image capture and processing
- Performs depth estimation
- Generates and saves point cloud data
-
Path Planning (
path_planner.py
)- Loads processed point cloud data
- Performs terrain analysis
- Executes path planning algorithm
Future work includes integrating these processes into a single pipeline for real-time operation.
The system processes video frames through these steps:
- Depth estimation using DepthAnything model
- Point cloud generation using camera intrinsics
- Filtering and optimization of point cloud data
Comparison between original camera input and generated depth map
The terrain analysis module implements two key cost functions to evaluate traversability:
- Fits a plane to each grid cell using least squares regression
- Calculates slope angle θ from the fitted plane
- If θ > θmax (maximum climbable angle), cost = infinity
- Otherwise, cost = M * g * L * sin(θ), where:
- M = rover mass
- g = gravitational acceleration
- L = length of fitted plane
- Calculates lobs (maximum obstacle height) in each cell
- If lobs > lmax (maximum traversable height), cost = infinity
- Otherwise, cost = M * g * lobs
- Helps identify impassable obstacles while allowing traversal of minor terrain variations
The total cost for each cell is the sum of these two functions, creating a comprehensive traversability map.
The system generates detailed 3D point clouds from the depth data:
3D point cloud visualization showing spatial mapping of the environment
Overhead view of the point cloud
Labeled point cloud visualization
The system was evaluated using:
- KITTI dataset for depth estimation accuracy
- Real-world testing on various terrain types
- Performance benchmarking on Jetson Nano
- Optimized for real-time processing on embedded hardware
- Balanced accuracy with computational efficiency
- Limited by monocular depth estimation accuracy
- Processing speed constraints on Jetson Nano
- Potential for improvement in extreme lighting conditions
- Based on research by Chen et al. (2023)
- Uses DepthAnything's ViT-S model
- Developed during a research internship at [Institution Name]