6D Pose Estimation with YOLO + Realsense on Jetson

Objective

In this lesson we move from 3D pose estimation (only position) to 6D pose estimation, meaning:

Position → X, Y, Z
Orientation → quaternion (rotation in space)

The goal is to estimate not only where the capsule is, but also how it is oriented in space.

We do this by combining:

YOLO → object detection (2D bounding box)
RealSense → depth
PCL (Point Cloud Library) → plane segmentation
RANSAC → robust plane fitting

At the end of the lesson you will:

understand the architecture of the pipeline
understand how plane segmentation gives orientation
know how to publish 6D poses in ROS2
be able to adapt this pipeline to your own application

This is a critical step if you want to go from detection to manipulation.

Code Explanation — What / Why / How

WHAT — System architecture

We now have two ROS2 packages inside the same workspace:

inference_running → YOLO inference
sixd_pose_pcl → 6D pose estimation using point cloud

You must clone:

cd ~/ros2_ws/src
git clone https://github.com/LearnRoboticsWROS/sixd_pose_pcl.git

Pipeline:

RGB Image
   ↓
YOLO (inference_running)
   ↓
2D Bounding Boxes (/detected_objects/detections_2d)
   ↓
+Depth Image+Camera Info
   ↓
sixd_pose_pcl (this node)
   ↓
Plane segmentation (RANSAC)
   ↓
6D Pose (PoseArray)   
   ↓
Visualization (MarkerArray)

Key idea:

We use the 2D bounding box to extract a 3D region, then we estimate a plane inside that region, and the plane normal gives us orientation.

WHY — Why this approach works

The capsule has a flat surface.

If you can:

extract 3D points from that surface
fit a plane
compute the normal

Then:

centroid → gives position
normal → gives orientation

This is much more robust than trying to estimate orientation directly from RGB.

This approach is widely used in industrial robotics.

HOW — Node structure

The node:

class SixDPoseEstimationPublisher : public rclcpp::Node

It subscribes to:

/detected_objects/detections_2d   (YOLO bounding boxes)
/camera/aligned_depth_to_color/image_raw
/camera/aligned_depth_to_color/camera_info

It publishes:

/detected_objects/poses_6d        (PoseArray)
/detected_objects/axes_markers    (RViz visualization)

Core Blocks Explained

1. Bounding box → ROI

BBoxXYXY bbox_xyxy;
bbox_xyxy.x1 = bx - 0.5 * bw;
bbox_xyxy.y1 = by - 0.5 * bh;
bbox_xyxy.x2 = bx + 0.5 * bw;
bbox_xyxy.y2 = by + 0.5 * bh;

We convert YOLO output into pixel coordinates.

This defines the Region Of Interest (ROI) where we will extract 3D points.

2. Build point cloud from depth

auto cloud = buildCloudFromROI(...)

Inside:

const double Z = 0.001 * dmm;const double X = (u - cx) * Z / fx;const double Y = (v - cy) * Z / fy;

This converts:

pixel + depth → 3D point

So we build a point cloud only inside the bounding box.

Important:

we filter depth range
we subsample using grid_step
we reject invalid depth

This step is equivalent to creating a local point cloud of the object.

3. Adaptive sampling

if (cloud->size() < min_pts_adaptive) {    int denser = grid_step / 2;    cloud = buildCloudFromROI(...);}

If we don’t have enough points:

→ we sample more densely

Why?

Because RANSAC needs enough points to fit a plane reliably.

4. Plane segmentation (RANSAC)

bool ok = fitPlaneRansac(...)

RANSAC tries to fit a plane:

ax + by + cz + d = 0

Output:

coeff->values[0..3] → plane coefficients

If RANSAC fails:

estimateNormalPCA(...)

Fallback using PCA.

Why PCA?

Because even if RANSAC fails, PCA can estimate the dominant direction.

5. Normal direction

if (n.z() > 0.0) n = -n;

This is critical.

We enforce that the normal points toward the camera.

Why?

Because plane normals have two possible directions:

n or -n

Without this constraint, orientation would flip randomly.

6. Build reference frame

Z = normalX = projection of camera X on planeY = Z × X

R.col(0) = x_axis;
R.col(1) = y_axis;
R.col(2) = z_axis;

This creates a full 3D coordinate frame.

Then:

Eigen::Quaterniond q_new(R);

This gives orientation.

7. Pose creation

pose_raw.position = centroidpose_raw.orientation = quaternion

So now we have a full 6D pose.

8. Stabilization

Same philosophy as previous lesson, but extended to rotation.

pos_alpha_rot_alpha_z_jump_thresh_hold_last_secs_

Position smoothing:

out.position = alpha * new + (1-alpha) * old

Rotation smoothing:

q_out = q_prev.slerp(rot_alpha_, q_meas)

SLERP = smooth interpolation between quaternions.

This is essential to avoid jitter in orientation.

9. Tracking

match_or_create_track_unique(...)

We use IoU to track objects across frames.

Each object gets a stable ID.

Why?

Because stabilization must be applied per object.

10. Publishing PoseArray

pub_pose6d_->publish(pose_arr);

Each detection → one Pose

So:

PoseArray = multiple objects

Each pose contains:

position (X, Y, Z)
orientation (quaternion)

11. Visualization markers

appendOrientedAxesMarkers(...)

We publish 3 arrows:

red → X
green → Y
blue → Z (normal)

This is extremely useful to visually verify orientation.

Demo 1 — Full pipeline execution

Step 1 — Run container with PCL support

Use your custom Docker image:

docker run -it --rm \
  --runtime=nvidia \
  --net=host \
  --ipc=host \
  -e NVIDIA_DRIVER_CAPABILITIES=all \
  -e ROS_DOMAIN_ID=10 \
  -v ~/models:/models \
  -v ~/runs:/runs \
  -v ~/ros2_ws:/workspace/ros2_ws \
  learnroboticswros/ros2-ultralytics-trt:pcl-snapshot

This image includes:

ROS2

Ultralytics
OpenCV
PCL

Without PCL, this node cannot run.

Step 2 — Run YOLO node

ros2 run inference_running three_pose_estimation_bbox_publisher --ros-args \
  -p output_topic:=/camera/color/pose_estimation_stable \
  -p detection_topic:=/detected_objects/detections_2d \
  -p model_path:=/models/capsules/best.pt

This publishes:

/detected_objects/detections_2d

Step 3 — Run 6D node

ros2 run sixd_pose_pcl sixd_pose_estimation_publisher --ros-args \
  -p detections_topic:=/detected_objects/detections_2d \
  -p depth_topic:=/camera/aligned_depth_to_color/image_raw \
  -p cam_info_topic:=/camera/aligned_depth_to_color/camera_info \
  -p pose6d_topic:=/detected_objects/poses_6d \
  -p markers_topic:=/detected_objects/axes_markers

Step 4 — Visualize

Use RViz or:

ros2 topic echo /detected_objects/poses_6d

You will see:

position + quaternion

In RViz you will see axes:

orientation changes as capsule rotates
Z axis aligns with plane normal

Key observation:

The normal always points toward the camera → this confirms the orientation is consistent

Demo 2 — Real setup and constraints

Now we move to the real setup.

Key requirements:

1. Camera must be fixed

If the camera moves:

→ pose estimation becomes inconsistent

So:

camera must be rigidly mounted relative to base_link

2. Correct working distance

If you move too close:

depth becomes invalid
plane segmentation fails

If too far:

resolution decreases

So you must find a working range.

3. Lighting matters

Industrial environments introduce:

reflections
shadows
changing illumination

These affect:

YOLO detection
depth quality

4. Orientation behavior

When you rotate the capsule:

the plane normal changes
the axes rotate accordingly

This is the key validation:

if axes rotate correctly → 6D pose estimation is working

Key Takeaways

You extended your perception pipeline from 3D to 6D.

You learned how to:

use YOLO for 2D detection
extract ROI from bounding box
build a point cloud from depth
use RANSAC to fit a plane
extract normal and centroid
build a full 3D orientation
stabilize position and rotation
publish PoseArray in ROS2
visualize orientation with markers

Most important insight:

6D pose estimation is not magic. It is geometry + filtering + good assumptions about the object.

You also learned how the architecture is modular:

detection node
6D estimation node
downstream robot node

This separation is fundamental if you want to scale your system.

Now you have a perception pipeline that can be directly connected to:

inverse kinematics
motion planning
pick-and-place execution

In the next step, you can transform these poses into the robot frame and close the loop with manipulation.

Complete and Continue

Discussion

Develop and Deploy Advanced Manufacturing Solution