DoughNet 🍩

A Visual Predictive Model for Topological Manipulation of Deformable Objects

Dominik Bauer 1, Zhenjia Xu 1,2, Shuran Song 1,2
1 Columbia University, 2 Stanford University

tl;dr Our predictive model enables planning of robotic manipulation for geometrical deformation and topological change of elastoplastic objects; taking only a single RGBD observation as input, we select the best tool, its pose and opening width to recreate the desired robot- or human-made goal shape.

Paper · Code · Dataset


Planning Topological Manipulation

Given a partial RGBD observation of (held-out) objects and a set of (held-out) tools that the robot may use, our approach selects (1) the best suited tool, (2) its in-plane pose, as well as (3) its final opening width to satisfy, both, the geometry and the topology of a given goal state.

Below, we provide the last frame in the left video as goal state and the first frame in the right video as initial state to our method. The left video shows the creation of the goal state; the right video shows the execution of our plan to achieve this goal state (from a similar initialization).

Robot-defined goal, splitting a doughnut shape by fully closing a narrow end-effector tool. This creates two smaller doughnut shapes; not a single figure-8 shape as validated by pulling the two components apart.
Our planned topological manipulation, deforming and merging towards a figure-8-shape, then splitting it into two doughnut shapes. Our approach selects the same tool used to create the goal.
Human-defined goal, pinching two roll shapes together with index finger and thumb. This creates a single X-shape; not just two deformed rolls. Note that the deformation is wider than the similar available tool.
Our planned topological manipulation, merging two components to one, deforming them towards the desired X-shape while avoiding to erroneously split it by closing further. Our approach selects a square tool to achieve the effect of pinching fingers.
Human-defined goal, squeezing a doughnut shape together between both palms. This creates a single roll shape. Note the irregular (contact) surface of the goal as compared to the available flat-sided tools.
Our planned topological manipulation, deforming the doughnut towards a roll such that it self-merges, closing the hole. The width of the roll matches the widest part of the irregular human-defined goal. Our approach selects a wide tool to achieve the effect of squeezing palms.

Predicting Deformation and Topological Change

The topological manipulation shown above is enabled by our learned predictive model. It consists of two components: (1) a denoising autoencoder embeds and completes partial point-cloud observations in a topology-aware latent space; (2) an autoregressive set-to-set model, taking such a representation of the current object state and the desired motion of the tool to predict the next latent state.

In the examples below, the top row shows color and depth observations of a real-world manipulation trajectory. The bottom row shows the partial point cloud we may extract from these observations; and the predictions of DoughNet, taking the partial observation in the left column as input and predicting subsequent completed states from its own previous output. Different colors indicate the assignment to different components. The graph in the lower-right corner is a visualization of the predicted topology.

Observed Color
Observed Depth
Observed Points
Our Reconstruction
Observed Color
Observed Depth
Observed Points
Our Prediction
Observed Color
Observed Depth
Observed Points
Our Prediction
Merging two components. In the middle column, DoughNet correctly predicts two separate components; they are only touching and would separate again if perturbed. When squeezed further together, conditioned on the objects (states) and tool used, DoughNet predicts a single merged component. If we, for example, want to minimize geometrical deformation, we may stop manipulation as soon as this topological change occurs.
Observed Color
Observed Depth
Observed Points
Our Reconstruction
Observed Color
Observed Depth
Observed Points
Our Prediction
Observed Color
Observed Depth
Observed Points
Our Prediction
Self-merging and splitting. In the right column, DoughNet predicts a split into two components where self-merging reduced the genus of the smaller component ("no hole"). Note that we predict the complete object whereas the observation is severely occluded by the closed tool, entirely missing the larger component.

Synthesizing a Topological Manipulation Dataset

To determine if a topological change has occured, objects need to be perturbed. In the real world, however, this perturbation is destructive and introduces unwanted geometrical deformation. Instead, we use an MPM-based simulation to create a synthetic dataset of topological manipulations by employing two checking operations: (1) pulling previously disconnected components apart (opposite of the tool direction) to test for (self-)merging, and (2) pulling previously connected components apart (orthogonal to the tool direction) to test for splitting.

(Self-)Merge check. At each keyframe, the regions that might merge (green) are pulled apart. If they remain connected at the end of this check, a merge is recorded.
Split check. At each keyframe, the regions that might split are pulled apart. If they are not connected in the radius graph at the end of this check, a split is recorded.

BibTeX

@article{bauer2024doughnet,
  title={DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects},
  author={Bauer, Dominik and Xu, Zhenjia and Song, Shuran},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}

Contact

If you have any questions, please contact Dominik Bauer.