Carl Qi*, Yilin Wu*, Lifan Yu, Haoyue Liu, Bowen Jiang, Xingyu Lin†, David Held†
International Conference on Intelligent Robots and Systems (IROS), 2024
@inproceedings{qitooluse2024,
title={Learning Generalizable Tool-use Skills through Trajectory Generation},
author={Qi, Carl and Wu, Yilin and Yu, Lifan and Liu, Haoyue and Jiang, Bowen and Lin, Xingyu and Held, David},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2024}
}
Autonomous systems that efficiently utilize tools can assist humans in completing many common tasks such as cooking and cleaning. However, current systems fall short of matching human-level of intelligence in terms of adapting to novel tools. Prior works based on affordance often make strong assumptions about the environments and cannot scale to more complex, contact-rich tasks. In this work, we tackle this challenge and explore how agents can learn to use previously unseen tools to manipulate deformable objects. We propose to learn a generative model of the tool-use trajectories as a sequence of tool point clouds, which generalizes to different tool shapes. Given any novel tool, we first generate a tool-use trajectory and then optimize the sequence of tool poses to align with the generated trajectory. We train a single model on four different challenging deformable object manipulation tasks, using demonstration data from only one tool per task. The model generalizes to various novel tools, significantly outperforming baselines. We further test our trained policy in the real world with unseen tools, where it achieves the performance comparable to human.
Bowen Jiang*, Yilin Wu*, Wenxuan Zhou, Chris Paxton, David Held
Robotics: Science and Systems (RSS), 2024
@inproceedings{jiang2024hacman++,
title={HACMan++: Spatially-Grounded Motion Primitives for Manipulation},
author={Jiang, Bowen and Wu, Yilin and Zhou, Wenxuan and Paxton, Chris and Held, David},
booktitle={Robotics: Science and Systems (RSS)},
year={2024}
}
We present HACMan++, a reinforcement learning framework using a novel action space of spatially-grounded parameterized motion primitives for manipulation tasks.
Zhanyi Sun*, Yufei Wang*, David Held†, Zackory Erickson†
Robotics and Automation Letters (RAL), 2024
@article{sun2024force,
title={Force-Constrained Visual Policy: Safe Robot-Assisted Dressing via Multi-Modal Sensing},
author={Sun, Zhanyi and Wang, Yufei and Held, David and Erickson, Zackory},
journal={IEEE Robotics and Automation Letters},
year={2024}
}
Robot-assisted dressing could profoundly enhance the quality of life of adults with physical disabilities. To achieve this, a robot can benefit from both visual and force sensing. The former enables the robot to ascertain human body pose and garment deformations, while the latter helps maintain safety and comfort during the dressing process. In this paper, we introduce a new technique that leverages both vision and force modalities for this assistive task. Our approach first trains a vision-based dressing policy using reinforcement learning in simulation with varying body sizes, poses, and types of garments. We then learn a force dynamics model for action planning to ensure safety. Due to limitations of simulating accurate force data when deformable garments interact with the human body, we learn a force dynamics model directly from real-world data. Our proposed method combines the vision-based policy, trained in simulation, with the force dynamics model, learned in the real world, by solving a constrained optimization problem to infer actions that facilitate the dressing process without applying excessive force on the person. We evaluate our system in simulation and in a real-world human study with 10 participants across 240 dressing trials, showing it greatly outperforms prior baselines. Video demonstrations are available on our project website.
@inproceedings{zhang2023fbpp,
title={FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection},
author={Zhang, Harry and Eisner, Ben and Held, David},
journal={Conference on Robot Learning (CoRL)},
year={2023}
}
Understanding and manipulating articulated objects, such as doors and drawers, is crucial for robots operating in human environments. We wish to develop a system that can learn to articulate novel objects with no prior interaction, after training on other articulated objects. Previous approaches for articulated object manipulation rely on either modular methods which are brittle or end-to-end methods, which lack generalizability. This paper presents FlowBot++, a deep 3D vision-based robotic system that predicts dense per-point motion and dense articulation parameters of articulated objects to assist in downstream manipulation tasks. FlowBot++ introduces a novel per-point representation of the articulated motion and articulation parameters that are combined to produce a more accurate estimate than either method on their own. Simulated experiments on the PartNet-Mobility dataset validate the performance of our system in articulating a wide range of objects, while real-world experiments on real objects' point clouds and a Sawyer robot demonstrate the generalizability and feasibility of our system in real-world scenarios.
Wenxuan Zhou, Bowen Jiang, Fan Yang, Chris Paxton*, David Held*
Conference on Robot Learning (CoRL), 2023
@inproceedings{zhou2023hacman,
title={HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation},
author={Zhou, Wenxuan and Jiang, Bowen and Yang, Fan and Paxton, Chris and Held, David},
journal={Conference on Robot Learning (CoRL)},
year={2023},
}
Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills.
Yufei Wang, Zhanyi Sun, Zackory Erickson*, David Held*
Robotics: Science and Systems (RSS), 2023
@inproceedings{Wang2023One,\n title={One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments},\n author={Wang, Yufei and Sun, Zhanyi and Erickson, Zackory and Held, David},\n booktitle={Robotics: Science\ \ and Systems (RSS)},\n year={2023}\n }"
Robot-assisted dressing could benefit the lives of many people such as older adults and individuals with disabilities. Despite such potential, robot-assisted dressing remains a challenging task for robotics as it involves complex manipulation of deformable cloth in 3D space. Many prior works aim to solve the robot-assisted dressing task, but they make certain assumptions such as a fixed garment and a fixed arm pose that limit their ability to generalize. In this work, we develop a robot-assisted dressing system that is able to dress different garments on people with diverse poses from partial point cloud observations, based on a learned policy. We show that with proper design of the policy architecture and Q function, reinforcement learning (RL) can be used to learn effective policies with partial point cloud observations that work well for dressing diverse garments. We further leverage policy distillation to combine multiple policies trained on different ranges of human arm poses into a single policy that works over a wide range of different arm poses. We conduct comprehensive real-world evaluations of our system with 510 dressing trials in a human study with 17 participants with different arm poses and dressed garments. Our system is able to dress 86\% of the length of the participants arms on average. Videos can be found on the anonymized project webpage: https://sites.google.com/view/one-policy-dress.
@inproceedings{zhou2022ungraspable,\n
title={{Learning to Grasp the Ungraspable with Emergent Extrinsic Dexterity}},\n
author={Zhou, Wenxuan and Held, David},\n
booktitle={Conference on Robot Learning (CoRL)},\n
year={2022}\n
}
A simple gripper can solve more complex manipulation tasks if it can utilize the external environment such as pushing the object against the table or a vertical wall, known as "Extrinsic Dexterity." Previous work in extrinsic dexterity usually has careful assumptions about contacts which impose restrictions on robot design, robot motions, and the variations of the physical parameters. In this work, we develop a system based on reinforcement learning (RL) to address these limitations. We study the task of “Occluded Grasping” which aims to grasp the object in configurations that are initially occluded; the robot needs to move the object into a configuration from which these grasps can be achieved. We present a system with model-free RL that successfully achieves this task using a simple gripper with extrinsic dexterity. The policy learns emergent behaviors of pushing the object against the wall to rotate and then grasp it without additional reward terms on extrinsic dexterity. We discuss important components of the system including the design of the RL problem, multi-grasp training and selection, and policy generalization with automatic curriculum. Most importantly, the policy trained in simulation is zero-shot transferred to a physical robot. It demonstrates dynamic and contact-rich motions with a simple gripper that generalizes across objects with various size, density, surface friction, and shape with a 78% success rate.
Chuer Pan*, Brian Okorn*, Harry Zhang*, Ben Eisner*, David Held
Conference on Robot Learning (CoRL), 2022
@inproceedings{pan2022tax,\n
title={TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation},\n
author={Pan, Chuer and Okorn, Brian and Zhang, Harry and Eisner, Ben and Held, David},\n
booktitle={Conference on Robot Learning (CoRL)},\n
year={2022}\n
}
How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship “cross-pose” and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method’s capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks.
Daniel Seita, Yufei Wang†, Sarthak J Shetty†, Edward Yao Li†, Zackory Erickson, David Held
Conference on Robot Learning (CoRL), 2022
@inproceedings{Seita2022toolflownet,\n
title={{ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow from Point Clouds}},\n
author={Seita, Daniel and Wang, Yufei and Shetty, Sarthak, and Li, Edward and Erickson, Zackory and Held, David},\n
booktitle={Conference on Robot Learning (CoRL)},\n
year={2022}\n
}
Point clouds are a widely available and canonical data modality which conveys the 3D geometry of a scene. Despite significant progress in classifica- tion and segmentation from point clouds, policy learning from such a modality remains challenging, and most prior works in imitation learning focus on learn- ing policies from images or state information. In this paper, we propose a novel framework for learning policies from point clouds for robotic manipulation with tools. We use a novel neural network, ToolFlowNet, which predicts dense per- point flow on the tool that the robot controls, and then uses the flow to derive the transformation that the robot should execute. We apply this framework to imita- tion learning of challenging deformable object manipulation tasks with continuous movement of tools, including scooping and pouring, and demonstrate significantly improved performance over baselines which do not use flow. We perform 50 phys- ical scooping experiments with ToolFlowNet and attain 82% scooping success. See https://tinyurl.com/toolflownet for supplementary material.
Xingyu Lin*, Carl Qi*, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held
Conference on Robot Learning (CoRL), 2022
@inproceedings{\n \ lin2022planning,\n
title={Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation},\n
author={Xingyu Lin and Carl Qi and Yunchu Zhang and Yunzhu Li and Zhiao Huang and Katerina Fragkiadaki and Chuang Gan and David Held},\n
booktitle={6th Annual Conference on Robot Learning},\n
year={2022},\n
url={https://openreview.net/forum?id=tyxyBj2w4vw}\n
}
Effective planning of long-horizon deformable object manipulation requires suitable
abstractions at both the spatial and temporal levels.
Previous methods typically either focus on short-horizon tasks or make
strong assumptions that full-state information is available, which prevents
their use on deformable objects. In this paper, we propose PlAnning with
Spatial-Temporal Abstraction (PASTA), which incorporates both spatial abstraction
(reasoning about objects and their relations to each other) and temporal
abstraction (reasoning over skills instead of low-level actions). Our framework
maps high-dimension 3D observations such as point clouds into a set of latent
vectors and plans over skill sequences on top of the latent set representation.
We show that our method can effectively perform challenging sequential deformable
object manipulation tasks in the real world, which require combining multiple
tool-use skills such as cutting with a knife, pushing with a pusher, and spreading
dough with a roller.
Sashank Tirumala*, Thomas Weng*, Daniel Seita*, Oliver Kroemer, Zeynep Temel, David Held
International Conference on Intelligent Robots and Systems (IROS), 2022
@inproceedings{tirumala2022reskin,
author={Tirumala, Sashank and Weng, Thomas and Seita, Daniel and Kroemer, Oliver and Temel, Zeynep and Held, David},
booktitle={2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={Learning to Singulate Layers of Cloth using Tactile Feedback},
year={2022},
volume={},
number={},
pages={7773-7780},
doi={10.1109/IROS47612.2022.9981341}
}
Robotic manipulation of cloth has applications ranging from fabrics manufacturing to handling blankets and laundry. Cloth manipulation is challenging for robots largely due to their high degrees of freedom, complex dynamics, and severe self-occlusions when in folded or crumpled configurations. Prior work on robotic manipulation of cloth relies primarily on vision sensors alone, which may pose challenges for fine-grained manipulation tasks such as grasping a desired number of cloth layers from a stack of cloth. In this paper, we propose to use tactile sensing for cloth manipulation; we attach a tactile sensor (ReSkin) to one of the two fingertips of a Franka robot and train a classifier to determine whether the robot is grasping a specific number of cloth layers. During test-time experiments, the robot uses this classifier as part of its policy to grasp one or two cloth layers using tactile feedback to determine suitable grasping points. Experimental results over 180 physical trials suggest that the proposed method outperforms baselines that do not use tactile feedback and has a better generalization to unseen fabrics compared to methods that use image classifiers.
Robotics and Automation Letters (RAL) with presentation at the International Conference on Intelligent Robots and Systems (IROS), 2022
@article{qi2022dough, \nauthor={Qi, Carl and Lin, Xingyu and Held, David},\n\ journal={IEEE Robotics and Automation Letters}, \ntitle={Learning Closed-Loop\ \ Dough Manipulation Using a Differentiable Reset Module}, \nyear={2022},\nvolume={7},\n\ number={4},\npages={9857-9864},\ndoi={10.1109/LRA.2022.3191239}}"
Deformable object manipulation has many applications such as cooking and laundry folding in our daily lives. Manipulating elastoplastic objects such as dough is particularly challenging because dough lacks a compact state representation and requires contact-rich interactions. We consider the task of flattening a piece of dough into a specific shape from RGB-D images. While the task is seemingly intuitive for humans, there exist local optima for common approaches such as naive trajectory optimization. We propose a novel trajectory optimizer that optimizes through a differentiable "reset" module, transforming a single-stage, fixed-initialization trajectory into a multistage, multi-initialization trajectory where all stages are optimized jointly. We then train a closed-loop policy on the demonstrations generated by our trajectory optimizer. Our policy receives partial point clouds as input, allowing ease of transfer from simulation to the real world. We show that our policy can perform real-world dough manipulation, flattening a ball of dough into a target shape.
Robotics and Automation Letters (RAL) with presentation at the International Conference on Intelligent Robots and Systems (IROS), 2022
@article{wang2022visual,
title={Visual Haptic Reasoning: Estimating Contact Forces by Observing Deformable Object Interactions},
author={Wang, Yufei and Held, David and Erickson, Zackory},
journal={IEEE Robotics and Automation Letters},
volume={7},
number={4},
pages={11426--11433},
year={2022},
publisher={IEEE}
}
Robotic manipulation of highly deformable cloth presents a promising opportunity to assist people with several daily tasks, such as washing dishes; folding laundry; or dressing, bathing, and hygiene assistance for individuals with severe motor impairments. In this work, we introduce a formulation that enables a collaborative robot to perform visual haptic reasoning with cloth -- the act of inferring the location and magnitude of applied forces during physical interaction. We present two distinct model representations, trained in physics simulation, that enable haptic reasoning using only visual and robot kinematic observations. We conducted quantitative evaluations of these models in simulation for robot-assisted dressing, bathing, and dish washing tasks, and demonstrate that the trained models can generalize across different tasks with varying interactions, human body sizes, and object shapes. We also present results with a real-world mobile manipulator, which used our simulation-trained models to estimate applied contact forces while performing physically assistive tasks with cloth.
@inproceedings{EisnerZhang2022FLOW,\n title={FlowBot3D: Learning\ \ 3D Articulation Flow to Manipulate Articulated Objects},\n author={Eisner*,\ \ Ben and Zhang*, Harry and Held,David},\n booktitle={Robotics: Science\ \ and Systems (RSS)},\n year={2022}\n }"
We explore a novel method to perceive and manipulate 3D articulated objects that generalizes to enable a robot to articulate unseen classes of objects. We propose a vision-based system that learns to predict the potential motions of the parts of a variety of articulated objects to guide downstream motion planning of the system to articulate the objects. To predict the object motions, we train a neural network to output a dense vector field representing the point-wise motion direction of the points in the point cloud under articulation. We then deploy an analytical motion planner based on this vector field to achieve a policy that yields maximum articulation. We train the vision system entirely in simulation, and we demonstrate the capability of our system to generalize to unseen object instances and novel categories in both simulation and the real world, deploying our policy on a Sawyer robot with no finetuning. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments.
Gautham Narayan Narasimhan, Kai Zhang, Ben Eisner, Xingyu Lin, David Held
International Conference of Robotics and Automation (ICRA), 2022
@inproceedings{icra2022pouring,
title={Self-supervised Transparent Liquid Segmentation for Robotic Pouring},
author={Gautham Narayan Narasimhan, Kai Zhang, Ben Eisner, Xingyu Lin, David Held},
booktitle={International Conference on Robotics and Automation (ICRA)},
year={2022}}
Liquid state estimation is important for robotics tasks such as pouring; however, estimating the state of transparent liquids is a challenging problem. We propose a novel segmentation pipeline that can segment transparent liquids such as water from a static, RGB image without requiring any manual annotations or heating of the liquid for training. Instead, we use a generative model that is capable of translating images of colored liquids into synthetically generated transparent liquid images, trained only on an unpaired dataset of colored and transparent liquid images. Segmentation labels of colored liquids are obtained automatically using background subtraction. Our experiments show that we are able to accurately predict a segmentation mask for transparent liquids without requiring any manual annotations. We demonstrate the utility of transparent liquid segmentation in a robotic pouring task that controls pouring by perceiving the liquid height in a transparent cup. Accompanying video and supplementary materials can be found on our project page.
Xingyu Lin*, Yufei Wang*, Zixuan Huang, David Held
Conference on Robot Learning (CoRL), 2021
@inproceedings{lin2021VCD,
title={Learning Visible Connectivity Dynamics for Cloth Smoothing},
author={Lin, Xingyu and Wang, Yufei and Huang, Zixuan and Held, David},
booktitle={Conference on Robot Learning},
year={2021}}
Robotic manipulation of cloth remains challenging for robotics due to the complex dynamics of the cloth, lack of a low-dimensional state representation, and self-occlusions. In contrast to previous model-based approaches that learn a pixel-based dynamics model or a compressed latent vector dynamics, we propose to learn a particle-based dynamics model from a partial point cloud observation. To overcome the challenges of partial observability, we infer which visible points are connected on the underlying cloth mesh. We then learn a dynamics model over this visible connectivity graph. Compared to previous learning-based approaches, our model poses strong inductive bias with its particle based representation for learning the underlying cloth physics; it is invariant to visual features; and the predictions can be more easily visualized. We show that our method greatly outperforms previous state-of-the-art model-based and model-free reinforcement learning methods in simulation. Furthermore, we demonstrate zero-shot sim-to-real transfer where we deploy the model trained in simulation on a Franka arm and show that the model can successfully smooth different types of cloth from crumpled configurations. Videos can be found on our project website.
Thomas Weng, Sujay Bajracharya, Yufei Wang, David Held
Conference on Robot Learning (CoRL), 2021
@inproceedings{weng2021fabricflownet,\n title={FabricFlowNet: Bimanual Cloth\ \ Manipulation \n with a Flow-based Policy},\n author={Weng, Thomas and Bajracharya,\ \ Sujay and \n Wang, Yufei and Agrawal, Khush and Held, David},\n booktitle={Conference\ \ on Robot Learning},\n year={2021}\n}"
We address the problem of goal-directed cloth manipulation, a challenging task due to the deformability of cloth. Our insight is that optical flow, a technique normally used for motion estimation in video, can also provide an effective representation for corresponding cloth poses across observation and goal images. We introduce FabricFlowNet (FFN), a cloth manipulation policy that leverages flow as both an input and as an action representation to improve performance. FabricFlowNet also elegantly switches between dual-arm and single-arm actions based on the desired goal. We show that FabricFlowNet significantly outperforms state-of-the-art model-free and model-based cloth manipulation policies. We also present real-world experiments on a bimanual system, demonstrating effective sim-to-real transfer. Finally, we show that our method generalizes when trained on a single square cloth to other cloth shapes, such as T-shirts and rectangular cloths.
@inproceedings{PLAS_corl2020,
title={PLAS: Latent Action Space for Offline Reinforcement Learning},
author={Zhou, Wenxuan and Bajracharya, Sujay and Held, David},
booktitle={Conference on Robot Learning},
year={2020}
}
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. This setting will be an increasingly more important paradigm for real-world applications of reinforcement learning such as robotics, in which data collection is slow and potentially dangerous. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. This leads to the challenge of constraining the policy to select actions within the support of the dataset during training. We propose to simply learn the Policy in the Latent Action Space (PLAS) such that this requirement is naturally satisfied. We evaluate our method on continuous control benchmarks in simulation and a deformable object manipulation task with a physical robot. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets, outperforming existing offline reinforcement learning methods with explicit constraints.
Jianing Qian*, Thomas Weng*, Luxin Zhang, Brian Okorn, David Held
International Conference on Intelligent Robots and Systems (IROS), 2020
@inproceedings{Qian_2020_IROS,\n author={Qian, Jianing and Weng, Thomas and Zhang, Luxin and Okorn, Brian and Held, David}, booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, title={Cloth Region Segmentation for Robust Grasp Selection}, year={2020}, volume={}, number={}, pages={9553-9560}, doi={10.1109/IROS45743.2020.9341121}}"
Cloth detection and manipulation is a common task in domestic and industrial settings, yet such tasks remain a challenge for robots due to cloth deformability. Furthermore, in many cloth-related tasks like laundry folding and bed making, it is crucial to manipulate specific regions like edges and corners, as opposed to folds. In this work, we focus on the problem of segmenting and grasping these key regions. Our approach trains a network to segment the edges and corners of a cloth from a depth image, distinguishing such regions from wrinkles or folds. We also provide a novel algorithm for estimating the grasp location, direction, and directional uncertainty from the segmentation. We demonstrate our method on a real robot system and show that it outperforms baseline methods on grasping success. Video and other supplementary materials are available at:
Thomas Weng, Amith Pallankize, Yimin Tang, Oliver Kroemer, David Held
Robotics and Automation Letters (RAL) with presentation at the International Conference of Robotics and Automation (ICRA), 2020
@ARTICLE{9001238,
author={Thomas Weng and Amith Pallankize and Yimin Tang and Oliver Kroemer and David Held},
journal={IEEE Robotics and Automation Letters},
title={Multi-Modal Transfer Learning for Grasping Transparent and Specular Objects},
year={2020},
volume={5},
number={3},
pages={3791-3798},
doi={10.1109/LRA.2020.2974686}}
State-of-the-art object grasping methods rely on depth sensing to plan robust grasps, but commercially available depth sensors fail to detect transparent and specular objects. To improve grasping performance on such objects, we introduce a method for learning a multi-modal perception model by bootstrapping from an existing uni-modal model. This transfer learning approach requires only a pre-existing uni-modal grasping model and paired multi-modal image data for training, foregoing the need for ground-truth grasp success labels nor real grasp attempts. Our experiments demonstrate that our approach is able to reliably grasp transparent and reflective objects. Video and supplementary material are available at