Leveraging Human-like Dexterous Picking Skills for Complex Multi-Object Scenes – Manipulation & Environmental Robotics Lab

This project is a part of a multi-university collaboration funded by Amazon Robotics Greater Boston Tech Initiative. Please see more information in our project website here: https://sites.google.com/view/gbti-supervisory-control-pick/home

This project focuses on picking challenging objects in cluttered scenes. Given an image of the scene, the system needs to be able to decide when and how to adopt these skills. Choosing the right skill for a target object surely depends on its shape, but it is not the only factor. The context that the object is in, i.e., the relative positions of the other objects, and the configuration of the environment (e.g., the existence and locations of table edges and vertical walls), both play essential roles. We would like to illustrate this with two examples: when a cylindrical object is lying on a table, well-separated from other items, it can be manipulated using simple picking methods. If the object is adjacent to a wall, the push-to-vertical skill becomes particularly effective, allowing the object to be cornered and securely grasped. Similarly, for picking a flat plate from a tabletop, the system would benefit from sliding the plate to the edge of the table. However, if an obstacle lies between the plate and the table’s edge or if another object is resting on the plate, the system must first remove these obstructions before attempting to slide the plate. Also, while sliding the plate to the edge, it is essential to identify how to apply these skills, i.e. where to establish contact with the object so that it allows for successful execution of the skill.

In this work, we provide precisely these kinds of decisions. Summarizing our contributions: 1)~We have developed a Skill Detection Module that leverages deep learning to predict the most suitable picking skills for each object within a multi-object scene, along with confidence levels for each prediction. As such, given an RGB-D image of the scene, our algorithm decides which object should be picked first (based on the confidence value) along with the skill that should be applied. The algorithm is capable of producing the appropriate skills, taking into account both the object’s shape and its context (including the positions of nearby objects and the overall environmental setup). 2)~We developed a Skill Location Module based on an attention gate-based neural network that is capable of identifying the application location of the chosen skill. Both models were trained on datasets comprising manually labeled images from both simulated environments and real-world settings, with approximately a 70:30. 3)~We present an end-to-end pipeline that applies these neural networks to the task of clearing objects from a tabletop. The effectiveness of this system was thoroughly evaluated through 45 real-world tests, encompassing more than 150 instances of object grasping. As supplementary data \cite{supplementary_material}, we have provided detailed information, including accounts of each experiment and their corresponding video recordings.

Published papers:

Leveraging Dexterous Picking Skills for Complex Multi-Object Scenes
Anagha Rajendra Dangle; Mihir Pradeep Deshmukh; Denny Boby; Berk Calli
IEEE-RAS International Conference on Humanoid Robots, 2024
[Paper]