9,4K подписчиков

Learning to See Transparent Objects

ClearGrasp uses 3 neural networks: a network to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that masks transparent objects

Optical 3D range sensors, like RGB-D cameras and LIDAR, have found widespread use in robotics to generate rich and accurate 3D maps of the environment, from self-driving cars to autonomous manipulators. However, despite the ubiquity of these complex robotic systems, transparent objects (like a glass container) can confound even a suite of expensive sensors that are commonly used. This is because optical 3D sensors are driven by algorithms that assume all surfaces are Lambertian, i.e., they reflect light evenly in all directions, resulting in a uniform surface brightness from all viewing angles. However, transparent objects violate this assumption, since their surfaces both refract and reflect light. Hence, most of the depth data from transparent objects are invalid or contain unpredictable noise.

   ClearGrasp uses 3 neural networks: a network to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that masks transparent objects Optical 3D range sensors, like

Enabling machines to better sense transparent surfaces would not only improve safety, but could also open up a range of new interactions in unstructured applications — from robots handling kitchenware or sorting plastics for recycling, to navigating indoor environments or generating AR visualizations on glass tabletops.

To overcome this issue, we created our own large-scale dataset of transparent objects that contains more than 50,000 photorealistic renders with corresponding surface normals(representing the surface curvature), segmentation masks, edges, and depth, useful for training a variety of 2D and 3D detection tasks. Each image contains up to five transparent objects, either on a flat ground plane or inside a tote, with various backgrounds and lighting.

Some example data of transparent objects from the ClearGrasp synthetic dataset.We also include a test set of 286 real-world images with corresponding ground truth depth. The real-world images were taken by a painstaking process of replacing each transparent object in the scene with a painted one in the same pose. The images are captured under a number of different indoor lighting conditions, using various cloth and veneer backgrounds and containing random opaque objects scattered around the scene. They contain both known objects, present in the synthetic training set, and novel objects. Left: The real-world image capturing setup, Middle: Custom user interface enables precisely replacing each transparent object with a spray-painted duplicate, Right: Example of captured data.The Challenge While the distorted view of the background seen through transparent objects confounds typical depth estimation approaches, there are clues that hint at the objects’ shape. Transparent surfaces exhibit specular reflections, which are mirror-like reflections that show up as bright spots in a well-lit environment. Since these visual cues are prominent in RGB images and are influenced primarily by the shape of the objects, convolutional neural networks can use these reflections to infer accurate surface normals, which then can be used for depth estimation. Specular reflections on transparent objects create distinct features that vary based on the object shape and provide strong visual cues for estimating surface normals.Most machine learning algorithms try to directly estimate depth from a monocular RGB image. However, monocular depth estimation is an ill-posed task, even for humans. We observed large errors in estimating the depth of flat background surfaces, which compounds the error in depth estimates for the transparent objects resting atop them. Therefore, rather than directly estimating the depth of all geometry, we conjectured that correcting the initial depth estimates from an RGB-D 3D camera is more practical — it would enable us to use the depth from the non-transparent surfaces to inform the depth of transparent surfaces.

   ClearGrasp uses 3 neural networks: a network to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that masks transparent objects Optical 3D range sensors, like-3

Google research: https://ai.googleblog.com/2020/02/learning-to-see-transparent-objects.html

Code: https://github.com/Shreeyak/cleargrasp

Dataset: https://sites.google.com/view/transparent-objects

3D Shape Estimation of Transparent Objects for Manipulation: https://sites.google.com/view/cleargrasp