具身智能

Introduction

具身智能_技术路线

场景理解，分割与检测：

SAM
Open-Voc Detection
SAM3D
Open-Voc Detection in Point Cloud

GOAT: GO to Any Thing

是个Navigation任务

使用MaskRCNN实例分割进行目标检测和像素分割
使用MidDaS头单目深度估计进行RGBD传感器数据修复
分割衙的RGBD投影Semantic Map进行环境建图
使用SuperGlue进行图像匹配
使用CLIP进行文本与图像匹配
使用Mistral 7B从复杂指令抽提Object Category

2024 GOAT-Bench

2024 OK-Robot

[1] P. Liu, Y. Orru, J. Vakil, C. Paxton, N. M. M. Shafiullah, and L. Pinto, OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics, Feb. 29, 2024, arXiv: arXiv:2401.12202.

https://ok-robot.github.io

实现了开放环境下的导航 + 抓取

使用AnyGrasp生成Grasping Candidates
使用Lang-SAM, 分割特定文本物体Mask
基于规则在Mask内选择最终Gasping Pose

2024 An Embodied Generalist Agent in 3D World

2023 Vid2Robot

[1] C. Wang et al., MimicPlay: Long-Horizon Imitation Learning by Watching Human Play, Oct. 13, 2023,

2024 MimicPlay

[1] C. Wang et al., MimicPlay: Long-Horizon Imitation Learning by Watching Human Play, Oct. 13, 2023, arXiv: arXiv:2302.12422. [Online]. Available: http://arxiv.org/abs/2302.12422

具身智能

Introduction

GOAT: GO to Any Thing

2024 GOAT-Bench

2024 OK-Robot

2024 An Embodied Generalist Agent in 3D World

2023 Vid2Robot

2024 MimicPlay

2023 ManipLLM

2024 ManipVQA

Look Before You Leap

HumanPlus

2024 3D Diffuser Actor

References

具身智能

Introduction

Related papers

GOAT: GO to Any Thing

2024 GOAT-Bench

2024 OK-Robot

2024 An Embodied Generalist Agent in 3D World

2023 Vid2Robot

2024 MimicPlay

2023 ManipLLM

2024 ManipVQA

Look Before You Leap

HumanPlus

2024 3D Diffuser Actor

References