A Survey For LLM-Based Works In Robotics

  1. 1. Robot Manipulation
  2. 2. MultiModal Perception

Robot Manipulation

Text2Motion: From Natural Language Instructions to Feasible Plans (arxiv.org)

microsoft/ChatGPT-Robot-Manipulation-Prompts (github.com)

This repository provides a set of prompts that can be used with OpenAI's ChatGPT to enable natural language communication between humans and robots for executing tasks. The prompts are designed to allow ChatGPT to convert natural language instructions into a sequence of executable robot actions, with a focus on robot manipulation tasks. The prompts are easy to customize and integrate with existing robot control and visual recognition systems. For more information, please see our blog post and our paper, ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application.

MultiModal Perception

facebookresearch/ImageBind: ImageBind One Embedding Space to Bind Them All (github.com)

ImageBind learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.