AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement

1Robotics Research Center, IIIT Hyderabad, India, 2TCS Research, Tata Consultancy Services, India, 3School of Informatics, University of Edinburgh, UK
Teaser Image

For any given task, an LLM provides a generic sequence of abstract actions that is refined using the domain-specific knowledge in a KG. If the sequence refers to objects, attributes, or actions that cannot be resolved using the KG, or leads to unexpected outcomes, human input helps refine or expand the KG.

Abstract

An embodied agent assisting humans is often asked to complete new tasks, and there may not be sufficient time or labeled examples to train the agent to perform these new tasks. Large Language Models (LLMs) trained on considerable knowledge across many domains can be used to predict a sequence of abstract actions for completing such tasks, although the agent may not be able to execute this sequence due to task-, agent-, or domain-specific constraints. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain knowledge encoded in a Knowledge Graph (KG), enabling an agent to quickly adapt to new tasks. The robot also solicits and uses human input as needed to refine its existing knowledge. Based on experimental evaluation in the context of cooking and cleaning tasks in simulation domains, we demonstrate that the interplay between LLM, KG, and human input leads to substantial performance gains compared with just using the LLM.

Baselines

Baseline Model 1
Baseline Model 2

Our Model

Our Pipeline
Block Diagram of Our Pipeline
Framework overview for cooking tasks: (a) Input Chain-of-Thought (COT) prompt contains target dish, available ingredients, and an example of input and output action sequence (for task of making coffee), to obtain an output action sequence; (b) Any mismatch (e.g., in object classes, actions) between LLM output and KG are identified and action sequence is revised if possible; (c) Agent attempts to resolve any remaining errors or unexpected outcomes by re-prompting LLM, with errors that persist being addressed by soliciting human input and updating KG; (iv) Revised/corrected action sequence is executed.

Video

Tasks performed in the Simulation

Following are some examples of tasks that we perform in the simulation environment.

Preparing Breakfast

The agent prepares an omelette, bread toast, and coffee, and serves them on the table
Cooking Tomato Omelette

Here, the agent is shown making a tomato omelette
Cleaning Dishes

Here, the agent cleans the dish using a dish sponge and places it back on the countertop
Rearranging Toys

The agent keeps the dog and ball back to their respective locations