Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations

--
1University of Genova, 2Istituto Italiano di Tecnologia, 3Chinese University of Hong Kong

Abstract

Cloth manipulation is an ubiquitous task in everyday life, but it is still an open challenge for robotics. The difficulties in developing cloth manipulation policies are attributed to the high-dimensional state space, complex dynamics, and high propensity to self-occlusion displayed by fabrics. As analytical methods haven’t been able to provide robust and general manipulation policies, reinforcement learning (RL) is seen as a promising approach to these problems.

However, in order to address the large state space and the complex dynamics, data-based methods usually rely on big models and long training times. The inferred computational cost significantly hampers the development and adoption of these methods. Additionally, because of the challenge of robust state estimation, garment manipulation policies usually adopt an end-to-end learning approach with workspace images as input. While this approach allows for a conceptually trivial sim-to-real transfer via real-world fine-tuning, it also impairs a significant computational cost on the training on agents with a very lossy representation of the environment state.

This paper aims at questioning this common design choice by exploring an efficient and modular approach to RL for cloth manipulation. We show that through careful design choices, we can significantly reduce model size and training time when learning in simulation. Additionally, we showcase how our optimal simulation-based model can be transferred to the real-word. We evaluate our work on the Softgym benchmark and achieve a significant performance improvements over available baselines on our task, with a significantly smaller model.

BibTeX

@misc{delehelle2026disentanglingperceptionreasoningimproving,
      title={Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations}, 
      author={Donatien Delehelle and Fei Chen and Darwin Caldwell},
      year={2026},
      eprint={2601.21713},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2601.21713}, 
}