WaterlooJun 15, 2026arXiv:2606.16870

Latent Space Reinforcement Learning for Inverse Material Estimation in Food Fracture Simulation

Adrian Ramlal, Yuhao Chen, John S. Zelek

AI Summary

This study tackles the challenge of estimating material parameters for food fracture simulation, specifically using orange peeling as a case study. By training a goal-conditioned Proximal Policy Optimization (PPO) policy on a neural surrogate derived from 2,000 forward simulations, the authors demonstrate a significant improvement in parameter recovery accuracy, achieving a 23% enhancement over traditional methods. The approach allows for efficient material estimation without the need for retraining, making it adaptable to various food items with differing properties.

Key Contribution

Achieving a 23% improvement in material parameter recovery for food fracture simulations could revolutionize how we model food manipulation in AI systems.

Abstract

Realistic visual simulation of food manipulation requires accurate material parameters, yet these are difficult to measure directly and vary across the heterogeneous regions of a single food item. We address the inverse problem of estimating material parameters from a target description of fracture behavior in a non-differentiable continuum damage mechanics simulator. Using orange peeling as a test case, we train a neural surrogate on 2,000 forward simulations and compare Covariance Matrix Adaptation Evolution Strategy (CMA-ES, a gradient-free evolutionary optimizer) with Proximal Policy Optimization (PPO, a reinforcement learning algorithm) across the original 9-dimensional parameter space and two learned 4-dimensional latent representations. Since different oranges have different material properties, a practical inverse system must handle arbitrary targets without retraining. We train a goal-conditioned PPO policy that learns a general inverse mapping: given any target description of peeling behavior, the policy produces a material parameter estimate in a single forward pass (8 surrogate evaluations, approximately 10ms). Operating in a normalizing flow latent space with a shared surrogate evaluator, the goal-conditioned policy achieves 0.642 actual recovery when validated through the simulator, outperforming the original parameter space by 23%. A warm-start extension that initializes CMA-ES refinement from the policy's output further improves recovery to 0.828 with 540 evaluations. These findings provide a practical framework for inverse food physics and lay groundwork for vision-driven material identification from video observations of food manipulation.

Robotics & Embodied AI Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Latent Space Reinforcement Learning for Inverse Material Estimation in Food Fracture Simulation

Related Papers