
Video-Action Models: Are video model backbones the future of VLAs?
This blog post is about mimic-video, our latest mimic release in which we instantiate a new class of Video-Action Models (VAM), grounding robotic policies in pretrained video models. We argue that video model backbones can be a much more natural choice for robotics foundation model pre-training compared to VLM backbones. Do you want to work with us on this and more? We just raised $16M and are actively hiring! Preface Vision-Language-Action Models (VLAs) have taken the robotics world by storm over the past two years....
