Ego2Allo

Developed a training and reasoning framework to strengthen visual perspective-taking in VLMs by combining Supervised Fine-Tuning (SFT) and RL-GRPO, achieving a 10% accuracy improvement on spatial reasoning benchmarks. Further integrated a ReAct-based function-calling agentic pipeline to abstract perspective changes with structured reasoning, yielding an additional 3% accuracy gain and more reliable viewpoint-aware predictions.