I build multimodal AI systems for general intelligence.

Former Chief Scientist at Luma AI. My work focuses on foundation models that reason across modalities, generate rich worlds, and turn human intent into controllable outputs.

Google Scholar GitHub X / Twitter Email

Recent Work

Uni-1

A multimodal reasoning model for image generation and editing. Uni-1 is built around understanding intention, following directions, spatial reasoning, reference-guided generation, and culture-aware visual creation.

Ray

Luma's family of video generation models focused on coherent motion, camera-aware generation, and creative control.

Diffusion Models

I created DDIM, an early accelerated sampler for diffusion models, and worked on image-to-image generation, inverse problems, and controllable generative modeling.

Background

At Luma, I led research through the pivots from 3D generation models with Genie to video generation models with Ray, and then to unified multimodal models with Uni-1. Across those shifts, I worked across the full modeling stack, including model architectures, training infrastructure, and data processing.

Before Luma, I was a research scientist at NVIDIA Research, working across multimodal generation and AI foundation model projects. I was previously a postdoc at Stanford with Stefano Ermon, where I also completed my Ph.D. in Computer Science.

For the complete publication list, please see Google Scholar.

Writing

Inference-Time Scaling for Generative Pre-Training

A short note on the false dichotomy between autoregression and diffusion, why flow maps help, and what this might mean for language models.