Kaan Akan

Adil Kaan Akan

I am a Research Scientist on the post-training team at fal.ai, where I work on large-scale diffusion models for image and video generation.

I received my PhD and MSc from Koç University (PhD Thesis, MSc Thesis), with my doctoral work on object-centric learning and compositional image and video generation. I completed my Bachelor's at Middle East Technical University.

Email / CV / Google Scholar / GitHub / LinkedIn / X (Twitter)

Research

My research centers on diffusion models and generative AI for image and video generation, spanning controllability, efficiency, and post-training.

	Exact Posterior Score Estimation for Solving Linear Inverse Problems Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh arXiv preprint, 2026 Preprint / Project Website / Code / bibtex We derive the exact posterior score in closed form for linear Gaussian inverse problems and turn it into EPS, a denoising training objective that preserves the input/output structure of standard pretraining. At inference, EPS reuses the backbone's sampler with no likelihood gradients or projections, outperforming training-free and training-based baselines while using roughly an order of magnitude fewer denoiser evaluations.
	VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Hoda Eldardiry, Pinar Yanardag arXiv preprint, 2026 Preprint / Project Website / Code / bibtex We present the first study of Multi-Head Latent Attention (MLA) in video diffusion. VideoMLA replaces per-head keys and values with a shared low-rank content latent and a decoupled 3D-RoPE positional key, cutting per-token KV memory by 92.7% at every cached layer. On VBench it matches short-horizon streaming baselines, achieves the best overall score at long horizons among evaluated methods, and improves throughput by 1.23x.
	Aligning Latent Geometry for Spherical Flow Matching in Image Generation Tuna Han Salih Meral, Kaan Oktay, Hidir Yesiltepe, Adil Kaan Akan, Pinar Yanardag arXiv preprint, 2026 Preprint / Project Website / bibtex We show that VAE latents and Gaussian noise both concentrate on thin spherical shells, where decoded content is carried predominantly by direction rather than radius. By projecting latents onto a fixed token radius and replacing linear interpolation with spherical interpolation, our geodesic flow-matching paths stay on the sphere at every timestep and consistently improve class-conditional ImageNet-256 FID without changing the diffusion architecture.
	Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Hidir Yesiltepe, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Pinar Yanardag CVPR 2026 Preprint / Project Website / Code / bibtex ∞-RoPE enables infinite-horizon, action-controllable video generation through Block-Relativistic RoPE, KV Flush, and RoPE Cut—training-free techniques that overcome temporal horizon limits, enable fine-grained action control, and support cinematic scene transitions within a single autoregressive rollout.
	Compositional Video Synthesis by Temporal Object-Centric Learning Adil Kaan Akan, Yücel Yemez Under Review at IEEE TPAMI Preprint / Project Website / bibtex We extend SlotAdapt to video by learning temporally consistent object-centric slots and conditioning them on pretrained diffusion models for compositional video synthesis. Our approach enables intuitive editing capabilities like object insertion, deletion, or replacement while maintaining consistent identities across frames. Experiments demonstrate superior video generation quality and temporal consistency, uniquely integrating segmentation with robust generative performance.
	Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation Adil Kaan Akan, Yücel Yemez ICLR 2025 Preprint / Project Website / bibtex SlotAdapt combines slot attention with pretrained diffusion models through adapters for slot-based conditioning. By adding a guidance loss to align cross-attention with slot attention, our model better identifies objects without external supervision. Experiments show superior performance in object discovery and image generation, particularly on complex real-world images.
	ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation Gorkay Aydemir, Adil Kaan Akan, Fatma Guney ICCV 2023 Preprint / Project Website / Code / bibtex We propose a novel method for trajectory prediction that can adapt itself into every agent in the shared scene. We exploit dynamic weight learning to adapt each agent's state separately to predict their future trajectories simultaneously without rotating and normalizing the scene frame. Our results achieve state-of-the-art performance on Argoverse and INTERACTION datasets with impressive runtime performance.
	StretchBEV: Stretching Future Instance Prediction Spatially and Temporally Adil Kaan Akan, Fatma Guney ECCV 2022 Preprint / Project Website / Code / bibtex We propose a novel method for future instance segmentation in Bird's-eye view space. We exploit state-space models for the future state prediction for encoding 3D scene structure and decoding future instance segmentations. Our results achieve state-of-the-art performance on NuScenes dataset with a great margin.
	Trajectory Forecasting on Temporal Graphs Gorkay Aydemir, Adil Kaan Akan, Fatma Guney arXiv preprint Preprint / Project Website / Code / bibtex We propose a novel method for trajectory prediction. We propose to use Temporal Graph Networks for learning dynamically evolving agent features. Our results reaches the state-of-the-art performance on Argoverse Forecasting dataset.
	Stochastic Video Prediction with Structure and Motion Adil Kaan Akan, Sadra Safadoust, Fatma Guney arXiv preprint Preprint / bibtex We decompose the scene into static and dynamic parts by encoding it into ego-motion and optical flow. We first factorize scene structure, the ego-motion, then conditioned on this, we predict the residual flow in the scene specifically for independently moving objects.
	SLAMP: Stochastic Latent Appearance and Motion Prediction Adil Kaan Akan, Erkut Erdem, Aykut Erdem, Fatma Guney ICCV 2021 Preprint / Project Website / Code / bibtex We propose a novel way for stochastic video prediction by decomposing static and dynamic parts of the scene. We reason about appearance and motion in the video stochastically by predicting the future based on the motion history.
	Just Noticeable Difference for Machine Perception and Generation of Regularized Adversarial Images with Minimal Perturbation Adil Kaan Akan, Emre Akbas, Fatos T. Yarman-Vural Signal, Image and Video Processing Preprint / bibtex We propose theoretical understanding of JND concept for machine perception and conduct further analyses and comparisons with other state-of-the-art methods. This paper extends our ICIP 2020 paper.
	Just Noticeable Difference for Machines to Generate Adversarial Images Adil Kaan Akan, Mehmet Ali Genc, Fatos T. Yarman-Vural ICIP 2020 Preprint / bibtex We propose a new concept for adversarial example generation. Inspired by the experimental psychology, we use the concept of Just Noticeable Difference to generate natural looking adversarial images.

Teaching

COMP302: Software Engineering, Koç University
COMP100: Introduction to Computer Science and Programming, Koç University
CENG223: Discrete Computational Structures, Middle East Technical University
CENG230: C Programming, Middle East Technical University

Website format from Jon Barron.