I am a Research Scientist at fal.ai, working on generative AI and diffusion models.
I completed my PhD at
Koç University
under the supervision of
Prof. Yücel Yemez,
focusing on object-centric learning and compositional image and video generation
(PhD Thesis).
Previously, I received my Master's degree
(Thesis)
from Koç University and my Bachelor's from
Middle East Technical University. Recipient of the Academic Excellence Award at Koç University.
∞-RoPE enables infinite-horizon, action-controllable video generation through Block-Relativistic RoPE, KV Flush, and RoPE Cut—training-free techniques that overcome temporal horizon limits, enable fine-grained action control, and support cinematic scene transitions within a single autoregressive rollout.
We extend SlotAdapt to video by learning temporally consistent object-centric slots and conditioning them on pretrained diffusion models for compositional video synthesis. Our approach enables intuitive editing capabilities like object insertion, deletion, or replacement while maintaining consistent identities across frames. Experiments demonstrate superior video generation quality and temporal consistency, uniquely integrating segmentation with robust generative performance.
SlotAdapt combines slot attention with pretrained diffusion models through adapters for slot-based conditioning. By adding a guidance loss to align cross-attention with slot attention, our model better identifies objects without external supervision. Experiments show superior performance in object discovery and image generation, particularly on complex real-world images.
We propose a novel method for trajectory prediction that can adapt itself into every agent in the shared scene. We exploit dynamic weight learning to adapt each agent's state separately to predict their future trajectories simultaneously without rotating and normalizing the scene frame. Our results achieve state-of-the-art performance on Argoverse and INTERACTION datasets with impressive runtime performance.
We propose a novel method for future instance segmentation in Bird's-eye view space. We exploit state-space models for the future state prediction for encoding 3D scene structure and decoding future instance segmentations. Our results achieve state-of-the-art performance on NuScenes dataset with a great margin.
We propose a novel method for trajectory prediction. We propose to use Temporal Graph Networks for learning dynamically evolving agent features. Our results reaches the state-of-the-art performance on Argoverse Forecasting dataset.
We decompose the scene into static and dynamic parts by encoding it into ego-motion and optical flow. We first factorize scene structure, the ego-motion, then conditioned on this, we predict the residual flow in the scene specifically for independently moving objects.
We propose a novel way for stochastic video prediction by decomposing static and dynamic parts of the scene. We reason about appearance and motion in the video stochastically by predicting the future based on the motion history.
We propose theoretical understanding of JND concept for machine perception and conduct further analyses and comparisons with other state-of-the-art methods.
We propose a new concept for adversarial example generation. Inspired by the experimental psychology, we use the concept of Just Noticeable Difference to generate natural looking adversarial images.
Teaching
COMP302: Software Engineering, Koç University
COMP100: Introduction to Computer Science and Programming, Koç University
CENG223: Discrete Computational Structures, Middle East Technical University
CENG230: C Programming, Middle East Technical University