Hello, I’m Yifan (Jack) 👋

Welcome to my corner of the internet! I’m a computer vision researcher based in Michigan State University, passionate about embodied AI, and Artificial General Intelligence. I am currently working on visual large language models (VLLMs), efficient vision transformer training and embodied question answering.

反者道之动，弱者道之用。 ——《道德经》

The Tao moves through reversal; its power lies in yielding. —— Dao De Jing

🎓 Background

I am currently a 3-nd year Phd Student at Michigan State University, where I:

Supervised by Prof. Yu Kong
Studied computer science engineering
Conduct research on efficient visual foundation models and embodied AI

Before that, I graduated from Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) with a degree in Master of Engineering, where I:

Supervised by Prof. Hu Han, and work with Prof. Shiguang Shan
Studied computer engineering
Conducted research on facial affective behavior analysis (FABA) and learning from noisy labels (LNL)!

💼 Work Experience

Adobe — Research Scientist / Engineering Intern

2025.05 – Present. Collaborated with Trung H. Bui, David Seunghyun Yoon, Franck Dernoncourt and Jason Kuen
Develop an efficient vision transformer with linear attention

Bosch — Research Intern

2024.05 – 2025.08. Collaborated with Xin Li, Wenbin He, Tianqin Li and Liu Ren
Develop an efficient adapter ViT-Split for vision foundation models like DINOv2, CLIP, etc (Accepted by ICCV 2025)

🚀 Projects

Daily ArXiv Assistant: Use GPT-4o to select interesting papers from ArXiv and push to the assigned e-mails everyday
Visual Large Language Model Applications: Comprehensive VLLM applications across different domains
EmoLA: Train an facial affective behavior analysis model based on instruction tuning

📖 Selected Publications

IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
Yifan Li, Lichi Li, Anh Dao*, Xinyu Zhou, Yicheng Qiao, Zheda Mai, Daeun Lee, Zichen Chen, Zhen Tan, Mohit Bansal, Yu Kong. ArXiv 2025
IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
Yifan Li, Yuhang Chen, Anh Dao*, Lichi Li, Zhongyi Cai, Zhen Tan, Tianlong Chen, Yu Kong. NeurlPS D&B 2025
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li, Wenbin He, Yu Kong, Ren Liu. ICCV 2025
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong. ECCV 2024
DISC: Learning from Noisy Labels via Dynamic Instance Specific Selection and Correction
Yifan Li, Hu Han, Shiguang Shan, Xilin Chen. CVPR 2023
ReCoT: Regularized Co-Training for Facial Action Unit Recognition
Yifan Li, Hu Han, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen. BMVC 2023

🛠️ Skills & Tools

Languages: Python · C/C++ · Matlab · Tex
Frameworks: PyTorch · Numpy · Scipy …
DevOps: Docker · Git