Hello, Iโm Yifan (Jack) ๐
Welcome to my corner of the internet! Iโm a computer vision researcher based in Michigan State University, passionate about embodied AI, and Artificial General Intelligence. I am currently working on visual large language models (VLLMs), efficient vision transformer training and embodied question answering.
ๅ่ ้ไนๅจ๏ผๅผฑ่ ้ไน็จใ โโใ้ๅพท็ปใ
The Tao moves through reversal; its power lies in yielding. โโ Dao De Jing
๐ Background
I am currently a 3-nd year Phd Student at Michigan State University, where I:
- Supervised by Prof. Yu Kong
- Studied computer science engineering
- Conduct research on efficient visual foundation models and embodied AI
Before that, I graduated from Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) with a degree in Master of Engineering, where I:
- Supervised by Prof. Hu Han, and work with Prof. Shiguang Shan
- Studied computer engineering
- Conducted research on facial affective behavior analysis (FABA) and learning from noisy labels (LNL)!
๐ผ Work Experience
Adobe โ Research Scientist / Engineering Intern
- 2025.05 โ Present. Collaborated with Trung H. Bui, David Seunghyun Yoon, Franck Dernoncourt and Jason Kuen
- Develop an efficient vision transformer with linear attention
Bosch โ Research Intern
- 2024.05 โ 2025.08. Collaborated with Xin Li, Wenbin He, Tianqin Li and Liu Ren
- Develop an efficient adapter ViT-Split for vision foundation models like DINOv2, CLIP, etc (Accepted by ICCV 2025)
๐ Projects
- Daily ArXiv Assistant: Use GPT-4o to select interesting papers from ArXiv and push to the assigned e-mails everyday
- Visual Large Language Model Applications: Comprehensive VLLM applications across different domains
- EmoLA: Train an facial affective behavior analysis model based on instruction tuning
๐ Selected Publications
IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
Yifan Li, Lichi Li, Anh Dao*, Xinyu Zhou, Yicheng Qiao, Zheda Mai, Daeun Lee, Zichen Chen, Zhen Tan, Mohit Bansal, Yu Kong. ArXiv 2025IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
Yifan Li, Yuhang Chen, Anh Dao*, Lichi Li, Zhongyi Cai, Zhen Tan, Tianlong Chen, Yu Kong. NeurlPS D&B 2025ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li, Wenbin He, Yu Kong, Ren Liu. ICCV 2025Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong. ECCV 2024DISC: Learning from Noisy Labels via Dynamic Instance Specific Selection and Correction
Yifan Li, Hu Han, Shiguang Shan, Xilin Chen. CVPR 2023ReCoT: Regularized Co-Training for Facial Action Unit Recognition
Yifan Li, Hu Han, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen. BMVC 2023
๐ ๏ธ Skills & Tools
- Languages: Python ยท C/C++ ยท Matlab ยท Tex
- Frameworks: PyTorch ยท Numpy ยท Scipy โฆ
- DevOps: Docker ยท Git