Jiarui Xu

Hi I am Jiarui, a fourth-year PhD student at UC San Diego. I'm honored to be advised by Professor Xiaolong Wang. I received my Bachelor's degree in Computer Science at Hong Kong University of Science and Technology (HKUST). I'm currently a part-time Research Intern at NVIDIA Research. Previously I have interned at Google Research, NVIDIA Research, Microsoft Research and OpenMMLab .


  • New 03/2024: PixelLLM is accepted by CVPR 2024.
  • 02/2023: ODISE is accepted by CVPR 2023 as highlight.
  • 01/2023: GPViT is accepted by ICLR 2023 as spotlight presentation.
  • 03/2022: GroupViT is accepted by CVPR 2022.
  • 07/2021: VFS is accepted by ICCV 2021 as oral presentation.

Publications show selected / show all by date / show all by topic

Topics: Vision Language / Representation Learning / Others (*: Equal Contribution)

Pixel Aligned Language Models
Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid
Conference on Computer Vision and Pattern Recognition (CVPR), 2024

project page / arXiv / code

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang
arXiv, 2023

project page / arXiv / code

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu, Sifei Liu*, Arash Vahdat*, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (Highlight)

project page / arXiv / code / demo / video

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang*, Jiarui Xu*, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang
International Conference on Learning Representations (ICLR), 2023 (Spotlight)

project page / arXiv / code

Learning Implicit Feature Alignment Function for Semantic Segmentation
Hanzhe Hu*, Yinbo Chen*, Jiarui Xu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang
European Conference on Computer Vision ( ECCV), 2022

arXiv / code

GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang
Conference on Computer Vision and Pattern Recognition (CVPR), 2022

project page / arXiv / code / models / demo / video

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Jiarui Xu, Xiaolong Wang
International Conference on Computer Vision (ICCV), 2021 (Oral)

project page / arXiv / code / video

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time.
Shaowei Liu*, Hanwen Jiang*, Jiarui Xu, Sifei Liu, Xiaolong Wang
Conference on Computer Vision and Pattern Recognition (CVPR), 2021

project page / arXiv

Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching
Xuhua Huang*, Jiarui Xu*, Yu-Wing Tai, Chi-Keung Tang
Conference on Computer Vision and Pattern Recognition (CVPR), 2020

project page / arXiv / video

Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories
Tiange Luo, Kaichun Mo, Zhiao Huang, Jiarui Xu, Siyu Hu, Liwei Wang, Hao Su
International Conference on Learning Representations (ICLR), 2020

project page / arXiv / code

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Yue Cao*, Jiarui Xu*, Steve Lin, Fangyun Wei, Han Hu
International Conference on Computer Vision Workshop (ICCVW), 2019 (Best Paper Award)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

arXiv / code / @mmdet / post / journal

Spatial-Temporal Relation Networks for Multi-Object Tracking
Jiarui Xu, Yue Cao, Zheng Zhang, Han Hu
International Conference on Computer Vision (ICCV), 2019


Spatial-Temporal Relation Networks for Multi-Object Tracking
Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang
European Conference on Computer Vision ( ECCV), 2018

project page / arXiv / code