Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

Created by
  • Haebom

Author

Yizhou Wang, Yixuan Wu, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang, Shixiang Tang

Outline

Hulk is the first multimodal human-centered generalization model capable of handling diverse human-centered perceptual tasks, including 2D and 3D vision, skeletal-based, and vision-language tasks. Existing human-centered models have limitations, such as their inability to handle 3D and vision-language tasks and the need for task-specific fine-tuning. To address these challenges, Hulk integrates diverse task-specific heads into two general heads: one for discrete representations (e.g., language) and one for continuous representations (e.g., coordinates). This unified representation allows Hulk to handle diverse human-centered tasks with modality transformation and integrate knowledge across a wide range of tasks. A comprehensive evaluation on 12 benchmarks covering eight human-centered tasks demonstrates the superiority of the proposed method, achieving state-of-the-art performance on 11 benchmarks. The code is available at https://github.com/OpenGVLab/Hulk .

Takeaways, Limitations

Takeaways:
We present the first multi-modality model capable of handling diverse human-centric perception tasks (2D/3D vision, skeleton-based, and vision-language) without task-specific fine-tuning.
Unified representation through two common heads enables knowledge integration and modality conversion across various tasks.
Achieved state-of-the-art performance in 11 out of 12 benchmarks.
Expanding research and increasing usability through open source disclosure.
Limitations:
Generalization performance verification is needed for tasks other than the currently presented benchmarks.
Further analysis of the model's size and computational cost is needed.
Further research is needed to optimize performance for specific tasks.
👍