Sign In

Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

Created by
  • Haebom
Category
Empty

μ €μž

Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin, Zhiyuan Yang, Ziwei Yu, Xin Wang, Mingxuan Yuan, Xianzhi Yu, Zhenhua Dong

πŸ’‘ κ°œμš”

λ³Έ 논문은 Ascend NPUμ—μ„œ μ €λΉ„νŠΈ λΆ€λ™μ†Œμˆ˜μ  μΆ”λ‘ μ˜ νš¨μœ¨μ„±μ„ κ·ΉλŒ€ν™”ν•˜κΈ° μœ„ν•΄ HiFloat (HiF8, HiF4) ν˜•μ‹μ„ μ œμ•ˆν•˜κ³  ν‰κ°€ν•©λ‹ˆλ‹€. λ‹€μ–‘ν•œ μž‘μ—…μ—μ„œ HiFloatλŠ” INT8κ³Ό λΉ„κ΅ν•˜μ—¬ λ°μ΄ν„°μ˜ 변동성에 따라 μ„±λŠ₯을 λ‹¬λ¦¬ν•˜λ©°, 특히 4λΉ„νŠΈμ—μ„œλŠ” HiF4κ°€ μ •μˆ˜ ν˜•μ‹μ—μ„œ λ°œμƒν•˜λŠ” 정확도 μ €ν•˜λ₯Ό 효과적으둜 λ°©μ§€ν•©λ‹ˆλ‹€. μ œμ•ˆλœ HiFloat ν˜•μ‹μ€ μ΅œμ‹  ν›„μ²˜λ¦¬ μ–‘μžν™” ν”„λ ˆμž„μ›Œν¬μ™€ ν˜Έν™˜λ˜μ–΄ LLM μΆ”λ‘ μ˜ κ³ νš¨μœ¨μ„±μ„ λ‹¬μ„±ν•˜λŠ” μ†”λ£¨μ…˜μ„ μ œκ³΅ν•©λ‹ˆλ‹€.

πŸ”‘ μ‹œμ‚¬μ  및 ν•œκ³„

β€’
μ €λΉ„νŠΈ λΆ€λ™μ†Œμˆ˜μ  ν˜•μ‹μ€ λ‹€μ–‘ν•œ λ²”μœ„μ˜ 데이터λ₯Ό 효과적으둜 μ²˜λ¦¬ν•  수 μžˆμ–΄, 특히 변동성이 큰 데이터에 λŒ€ν•΄ INT8보닀 μš°μˆ˜ν•œ μ„±λŠ₯을 λ³΄μž…λ‹ˆλ‹€.
β€’
4λΉ„νŠΈ μ €λΉ„νŠΈ μΆ”λ‘ μ—μ„œ HiF4 ν˜•μ‹μ€ 계측적 μŠ€μΌ€μΌλ§μ„ 톡해 κΈ°μ‘΄ μ •μˆ˜ 기반 ν˜•μ‹μ—μ„œ λ°œμƒν•˜λŠ” 정확도 ν•˜λ½ 문제λ₯Ό ν•΄κ²°ν•©λ‹ˆλ‹€.
β€’
HiFloat ν˜•μ‹μ€ ν˜„μž¬ μ‚¬μš©λ˜λŠ” μ΅œμ²¨λ‹¨ ν›„μ²˜λ¦¬ μ–‘μžν™” ν”„λ ˆμž„μ›Œν¬μ™€ μ™„λ²½ν•˜κ²Œ ν˜Έν™˜λ˜μ–΄ μ‹€μ§ˆμ μΈ 적용 κ°€λŠ₯성을 λ†’μž…λ‹ˆλ‹€.
πŸ‘