All
VLN
Survey
Dataset
EmbodiedAI
LLM
Useful
Book
NLP
Lecture
AD
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
๐Ÿš— LLMs์™€ ๋น„์ „ ๋ชจ๋ธ์˜ ํ†ตํ•ฉ ํ™œ์šฉ: LLM๊ณผ ๋น„์ „ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ž์œจ ์ฃผํ–‰์˜ ์ดํ•ด ๋ฐ ์˜์‚ฌ ๊ฒฐ์ •์„ ๊ฐ•ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ” ๊ธฐ์ˆ  ๋ฐœ์ „ ํ‰๊ฐ€ ๋ฐ ๋„์ „ ๊ณผ์ œ ๊ทœ๋ช…: ํ˜„์žฌ์˜ ๊ธฐ์ˆ ์  ์ง„๋ณด๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ์ž์œจ ์ฃผํ–‰ ๋ถ„์•ผ์˜ ๊ณผ์ œ๋ฅผ ๋ช…ํ™•ํžˆ ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ“š ์ตœ์‹  ์—ฐ๊ตฌ ๋™ํ–ฅ ๋ฐ ์˜คํ”ˆ ์†Œ์Šค ์ž๋ฃŒ ์ œ๊ณต: ์—ฐ๊ตฌ์ž๋“ค์„ ์œ„ํ•ด ์ตœ์‹  ๊ฐœ๋ฐœ ์‚ฌํ•ญ ๋ฐ ๊ด€๋ จ ์˜คํ”ˆ ์†Œ์Šค ์ž๋ฃŒ๋ฅผ ์ง€์†์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. Introduction ์ž์œจ์ฃผํ–‰(Autonomous Driving)์€ ์ „ํ†ต์ ์œผ๋กœ๋Š” ๋ชจ๋“ˆ ๊ธฐ๋ฐ˜์˜ ์‹œ์Šคํ…œ์ด์—ˆ์ง€๋งŒ, ์ตœ๊ทผ์—๋Š” End-to-End ์‹œ์Šคํ…œ์œผ๋กœ ์ „ํ™˜ํ•˜๊ณ  ์žˆ๋‹ค. ๋ฌ˜๋“ˆ ๊ธฐ๋ฐ˜์˜ ์‹œ์Šคํ…œ์€ ๊ฐ ๋ชจ๋“ˆ์˜ ์˜ค๋ฅ˜๊ฐ€ ๋ˆ„์ ๋˜์–ด ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ End-to-End ์‹œ์Šคํ…œ์€ ๋” ์ผ๋ฐ˜ํ™”๋œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•จ. โ†’ ์ด๋Ÿฐ ์ ์—์„œ LLM์€ context๋ฅผ ์ž˜ ์ดํ•ดํ•˜๊ณ , reasoning์„ ์ž˜ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์œจ์ฃผํ–‰ ๋ถ„์•ผ์—์„œ ๋งŽ์€ ๊ด€์‹ฌ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค. LLM Internet AI์—์„œ ํ•™์Šตํ•œ ์ผ๋ฐ˜์ ์ธ ์ƒ์‹๊ณผ, ์ƒํ™ฉ์„ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง โ†’ LLM ์ž์œจ์ฃผํ–‰ ์‹œ์Šคํ…œ์— ์–ด๋–ป๊ฒŒ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€? LLM์˜ ์ฃผํ–‰ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ• Simulator์ƒ์—์„œ์˜ ํ•™์Šต(Closed-Loop Setting) Offline Dataset์—์„œ์˜ ํ•™์Šต(Open-Loop Setting) ํ•˜์ง€๋งŒ, Sim2Real gap + ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ๋ฐ์ดํ„ฐ์ˆ˜์ง‘์˜ ์–ด๋ ค์›€ ๋•Œ๋ฌธ์—, ์šด์ „ ์‹ค๋ ฅ์„ expert level ๊นŒ์ง€ ์˜ฌ๋ฆฌ๋Š” ๋ฐ์—๋Š” ์–ด๋ ค์›€์ด ์žˆ๋‹ค. โ†’ ์ด๋ฅผ LLM์ด ๋‚ด์žฌํ•˜๊ณ  ์žˆ๋Š” Commonsense๋ฅผ ํ†ตํ•ด์„œ ์–ด๋Š์ •๋„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? Methods
  1. EmbodiedAI
  2. AD
In Progress
  • T
    TikaToka
CS224N: Natural Language Processing with Deep Learning
Assignment 1 Assignment 2
  1. NLP
  2. Lecture
In Progress
  • T
    TikaToka
Stanford CS25: V4 I Overview of Transformers
CS25 ๊ฐ•์˜๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ์ด์˜ ์ ์šฉ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ๋ฐฉํ–ฅ ์‹ ๊ธฐ์ˆ  Futureworks ๋ฅผ ์–ป์–ด๊ฐ€๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” Embodied AI๋กœ ๊ฐ€๊ธฐ ์œ„ํ•œ ์ข‹์€ ๋ฐœํŒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์—, (ํŠนํžˆ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌLLM์ด) ์ด๋Ÿฌํ•œ ๋ถ€๋ถ„๋“ค์— ๋Œ€ํ•ด ์ข€ ๋” ์ž์„ธํžˆ ์•Œ๊ณ  ์‹ถ์–ด ์ด ๊ฐ•์˜๋ฅผ ๋“ฃ๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค.
  1. LLM
  2. Lecture
In Progress
  • T
    TikaToka
Positional Encoding
์ฒ˜์Œ Transformer๊ฐ€ ๋‚˜์™”์„ ๋•Œ๋Š” Cos, Sin์„ ์ด์šฉํ•˜์—ฌ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค. ์ตœ๊ทผ์—๋Š” ์ข€ ๋” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ์กด์žฌํ•œ๋‹ค. Rotary Position Embedding (RoPE) RoPE๋Š” ํšŒ์ „ ํ–‰๋ ฌ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ ˆ๋Œ€์  ์œ„์น˜๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๊ณ , ์…€ํ”„ ์–ดํ…์…˜ ๊ณต์‹์— ์ƒ๋Œ€์  ์œ„์น˜ ์ข…์†์„ฑ์„ ํ†ตํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์‹œํ€€์Šค ๊ธธ์ด ์œ ์—ฐ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ์ƒ๋Œ€์  ๊ฑฐ๋ฆฌ๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ํ† ํฐ ๊ฐ„ ์ข…์†์„ฑ์ด ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค. RoFormer์™€ ๊ฐ™์€ ๋ชจ๋ธ์—์„œ RoPE๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์žฅ๋ฌธ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋ฒค์น˜๋งˆํฌ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. Contextual Position Encoding (CoPE) CoPE๋Š” ์œ„์น˜๋ฅผ ๋ฌธ๋งฅ์— ๋”ฐ๋ผ ์กฐ๊ฑด๋ถ€๋กœ ์„ค์ •ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํŠน์ • ํ† ํฐ์—์„œ๋งŒ ์œ„์น˜๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ ๋ฌธ๋งฅ์— ๋”ฐ๋ผ ์œ„์น˜๋ฅผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ์–ธ์–ด ๋ชจ๋ธ๋ง๊ณผ ์ฝ”๋”ฉ ์ž‘์—…์˜ ๋ณต์žก์„ฑ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. CoPE๋Š” ์ผ๋ฐ˜์ ์ธ ์œ„์น˜ ์ธ์ฝ”๋”ฉ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ๋†’์€ ์ˆ˜์ค€์˜ ์œ„์น˜ ์ถ”์ƒํ™”๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Conditional Positional Encoding (CPE) CPE๋Š” ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ(Vision Transformers)์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํŠธ๋ ˆ์ด๋‹๊ณผ ์ธํผ๋Ÿฐ์Šค์—์„œ ํ•ด์ƒ๋„ ํฌ๊ธฐ๊ฐ€ ๋‹ฌ๋ผ๋„ ๋™์ ์œผ๋กœ ์ ์‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒ๋Œ€์  ์œ„์น˜ ์ธ์ฝ”๋”ฉ์˜ ์ด๋ก ์„ ๊ฐ€์ ธ๊ฐ€๋ฉด์„œ๋„ ์ ˆ๋Œ€์  ์œ„์น˜์˜ ๋Šฅ๋ ฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋ฏธ์ง€ ํ•ด์ƒ๋„ ๋ณ€๊ฒฝ์— ๋”ฐ๋ฅธ ๋™์  ๋ณ€ํ™”๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Relative Position Encoding Relative Position Encoding์€ ์ƒ๋Œ€์  ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์ž„๋ฒ ๋”ฉ ์ธต์„ ํ†ตํ•ด ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ฐ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜์—ฌ lookup table ํ˜•์‹์œผ๋กœ ์‹ ๊ฒฝ๋ง์— ์ „๋‹ฌํ•˜๋ฉฐ, ์ƒ๋Œ€์  ์œ„์น˜๋ฅผ ์•Œ๊ฒŒ ๋œ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
  1. LLM
Done
  • T
    TikaToka
๋ฐ‘๋ฐ”๋‹ฅ ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹
1๊ถŒ ํŒŒ์ด์ฌ ์ž…๋ฌธ (์ƒ๋žต)
  1. Book
In Progress
  • T
    TikaToka
How to Read a Paper
Author: S. Keshav Conference / Journal: N/A PDF: https://web.stanford.edu/class/ee384m/Handouts/HowtoReadPaper.pdf tl;dr No need to write for this article Introduction ๋…ผ๋ฌธ์„ ์–ด๋–ป๊ฒŒ ์ฝ์–ดํ– ํ•˜๋Š”์ง€ ๋ฐฐ์šฐ๋Š” ๊ฒฝ์šฐ๋Š” ๋“œ๋ฌผ๊ธฐ์— ๋งŽ์€ ํ•™์ƒ๋“ค์ด ์–ด๋ ค์›€์„ ๊ฒช๋Š”๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฒˆ article์ด ๋„์›€์ด ๋˜์—ˆ์œผ๋ฉด ์ข‹๊ฒ ๋‹ค. Method ๋…ผ๋ฌธ์„ ์ฝ์„ ๋•Œ๋Š” 3๋‹จ๊ณ„๋กœ ์ฝ๊ฒŒ ๋œ๋‹ค. First pass Title, Abstract, Intro ๋ฅผ ์ฝ๊ธฐ Section, subsection ์ด๋ฆ„ ์ฝ๊ธฐ Conclusion์ฝ๊ธฐ Reference ํ›‘์–ด๋ณด๊ณ  ์ฝ์–ด๋ณธ๊ฒŒ ์žˆ๋‚˜ ํ™•์ธ โ†’ ์ด๋ฅผ ํ†ตํ•ด 5๊ฐ€์ง€ ๊ธฐ์ค€์— ๋Œ€ํ•ด ํ‰๊ฐ€ Category: ์ด ๋…ผ๋ฌธ์€ ์–ด๋–ค ํƒ€์ž…์ธ๊ฐ€? ์ธก์ •์— ๊ด€ํ•œ ๊ฑด๊ฐ€? ๊ธฐ์กด ์‹œ์Šคํ…œ ๋ถ„์„์— ๊ด€ํ•œ๊ฑด๊ฐ€? ์—ฐ๊ตฌ ํ”„๋กœํ† ํƒ€์ž…์ธ๊ฐ€? Context: ์ด ์—ฐ๊ตฌ์™€ ๊ด€๋ จ๋œ ๋‹ค๋ฅธ ์—ฐ๊ตฌ๋Š” ๋ญ˜๊นŒ? ์–ด๋–ค ์ด๋ก ์  ๋ฐฐ๊ฒฝ์ด ๋ฌธ์ œ ํ•ด๊ฒฐ์— ์“ฐ์˜€๋‚˜? Correctness: ๋…ผ๋ฌธ์˜ ๊ฐ€์ •์ด ์œ ํšจํ•œ๊ฐ€?
  1. Useful
In Progress
  • T
    TikaToka
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction
์ด ๋…ผ๋ฌธ์€ ์ตœ๊ทผ์— ์ฐธ์„ํ•œ WoRV ํŒ€์˜ ์ฑ„์šฉ์„ค๋ช…ํšŒ๋ฅผ ๊ฐ”๋‹ค๊ฐ€ ๊ด€์‹ฌ์ด ์ƒ๊ฒจ ํ•œ๋ฒˆ ์ฝ์–ด๋ณด๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  1. EmbodiedAI
Not Started
  • T
    TikaToka
A Survey of Large Language Models
๋ฐฑ์ˆ˜ or ์ทจ๋ฝ€ํ•˜๋ฉด ์ž”์ž”ํ•˜๊ฒŒ ์ญ‰ ์ฝ์œผ๋ฉฐ ์ •๋ฆฌํ•ด ๋ณผ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค...
  1. LLM
  2. Survey
Not Started
  • T
    TikaToka
Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
Author: Zehao Wang1, Minye Wu1, Yixin Cao4, Yubo Ma3, Meiqi Chen2, Tinne Tuytelaars1 Conference / Journal: EMNLP 2024 Findings PDF: https://arxiv.org/pdf/2409.17313 Code: https://github.com/zehao-wang/navnuances tl;dr VLN ๋ชจ๋ธ์˜ ๋‹ค์–‘ํ•œ ์ง€์‹œ ์œ ํ˜•์„ ์„ธ๋ฐ€ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•จ. ์ด๋ฅผ ํ†ตํ•ด ์ˆ˜์น˜ ์ดํ•ด, ํŠน์ • ๋ฐฉํ–ฅ ์ธ์‹์—์„œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ๋ถ€์กฑ์ด ๋“œ๋Ÿฌ๋‚จ. ๋ฐœ๊ฒฌ๋œ ๋ฌธ์ œ์ ์„ ๋ฐ”ํƒ•์œผ๋กœ VLN ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๊ตฌ์ฒด์  ๋ฐฉํ–ฅ์„ฑ์„ ์ œ๊ณตํ•จ. Motivation ๊ธฐ์กด VLN ๋ชจ๋ธ์˜ ํ•œ๊ณ„: VLN ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ๋‚ด๋น„๊ฒŒ์ด์…˜ ์ง€์‹œ๋ฅผ ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜๊ณ  ์‹คํ–‰ํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๊ณผ๋Œ€ํ‰๊ฐ€๋˜์—ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. ์„ธ๋ฐ€ํ•œ ํ‰๊ฐ€ ํ•„์š”์„ฑ: VLN Task๋ฅผ ๋” ์ž‘์€ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์–ด ๋‹ค์–‘ํ•œ ์ง€์‹œ ์œ ํ˜•์— ๋Œ€ํ•ด ์„ธ๋ฐ€ํ•œ ์„ฑ๋Šฅ ํ‰๊ฐ€๊ฐ€ ํ•„์š”ํ•จ. โ†’ LLM ๊ธฐ๋ฐ˜ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ ํ•„์š”์„ฑ: LLM๊ธฐ๋ฐ˜์œผ๋กœ VLN Instruction์„ ๊ตฌ์„ฑํ•˜๊ณ  ์„ธ๋ฐ€ํ•œ ํ‰๊ฐ€๊ฐ€ ๊ฐ€๋Šฅํ•œ ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜๊ณ ์ž ํ•จ. Method Context-Free Grammar(CFG) VLN Instruction์˜ ๊ตฌ์กฐ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด CFG๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ง€์‹œ ์œ ํ˜•์„ ํ‘œํ˜„. (LLM์„ ํ†ตํ•ด ๊ตฌ์ถ•) N: ๋น„์ข…๊ฒฐ๊ธฐํ˜ธ (๋ฐฉํ–ฅ, ๊ฐ์ฒด, ํ–‰๋™) T: ์ข…๊ฒฐ๊ธฐํ˜ธ (๋ฐฉํ–ฅ์ง€์‹œ์–ด, object ์ด๋ฆ„) P: ์ƒ์„ฑ ๊ทœ์น™ (N๋“ค์ด ์–ด๋–ป๊ฒŒ T๋‚˜ ๋‹ค๋ฅธ N์œผ๋กœ ๋ณ€ํ™˜๋  ์ˆ˜ ์žˆ๋Š”์ง€ ๊ทœ์น™) 3~7 N โ†’ T N โ†’ N S: ์‹œ์ž‘ ๊ธฐํ˜ธ
  1. VLN
  2. Dataset
  3. EmbodiedAI
Done
  • T
    TikaToka