[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

SystolicAttention: Fusing FlashAttention within a Single Systolic Array

Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge

"Is it always watching? Is it always listening?" Exploring Contextual Privacy and Security Concerns Toward Domestic Social Robots

A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks

GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning

Extension OL-MDISF: Online Learning from Mix-Typed, Drifted, and Incomplete Streaming Features

When and Where do Data Poisons Attack Textual Inversion?

Truth Sleuth and Trend Bender: AI Agents to fact-check YouTube videos and influence opinions

NLP Meets the World: Toward Improving Conversations With the Public About Natural Language Processing Research

Accurate generation of chemical reaction transition states by conditional flow matching

Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop

A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma

NeuTSFlow: Modeling Continuous Functions Behind Time Series Forecasting

THOR: Transformer Heuristics for On-Demand Retrieval

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Bridging Literature and the Universe Via A Multi-Agent Large Language Model System

Magneto-radiative modelling and artificial neural network optimization of biofluid flow in a stenosed arterial domain

Symbiosis: Multi-Adapter Inference and Fine-Tuning

Rethinking Data Protection in the (Generative) Artificial Intelligence Era

SoK: Semantic Privacy in Large Language Models

FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model

Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models

Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation

ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation

HueManity: Probing Fine-Grained Visual Perception in MLLMs

AKReF: An argumentative knowledge representation framework for structured argumentation

Large Language Models Often Know When They Are Being Evaluated

Dynamic Risk Assessments for Offensive Cybersecurity Agents

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models

Flow-GRPO: Training Flow Matching Models via Online RL

On the Need for a Statistical Foundation in Scenario-Based Testing of Autonomous Vehicles

What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift

TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons

MobileCity: An Efficient Framework for Large-Scale Urban Behavior Simulation

Semantic Adapter for Universal Text Embeddings: Diagnosing and Mitigating Negation Blindness to Enhance Universality

Leveraging LLMs for User Stories in AI Systems: UStAI Dataset

Large Language Models are Unreliable for Cyber Threat Intelligence

AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization

A Thorough Assessment of the Non-IID Data Impact in Federated Learning

Visual Position Prompt for MLLM based Visual Grounding

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction

FADE: Why Bad Descriptions Happen to Good Features

FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation

LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement

Towards Geo-Culturally Grounded LLM Generations

Learning to Reason at the Frontier of Learnability

Flexible and Efficient Grammar-Constrained Decoding

PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings

The Impact of Modern AI in Metadata Management

Learning an Effective Premise Retrieval Model for Efficient Mathematical Formalization

ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation

Many Objective Problems Where Crossover is Provably Essential

Patherea: Cell Detection and Classification for the 2020s

ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images

Quantifying calibration error in modern neural networks through evidence based theory

Multi-view biomedical foundation models for molecule-target and property prediction

Reinforced Imitative Trajectory Planning for Urban Automated Driving

Distilling Invariant Representations with Dual Augmentation

Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects

Linearly-Interpretable Concept Embedding Models for Text Analysis

Towards Understanding Link Predictor Generalizability Under Distribution Shifts

StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging

Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models

On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension

Programming Distributed Collective Processes in the eXchange Calculus

Holistic analysis on the sustainability of Federated Learning across AI product lifecycle

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

Epic-Sounds: A Large-scale Dataset of Actions That Sound

From Semantic Web and MAS to Agentic AI: A Unified Narrative of the Web of Agents

On Gradual Semantics for Assumption-Based Argumentation

The Challenge of Teaching Reasoning to LLMs Without RL or Distillation

Continuous Classification Aggregation

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

Lost in Transmission: When and Why LLMs Fail to Reason Globally

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

System 0/1/2/3: Quad-process theory for multi-timescale embodied collective cognitive systems

Practical Principles for AI Cost and Compute Accounting

Generative Emergent Communication: Large Language Model is a Collective World Model

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Learning Lifted STRIPS Models from Action Traces Alone: A Simple, General, and Scalable Solution

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

Life, uh, Finds a Way: Hyperadaptability by Behavioral Search

Governance of Generative Artificial Intelligence for Companies

RACER: Rational Artificial Intelligence Car-following-model Enhanced by Reality

Artificial Intelligence Governance for Businesses

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis

S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling

LLM-Based Config Synthesis requires Disambiguation

Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models

Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation

Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data

Mixture of Raytraced Experts

QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval

AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models

RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour

작성자

Haebom

저자

Valerii Serpiva, Artem Lykov, Artyom Myshlyaev, Muhammad Haris Khan, Ali Alridha Abdulkarim, Oleg Sautenkov, Dzmitry Tsetserukou

개요

RaceVLA는 시각-언어-행동(VLA)을 활용하여 인간과 유사한 행동을 모방하는 자율 주행 드론 내비게이션을 위한 혁신적인 접근 방식을 제시합니다. 본 연구는 드론이 실시간 환경 피드백에 따라 항법 전략을 적응시킬 수 있도록 하는 고급 알고리즘의 통합을 탐구하며, 인간 조종사의 의사결정 과정을 모방합니다. 수집된 레이싱 드론 데이터셋으로 미세 조정된 모델은 드론 레이싱 환경의 복잡성에도 불구하고 강력한 일반화 성능을 보여줍니다. RaceVLA는 OpenVLA에 비해 동작(75.0 대 60.0) 및 의미적 일반화(45.5 대 36.3)에서 우수한 성능을 보이며, 동적인 카메라와 단순화된 동작 작업의 이점을 누립니다. 그러나 다양한 크기의 물체가 있는 역동적인 환경에서의 조종의 어려움으로 인해 시각적(79.6 대 87.0) 및 물리적(50.0 대 76.7) 일반화는 약간 감소했습니다. RaceVLA는 또한 모든 축(시각적: 79.6 대 52.0, 동작: 75.0 대 55.0, 물리적: 50.0 대 26.7, 의미적: 45.5 대 38.8)에서 RT-2를 능가하여 복잡한 환경에서 실시간 조정에 대한 강력함을 보여줍니다. 실험 결과 평균 속도는 1.04m/s, 최대 속도는 2.02m/s이며, 일관된 조종성을 보여 RaceVLA가 고속 시나리오를 효과적으로 처리할 수 있음을 보여줍니다. 이러한 결과는 경쟁적인 레이싱 환경에서 고성능 내비게이션을 위한 RaceVLA의 잠재력을 강조합니다. RaceVLA 코드베이스, 사전 훈련된 가중치 및 데이터셋은 https://racevla.github.io/에서 사용할 수 있습니다.

RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour

racevla.github.io

시사점, 한계점

•

시사점:

◦

VLA 기반 접근 방식을 통해 인간과 유사한 자율 주행 드론 내비게이션 구현 가능성 제시.

◦

OpenVLA 및 RT-2 대비 우수한 성능으로 고속 및 복잡한 환경에서의 효과적인 항법 가능성 확인.

◦

공개된 코드베이스, 사전 훈련된 가중치 및 데이터셋을 통한 추가 연구 및 개발 가능성 확대.

•

한계점:

◦

다양한 크기의 물체가 있는 역동적인 환경에서의 시각적 및 물리적 일반화 성능 저하.

◦

평균 속도 및 최대 속도는 상대적으로 낮은 수치를 기록. (개선의 여지 존재)

Slashpage로 제작됨