Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Sekai: A Video Dataset towards World Exploration

Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction

One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution

Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis

WebXAII: an open-source web framework to study human-XAI interaction

Refining music sample identification with a self-supervised graph neural network

Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models

Essential-Web v1.0: 24T tokens of organized web data

Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Serving Large Language Models on Huawei CloudMatrix384

Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review

Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning

Two Heads Are Better than One: Simulating Large Transformers with Small Ones

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and developing a Transformer Implementation for Breast Cancer Treatment Response Prediction

Semantic Preprocessing for LLM-based Malware Analysis

A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models

Human-like Forgetting Curves in Deep Neural Networks

Convergent Linear Representations of Emergent Misalignment

LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Autonomous Computer Vision Development with Agentic AI

The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI

SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks

Using Language and Road Manuals to Inform Map Reconstruction for Autonomous Driving

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

PlantBert: An Open Source Language Model for Plant Science

Info-Coevolution: An Efficient Framework for Data Model Coevolution

SDE-SQL: Enhancing Text-to-SQL Generation in Large Language Models via Self-Driven Exploration with SQL Probes

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation

Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient Contribution

Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning

CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement

Dynamic Risk Assessments for Offensive Cybersecurity Agents

SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation

Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression

Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

Learning Dynamics in Continual Pre-Training for Large Language Models

Mask-PINNs: Regulating Feature Distributions in Physics-Informed Neural Networks

Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities

SPIN-ODE: Stiff Physics-Informed Neural ODE for Chemical Reaction Rate Estimation

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

DeepSelective: Interpretable Prognosis Prediction via Feature Selection and Compression in EHR Data

Boosting multi-demographic federated learning for chest radiograph analysis using general-purpose self-supervised representations

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization

TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models

Decentralized Collective World Model for Emergent Communication and Coordination

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack

LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions

Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies

QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning

Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments

Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

Batayan: A Filipino NLP benchmark for evaluating Large Language Models

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization

ShapeLib: Designing a library of programmatic 3D shape abstractions with Large Language Models

Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning

Guaranteed prediction sets for functional surrogate models

FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting

Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning

MonoSOWA: Scalable monocular 3D Object detector Without human Annotations

Representation Learning of Point Cloud Upsampling in Global and Local Inputs

Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts

Incivility and Rigidity: The Risks of Fine-Tuning LLMs for Political Argumentation

Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control

Learning Multi-Branch Cooperation for Enhanced Click-Through Rate Prediction at Taobao

On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse

Web Archives Metadata Generation with GPT-4o: Challenges and Insights

Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation

A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system

Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving

ALTA: Compiler-Based Analysis of Transformers

Learning to Route LLMs with Confidence Tokens

Core Knowledge Deficits in Multi-Modal Language Models

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

Can Large Language Models Replace Human Subjects? A Large-Scale Replication of Scenario-Based Experiments in Psychology and Management

LogProber: Disentangling confidence from contamination in LLM responses

V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach

PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval

A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Created by

Haebom

저자

Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy

개요

본 논문은 대규모 언어 모델(LLM)의 추론 능력을 이해하기 위해 최소한의 명제 논리 문제를 사용하여 Mistral 및 Gemma 모델(최대 27B 매개변수)을 연구합니다. 인과 매개 분석을 통해 LLM의 추론 과정의 경로와 구성 요소를 밝히고, 각 계층에서 어텐션 헤드의 기능에 대한 세밀한 통찰력을 제공합니다. 결과적으로, 답을 계산하는 드문 회로를 발견하고, 이를 네 가지 독립적이고 모듈식 용도를 가진 하위 회로로 분해합니다. 마지막으로, Mistral-7B, Gemma-2-9B, Gemma-2-27B 세 가지 모델에 유사하지만 동일하지 않은 메커니즘이 존재함을 보여줍니다.

시사점, 한계점

•

시사점:

◦

LLM의 추론 과정을 이해하기 위한 새로운 접근 방식(최소한의 명제 논리 문제와 인과 매개 분석) 제시.

◦

LLM이 추론 문제를 해결하는 데 사용하는 핵심 구성 요소 및 드문 회로 규명.

◦

추론 과정의 모듈성과 하위 회로의 기능에 대한 세밀한 분석 제공.

◦

서로 다른 크기의 LLM에서 유사하지만 동일하지 않은 메커니즘의 존재 확인.

•

한계점:

◦

연구에 사용된 명제 논리 문제가 최소화된 예시이므로, 더 복잡한 문제에 대한 일반화 가능성 제한.

◦

분석 대상 모델이 Mistral과 Gemma로 제한되어 다른 LLM 아키텍처에 대한 일반화 가능성 제한.

◦

인과 매개 분석의 해석에 대한 주관성 존재 가능성.

Made with Slashpage