Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Dual-level Progressive Hardness-Aware Reweighting for Cross-View Geo-Localization

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

The Narrative Continuity Test: A Conceptual Framework for Evaluating Identity Persistence in AI Systems

DMVFC: Deep Learning Based Functionally Consistent Tractography Fiber Clustering Using Multimodal Diffusion MRI and Functional MRI

Rethinking Visual Intelligence: Insights from Video Pretraining

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Decoder-Only Transformers

Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation

GroupSHAP-Guided Integration of Financial News Keywords and Technical Indicators for Stock Price Prediction

Knowledge-guided Continual Learning for Behavioral Analytics Systems

A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation

ADPO: Anchored Direct Preference Optimization

Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness

Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction

Automotive Crash Dynamics Modeling Accelerated with Machine Learning

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

MedREK: Retrieval-Based Editing for Medical LLMs with Key-Aware Prompts

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

Dynamic Topic Evolution with Temporal Decay and Attention in Large Language Models

Debiasing LLMs by Masking Unfairness-Driving Attention Heads

What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

Measuring Algorithmic Partisanship via Zero-Shot Classification and Its Implications on Political Discourse

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

EmbeddingGemma: Powerful and Lightweight Text Representations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Lattice Boltzmann Model for Learning Real-World Pixel Dynamicity

Beyond Pointwise Scores: Decomposed Criteria-Based Evaluation of LLM Responses

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models

Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning

RL Fine-Tuning Heals OOD Forgetting in SFT

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data

Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation

Language Native Lightly Structured Databases for Large Language Model Driven Composite Materials Research

EndoGMDE: Generalizable Monocular Depth Estimation with Mixture of Low-Rank Experts for Diverse Endoscopic Scenes

Multi-Focused Video Group Activities Hashing

OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews

Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency

Learning to Steer: Input-dependent Steering for Multimodal LLMs

Combinative Matching for Geometric Shape Assembly

Identity Increases Stability in Neural Cellular Automata

Why Attention Fails: A Taxonomy of Faults in Attention-Based Neural Networks

A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design

Dynamic Forgetting and Spatio-Temporal Periodic Interest Modeling for Local-Life Service Recommendation

Recognising, Anticipating, and Mitigating LLM Pollution of Online Behavioural Research

Music Arena: Live Evaluation for Text-to-Music

A Self-Evolving AI Agent System for Climate Science

H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding

Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

A Collectivist, Economic Perspective on AI

On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study

Context Tuning for In-Context Optimization

AI-Generated Video Detection via Perceptual Straightening

PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Over-squashing in Spatiotemporal Graph Neural Networks

Flat Channels to Infinity in Neural Loss Landscapes

Balancing Caregiving and Self-Care: Exploring Mental Health Needs of Alzheimer's and Dementia Caregivers

Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models

ConTextTab: A Semantics-Aware Tabular In-Context Learner

Non-Contact Health Monitoring During Daily Personal Care Routines

MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification

LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios

In Dialogue with Intelligence: Rethinking Large Language Models as Collective Knowledge

Spatial Knowledge Graph-Guided Multimodal Synthesis

Policy Optimized Text-to-Image Pipeline Design

Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies

Exploring the Hidden Capacity of LLMs for One-Step Text Generation

KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

Exploring the limits of strong membership inference attacks on large language models

Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation

Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning

Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally

Multi-head Temporal Latent Attention

New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models

Learning Repetition-Invariant Representations for Polymer Informatics

Evaluating Simplification Algorithms for Interpretability of Time Series Classification

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

Memory Assisted LLM for Personalized Recommendation System

Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

Created by

Haebom

저자

Sriram Balasubramanian, Samyadeep Basu, Soheil Feizi

개요

본 논문은 Chain-of-thought (CoT) 추론이 대형 시각-언어 모델 (LVLM)의 성능을 향상시키지만, 이러한 추론 과정이 모델의 내부 프로세스를 충실히 반영하는지에 대한 의문을 제기하며, CoT 충실성에 대한 연구를 수행한다. 특히 텍스트 기반 및 이미지 기반 편향이 추론과 편향 표현에 미치는 영향을 조사하고, 새로운 세분화된 평가 파이프라인을 도입하여 CoT 추론을 정밀하게 분석한다. 이 프레임워크를 통해 모델이 다양한 유형의 편향에 어떻게 반응하는지에 대한 새로운 통찰력을 얻고, 모델이 "불일치" 추론 현상을 보일 수 있음을 발견한다. 또한, 동일한 평가 파이프라인을 사용하여 다양한 수준의 암시적 단서에 대한 LLM의 CoT 충실성을 재검토한다.

시사점, 한계점

•

LVLM에서 미묘한 이미지 기반 편향은 명시적인 텍스트 기반 편향에 비해 거의 표현되지 않는다.

•

많은 모델이 "불일치" 추론 현상을 보이며, 이는 편향된 추론을 감지하는 지표가 될 수 있다.

•

현재 언어 전용 추론 모델은 명시적으로 언급되지 않은 단서를 표현하는 데 어려움을 겪는다.

•

새로운 세분화된 평가 파이프라인을 통해 CoT 추론에 대한 보다 정확한 분석이 가능하다.

•

LVLM CoT 충실성에 대한 추가 연구의 필요성을 제기한다.

Made with Slashpage