Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

VerilogLAVD: LLM-Aided Rule Generation for Vulnerability Detection in Verilog

HRS: Hybrid Representation Framework with Scheduling Awareness for Time Series Forecasting in Crowdsourced Cloud-Edge Platforms

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Vehicle detection from GSV imagery: Predicting travel behaviour for cycling and motorcycling using Computer Vision

CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description

Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering

SRMA-Mamba: Spatial Reverse Mamba Attention Network for Pathological Liver Segmentation in MRI Volumes

MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph

Fortifying the Agentic Web: A Unified Zero-Trust Architecture Against Logic-layer Threats

Generalized invariants meet constitutive neural networks: A novel framework for hyperelastic materials

What Matters for Bioacoustic Encoding

Age-Normalized HRV Features for Non-Invasive Glucose Prediction: A Pilot Sleep-Aware Machine Learning Study

Comparative Analysis of Time Series Foundation Models for Demographic Forecasting: Enhancing Predictive Accuracy in US Population Dynamics

FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection

An Explainable AI based approach for Monitoring Animal Health

Exploring Content and Social Connections of Fake News with Explainable Text and Graph Learning

Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs

Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks

To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression

Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records

A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges

FlowState: Sampling Rate Invariant Time Series Forecasting

Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks

Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder

SlotMatch: Distilling Temporally Consistent Object-Centric Representations for Unsupervised Video Segmentation

SBP-YOLO:A Lightweight Real-Time Model for Detecting Speed Bumps and Potholes

Dataset Condensation with Color Compensation

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

Spatial-Temporal Transformer with Curriculum Learning for EEG-Based Emotion Recognition

RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation

Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs

Segment Anything in Pathology Images with Natural Language

Neural Cellular Automata for ARC-AGI

Scaling Intelligence: Designing Data Centers for Next-Gen Language Models

ConTextTab: A Semantics-Aware Tabular In-Context Learner

PlantDeBERTa: An Open Source Language Model for Plant Science

Recipes for Pre-training LLMs with MXFP8

Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models

G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access

"Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs

VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning

LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation

Position: We Need Responsible, Application-Driven (RAD) AI Research

Blending 3D Geometry and Machine Learning for Multi-View Stereopsis

Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective Distractors

Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models

POPri: Private Federated Learning using Preference-Optimized Synthetic Data

Parameter-Efficient Continual Fine-Tuning: A Survey

Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling

Augmented Adversarial Trigger Learning

Setup Once, Secure Always: A Single-Setup Secure Federated Learning Aggregation Protocol with Forward and Backward Secrecy for Dynamic Users

DDD-GenDT: Dynamic Data-driven Generative Digital Twin Framework

Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes

Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted Dialogue Scripts and Therapeutic Strategies for Psychotherapy

SSD-TS: Exploring the Potential of Linear State Space Models for Diffusion Models in Time Series Imputation

Vision Backbone Efficient Selection for Image Classification in Low-Data Regimes

Boolean Matrix Logic Programming on the GPU

Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy

iTBLS: A Dataset of Interactive Conversations Over Tabular Information

Fusing Echocardiography Images and Medical Records for Continuous Patient Stratification

Joint Problems in Learning Multiple Dynamical Systems

Radio Map Estimation: Empirical Validation and Analysis

"I see models being a whole other thing": An Empirical Study of Pre-Trained Model Naming Conventions and A Tool for Enhancing Naming Consistency

LEGO: Learning and Graph-Optimized Modular Tracker for Online Multi-Object Tracking with Point Clouds

Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication

PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models

Modeling Relational Logic Circuits for And-Inverter Graph Convolutional Network

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

Modeling Uncertainty: Constraint-Based Belief States in Imperfect-Information Games

Data-Efficient Safe Policy Improvement Using Parametric Structure

Towards Urban Planing AI Agent in the Age of Agentic AI

Dispositions and Roles of Generically Dependent Entities

Efficient Network Automatic Relevance Determination

Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level Approach

Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots

Trust, but verify

Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning

Hawkeye:Efficient Reasoning with Model Collaboration

GoAI: Enhancing AI Students' Learning Paths and Idea Generation via Graph of AI Ideas

The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course

VRoPE: Rotary Position Embedding for Video Large Language Models

Where to Go Next Day: Multi-scale Spatial-Temporal Decoupled Model for Mid-term Human Mobility Prediction

LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation

Ask Good Questions for Large Language Models

Efficient Knowledge Graph Unlearning with Zeroth-order Information

Evaluating Identity Leakage in Speaker De-Identification Systems

ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization

Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models

Created by

Haebom

저자

Nanxing Hu, Xiaoyue Duan, Jinchao Zhang, Guoliang Kang

개요

본 논문은 대규모 비전-언어 모델(LVLM)에서 발생하는 환각(hallucination) 문제를 해결하기 위한 새로운 방법을 제안합니다. LVLM은 문맥적으로 일관된 텍스트를 생성하지만, 시각적 입력과 일치하지 않는 환각 현상을 보이는데, 이는 실제 응용에 걸림돌이 됩니다. 기존 연구는 특정 모달리티(시각 또는 텍스트)의 특징이나 출력을 개선하는 데 초점을 맞췄지만, 시각적 의존성을 명시적이고 체계적으로 향상시키지는 못했습니다. 본 논문에서는 베이지안 관점에서 LVLM의 텍스트 생성 과정에서 시각적 의존성을 저하시키는 요인들을 포괄적으로 조사하고, 이를 바탕으로 환각 문제를 완화하기 위한 세 가지 측면의 방법을 제시합니다. 첫째, 모든 시각 토큰이 의미있는 텍스트 생성에 유익한 것은 아니므로, 불필요한 시각 토큰을 제거하여 간섭을 방지합니다. 둘째, LVLM이 부적절한 사전 정보를 인코딩하여 예상치 못한 단어를 생성할 수 있으므로, 베이지안 관점에서 사전 정보를 수정합니다. 셋째, 특정 단계부터 시각 토큰을 조건으로 한 다음 토큰 예측의 사후 확률이 어떤 유익한 시각 토큰에도 의존하지 않는 사전 분포로 붕괴될 수 있으므로, 환각을 피하기 위해 추가적인 텍스트 생성을 중단합니다. POPE, CHAIR, MME 세 가지 벤치마크에 대한 광범위한 실험을 통해 제안된 방법이 LVLM의 환각 문제를 일관되게 완화하고 기존 최첨단 기술보다 우수한 성능을 보임을 입증합니다.

시사점, 한계점

•

시사점:

◦

LVLM의 환각 문제를 베이지안 관점에서 체계적으로 분석하고, 시각적 의존성을 향상시키는 효과적인 방법을 제시했습니다.

◦

불필요한 시각 정보 제거, 사전 정보 수정, 생성 중단 등 세 가지 측면의 접근을 통해 환각 문제 완화에 기여했습니다.

◦

세 가지 벤치마크에서 기존 최첨단 모델보다 우수한 성능을 달성하여 실용적인 효과를 입증했습니다.

•

한계점:

◦

제안된 방법의 효과는 특정 벤치마크 데이터셋에 국한될 수 있습니다. 다양한 데이터셋과 LVLM 아키텍처에 대한 추가적인 실험이 필요합니다.

◦

베이지안 관점에 기반한 분석 및 방법론은 계산 비용이 높을 수 있습니다. 실시간 응용을 위한 효율적인 구현 전략이 필요합니다.

◦

환각의 원인에 대한 분석이 베이지안 관점에 국한되어 다른 관점에서의 분석이 추가적으로 필요할 수 있습니다.

Made with Slashpage