Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Hakim: Farsi Text Embedding Model

Fast Text-to-Audio Generation with Adversarial Post-Training

Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Intelligent Product 3.0: Decentralised AI Agents and Web3 Intelligence Standards

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

Decoding Futures Price Dynamics: A Regularized Sparse Autoencoder for Interpretable Multi-Horizon Forecasting and Factor Discovery

Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering

SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment

Don't be lazy: CompleteP enables compute-efficient deep transformers

Llama-Nemotron: Efficient Reasoning Models

Rethinking Time Encoding via Learnable Transformation Functions

CrashFixer: A crash resolution agent for the Linux kernel

Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges

Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

The Problem of the Priors, or Posteriors?

Simulating and Analysing Human Survey Responses with Large Language Models: A Case Study in Energy Stated Preference

Learning Autonomy: Off-Road Navigation Enhanced by Human Input

InductionBench: LLMs Fail in the Simplest Complexity Class

Activation Steering in Neural Theorem Provers

PropNet: a White-Box and Human-Like Network for Sentence Representation

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

Learning Traffic Anomalies from Generative Models on Real-Time Observations

High-temperature superconductivity in Li$_2$AuH$_6$ mediated by strong electron-phonon coupling under ambient pressure

A Bio-Inspired Research Paradigm of Collision Perception Neurons Enabling Neuro-Robotic Integration: The LGMD Case

ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System

Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans?

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering

Deep Signature: Characterization of Large-Scale Molecular Dynamics

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging

Fragment-Masked Diffusion for Molecular Optimization

Public Constitutional AI

Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation

DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Detecting Multimedia Generated by Large AI Models: A Survey

Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search

EiHi Net: Out-of-Distribution Generalization Paradigm

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

SafeMate: A Modular RAG-Based Agent for Context-Aware Emergency Guidance

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

AdaWorld: Learning Adaptable World Models with Latent Actions

Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming

An Analytical Emotion Framework of Rumour Threads on Social Media

Deontic Temporal Logic for Formal Verification of AI Ethics

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Exploiting Uncertainty for Querying Inconsistent Description Logics Knowledge Bases

Learning to Be Cautious

Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Variational Visual Question Answering

Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations

Learning Long-Context Diffusion Policies via Past-Token Prediction

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

Preserving Plasticity in Continual Learning with Adaptive Linearity Injection

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities

A 2D Semantic-Aware Position Encoding for Vision Transformers

Quantum state-agnostic work extraction (almost) without dissipation

Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits

Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units

FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform

TensorRL-QAS: Reinforcement learning with tensor networks for scalable quantum architecture search

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems

BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

Neural Video Compression using 2D Gaussian Splatting

Toward Fair Federated Learning under Demographic Disparities and Data Imbalance

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases

Educational impacts of generative artificial intelligence on learning and performance of engineering students in China

InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials

DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection

An Initial Exploration of Default Images in Text-to-Image Generation

A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning

ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor

Fair Clustering via Alignment

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Created by

Haebom

저자

Abdullah Mushtaq, Imran Taj, Rafay Naeem, Ibrahim Ghaznavi, Junaid Qadir

개요

본 논문은 서구 중심적인 지식 체계와 사회문화적 규범을 강화하는 방식으로 주로 훈련되고 정렬되는 대규모 언어 모델(LLM)의 한계를 지적하며, 이로 인한 문화적 동질화와 세계 문명의 다양성을 반영하는 능력의 제한을 논의합니다. 기존 벤치마킹 프레임워크는 문화적 포용성의 복잡성을 간과하는 경직된 평가 방식에 의존하기 때문에 이러한 편향을 충분히 포착하지 못합니다. 이를 해결하기 위해, 본 논문은 다양한 세계관을 수용하는 LLM의 능력을 분석하여 LLM의 세계 문화 포용성(GCI)을 평가하도록 설계된 WorldView-Bench 벤치마크를 제시합니다. Senturk 등의 다중 세계관(Multiplex Worldview)에 기반하여, 문화적 동질화를 강화하는 단일 세계관(Uniplex) 모델과 다양한 관점을 통합하는 다중 세계관(Multiplex) 모델을 구분하고, 전통적인 범주형 벤치마크가 아닌 자유 형식의 생성적 평가를 통해 대안적 관점의 배제인 문화적 양극화를 측정합니다. 맥락적으로 구현된 다중 LLM(Contextually-Implemented Multiplex LLMs)과 다중 에이전트 시스템(MAS)-구현 다중 LLM(MAS-Implemented Multiplex LLMs)이라는 두 가지 개입 전략을 통해 응용 다중성을 구현합니다. 결과적으로 MAS-구현 다중 LLM을 사용하면 관점 분포 점수(PDS) 엔트로피가 기준선 13%에서 94%로 크게 증가하고, 긍정적 정서(67.7%)로 이동하고 문화적 균형이 향상됨을 보여줍니다. 이러한 결과는 LLM의 문화적 편향을 완화하고 더욱 포괄적이고 윤리적으로 정렬된 AI 시스템을 위한 길을 열어주는 다중 인식 AI 평가의 잠재력을 강조합니다.

시사점, 한계점

•

시사점:

◦

LLM의 문화적 편향을 평가하기 위한 새로운 벤치마크인 WorldView-Bench 제시.

◦

다중 세계관(Multiplex Worldview) 개념을 활용한 LLM 평가 및 개선 전략 제시.

◦

다중 에이전트 시스템(MAS)을 활용한 LLM의 문화적 포용성 향상 가능성 확인.

◦

LLM의 문화적 편향 완화를 위한 새로운 접근법 제시 및 향후 연구 방향 제시.

•

한계점:

◦

WorldView-Bench의 일반화 가능성 및 다른 문화적 맥락에 대한 적용 가능성에 대한 추가 연구 필요.

◦

다중 에이전트 시스템(MAS)의 복잡성 및 효율성에 대한 추가 연구 필요.

◦

현재 벤치마크가 포착하지 못하는 문화적 뉘앙스 및 미묘한 편향의 존재 가능성.

◦

다양한 문화적 배경을 가진 평가자들의 주관성이 결과에 미치는 영향에 대한 고려 필요.

Made with Slashpage