/
/
Daily Arxiv
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
CoRT: Code-integrated Reasoning within Thinking
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
Policy-Based Trajectory Clustering in Offline Reinforcement Learning
Understanding Human-AI Trust in Education
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning
CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Robotic Policy Learning via Human-assisted Action Preference Optimization
LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models
QuantMCP: Grounding Large Language Models in Verifiable Financial Reality
Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR
Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery
Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Context Is Not Comprehension
High Performance Space Debris Tracking in Complex Skylight Backgrounds with a Large-Scale Dataset
SALAD: Systematic Assessment of Machine Unlearing on LLM-Aided Hardware Design
iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning
Quantum AIXI: Universal Intelligence via Quantum Information
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
QuXAI: Explainers for Hybrid Quantum Machine Learning Models
Convert Language Model into a Value-based Strategic Planner
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Token-Efficient RL for LLM Reasoning
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark
Elucidating the Design Space of Multimodal Protein Language Models
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Content ARCs: Decentralized Content Rights in the Age of Generative AI
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
Computation Mechanism Behind LLM Position Generalization
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem
Generative Uncertainty in Diffusion Models
EgoNormia: Benchmarking Physical Social Norm Understanding
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
From Features to Graphs: Exploring Graph Structures and Pairwise Interactions via GNNs
Object-Centric Latent Action Learning
Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation
TransMLA: Multi-Head Latent Attention Is All You Need
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
Prompt-based Depth Pruning of Large Language Models
Great Models Think Alike and this Undermines AI Oversight
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
Latent Action Learning Requires Supervision in the Presence of Distractors
SR-Reward: Taking The Path More Traveled
Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs
SoK: Watermarking for AI-Generated Content
Engagement-Driven Content Generation with Large Language Models
PyGen: A Collaborative Human-AI Approach to Python Package Creation
DAWN: Designing Distributed Agents in a Worldwide Network
Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling
Center-fixing of tropical cyclones using uncertainty-aware deep learning applied to high-temporal-resolution geostationary satellite imagery
LLM-Cure: LLM-based Competitor User Review Analysis for Feature Enhancement
Deploying Open-Source Large Language Models: A performance Analysis
Neural Networks Generalize on Low Complexity Data
M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture
Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation
Multi-group Uncertainty Quantification for Long-form Text Generation
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
Privacy-Aware Spectrum Pricing and Power Control Optimization for LEO Satellite Internet-of-Things
IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language
Incentivizing Quality Text Generation via Statistical Contracts
Visually Descriptive Language Model for Vector Graphics Reasoning
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge
IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection
Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation
ConvD: Attention Enhanced Dynamic Convolutional Embeddings for Knowledge Graph Completion
Noise Balance and Stationary Distribution of Stochastic Gradient Descent
The Packing Chromatic Number of the Infinite Square Grid is 15
Reinforcing Multimodal Understanding and Generation with Dual Self-rewards
A Proposal to Extend the Common Model of Cognition with Metacognition
The Optimization Paradox in Clinical AI Multi-Agent Systems
CHANCERY: Evaluating Corporate Governance Reasoning Capabilities in Language Models
DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDE
Beamforming and Resource Allocation for Delay Optimization in RIS-Assisted OFDM Systems
Evaluation of LLMs for mathematical problem solving
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets
A Heuristic Algorithm Based on Beam Search and Iterated Local Search for the Maritime Inventory Routing Problem
A Vision for Auto Research with LLM Agents
AssistanceZero: Scalably Solving Assistance Games
Don't Lag, RAG: Training-Free Adversarial Detection Using RAG
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Training-Free Safe Denoisers for Safe Use of Diffusion Models
CollabLLM: From Passive Responders to Active Collaborators
Position: Theory of Mind Benchmarks are Broken for Large Language Models
Load more
Learned Collusion
Created by
Haebom
저자
Olivier Compte (Paris School of Economics)
개요
본 논문은 Q-러닝 기반의 자동 기계 시스템에서 특정 행동(예: 협력)을 선호하는 편향을 도입하여, 협력 또는 공모를 촉진하는 연구를 다룬다. 기존 Q-러닝의 단순 최대 Q-값 선택 정책과 달리, 시스템적으로 특정 행동을 우선하는 정책을 제안하고, 로그잇/최적반응 역학을 통해 수렴하는 안정적인 균형 편향을 찾는다. 이러한 편향은 초기 Q-값과 관계없이 다양한 보상 구조와 모니터링 구조에서 공모 또는 협력을 강력하게 촉진한다.
시사점, 한계점
•
시사점:
◦
Q-러닝 기반 시스템에서 협력 또는 공모를 효과적으로 유도하는 새로운 방법 제시
◦
초기 Q-값에 독립적인 안정적인 균형 편향 발견
◦
다양한 보상 구조와 모니터링 구조에서 적용 가능성 확인
◦
로그잇/최적반응 역학을 통한 효율적인 학습 가능성 제시
•
한계점:
◦
제안된 편향 메커니즘의 일반화 가능성에 대한 추가 연구 필요
◦
실제 시스템에 적용 시 발생할 수 있는 문제점 및 한계에 대한 분석 필요
◦
다른 학습 알고리즘과의 비교 분석 부족
◦
특정 행동에 대한 편향 설정의 최적화 방법에 대한 추가 연구 필요
PDF 보기
Made with Slashpage