[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
A Roadmap for Climate-Relevant Robotics Research
Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening
MMOne: Representing Multiple Modalities in One Scene
SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
(Almost) Free Modality Stitching of Foundation Models
A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
THOR: Transformer Heuristics for On-Demand Retrieval
SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems
KeyRe-ID: Keypoint-Guided Person Re-Identification using Part-Aware Representation in Videos
Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model
Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
ReCode: Updating Code API Knowledge with Reinforcement Learning
Cross-Layer Discrete Concept Discovery for Interpreting Language Models
Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Multiple-Frequencies Population-Based Training
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
GPU Performance Portability needs Autotuning
Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID
Coral Protocol: Open Infrastructure Connecting The Internet of Agents
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
KP Quantum Neural Networks
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Data-Efficient Deep Operator Network for Unsteady Flow: A Multi-Fidelity Approach with Physics-Guided Subsampling
Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion
GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial and Legged Odometry Fusion SLAM for Dynamic Legged Robotics
A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
Multi-View Node Pruning for Accurate Graph Representation
V-Max: A Reinforcement Learning Framework for Autonomous Driving
Interpretable Transformation and Analysis of Timelines through Learning via Surprisability
AI Governance InternationaL Evaluation Index (AGILE Index) 2024
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Improving Transformer World Models for Data-Efficient RL
LLM-RecG: A Semantic Bias-Aware Framework for Zero-Shot Sequential Recommendation
SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks
Determination of galaxy photometric redshifts using Conditional Generative Adversarial Networks (CGANs)
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning
Dataset resulting from the user study on comprehensibility of explainable AI algorithms
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information
DeFine: Decision-Making with Analogical Reasoning over Factor Profiles
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Risks of ignoring uncertainty propagation in AI-augmented security pipelines
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs
Leveraging Quantum Superposition to Infer the Dynamic Behavior of a Spatial-Temporal Neural Network Signaling Model
Bounding the Worst-class Error: A Boosting Approach
TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph
Machine Learning Systems: A Survey from a Data-Oriented Perspective
Aime: Towards Fully-Autonomous Multi-Agent Framework
SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control
Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments
NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons
Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
BEARCUBS: A benchmark for computer-using web agents
Demystifying MuZero Planning: Interpreting the Learned Model
LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Imbalance in Balance: Online Concept Balancing in Generation Models
Latent Policy Steering with Embodiment-Agnostic Pretrained World Models
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour
Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Voxtral
Merge Kernel for Bayesian Optimization on Permutation Space
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Automating Steering for Safe Multimodal Large Language Models
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models
VITA: Vision-to-Action Flow Matching Policy
$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection
Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks
Prompt Injection 2.0: Hybrid AI Threats
Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Load more
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Created by
Haebom
作者
Hao Sun, Mihaela van der Schaar
概要
本論文は、大規模言語モデル(LLM)のソート問題に関する最近の研究動向を、逆強化学習(IRL)の観点から総合的に検討します。ソートのためのIRLの最近の発展、主な課題と機会、データセット、ベンチマーク、評価指標、インフラストラクチャ、計算効率的な訓練および推論技術などの実質的な側面を網羅します。そして、IRL技術によるLLMアライメントを改善するための有望な将来の方向性を提案することを目指しています。
Takeaways、Limitations
•
Takeaways:
◦
LLMソートのためのIRLの最近の進歩の包括的なレビューを提供します。
◦
LLMソートでの強化学習と既存の強化学習の違いを明確にします。
◦
人間のデータ駆動ニューラルネットワーク補償モデルの構成の重要性を強調します。
◦
データセット、ベンチマーク、評価指標、インフラストラクチャなどの実用的な側面を検討してください。
◦
希少報酬強化学習研究に基づき、将来の研究方向を提示します。
•
Limitations:
◦
本論文自体がまだ発表されていない事前印刷段階の論文であるため、実際の研究結果の検証が必要です。
◦
さまざまな研究結果を総合的に提示しますが、個々の研究のLimitationsの詳細な議論は不十分かもしれません。
◦
特定のIRLテクニックまたはLLMアライメント方法の偏った視点を持つ可能性があります。
◦
急速に発展する分野なので、論文発表時点以降、新たな研究結果が登場し、議論の一部が古くなる可能性があります。
PDFを見る
Made with Slashpage