/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records
Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
Adaptive Learning Strategies for Mitotic Figure Classification in MIDOG2025 Challenge
MitoDetect++: A Domain-Robust Pipeline for Mitosis Detection and Atypical Subtyping
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
Fantastic Pretraining Optimizers and Where to Find Them
Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework
TECP: Token-Entropy Conformal Prediction for LLMs
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management
Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees
Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare
Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection
HuggingGraph: Understanding the Supply Chain of LLM Ecosystem
Food safety trends across Europe: insights from the 392-million-entry CompreHensive European Food Safety (CHEFS) database
Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification
BayesSDF: Surface-Based Laplacian Uncertainty Estimation for 3D Geometry with Neural Signed Distance Fields
Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
AI-Assisted Rapid Crystal Structure Generation Towards a Target Local Environment
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning
AutoPDL: Automatic Prompt Optimization for LLM Agents
RailGoerl24: G\"orlitz Rail Test Center CV Dataset 2024
Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models
Spoof Trace Discovery for Deep Learning Based Explainable Face Anti-Spoofing
The Information Security Awareness of Large Language Models
Automatically Detecting Online Deceptive Patterns
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Automated detection of underdiagnosed medical conditions via opportunistic imaging
Selective Preference Optimization via Token-Level Reward Function Estimation
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation
PersonaGym: Evaluating Persona Agents and LLMs
CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test Cases
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Demystifying Chains, Trees, and Graphs of Thoughts
Survival Analysis with Adversarial Regularization
Net2Brain: A Toolbox to compare artificial vision models with human brain responses
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Dynamic Speculative Agent Planning
AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning
Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain
MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design
Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning
Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Epistemic Skills: Reasoning about Knowledge and Oblivion
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
GUIエージェント:A Survey
Neural Network Verification with PyRAT
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation
MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Scaling Performance of Large Language Model Pretraining
Recomposer: Event-roll-guided generative audio editing
COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization
Uncertain but Useful: Leveraging CNN Variability into Data Augmentation
CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks
Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet
AI Agents for Web Testing: A Case Study in the Wild
Accuracy-Constrained CNN Pruning for Efficient and Reliable EEG-Based Seizure Detection
Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination
GenAI-based test case generation and execution in SDV platform
ICR: Iterative Clarification and Rewriting for Conversational Search
ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions
Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization
Pointing-Guided Target Estimation via Transformer-Based Attention
Adversarial Augmentation and Active Sampling for Robust Cyber Anomaly Detection
LLM Enabled Multi-Agent System for 6G Networks: Framework and Method of Dual-Loop Edge-Terminal Collaboration
High-Resolution Global Land Surface Temperature Retrieval via a Coupled Mechanism-Machine Learning Framework
Exploring an implementation of quantum learning pipeline for support vector machines
DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation
Artificial intelligence for representing and characterizing quantum systems
PLaMo 2 Technical Report
SpiderNets: Estimating Fear Ratings of Spider-Related Images with Vision Models
The Paradox of Doom: Acknowledging Extinction Risk Reduces the Incentive to Prevent It
A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing
REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution
AI-Driven Fronthaul Link Compression in Wireless Communication Systems: Review and Method Design
Toward Accessible Dermatology: Skin Lesion Classification Using Deep Learning Models on Mobile-Acquired Images
Graph Unlearning: Efficient Node Removal in Graph Neural Networks
Enhancing Diversity in Large Language Models via Determinantal Point Processes
VARMA-Enhanced Transformer for Time Series Forecasting
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
Load more
Selective Preference Optimization via Token-Level Reward Function Estimation
Created by
Haebom
作者
Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou
概要
この論文では、大規模言語モデルのソートのための新しいオプションのソート戦略であるSePO(Selective Preference Optimization)を提案します。既存のトークンレベルのソート方法がすべてのトークンを最適化したり、複雑で高価なキートークン選択戦略を使用するのとは異なり、SePOは効率的なキートークン選択に焦点を当てています。 SePOは直接選好最適化(DPO)に基づいて最初のトークンを選択する方法を提示します。これはOracleモデルを訓練し、ターゲットデータのトークンレベル補償関数を推定します。この方法は、レスポンスレベルの注釈付きの既存のソートデータセットに適用でき、小規模なOracleモデルとトレーニングデータを使用して費用対効果の高いトークンの選択を可能にします。推定された補償関数は、ターゲットデータセット内のすべてのトークンをスコアリングするために使用され、参照モデルのない対照的な目的関数を使用してターゲットポリシーモデルを監視するには、キートークンのみが選択されます。 3つの公開評価ベンチマークの広範な実験の結果、SePOはターゲットデータセットの30%キートークンのみを最適化し、競合基準方法よりもパフォーマンスが大幅に向上することを示しています。弱い一般化から強い一般化へのSePO適用は、弱いOracleモデルが最大16.8倍のパラメータを持つ強力なポリシーモデルを効果的に監督することを示しています。さらに、SePOは、分布外データからキートークンを効果的に選択し、強力なポリシーモデルを改善し、過適合問題を軽減します。
Takeaways、Limitations
•
Takeaways:
◦
効率的なキートークンの選択により、既存のトークンレベルのソート方法の非効率性とノイズの問題を解決
◦
DPOベースの新しいトークンの選択方法提示と応答レベルのコメントのみを利用して、さまざまなデータセットに適用可能性を確保します。
◦
小規模なOracleモデルとトレーニングデータで費用対効果の高いトークンを選択できます。
◦
弱いOracleモデルが強力なポリシーモデルを効果的に監督できることを実験的に証明。
◦
分布外データからの鍵トークンの選択による強力なポリシーモデルの改善と過適合問題の緩和
◦
競合方法に対する性能向上を実験的に検証。
•
Limitations:
◦
DPOベースのOracleモデルのパフォーマンスへの依存性が高い。 Oracleモデルのパフォーマンスが低下すると、SePOのパフォーマンスも低下する可能性があります。
◦
鍵トークン選択戦略の一般化性能に関する追加研究の必要性特定のデータセットまたはタスクに過度に最適化される可能性があります。
◦
提案された方法のスケーラビリティと様々なモデルアーキテクチャへの適用性に関するさらなる研究の必要性
PDFを見る
Made with Slashpage