/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
Comparative Analysis of Transformer Models in Disaster Tweet Classification for Public Safety
Emergent Social Dynamics of LLM Agents in the El Farol Bar Problem
The Good, the Bad and the Constructive: Automatically Measuring Peer Review's Utility for Authors
Energy Landscapes Enable Reliable Abstention in Retrieval-Augmented Large Language Models for Healthcare
DEXOP: A Device for Robotic Transfer of Dexterous Human Manipulation
Reinforcement Learning for Robust Ageing-Aware Control of Li-ion Battery Systems with Data-Driven Formal Verification
RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models
Gravity Well Echo Chamber Modeling With An LLM-Based Confirmation Bias Model
Insights from Gradient Dynamics: Gradient Autoscaled Normalization
Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
DCPO: Dynamic Clipping Policy Optimization
DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
Can AI be Auditable?
Robotic Fire Risk Detection based on Dynamic Knowledge Graph Reasoning: An LLM-Driven Approach with Graph Chain-of-Thought
Navigating the EU AI Act: Foreseeable Challenges in Qualifying Deep Learning-Based Automated Inspections of Class III Medical Devices
Complementary Learning System Empowers Online Continual Learning of Vehicle Motion Forecasting in Smart Cities
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts
QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
MovieCORE: COgnitive REasoning in Movies
Automatic Prompt Optimization with Prompt Distillation
Membership Inference Attacks on LLM-based Recommender Systems
Leveraging Large Language Models for Accurate Sign Language Translation in Low-Resource Scenarios
Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning
Convergence and Generalization of Anti-Regularization for Parametric Models
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
Bridging Generalization and Personalization in Human Activity Recognition via On-Device Few-Shot Learning
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
Using Artificial Intuition in Distinct, Minimalist Classification of Scientific Abstracts for Management of Technology Portfolios
Semantic Discrepancy-aware Detector for Image Forgery Identification
Quantum-Efficient Reinforcement Learning Solutions for Last-Mile On-Demand Delivery
BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning
Real-Time Analysis of Unstructured Data with Machine Learning on Heterogeneous Architectures
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
An Efficient Continuous-Time MILP for Integrated Aircraft Hangar Scheduling and Layout
DIRF: A Framework for Digital Identity Protection and Clone Governance in Agentic AI Systems
COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA
Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning
LanternNet: A Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations
RecPS: Privacy Risk Scoring for Recommender Systems
Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)
Role-Playing LLM-Based Multi-Agent Support Framework for Detecting and Addressing Family Communication Bias
PLAME: Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings
Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles
Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model
Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
Precise Bayesian Neural Networks
Transit for All: Mapping Equitable Bike2Subway Connection using Region Representation Learning
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems
SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies
Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Bipedal Balance Control with Whole-body Musculoskeletal Standing and Falling Simulations
Scaling Laws of Motion Forecasting and Planning - Technical Report
Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems
Unsupervised Evolutionary Cell Type Matching via Entropy-Minimized Optimal Transport
Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
Steering LLM Reasoning Through Bias-Only Adaptation
MetaSTH-Sleep: Towards Effective Few-Shot Sleep Stage Classification for Health Management with Spatial-Temporal Hypergraph Enhanced Meta-Learning
InterFeat: A Pipeline for Finding Interesting Scientific Features
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Advancing Scientific Text Classification: Fine-Tuned Models with Dataset Expansion and Hard-Voting
Test It Before You Trust It: Applying Software Testing for Trustworthy In-context Learning
Action Flow Matching for Continual Robot Learning
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Byzantine-Robust Federated Learning Using Generative Adversarial Networks
Beyond SHAP and Anchors: A large-scale experiment on how developers struggle to design meaningful end-user explanations
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation
Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning
Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support
Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models
CHIRLA: Comprehensive High-resolution Identification and Re-identification for Large-scale Analysis
Kolmogorov-Arnold Fourier Networks
Position: LLMs Can be Good Tutors in English Education
Predicting Steady-State Behavior in Complex Networks with Graph Neural Networks
Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models
Motion-enhanced Cardiac Anatomy Segmentation via an Insertable Temporal Attention Module
Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
DispFormer: A Pretrained Transformer Incorporating Physical Constraints for Dispersion Curve Inversion
Integrating Evidence into the Design of XAI and AI-based Decision Support Systems: A Means-End Framework for End-users in Construction
Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection
Neural Port-Hamiltonian Differential Algebraic Equations for Compositional Learning of Electrical Networks
Sequential Controlled Langevin Diffusions
Privacy-Preserving Federated Learning via Homomorphic Adversarial Networks
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Lessons from Studying Two-Hop Latent Reasoning
HierTOD: A Task-Oriented Dialogue System Driven by Hierarchical Goals
Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs
FACEGroup: Feasible and Actionable Counterfactual Explanations for Group Fairness
ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
Load more
Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning
Created by
Haebom
作者
Hongyi Cai, Jie Li, Mohammad Mahdinur Rahman, Wenzhen Dong
概要
本稿では、大規模言語モデルのディレクティブの微調整効率を向上させるための新しいフィルタリングフレームワークであるLow-Confidence Gold(LCG)を提案します。 LCGは、中心ベースのクラスタリングと信頼性ベースの選択を活用して、価値あるディレクティブのペアを識別します。軽量分類器を使用した準マップ学習方式により、高品質のサブセットを作成しながらデータの多様性を維持します。実験の結果、LCGでフィルタリングされた6Kサンプルで微調整されたモデルは、従来の方法よりも優れた性能を示し、MT-benchで大幅な性能向上と総合的な評価指標全体で一貫した性能向上を示しました。モデルのパフォーマンスを維持しながら効率を高めるフレームワークの効果は、効率的なディレクティブの微調整のための有望な方向を提示します。
Takeaways、Limitations
•
Takeaways:
◦
LCGフレームワークは、少量の高品質データだけで大規模言語モデルのディレクティブの微調整性能を向上させることができます。
◦
従来のバルクデータに基づく微調整方法と比較して効率的なディレクティブ微調整方法を提示する。
◦
中心ベースのクラスタリングと信頼性ベースの選択を組み合わせた新しいデータフィルタリング手法の有効性を実証
◦
MTベンチを含む様々な評価指標で一貫した性能向上を達成する。
•
Limitations:
◦
LCGの性能は軽量分類器の性能に依存し得る。
◦
6Kという限られたデータサイズで実験を進め、より大きな規模のデータセットの一般化性能には追加の研究が必要です。
◦
特定の種類のディレクティブまたはデータセットに偏っている可能性があります。
◦
フレームワークの一般化の可能性について追加の検証が必要です。
PDFを見る
Made with Slashpage