Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

M3DR: Towards Universal Multilingual Multimodal Document Retrieval

Physics-Driven Learning Framework for Tomographic Tactile Sensing

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

Cell-cell communication inference and analysis: biological mechanisms, computational approaches, and future opportunities

ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models

Learning From Limited Data and Feedback for Cell Culture Process Monitoring: A Comparative Study

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles

GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers

Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation

World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations

BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents

Better World Models Can Lead to Better Post-Training Performance

VS-Graph: Scalable and Efficient Graph Classification Using Hyperdimensional Computing

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting

HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration

Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in Echocardiography

Single-Round Scalable Analytic Federated Learning

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction

Retrofitting Earth System Models with Cadence-Limited Neural Operator Updates

Robust Tabular Foundation Models

HydroDCM: Hydrological Domain-Conditioned Modulation for Cross-Reservoir Inflow Prediction

Adaptive Regime-Switching Forecasts with Distribution-Free Uncertainty: Deep Switching State-Space Models Meet Conformal Prediction

BlendedNet++: A Large-Scale Blended Wing Body Aerodynamics Dataset and Benchmark

Thucy: An LLM-based Multi-Agent System for Claim Verification across Relational Databases

PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery

Learning Network Sheaves for AI-native Semantic Communication

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy

InvertiTune: High-Quality Data Synthesis for Cost-Effective Single-Shot Text-to-Knowledge Graph Generation

Ultra-Strong Gradient Diffusion MRI with Self-Supervised Learning for Prostate Cancer Characterization

Plantain: Plan-Answer Interleaved Reasoning

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra

Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

The BEAT-CF Causal Model: A model for guiding the design of trials and observational analyzes of cystic fibrosis exacerbations

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

Public Sentiment Analysis of Traffic Management Policies in Knoxville: A Social Media Driven Study

Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration

ALARM: Automated MLLM-Based Anomaly Detection in Complex-Environment Monitoring with Uncertainty Quantification

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks

QGShap: Quantum Acceleration for Faithful GNN Explanations

Community Quality and Influence Maximization: An Empirical Study

Password-Activated Shutdown Protocols for Misaligned Frontier Agents

When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI

Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation

Alleviating Choice Supportive Bias in LLM with Reasoning Dependency Generation

AtomDisc: An Atom-level Tokenizer that Boosts Molecular LLMs and Reveals Structure--Property Associations

Irresponsible AI: big tech's influence on AI research and associated impacts

Will Power Return to the Clouds? From Divine Authority to GenAI Authority

Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering

Mixed Data Clustering Survey and Challenges

Hierarchical clustering of complex energy systems using pretopology

Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation

Quantifying the Potential to Escape Filter Bubbles: A Behavior-Aware Measure via Contrastive Simulation

Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning

A note on the impossibility of conditional PAC-efficient reasoning in large language models

Delta Sampling: Data-Free Knowledge Transfer Across Diffusion Models

Physics-informed self-supervised learning for predictive modeling of coronary artery digital twins

Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research

Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation

Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models

AI-Driven Document Redaction in UK Public Authorities: Implementation Gaps, Regulatory Challenges, and the Human Oversight Imperative

Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol

Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties

A Hierarchical Tree-based approach for creating Configurable and Static Deep Research Agent (Static-DRA)

Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning

RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design

MemVerse: Multimodal Memory for Lifelong Learning Agents

DeepRule: An Integrated Framework for Automated Business Rule Generation via Deep Predictive Modeling and Hybrid Search Optimization

EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths

Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks

PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

Multimodal Reinforcement Learning with Agentic Verifier for AI Agents

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

Prior preferences in active inference agents: soft, hard, and goal shaping

When Do Symbolic Solvers Enhance Reasoning in Large Language Models?

Beyond the Black Box: A Cognitive Architecture for Explainable and Aligned AI

Exploring Syntropic Frameworks in AI Alignment: A Philosophical Investigation

Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning

Proactive Agentic Whiteboards: Enhancing Diagrammatic Learning

Beyond Greenfield: The D3 Framework for AI-Driven Productivity in Brownfield Engineering

Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints

HBLLM: A Haar-Based Approach for Accurate Structured 1-Bit Quantized LLMs

Limitations of Using Identical Distributions for Training and Testing When Learning Boolean Functions

BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models

MIMIC-MJX: Neuromechanical Emulation of Animal Behavior

Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives

PersonAgent with GraphRAG: Community-Aware Knowledge Graphs for Personalized LLM

Generative AI in Sociological Research: State of the Discipline

MRI Super-Resolution with Deep Learning: A Comprehensive Survey

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Created by

Haebom

Category

Empty

저자

Alexander Rubinstein, Benjamin Raible, Martin Gubri, Seong Joon Oh

개요

현대 머신러닝 모델의 평가는 비용이 많이 든다. DISCO(Diversifying Sample Condensation)는 모델 응답의 다양성을 극대화하는 샘플을 선택하여 평가 비용을 줄이는 새로운 방법이다. 모델 간의 불일치를 기준으로 샘플을 선택하며, 기존 방법에 비해 개념적으로 간단하고 MMLU, Hellaswag, Winogrande, ARC 벤치마크에서 최고의 성능 예측 결과를 보여준다.

시사점, 한계점

•

시사점:

◦

평가 비용 절감으로 머신러닝 연구의 접근성을 높임.

◦

혁신 속도를 가속화하고 환경 영향을 줄임.

◦

개념적으로 단순한 방법으로 우수한 성능을 달성.

◦

이론적으로 모델 간의 불일치가 최적의 선택 규칙임을 증명.

◦

MMLU, Hellaswag, Winogrande, ARC 벤치마크에서 SOTA 달성.

•

한계점:

◦

앙커 샘플 선택에 대한 구체적인 한계점은 논문에 명시되지 않음.

◦

DISCO의 일반화 가능성에 대한 추가적인 연구가 필요.

◦

구체적인 실험 결과 및 비교 분석에 대한 자세한 정보가 필요.

Made with Slashpage