Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG

Neural Diffusion Processes for Physically Interpretable Survival Prediction

Tenyidie Syllabification corpus creation and deep learning applications

On Predictability of Reinforcement Learning Dynamics for Large Language Models

EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Normal-Abnormal Guided Generalist Anomaly Detection

Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

The AI Productivity Index (APEX)

Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design

Uncertainty-Aware Generative Oversampling Using an Entropy-Guided Conditional Variational Autoencoder

GeoSQL-Eval: First Evaluation of LLMs on PostGIS-Based NL2GeoSQL Queries

Segmentor-Guided Counterfactual Fine-Tuning for Locally Coherent and Targeted Image Synthesis

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks

The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact

IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting

An effective control of large systems of active particles: An application to evacuation problem

Discovering Software Parallelization Points Using Deep Neural Networks

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models

Machines are more productive than humans until they aren't, and vice versa

Landcover classification and change detection using remote sensing and machine learning: a case study of Western Fiji

Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

Forecasting the Ionosphere from Sparse GNSS Data with Temporal-Fusion Transformers

Towards Methane Detection Onboard Satellites

Tackling Federated Unlearning as a Parameter Estimation Problem

Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability

Legal Knowledge Graph Foundations, Part I: URI-Addressable Abstract Works (LRMoo F1 to schema.org)

An Architecture for Spatial Networking

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

VITA: Vision-to-Action Flow Matching Policy

Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection

Model Parallelism With Subnetwork Data Parallelism

A Novel Approach for Estimating Largest Lyapunov Exponents in One-Dimensional Chaotic Time Series Using Machine Learning

PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

MS-DFTVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Deformable Convolution

Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Localized Forest Fire Risk Prediction: A Department-Aware Approach for Operational Decision Support

CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Differential Information Distribution: A Bayesian Perspective on Direct Preference Optimization

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Enhanced DACER Algorithm with High Diffusion Efficiency

What happens when generative AI models train recursively on each others' outputs?

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

AI-Powered Inverse Design of Ku-Band SIW Resonant Structures by Iterative Residual Correction Network

Feature Representation Transferring to Lightweight Models via Perception Coherence

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

FalconWing: An Ultra-Light Indoor Fixed-Wing UAV Platform for Vision-Based Autonomy

WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms

Towards Effective E-Participation of Citizens in the European Union: The Development of AskThePublic

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion

Knowledge-guided machine learning for county-level corn yield prediction under drought

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning

Interpretable Text Embeddings and Text Similarity Explanation: A Survey

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

Forget Forgetting: Continual Learning in a World of Abundant Memory

Out-of-Distribution Detection using Synthetic Data Generation

Handling Heterophily in Recommender Systems with Wavelet Hypergraph Diffusion

Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review

Diffusion Adversarial Post-Training for One-Step Video Generation

Unraveling Indirect In-Context Learning Using Influence Functions

Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

Reasoning over User Preferences: Knowledge Graph-Augmented LLMs for Explainable Conversational Recommendations

Reliable Decision Making via Calibration Oriented Retrieval Augmented Generation

Faster LLM Inference using DBMS-Inspired Preemption and Cache Replacement Policies

There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models

QSpec: Speculative Decoding with Complementary Quantization Schemes

Superficial Safety Alignment Hypothesis

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

R2 v2: The Pareto-compliant R2 Indicator for Better Benchmarking in Bi-objective Optimization

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

Neural Network Parameter-optimization of Gaussian pmDAGs

Semantic Bridges Between First Order c-Representations and Cost-Based Semantics: An Initial Perspective

Rethinking Reward Models for Multi-Domain Test-Time Scaling

Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Created by

Haebom

저자

Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K. Ainsworth, Dominik Krupke, Patrick Kidger, Touqir Sajed, Bartolomeo Stellato, Jisun Park, Nathanael Bosch, Eli Meril, Albert Steppi, Arman Zharmagambetov, Fangzhao Zhang, David Perez-Pineiro, Alberto Mercurio, Ni Zhan, Talor Abramovich, Kilian Lieret, Hanlin Zhang, Shirley Huang, Matthias Bethge, Ofir Press

AlgoTune: 알고리즘 설계를 위한 개방형 벤치마크

개요

언어 모델(LM)의 성능 향상에도 불구하고, 기존 평가는 인간이 해결한 프로그래밍 및 수학 관련 작업에 집중되었다. 본 연구에서는 LM이 컴퓨터 과학, 물리학, 수학 분야의 계산적으로 어려운 문제를 효율적으로 해결하는 코드를 작성하는 능력을 평가하는 개방형 벤치마크인 AlgoTune을 제안한다. AlgoTune은 도메인 전문가로부터 수집한 154개의 코딩 작업과 LM이 생성한 솔루션 코드를 검증하고 타이밍을 측정하는 프레임워크로 구성된다. 또한, AlgoTuner라는 기본 LM 에이전트를 개발하고, 이를 다양한 최첨단 모델에서 평가했다. AlgoTuner는 코드 편집, 컴파일 및 실행, 성능 프로파일링, 테스트를 통한 정확성 검증, 가장 빠른 유효 버전 선택을 수행하는 간단한 예산 루프를 사용한다. AlgoTuner는 SciPy, sk-learn, CVXPY와 같은 라이브러리를 사용하는 참조 솔버 대비 평균 1.72배의 속도 향상을 달성했다. 하지만, 현재 모델은 표면적인 최적화만 선호하며 알고리즘 혁신을 발견하는 데 실패했다. AlgoTune이 최첨단 인간 성능을 넘어 창의적인 문제 해결 능력을 보이는 LM 에이전트 개발을 촉진할 것으로 기대한다.

시사점, 한계점

•

AlgoTune은 LM의 알고리즘 설계 능력을 평가하는 새로운 개방형 벤치마크를 제시한다.

•

AlgoTuner 에이전트는 참조 솔버 대비 상당한 속도 향상을 보였다.

•

현재 LM은 알고리즘 혁신을 발견하는 데 어려움을 겪는다.

•

AlgoTune은 LM의 알고리즘 설계 능력 향상을 위한 추가 연구를 촉진할 수 있다.

Made with Slashpage