Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG

Neural Diffusion Processes for Physically Interpretable Survival Prediction

Tenyidie Syllabification corpus creation and deep learning applications

On Predictability of Reinforcement Learning Dynamics for Large Language Models

EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Normal-Abnormal Guided Generalist Anomaly Detection

Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

The AI Productivity Index (APEX)

Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design

Uncertainty-Aware Generative Oversampling Using an Entropy-Guided Conditional Variational Autoencoder

GeoSQL-Eval: First Evaluation of LLMs on PostGIS-Based NL2GeoSQL Queries

Segmentor-Guided Counterfactual Fine-Tuning for Locally Coherent and Targeted Image Synthesis

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks

The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact

IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting

An effective control of large systems of active particles: An application to evacuation problem

Discovering Software Parallelization Points Using Deep Neural Networks

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models

Machines are more productive than humans until they aren't, and vice versa

Landcover classification and change detection using remote sensing and machine learning: a case study of Western Fiji

Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

Forecasting the Ionosphere from Sparse GNSS Data with Temporal-Fusion Transformers

Towards Methane Detection Onboard Satellites

Tackling Federated Unlearning as a Parameter Estimation Problem

Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability

Legal Knowledge Graph Foundations, Part I: URI-Addressable Abstract Works (LRMoo F1 to schema.org)

An Architecture for Spatial Networking

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

VITA: Vision-to-Action Flow Matching Policy

Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection

Model Parallelism With Subnetwork Data Parallelism

A Novel Approach for Estimating Largest Lyapunov Exponents in One-Dimensional Chaotic Time Series Using Machine Learning

PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

MS-DFTVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Deformable Convolution

Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Localized Forest Fire Risk Prediction: A Department-Aware Approach for Operational Decision Support

CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Differential Information Distribution: A Bayesian Perspective on Direct Preference Optimization

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Enhanced DACER Algorithm with High Diffusion Efficiency

What happens when generative AI models train recursively on each others' outputs?

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

AI-Powered Inverse Design of Ku-Band SIW Resonant Structures by Iterative Residual Correction Network

Feature Representation Transferring to Lightweight Models via Perception Coherence

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

FalconWing: An Ultra-Light Indoor Fixed-Wing UAV Platform for Vision-Based Autonomy

WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms

Towards Effective E-Participation of Citizens in the European Union: The Development of AskThePublic

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion

Knowledge-guided machine learning for county-level corn yield prediction under drought

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning

Interpretable Text Embeddings and Text Similarity Explanation: A Survey

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

Forget Forgetting: Continual Learning in a World of Abundant Memory

Out-of-Distribution Detection using Synthetic Data Generation

Handling Heterophily in Recommender Systems with Wavelet Hypergraph Diffusion

Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review

Diffusion Adversarial Post-Training for One-Step Video Generation

Unraveling Indirect In-Context Learning Using Influence Functions

Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

Reasoning over User Preferences: Knowledge Graph-Augmented LLMs for Explainable Conversational Recommendations

Reliable Decision Making via Calibration Oriented Retrieval Augmented Generation

Faster LLM Inference using DBMS-Inspired Preemption and Cache Replacement Policies

There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models

QSpec: Speculative Decoding with Complementary Quantization Schemes

Superficial Safety Alignment Hypothesis

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

R2 v2: The Pareto-compliant R2 Indicator for Better Benchmarking in Bi-objective Optimization

Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions

Neural Network Parameter-optimization of Gaussian pmDAGs

Semantic Bridges Between First Order c-Representations and Cost-Based Semantics: An Initial Perspective

Rethinking Reward Models for Multi-Domain Test-Time Scaling

Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation

Model Parallelism With Subnetwork Data Parallelism

Created by

Haebom

저자

Vaibhav Singh, Zafir Khalid, Edouard Oyallon, Eugene Belilovsky

개요

대규모 신경망 사전 훈련 시 발생하는 메모리 문제와 통신 비용 문제를 해결하기 위해, 활성값을 교환하지 않고 작업자 간에 모델을 분할하여 훈련하는 분산 훈련 프레임워크인 Subnetwork Data Parallelism (SDP)을 제안합니다. SDP는 후방 단계에서만 희소성을 적용하여 편향되지 않은 기울기를 유지하는 backward masking과, 순방향 단계에서도 매개변수를 제거하여 효율성을 높이고 정규화를 제공하는 forward masking을 포함한 두 가지 마스킹 방식을 연구합니다. 또한, CNN 및 트랜스포머에 적용되는 neuron level 및 block level의 두 가지 subnetwork 구성 전략을 탐구합니다. CIFAR, ImageNet의 CNN 및 트랜스포머, FineWeb에서의 LLM 사전 훈련 실험을 통해 SDP는 장치당 메모리 사용량을 30%-75% 줄이면서 성능을 유지하거나 향상시켰습니다. 특히, FLOP가 일치하는 설정에서 forward masking이 더 나은 성능을 달성할 수 있습니다.

시사점, 한계점

•

시사점:

◦

SDP는 대규모 모델 훈련 시 메모리 사용량을 크게 줄여줍니다.

◦

SDP는 성능 저하 없이, 또는 성능을 향상시키면서 효율성을 개선합니다.

◦

Forward masking은 추가적인 정규화를 제공하고 FLOP-matched 설정에서 더 나은 성능을 보일 수 있습니다.

◦

CNN 및 트랜스포머, 다양한 데이터셋과 LLM 사전 훈련에 적용 가능합니다.

•

한계점:

◦

논문에 명시된 한계점은 제시되지 않았습니다.

Made with Slashpage