Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Defining and Quantifying Creative Behavior in Popular Image Generators

TS-SNN: Temporal Shift Module for Spiking Neural Networks

IntelliCardiac: An Intelligent Platform for Cardiac Image Segmentation and Classification

AI-Powered Agile Analog Circuit Design and Optimization

Demonstrating ViSafe: Vision-enabled Safety for High-speed Detect and Avoid

Motion-compensated cardiac MRI using low-rank diffeomorphic flow (DMoCo)

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Humans can learn to detect AI-generated texts, or at least learn when they can't

Large Language Models Understanding: an Inherent Ambiguity Barrier

Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding

Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing

PINN-MEP: Continuous Neural Representations for Minimum-Energy Path Discovery in Molecular Systems

Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks

GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions

A highly maneuverable flying squirrel drone with agility-improving foldable wings

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment

Perils of Label Indeterminacy: A Case Study on Prediction of Neurological Recovery After Cardiac Arrest

CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation

Novel Deep Neural OFDM Receiver Architectures for LLR Estimation

NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products

Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks

Atyaephyra at SemEval-2025 Task 4: Low-Rank Negative Preference Optimization

Integrating AI for Human-Centric Breast Cancer Diagnostics: A Multi-Scale and Multi-View Swin Transformer Framework

Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies

Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

Semantic Shift Estimation via Dual-Projection and Classifier Reconstruction for Exemplar-Free Class-Incremental Learning

LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems

FLARE: A Framework for Stellar Flare Forecasting using Stellar Physical Properties and Historical Records

TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data

Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts

Safety Evaluation of DeepSeek Models in Chinese Contexts

DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps

Texture Image Synthesis Using Spatial GAN Based on Vision Transformers

Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning

The Right to AI

Communicating Activations Between Language Model Agents

Guaranteed Recovery of Unambiguous Clusters

ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning

Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction

Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

CATCH: Channel-Aware multivariate Time Series Anomaly Detection via Frequency Patching

Learning to Compare Hardware Designs for High-Level Synthesis

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Automated detection of underdiagnosed medical conditions via opportunistic imaging

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant

On Synthetic Texture Datasets: Challenges, Creation, and Curation

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries

HORAE: A Domain-Agnostic Language for Automated Service Regulation

DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction with Slot Querying

Analyzing Consumer IoT Traffic from Security and Privacy Perspectives: a Comprehensive Survey

DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling

The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets

Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics

An automated end-to-end deep learning-based framework for lung cancer diagnosis by detecting and classifying the lung nodules

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

Transformer-based assignment decision network for multiple object tracking

An alignment safety case sketch based on debate

The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem

Theoretical Foundations for Semantic Cognition in Artificial Intelligence

Approximate Lifted Model Construction

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems

Generating Symbolic World Models via Test-time Scaling of Large Language Models

Imagining and building wise machines: The centrality of AI metacognition

Public Perceptions of Fairness Metrics Across Borders

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

Flow-GRPO: Training Flow Matching Models via Online RL

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

ComPO: Preference Alignment via Comparison Oracles

TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Reasoning Models Don't Always Say What They Think

Crosslingual Reasoning through Test-Time Scaling

CART-ELC: Oblique Decision Tree Induction via Exhaustive Search

Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields

High-fidelity Grain Growth Modeling: Leveraging Deep Learning for Fast Computations

Feature-Augmented Deep Networks for Multiscale Building Segmentation in High-Resolution UAV and Satellite Imagery

Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects

Scalable Chain of Thoughts via Elastic Reasoning

Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

Software Development Life Cycle Perspective: A Survey of Benchmarks for CodeLLMs and Agents

T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction

Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation

Created by

Haebom

저자

Bojian Yin, Federico Corradi

개요

역전파(BP)는 딥러닝의 기반이지만, 전역적 기울기 동기화에 의존하여 확장성이 제한되고 상당한 메모리 오버헤드가 발생합니다. 본 논문에서는 계층적 변분 추론으로 훈련을 재구성하는 확장 가능한 대안인 확률적 변분 전파(SVP)를 제안합니다. SVP는 계층 활성화를 잠재 변수로 취급하고 국지적 증거 하한(ELBO)을 최적화하여 전역적 일관성을 유지하면서 독립적인 국지적 업데이트를 가능하게 합니다. 하지만 계층별 ELBO에서 KL divergence를 직접 적용하면 과도한 압축으로 인해 계층 간 표현이 붕괴될 위험이 있습니다. 이를 방지하기 위해 SVP는 고정된 랜덤 행렬을 통해 활성화를 저차원 공간으로 투영하여 정보 보존과 표현 다양성을 보장합니다. 계층 간 일관성을 위한 특징 정렬 손실과 결합하여 SVP는 다양한 아키텍처(MLP, CNN, Transformer)와 데이터셋(MNIST부터 ImageNet까지)에서 BP와 경쟁력 있는 정확도를 달성하고, 메모리 사용량을 최대 4배까지 줄이며 확장성을 크게 향상시킵니다. 더 넓게 보면, SVP는 딥 표현 학습에 확률적 관점을 도입하여 더욱 모듈화되고 해석 가능한 신경망 설계를 위한 경로를 열어줍니다.

시사점, 한계점

•

시사점:

◦

역전파의 확장성 및 메모리 효율성 문제를 해결하는 새로운 방법 제시

◦

계층적 변분 추론 기반의 확장 가능한 딥러닝 훈련 프레임워크 제안

◦

다양한 아키텍처와 데이터셋에서 BP에 필적하는 성능 달성

◦

메모리 사용량 최대 4배 감소 및 확장성 향상

◦

딥러닝에 대한 확률적 관점 제시 및 모듈화, 해석 가능성 증진 가능성 제시

•

한계점:

◦

제안된 방법의 일반화 성능에 대한 추가적인 실험 필요

◦

고정된 랜덤 행렬을 사용하는 방식의 한계 및 개선 여지 존재

◦

계층 간 정보 손실 가능성 및 그에 대한 추가적인 분석 필요

◦

실제 대규모 애플리케이션에 적용했을 때의 성능 및 효율성에 대한 추가적인 연구 필요

Made with Slashpage