Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions

On The Impact of Merge Request Deviations on Code Review Practices

Societal AI Research Has Become Less Interdisciplinary

Geometric deep learning for local growth prediction on abdominal aortic aneurysm surfaces

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

KP-PINNs: Kernel Packet Accelerated Physics Informed Neural Networks

Teaching Physical Awareness to LLMs through Sounds

TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization

TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration

Your Agent Can Defend Itself against Backdoor Attacks

Learnable Spatial-Temporal Positional Encoding for Link Prediction

Unable to Forget: Proactive lnterference Reveals Working Memory Limits in LLMs Beyond Context Length

IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation

STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation

Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

Toward Reliable AR-Guided Surgical Navigation: Interactive Deformation Modeling with Data-Driven Biomechanics and Prompts

Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining

Vision Transformers Don't Need Trained Registers

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations

AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

Synthesis by Design: Controlled Data Generation via Structural Guidance

MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization

MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

Pre-trained Large Language Models Learn Hidden Markov Models In-context

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

A Reinforcement Learning Approach for RIS-aided Fair Communications

Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR

Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs

Context Is Not Comprehension: Unmasking LLM reasoning blind spots with VLO

HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025

GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation

Fourier-Modulated Implicit Neural Representation for Multispectral Satellite Image Compression

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks

DeepMultiConnectome: Deep Multi-Task Prediction of Structural Connectomes Directly from Diffusion MRI Tractography

SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting

Large Language Models Miss the Multi-Agent Mark

Rethinking Text-based Protein Understanding: Retrieval or LLM?

Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models

Discovering Forbidden Topics in Language Models

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Reciprocity as the Foundational Substrate of Society: How Reciprocal Dynamics Scale into Social Systems

LLM Enhancers for GNNs: An Analysis from the Perspective of Causal Mechanism Identification

Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective

Convert Language Model into a Value-based Strategic Planner

Griffin: Towards a Graph-Centric Relational Database Foundation Model

Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items

Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment

Assessment of Evolving Large Language Models in Upper Secondary Mathematics

TerraMind: Large-Scale Generative Multimodality for Earth Observation

LEMUR Neural Network Dataset: Towards Seamless AutoML

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring

Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?

Conformal Prediction as Bayesian Quadrature

On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis

Trustworthy AI: Safety, Bias, and Privacy -- A Survey

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification

Bias Detection via Maximum Subgroup Discrepancy

Irony Detection, Reasoning and Understanding in Zero-shot Learning

TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component Deep Learning Systems

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement

Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation

Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning

CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization

GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints

Code-Switching Curriculum Learning for Multilingual Transfer in LLMs

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

Phonology-Guided Speech-to-Speech Translation for African Languages

The Causal Information Bottleneck and Optimal Causal Variable Abstractions

Multimodal Pragmatic Jailbreak on Text-to-image Models

Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning

A Survey on Knowledge Organization Systems of Research Fields: Resources and Challenges

LogProber: Disentangling confidence from contamination in LLM responses

Holistic Uncertainty Estimation For Open-Set Recognition

AcTracer: Active Testing of Large Language Model via Multi-Stage Sampling

XMeCap: Meme Caption Generation with Sub-Image Adaptability

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

The Remarkable Robustness of LLMs: Stages of Inference?

BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Created by

Haebom

저자

Roman Kochnev, Arash Torabi Goodarzi, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte

개요

본 논문은 복잡한 신경망의 성능을 극대화하기 위한 최적의 하이퍼파라미터 선택 문제를 해결하기 위해, LoRA를 사용하여 매개변수 효율적인 Code Llama를 미세 조정한 대규모 언어 모델(LLM)을 활용하는 방법을 제시합니다. 제안된 방법은 다양한 신경망 구조에 맞춤화된 정확하고 효율적인 하이퍼파라미터 권장 사항을 생성할 수 있습니다. Optuna와 같은 기존의 시행착오 방식과 달리, 제안된 LLM 기반 방법은 RMSE 측면에서 경쟁력 있는 또는 우수한 결과를 달성하면서 계산 오버헤드를 크게 줄입니다. TPE와 같은 최첨단 기법과의 성능 비교를 통해 LLM 기반 최적화의 효율성과 성능을 검증하고, 특히 리소스 제약 환경에서의 빠른 실험을 위한 유망한 대안임을 보여줍니다. 또한, 다양한 작업에 걸쳐 일관된 성능과 시간 절약을 제공하여 강력성과 일반화 가능성을 강조합니다. 생성된 모든 하이퍼파라미터는 공개적으로 이용 가능한 LEMUR 신경망(NN) 데이터셋에 포함되어 하이퍼파라미터 최적화 연구를 위한 오픈소스 벤치마크 역할을 합니다.

시사점, 한계점

•

시사점:

◦

LLM을 활용한 하이퍼파라미터 최적화는 기존의 시행착오 방식보다 계산 효율성이 뛰어나며, 경쟁력 있는 성능을 제공합니다.

◦

특히, 리소스 제약 환경(엣지 디바이스, 모바일 플랫폼)에서 효과적입니다.

◦

시간 절약과 다양한 작업에 대한 일관된 성능을 제공하여 실용성이 높습니다.

◦

공개된 LEMUR NN 데이터셋은 하이퍼파라미터 최적화 연구에 기여할 수 있습니다.

•

한계점:

◦

본 논문에서는 구체적인 한계점이 언급되지 않았습니다. LLM의 성능은 학습 데이터에 의존적이며, 특정 유형의 신경망이나 작업에 대해서는 일반화 성능이 저하될 가능성이 있습니다. 또한, LLM 자체의 훈련 및 추론에 필요한 자원 소모에 대한 논의가 부족합니다.

Made with Slashpage