Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Interleaving Reasoning for Better Text-to-Image Generation

Barycentric Neural Networks and Length-Weighted Persistent Entropy Loss: A Green Geometric and Topological Framework for Function Approximation

Signal-Based Malware Classification Using 1D CNNs

Toward a Metrology for Artificial Intelligence: Hidden-Rule Environments and Reinforcement Learning

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding

No Thoughts Just AI: Biased LLM Hiring Recommendations Alter Human Decision Making and Limit Human Autonomy

What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?

HodgeFormer: Transformers for Learnable Operators on Triangular Meshes through Data-Driven Hodge Matrices

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models

Pilot Study on Generative AI and Critical Thinking in Higher Education Classrooms

zkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Ultra-Low-Latency Spiking Neural Networks with Temporal-Dependent Integrate-and-Fire Neuron Model for Objects Detection

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

Trust but Verify! A Survey on Verification Design for Test-time Scaling

Research on Conversational Recommender System Considering Consumer Types

A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges

Grid-Agent: An LLM-Powered Multi-Agent System for Power Grid Control

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAIde app

Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs

MoRPI-PINN: A Physics-Informed Framework for Mobile Robot Pure Inertial Navigation

Conditional Video Generation for High-Efficiency Video Compression

Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges

Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models

Language Models Might Not Understand You: Evaluating Theory of Mind via Story Prompting

From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations

HueManity: Probing Fine-Grained Visual Perception in MLLMs

Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments

Localizing Persona Representations in LLMs

Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition

SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts

Visuospatial Cognitive Assistant

Overflow Prevention Enhances Long-Context Recurrent LLMs

GRADA: Graph-based Reranking against Adversarial Documents Attack

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices

Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?

Llama-Nemotron: Efficient Reasoning Models

Tripartite-GraphRAG via Plugin Ontologies

DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification

Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba

Audio-centric Video Understanding Benchmark without Text Shortcut

The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege

Involution and BSConv Multi-Depth Distillation Network for Lightweight Image Super-Resolution

DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection

VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification

Cardiverse: Harnessing LLMs for Novel Card Game Prototyping

TrojanRobot: Physical-world Backdoor Attacks Against VLM-based Robotic Manipulation

Automatically Detecting Online Deceptive Patterns

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning

CTourLLM: Enhancing LLMs with Chinese Tourism Knowledge

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

MSRFormer: Road Network Representation Learning using Multi-scale Feature Fusion of Heterogeneous Spatial Interactions

Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts

EvoEmo: Towards Evolved Emotional Policies for LLM Agents in Multi-Turn Negotiation

AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes

Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond

CountQA: How Well Do MLLMs Count in the Wild?

ASP-FZN: A Translation-based Constraint Answer Set Solver

MedGellan: LLM-Generated Medical Guidance to Support Physicians

Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs

Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs

GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

Automatic Reward Shaping from Confounded Offline Data

Visualizing Thought: Conceptual Diagrams Enable Robust Combinatorial Planning in LMMs

COMMA: A Communicative Multimodal Multi-Agent Benchmark

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism

Self-Emotion-Mediated Exploration in Artificial Intelligence Mirrors: Findings from Cognitive Psychology

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

ACE and Diverse Generalization via Selective Disagreement

Bringing Multi-Modal Multi-Task Federated Foundation Models to Education Domain: Prospects and Challenges

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Breaking Android with AI: A Deep Dive into LLM-Powered Exploitation

Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s

GENUINE: Graph Enhanced Multi-level Uncertainty Estimation for Large Language Models

Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation

Uncovering Scaling Laws for Large Language Models via Inverse Problems

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning

Deep Learning-Based Burned Area Mapping Using Bi-Temporal Siamese Networks and AlphaEarth Foundation Datasets

Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost

Forecasting Russian Equipment Losses Using Time Series and Deep Learning Models

Enhanced SegNet with Integrated Grad-CAM for Interpretable Retinal Layer Segmentation in OCT Images

Individual utilities of life satisfaction reveal inequality aversion unrelated to political alignment

XSRD-Net: EXplainable Stroke Relapse Detection

Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning

What Were You Thinking? An LLM-Driven Large-Scale Study of Refactoring Motivations in Open-Source Projects

Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks

Enhancing Online Learning by Integrating Biosensors and Multimodal Learning Analytics for Detecting and Predicting Student Behavior: A Review

Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding

Created by

Haebom

저자

Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zhen, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, Hongsheng Li

개요

본 논문은 대규모 언어 모델(LLM)을 활용하여 다양한 작업에 걸쳐 적용 가능한 신경망 구조 탐색(NAS) 프레임워크인 LM-Searcher를 제안합니다. 기존의 LLM 기반 NAS 접근 방식은 프롬프트 엔지니어링 및 도메인 특정 튜닝에 크게 의존하는 한계가 있었지만, LM-Searcher는 도메인 특정 적응 없이 다양한 도메인에서 신경망 구조 최적화를 수행합니다. 이를 위해, 신경망 구조를 위한 범용 숫자 문자열 표현인 NCode를 사용하여 도메인 간 아키텍처 인코딩 및 탐색을 가능하게 합니다. 또한, NAS 문제를 순위 지정 작업으로 재구성하고, 새로운 가지치기 기반 부분 공간 샘플링 전략에서 파생된 지시 튜닝 샘플을 사용하여 고성능 아키텍처를 후보 풀에서 선택하도록 LLM을 훈련합니다. 다양한 아키텍처-성능 쌍을 포함하는 정제된 데이터 세트를 통해 강력하고 전이 가능한 학습을 장려합니다. 광범위한 실험을 통해 LM-Searcher가 도메인 내(예: 이미지 분류를 위한 CNN) 및 도메인 외부(예: 분할 및 생성을 위한 LoRA 구성) 작업 모두에서 경쟁력 있는 성능을 달성함을 보여주며, 유연하고 일반화 가능한 LLM 기반 아키텍처 탐색을 위한 새로운 패러다임을 제시합니다. 데이터 세트와 모델은 https://github.com/Ashone3/LM-Searcher 에서 공개될 예정입니다.

GitHub - Ashone3/LM-Searcher: LLMs for Generalizable Neural Architecture Search.

LLMs for Generalizable Neural Architecture Search. - Ashone3/LM-Searcher

시사점, 한계점

•

시사점:

◦

도메인 특정 적응 없이 다양한 도메인에서 신경망 구조 최적화가 가능한 새로운 LLM 기반 NAS 프레임워크 제시

◦

범용 숫자 문자열 표현인 NCode를 활용하여 도메인 간 아키텍처 인코딩 및 탐색 가능

◦

가지치기 기반 부분 공간 샘플링 전략을 통해 효율적인 아키텍처 탐색 가능

◦

도메인 내 및 도메인 외부 작업 모두에서 경쟁력 있는 성능 달성

◦

재현성을 위해 코드와 데이터 공개

•

한계점:

◦

제안된 NCode 표현 방식의 일반화 성능 및 한계에 대한 추가적인 분석 필요

◦

다양한 작업에 대한 범용성을 더욱 높이기 위한 추가 연구 필요

◦

가지치기 기반 부분 공간 샘플링 전략의 최적화 및 개선 여지 존재

◦

LLM의 성능에 대한 의존도가 높아 LLM의 한계가 LM-Searcher의 성능에 영향을 미칠 가능성 존재

Made with Slashpage