Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Saturation Self-Organizing Map

PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications

Intra-Trajectory Consistency for Reward Modeling

Foundation Models in Medical Imaging -- A Review and Outlook

Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Improving Large Language Models with Concept-Aware Fine-Tuning

Consistent Video Editing as Flow-Driven Image-to-Video Generation

Evidential Spectrum-Aware Contrastive Learning for OOD Detection in Dynamic Graphs

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems

Cartridges: Lightweight and general-purpose long context representations via self-study

Fast-DataShapley: Neural Modeling for Training Data Valuation

NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics

Training RL Agents for Multi-Objective Network Defense Tasks

Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations

RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu

Table-R1: Region-based Reinforcement Learning for Table Understanding

Spatiotemporal Field Generation Based on Hybrid Mamba-Transformer with Physics-informed Fine-tuning

Graph-Based Floor Separation Using Node Embeddings and Clustering of WiFi Trajectories

Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Quantitative Analysis of Performance Drop in DeepSeek Model Quantization

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services

Towards Personalized Conversational Sales Agents: Contextual User Profiling for Strategic Action

PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices

Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation

Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials

Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

V-Max: A Reinforcement Learning Framework for Autonomous Driving

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior

MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Understanding the Emergence of Multimodal Representation Alignment

Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

BalanceBenchmark: A Survey for Multimodal Imbalance Learning

AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers

Vision-Language Models for Edge Networks: A Comprehensive Survey

Foundation Models for Anomaly Detection: Vision and Challenges

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search

Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation

Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

Bi-directional Mapping of Morphology Metrics and 3D City Blocks for Enhanced Characterization and Generation of Urban Form

One Diffusion to Generate Them All

Entropy Controllable Direct Preference Optimization

TrajAgent: An LLM-based Agent Framework for Automated Trajectory Modeling via Collaboration of Large and Small Models

Control Industrial Automation System with Large Language Model Agents

How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python?

Self-interpreting Adversarial Images

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Ad Auctions for LLMs via Retrieval Augmented Generation

Dynamic and Adaptive Feature Generation with LLM

MiniMaxAD: A Lightweight Autoencoder for Feature-Rich Anomaly Detection

PLeak: Prompt Leaking Attacks against Large Language Model Applications

HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

Evolution Guided Generative Flow Networks

Manipulating Feature Visualizations with Gradient Slingshots

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Decomposability-Guaranteed Cooperative Coevolution for Large-Scale Itinerary Planning

DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDEs

Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints

Epistemic Artificial Intelligence is Essential for Machine Learning Models to Truly `Know When They Do Not Know'

Enhancing multimodal analogical reasoning with Logic Augmented Generation

Beating Transformers using Synthetic Cognition

HypRL: Reinforcement Learning of Control Policies for Hyperproperties

From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development -- An Opinion Paper

Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines

Dynamic Policy Fusion for User Alignment Without Re-Interaction

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

An overview of domain-specific foundation model: key technologies, applications and challenges

Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

code_transformed: The Influence of Large Language Models on Code

Reimagining Dance: Real-time Music Co-creation between Dancers and AI

Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

VGR: Visual Grounded Reasoning

Technical Evaluation of a Disruptive Approach in Homomorphic AI

SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies

Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?

Today's Cat Is Tomorrow's Dog: Accounting for Time-Based Changes in the Labels of ML Vulnerability Detection Approaches

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Created by

Haebom

저자

Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang

개요

본 논문은 대규모 기초 모델의 도메인 특화 또는 개인화된 작업을 위한 미세 조정이 높은 메모리 오버헤드로 인해 대부분의 사용자에게는 여전히 비용이 많이 든다는 문제를 해결하기 위해, 추론에 필요한 메모리 예산 내에서 모델 미세 조정을 가능하게 하는 EMLoC(Emulator-based Memory-efficient fine-tuning framework with LoRA Correction) 프레임워크를 제안합니다. EMLoC은 작은 다운스트림 보정 세트에 대한 활성화 인식 특이값 분해(SVD)를 사용하여 작업별 경량 에뮬레이터를 구성하고, LoRA를 통해 이 경량 에뮬레이터에서 미세 조정을 수행합니다. 원래 모델과 압축된 에뮬레이터 간의 불일치 문제를 해결하기 위해 미세 조정된 LoRA 모듈을 보정하는 새로운 보정 알고리즘을 제안하며, 이를 통해 추론을 위해 원래 모델에 병합할 수 있습니다. EMLoC은 유연한 압축 비율과 표준 교육 파이프라인을 지원하여 광범위한 애플리케이션에 적용할 수 있습니다. 광범위한 실험을 통해 EMLoC이 여러 데이터 세트와 모드에서 다른 기준보다 우수한 성능을 보임을 보여줍니다. 또한, 양자화 없이도 24GB 소비자 GPU 하나로 38B 모델의 미세 조정을 가능하게 하여 개별 사용자에게 효율적이고 실용적인 모델 적응을 제공합니다.

시사점, 한계점

•

시사점:

◦

기존의 높은 메모리 요구량으로 인해 어려웠던 대규모 기초 모델의 미세 조정을 개인 사용자 수준에서 가능하게 함으로써, 개인화된 AI 애플리케이션 개발을 촉진할 수 있습니다.

◦

활성화 인식 SVD와 LoRA를 결합한 EMLoC 프레임워크는 메모리 효율성과 성능을 동시에 향상시키는 효과적인 방법을 제시합니다.

◦

다양한 데이터 세트와 모드에서 우수한 성능을 보임으로써, EMLoC의 범용성과 실용성을 입증했습니다.

•

한계점:

◦

본 논문에서 제시된 EMLoC의 성능은 특정 데이터 세트와 모델에 대한 실험 결과에 기반하며, 다른 상황에서는 성능이 달라질 수 있습니다.

◦

에뮬레이터 생성에 사용되는 작은 다운스트림 보정 세트의 크기와 품질이 EMLoC의 성능에 영향을 미칠 수 있습니다.

◦

새로운 보정 알고리즘의 효과는 다양한 모델과 작업에 대해 추가적인 검증이 필요합니다.

◦

현재는 24GB GPU에서 38B 모델을 미세 조정하는 데 성공했으나, 더 큰 모델이나 더 제한된 메모리 환경에서는 성능이 저하될 가능성이 있습니다.

Made with Slashpage