Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Hakim: Farsi Text Embedding Model

Fast Text-to-Audio Generation with Adversarial Post-Training

Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Intelligent Product 3.0: Decentralised AI Agents and Web3 Intelligence Standards

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

Decoding Futures Price Dynamics: A Regularized Sparse Autoencoder for Interpretable Multi-Horizon Forecasting and Factor Discovery

Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering

SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment

Don't be lazy: CompleteP enables compute-efficient deep transformers

Llama-Nemotron: Efficient Reasoning Models

Rethinking Time Encoding via Learnable Transformation Functions

CrashFixer: A crash resolution agent for the Linux kernel

Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges

Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

The Problem of the Priors, or Posteriors?

Simulating and Analysing Human Survey Responses with Large Language Models: A Case Study in Energy Stated Preference

Learning Autonomy: Off-Road Navigation Enhanced by Human Input

InductionBench: LLMs Fail in the Simplest Complexity Class

Activation Steering in Neural Theorem Provers

PropNet: a White-Box and Human-Like Network for Sentence Representation

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

Learning Traffic Anomalies from Generative Models on Real-Time Observations

High-temperature superconductivity in Li$_2$AuH$_6$ mediated by strong electron-phonon coupling under ambient pressure

A Bio-Inspired Research Paradigm of Collision Perception Neurons Enabling Neuro-Robotic Integration: The LGMD Case

ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System

Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans?

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering

Deep Signature: Characterization of Large-Scale Molecular Dynamics

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging

Fragment-Masked Diffusion for Molecular Optimization

Public Constitutional AI

Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation

DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Detecting Multimedia Generated by Large AI Models: A Survey

Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search

EiHi Net: Out-of-Distribution Generalization Paradigm

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

SafeMate: A Modular RAG-Based Agent for Context-Aware Emergency Guidance

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

AdaWorld: Learning Adaptable World Models with Latent Actions

Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming

An Analytical Emotion Framework of Rumour Threads on Social Media

Deontic Temporal Logic for Formal Verification of AI Ethics

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Exploiting Uncertainty for Querying Inconsistent Description Logics Knowledge Bases

Learning to Be Cautious

Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Variational Visual Question Answering

Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations

Learning Long-Context Diffusion Policies via Past-Token Prediction

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

Preserving Plasticity in Continual Learning with Adaptive Linearity Injection

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities

A 2D Semantic-Aware Position Encoding for Vision Transformers

Quantum state-agnostic work extraction (almost) without dissipation

Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits

Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units

FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform

TensorRL-QAS: Reinforcement learning with tensor networks for scalable quantum architecture search

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems

BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

Neural Video Compression using 2D Gaussian Splatting

Toward Fair Federated Learning under Demographic Disparities and Data Imbalance

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases

Educational impacts of generative artificial intelligence on learning and performance of engineering students in China

InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials

DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection

An Initial Exploration of Default Images in Text-to-Image Generation

A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning

ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor

Fair Clustering via Alignment

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

Created by

Haebom

저자

Wenju Sun, Qingyong Li, Yangli-ao Geng, Boyang Li

개요

본 논문은 추가적인 훈련 없이 여러 전문가 모델을 통합된 모델로 통합하는 다중 작업 모델 병합에 대한 새로운 접근 방식인 CAT Merging(Conflict-Aware Task Merging)을 제안합니다. 기존의 Task Arithmetic과 같은 방법들은 미세 조정된 모델과 사전 훈련된 모델 간의 파라미터 차이인 작업 벡터를 누적하여 모델을 병합하지만, 지식 충돌로 인해 성능 저하가 발생할 수 있습니다. CAT Merging은 이러한 지식 충돌을 해결하기 위해 작업 벡터에서 충돌이 발생하기 쉬운 구성 요소를 선택적으로 제거하는 훈련이 필요 없는 프레임워크입니다. 선형 가중치에 대한 투영과 정규화 계층의 스케일링 및 이동 파라미터에 대한 마스킹과 같은 파라미터별 전략을 도입하여, 비전, 언어, 비전-언어 작업에 대한 광범위한 실험을 통해 기존 최첨단 방법보다 최대 2.5%(ViT-B/32) 및 2.0%(ViT-L/14)의 평균 정확도 향상을 달성했습니다.

시사점, 한계점

•

시사점:

◦

추가적인 훈련 없이 다중 작업 모델을 효과적으로 통합하는 새로운 방법 제시.

◦

지식 충돌 문제를 효과적으로 완화하여 기존 방법보다 성능 향상 달성.

◦

비전, 언어, 비전-언어 작업 모두에서 성능 향상을 보임.

◦

파라미터별 전략을 통해 다양한 모델 아키텍처에 적용 가능성 증가.

•

한계점:

◦

제안된 파라미터별 전략의 일반화 가능성에 대한 추가 연구 필요.

◦

특정 유형의 지식 충돌에 대해서는 효과가 제한적일 수 있음.

◦

다양한 모델 크기와 아키텍처에 대한 더욱 포괄적인 실험 필요.

Made with Slashpage