Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Horus: A Protocol for Trustless Delegation Under Uncertainty

Reducing Variability of Multiple Instance Learning Methods for Digital Pathology

Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions

DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues

Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs

BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Towards Safety Evaluations of Theory of Mind in Large Language Models

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks

Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information

Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning

On the Fundamental Impossibility of Hallucination Control in Large Language Models

Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems

Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

Pre-training Large Memory Language Models with Internal and External Knowledge

Towards Universal Semantics With Large Language Models

Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

Enhancing Robustness to Missing Modalities through Clustered Federated Learning

Perceiving Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

How Metacognitive Architectures Remember Their Own Thoughts: A Systematic Review

SFO: Piloting VLM Feedback for Offline RL

Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks

KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

Distribution Matching for Self-Supervised Transfer Learning

A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior

SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation

AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks

A Framework for Mining Collectively-Behaving Bots in MMORPGs

Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection

A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs

There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication

Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

NegMerge: Sign-Consensual Weight Merging for Machine Unlearning

Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Backdooring Bias (B^2) into Stable Diffusion Models

Embodied Instruction Following in Unknown Environments

Improving Consistency Models with Generator-Augmented Flows

OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors

Divergent Creativity in Humans and Large Language Models

SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Embedded AI Systems

Squat: Quant Small Language Models on the Edge

Dataset Distillation via the Wasserstein Metric

The Boolean Solution Problem from the Perspective of Predicate Logic -- Extended Version

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

World-aware Planning Narratives Enhance Large Vision-Language Model Planner

Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?

MMLU-Reason: Benchmarking Multi-Task Multi-modal Language Understanding and Reasoning

Adapting Probabilistic Risk Assessment for AI

Beating Transformers using Synthetic Cognition

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

Using Large Language Models to Categorize Strategic Situations and Decipher Motivations Behind Human Behaviors

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification

DREAMS: A python framework for Training Deep Learning Models on EEG Data with Model Card Reporting for Medical Applications

Human Mobility Modeling with Household Coordination Activities under Limited Information via Retrieval-Augmented LLMs

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection

End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Towards Foundation Auto-Encoders for Time-Series Anomaly Detection

Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

MILP-SAT-GNN: Yet Another Neural SAT Solver

Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

How Do Vision-Language Models Process Conflicting Information Across Modalities?

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Probing Evaluation Awareness of Language Models

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class Classification

GPU-based complete search for nonlinear minimization subject to bounds

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems

Created by

Haebom

저자

Yushang Zhao, Haotian Lyu, Yike Peng, Aijia Sun, Feng Jiang, Xinyue Han

개요

본 논문은 온라인 서비스의 증가에 따라 실시간 성능과 복잡한 사용자-아이템 상호작용 처리가 가능한 고속, 고효율 추천 시스템(ReS)의 필요성에 주목합니다. 하이브리드 그래프 신경망(GNN)과 대규모 언어 모델(LLM) 기반 ReS의 계산 병목 현상을 최적화하여 추론 지연 시간과 훈련 효율을 개선하는 방법을 제시합니다. 양자화, LoRA, 지식 증류와 같은 아키텍처 최적화 전략과 FPGA, DeepSpeed와 같은 하드웨어 가속을 R 4.4.2 환경에서 하이브리드 GNN-LLM 통합 아키텍처에 적용하는 광범위한 방법론을 사용했습니다. 실험 결과, 최적화된 하이브리드 + FPGA + DeepSpeed 구성은 지연 시간 40-60ms에서 정확도(NDCG@10: 0.75)가 13.6% 향상되었으며, LoRA는 훈련 시간을 66%(3.8시간) 단축했습니다. 도메인에 관계없이 하드웨어-소프트웨어 공동 설계 및 매개변수 효율적인 튜닝을 통해 하이브리드 모델이 독립적으로 구현된 GNN 또는 LLM 접근 방식보다 우수한 성능을 보임을 보여줍니다. 실시간 배포를 위해 FPGA와 LoRA를 사용할 것을 권장합니다.

시사점, 한계점

•

시사점:

◦

하드웨어-소프트웨어 공동 설계 및 매개변수 효율적인 튜닝(LoRA, 양자화, 지식 증류)을 통해 하이브리드 GNN-LLM 기반 추천 시스템의 성능을 크게 향상시킬 수 있음을 보여줌.

◦

FPGA와 DeepSpeed를 활용한 하드웨어 가속을 통해 추론 지연 시간을 단축하고 훈련 효율을 높일 수 있음을 증명.

◦

하이브리드 모델이 GNN 또는 LLM 단독 접근 방식보다 우수한 성능을 제공함을 확인.

◦

실시간 추천 시스템 구축을 위한 FPGA와 LoRA의 효용성을 제시.

•

한계점:

◦

연구의 확장성 및 개인정보 보호에 대한 고려가 부족 (향후 연구 과제로 제시됨).

◦

연합 학습(Federated Learning) 및 고급 융합 아키텍처에 대한 추가적인 연구가 필요.

◦

특정 하드웨어 및 소프트웨어 환경에 국한된 결과일 가능성 존재.

Made with Slashpage