[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Photonic Fabric Platform for AI Accelerators

Achieving Robust Channel Estimation Neural Networks by Designed Training Data

Can Mental Imagery Improve the Thinking Capabilities of AI Systems?

Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length

PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling

A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n

HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space

Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training

Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning

Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model

VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

Learning Software Bug Reports: A Systematic Literature Review

Rethinking Data Protection in the (Generative) Artificial Intelligence Era

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving

"Before, I Asked My Mom, Now I Ask ChatGPT": Visual Privacy Management with Generative AI for Blind and Low-Vision People

QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration

FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions

DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs

Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation

Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Draft-based Approximate Inference for LLMs

Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification

SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels

Adversarial bandit optimization for approximately linear functions

Know Or Not: a library for evaluating out-of-knowledge base robustness

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios

Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

AnyTSR: Any-Scale Thermal Super-Resolution for UAV

Enhanced Pruning Strategy for Multi-Component Neural Architectures Using Component-Aware Graph Analysis

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems

Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

The Dual-Route Model of Induction

Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models

SWI: Speaking with Intent in Large Language Models

A Study of LLMs' Preferences for Libraries and Programming Languages

TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data

Sampling Decisions

Federated Continual Instruction Tuning

Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

BriLLM: Brain-inspired Large Language Model

Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

OMNISEC: LLM-Driven Provenance-based Intrusion Detection via Retrieval-Augmented Behavior Prompting

Too Much to Trust? Measuring the Security and Cognitive Impacts of Explainability in AI-Driven SOCs

Attend or Perish: Benchmarking Attention in Algorithmic Reasoning

Can Optical Denoising Clean Sonar Images? A Benchmark and Fusion Approach

Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

Detecting Benchmark Contamination Through Watermarking

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans

MKE-Coder: Multi-Axial Knowledge with Evidence Verification in ICD Coding for Chinese EMRs

An Overall Real-Time Mechanism for Classification and Quality Evaluation of Rice

Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs

Learning in Strategic Queuing Systems with Small Buffers

BARNN: A Bayesian Autoregressive and Recurrent Neural Network

HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options

A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios

PEMF-VTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

Understanding the Design Decisions of Retrieval-Augmented Generation Systems

DOGR: Towards Versatile Visual Document Grounding and Referring

Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking

DualSwinUnet++: An Enhanced Swin-Unet Architecture With Dual Decoders For PTMC Segmentation

PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization

Continual Learning with Neuromorphic Computing: Foundations, Methods, and Emerging Applications

FlexiTex: Enhancing Texture Generation via Visual Guidance

ASMA: An Adaptive Safety Margin Algorithm for Vision-Language Drone Navigation via Scene-Aware Control Barrier Functions

The unknotting number, hard unknot diagrams, and reinforcement learning

Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language

CVPT: Cross Visual Prompt Tuning

Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models

Stimulating Imagination: Towards General-purpose "Something Something Placement"

Why Does New Knowledge Create Messy Ripple Effects in LLMs?

A Mathematical Framework and a Suite of Learning Techniques for Neural-Symbolic Systems

How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online Continual Learning

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles

Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences

Oversmoothing Alleviation in Graph Neural Networks: A Survey and Unified View

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics

Benchmarking Mobile Device Control Agents across Diverse Configurations

HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation

Created by

Haebom

저자

Hazem Taha, Ameer M. S. Abdelhadi

개요

본 논문은 Proximal Policy Optimization(PPO) 알고리즘의 Generalized Advantage Estimation(GAE) 단계를 최적화하기 위해 설계된 FPGA 기반 가속기인 HEPPO-GAE를 소개한다. 기존의 trajectory collection과 actor-critic 업데이트에 초점을 맞춘 접근 방식과 달리, HEPPO-GAE는 단일 시스템온칩(SoC)에 구현된 병렬 파이프라인 아키텍처를 통해 GAE의 계산 요구 사항을 해결한다. 다양한 PPO 단계에 맞춤화된 하드웨어 가속기를 적용할 수 있도록 설계되었으며, 동적 보상 표준화와 값에 대한 블록 표준화를 결합한 전략적 표준화 기법과 8비트 균일 양자화를 통해 학습 안정성을 높이고 성능을 향상시키며 메모리 병목 현상을 관리하여 메모리 사용량을 4배 감소시키고 누적 보상을 1.5배 증가시켰다. 프로그래머블 로직과 임베디드 프로세서를 갖춘 단일 SoC 장치에서의 솔루션을 제안하여 기존 CPU-GPU 시스템보다 훨씬 높은 처리량을 제공하며, 통신 지연 시간과 처리량 병목 현상을 최소화하여 PPO 학습 효율을 크게 향상시킨다. 실험 결과, PPO 속도가 30% 증가하고 메모리 접근 시간이 크게 감소하여 하드웨어 효율적인 강화 학습 알고리즘에 대한 HEPPO-GAE의 광범위한 적용 가능성을 보여준다.

시사점, 한계점

•

시사점:

◦

단일 SoC 기반의 FPGA 가속기를 활용하여 PPO 알고리즘의 GAE 단계를 효과적으로 가속화할 수 있음을 보여줌.

◦

제안된 전략적 표준화 기법을 통해 메모리 사용량 감소 및 학습 안정성 향상을 달성.

◦

기존 CPU-GPU 시스템 대비 훨씬 높은 처리량과 효율적인 PPO 학습 가능성 제시.

◦

하드웨어 효율적인 강화 학습 알고리즘 개발에 기여.

•

한계점:

◦

현재는 단일 SoC 기반으로 구현되어 확장성에 대한 추가적인 연구 필요.

◦

다양한 강화학습 알고리즘 및 환경에 대한 일반화 성능 평가가 추가적으로 필요.

◦

제안된 표준화 기법의 최적 파라미터 설정에 대한 추가적인 연구 필요.

◦

특정 FPGA 아키텍처에 종속적인 부분이 존재할 가능성.

Made with Slashpage