haebom
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction
FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation
Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models
Learning Quantized Continuous Controllers for Integer Hardware
Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment
TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning
Active Learning for Animal Re-Identification with Ambiguity-Aware Sampling
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
A Remarkably Efficient Paradigm to Multimodal Large Language Models for Sequential Recommendation
EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion
Enhancing Diffusion Model Guidance through Calibration and Regularization
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
Report from Workshop on Dialogue alongside Artificial Intelligence
Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution
Selective Diabetic Retinopathy Screening with Accuracy-Weighted Deep Ensembles and Entropy-Guided Abstention
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Alternative Fairness and Accuracy Optimization in Criminal Justice
I Detect What I Don't Know: Incremental Anomaly Learning with Stochastic Weight Averaging-Gaussian for Oracle-Free Medical Imaging
OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms
Federated Learning with Gramian Angular Fields for Privacy-Preserving ECG Classification on Heterogeneous IoT Devices
How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
Normalization in Attention Dynamics
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
Language over Content: Tracing Cultural Understanding in Multilingual Large Language Models
TACL: Threshold-Adaptive Curriculum Learning Strategy for Enhancing Medical Text Understanding
TraceCoder: Towards Traceable ICD Coding via Multi-Source Knowledge Integration
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Comparative Analysis of Large Language Models for the Machine-Assisted Resolution of User Intentions
Evolutionary Profiles for Protein Fitness Prediction
Epistemic Diversity and Knowledge Collapse in Large Language Models
MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages
Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy
CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning
A Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models
TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding
RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images
Instance Generation for Meta-Black-Box Optimization through Latent Space Reverse Engineering
Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization
Towards Methane Detection Onboard Satellites
OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction
CoCoLIT: ControlNet-Conditioned Latent Image Translation for MRI to Amyloid PET Synthesis
Beyond Algorethics: Addressing the Ethical and Anthropological Challenges of AI Recommender Systems
Imbalance in Balance: Online Concept Balancing in Generation Models
ReCode: Updating Code API Knowledge with Reinforcement Learning
Rethinking Losses for Diffusion Bridge Samplers
Zeroth-Order Optimization Finds Flat Minima
UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection
A Unified and Fast-Sampling Diffusion Bridge Framework via Stochastic Optimal Control
BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations
FB-RAG: Improving RAG with Forward and Backward Lookup
FedSEA-LLaMA: A Secure, Efficient and Adaptive Federated Splitting Framework for Large Language Models
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Alignment
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
FaSDiff: Balancing Perception and Semantics in Face Compression via Stable Diffusion Priors
On the generalization of language models from in-context learning and finetuning: a controlled study
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada
MULTI-LF: A Continuous Learning Framework for Real-Time Malicious Traffic Detection in Multi-Environment Networks
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
COPA: Comparing the incomparable in multi-objective model evaluation
CLEV: LLM-Based Evaluation Through Lightweight Efficient Voting for Free-Form Question-Answering
Towards Synthesizing High-Dimensional Tabular Data with Limited Samples
Explaining the Unexplainable: A Systematic Review of Explainable AI in Finance
Learning Vision-Based Neural Network Controllers with Semi-Probabilistic Safety Guarantees
MA-GTS: A Multi-Agent Framework for Solving Complex Graph Problems in Real-World Applications
On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers
A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses
Generalizing Weisfeiler-Lehman Kernels to Subgraphs
SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
GeMID: Generalizable Models for IoT Device Identification
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Selection of LLM Fine-Tuning Data based on Orthogonal Rules
Benchmarking Domain Generalization Algorithms in Computational Pathology
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form QA
Identifying treatment response subgroups in observational time-to-event data
Informed Correctors for Discrete Diffusion Models
Integrating Artificial Intelligence into Operating Systems: A Survey on Techniques, Applications, and Future Directions
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
Spikingformer: A Key Foundation Model for Spiking Neural Networks
DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas
Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture
A Theoretical Analysis of Detecting Large Model-Generated Time Series
Green AI: A systematic review and meta-analysis of its definitions, lifecycle models, hardware and measurement attempts
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads
When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks
ScRPO: From Errors to Insights
Deep Value Benchmark: Measuring Whether Models Generalize Deep Values or Shallow Preferences
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Load more
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Not Local
Created by
Haebom
저자
Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
개요
Text-to-image diffusion 모델의 데이터 복제 문제를 다루며, 기존의 국소적 기억 가설에 대한 문제점을 제기합니다. 모델의 기억이 국소적이지 않음을 보여주고, 이를 기반으로 보다 강력한 완화 기법을 제시합니다.
시사점, 한계점
•
시사점:
◦
텍스트-이미지 확산 모델에서의 데이터 기억이 국소적이지 않다는 것을 밝힘.
◦
기존 국소적 기억 가설에 기반한 완화 기법의 취약성을 지적.
◦
적대적 미세 조정을 통한 보다 강력한 기억 완화 기법 제시.
◦
텍스트 임베딩 공간에서의 복제 트리거 분포, 모델 활성화의 차이, 상이한 가지치기 방법의 불일치 등을 통해 기억의 비국소성을 입증.
•
한계점:
◦
구체적인 완화 기법의 성능 평가 및 실제 적용 가능성에 대한 추가 연구 필요.
◦
모델 구조, 데이터셋의 특성에 따른 기억 양상의 변화에 대한 추가 분석 필요.
◦
적대적 미세 조정 과정에서의 오버피팅 및 일반화 문제에 대한 추가 연구 필요.
PDF 보기
Made with Slashpage