[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Merge Kernel for Bayesian Optimization on Permutation Space
Demographic-aware fine-grained classification of pediatric wrist fractures
Generative Multi-Target Cross-Domain Recommendation
ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle
Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models
A Simple Baseline for Stable and Plastic Neural Networks
WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
How Not to Detect Prompt Injections with an LLM
Critiques of World Models
The role of large language models in UI/UX design: A systematic literature review
LearnLens: LLM-Enabled Personalised, Curriculum-Grounded Feedback with Educators in the Loop
STACK: Adversarial Attacks on LLM Safeguard Pipelines
ZonUI-3B: A Lightweight Vision-Language Model for Cross-Resolution GUI Grounding
Understanding Reasoning in Thinking Language Models via Steering Vectors
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation
TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Exploring Graph Representations of Logical Forms for Language Modeling
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors
Hands-On: Segmenting Individual Signs from Continuous Sequences
Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks?
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation
AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results
An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model
Evaluating link prediction: New perspectives and recommendations
Learning to Reason at the Frontier of Learnability
Stonefish: Supporting Machine Learning Research in Marine Robotics
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
On the Transfer of Knowledge in Quantum Algorithms
Code Readability in the Age of Large Language Models: An Industrial Case Study from Atlassian
Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
ASTRID - An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Consistency of Responses and Continuations Generated by Large Language Models on Social Media
From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study
Temporal reasoning for timeline summarisation in social media
Invisible Textual Backdoor Attacks based on Dual-Trigger
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Two-Stage Pretraining for Molecular Property Prediction in the Wild
Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges
An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots
Bridging Local and Global Knowledge via Transformer in Board Games
Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Visual Grounding Methods for Efficient Interaction with Desktop Graphical User Interfaces
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation
SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges
From Roots to Rewards: Dynamic Tree Reasoning with RL
Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light
Instance space analysis of the capacitated vehicle routing problem
Multi-Agent LLMs as Ethics Advocates for AI-Based Systems
GATSim: Urban Mobility Simulation with Generative Agents
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Strategic Reflectivism In Intelligent Systems
SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator
What the F*ck Is Artificial General Intelligence?
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis
Toward Temporal Causal Representation Learning with Tensor Decomposition
Kolmogorov Arnold Networks (KANs) for Imbalanced Data - An Empirical Perspective
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) トラック
Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?
DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits
Edge Intelligence with Spiking Neural Networks
VLA-Mark: A cross modal watermark for large vision-language alignment model
Noradrenergic-inspired gain modulation attenuates the stability gap in joint training
A multi-strategy improved snake optimizer for 3-dimensional UAV path planning and engineering problems
Photonic Fabric Platform for AI Accelerators
OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
A segmented robot grasping perception neural network for edge AI
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
DUALRec: A Hybrid Sequential and Language Model Framework for Context-Aware Movie Recommendation
Exploiting Primacy Effect To Improve Large Language Models
Generalist Forecasting with Frozen Video Models via Latent Diffusion
Convergent transformations of visual representation in brains and models
Preprint: Did I Just Browse A Website Written by LLMs?
The Levers of Political Persuasion with Conversational AI
Political Leaning and Politicalness Classification of Texts
Self-supervised learning on gene expression data
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
Load more
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Created by
Haebom
作者
Charvi Rastogi, Tian Huey Teh, Pushkar Mishra, Roma Patel, Ding Wang, Mark D iaz, Alicia Parrish, Aida Mostafazadeh Davani, Zoe Ashwood, Michela Paganini, Vinodkumar Prabhakaran, Verena Rieser, Lora Aroyo
概要
この論文は、さまざまな人間の経験を考慮に入れていない既存のテキスト画像(T2I)モデルの限界を指摘し、多様でしばしば矛盾する人間の価値を理解し調整することができる「多元的アライメント」を提示します。これには3つの主要な貢献があります。まず、さまざまなクロスビジュアル評価(DIVE)用の新しいマルチモードデータセットを紹介します。このデータセットは、1000のプロンプトに広範なフィードバックを提供した人口統計学的に交差する多数の評価者を介して、さまざまな安全視点の深い位置合わせを可能にします。第二に、この研究は、人口統計学的特徴がこの分野におけるさまざまな視点の重要な代理変数であることを実証的に確認し、既存の評価とは異なるかなりの文脈依存の被害認識の違いを明らかにします。第三に、効率的なデータ収集戦略、LLM判断機能、さまざまな視点に対するモデル調整の可能性など、ソートされたT2Iモデルを構築するためのTakeawaysについて説明します。この研究は、より公平で整列したT2Iシステムのための基礎ツールを提供します。
Takeaways、Limitations
•
Takeaways:
◦
多様な人間価値を考慮した多元的整列概念の提示と重要性の強調
◦
さまざまなクロスビジュアル評価(DIVE)のための新しいマルチモードデータセットを提供
◦
人口統計学的特徴がT2Iモデルの安全性評価において重要な代理変数であることを実証的に確認する
◦
効率的なデータ収集戦略,LLM判断機能,モデル調整可能性提示による改良型T2Iモデル構築方向の提示
◦
より公平で整列したT2Iシステムを構築するための基礎ツールを提供
•
Limitations:
◦
論文で言及されているように、機密性の高い内容が含まれており、潜在的な被害の可能性がある
◦
DIVEデータセットの規模と一般化の可能性をさらに検証する必要性
◦
提示された方法論の実際のT2Iモデル適用と効果に関するさらなる研究の必要性
◦
LLM判断機能とモデル調整の可能性に関する具体的な技術的詳細の欠如
PDFを見る
Made with Slashpage