[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries

Apple Intelligence Foundation Language Models: Tech Report 2025

Change of Thought: Adaptive Test-Time Computation

Time Series Forecastability Measures

Reading Between the Lines: Combining Pause Dynamics and Semantic Coherence for Automated Assessment of Thought Disorder

Loss-Complexity Landscape and Model Structure Functions

Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography

Humans learn to prefer trustworthy AI over human partners

PHASE: Passive Human Activity Simulation Evaluation

AI-Assisted Fixes to Code Review Comments at Scale

Neural Architecture Search with Mixed Bio-inspired Learning Rules

ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations

Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

CaSTFormer: Causal Spatio-Temporal Transformer for Driving Intention Prediction

Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity

AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery

Soft-ECM: An extension of Evidential C-Means for complex data

Single- to multi-fidelity history-dependent learning with uncertainty quantification and disentanglement: application to data-driven constitutive modeling

SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection

Gauge Flow Models

Aligning Knowledge Graphs and Language Models for Factual Accuracy

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

A Deep Learning-Based Ensemble System for Automated Shoulder Fracture Detection in Clinical Radiographs

IConMark: Robust Interpretable Concept-Based Watermark For AI Images

Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only

TopicImpact: Improving Customer Feedback Analysis with Opinion Units for Topic Modeling and Star-Rating Prediction

Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models

Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition

Smart Routing for Multimodal Video Retrieval: When to Search What

Enhancing Breast Cancer Detection with Vision Transformers and Graph Neural Networks

Transformer-Based Framework for Motion Capture Denoising and Anomaly Detection in Medical Rehabilitation

H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance

VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiatio

OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning

Just Add Geometry: Gradient-Free Open-Vocabulary 3D Detection Without Human-in-the-Loop

Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs

PGR-DRC: Pre-Global Routing DRC Violation Prediction Using Unsupervised Learning

Physical models realizing the transformer architecture of large language models

Generalist Bimanual Manipulation via Foundation Video Diffusion Models

The AI Ethical Resonance Hypothesis: The Possibility of Discovering Moral Meta-Patterns in AI Systems

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment

Generative AI-Driven High-Fidelity Human Motion Simulation

Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions

KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models

Towards Constraint Temporal Answer Set Programming

Cross-modal Causal Intervention for Alzheimer's Disease Prediction

Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery

Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments

When Speed meets Accuracy: an Efficient and Effective Graph Model for Temporal Link Prediction

From Extraction to Synthesis: Entangled Heuristics for Agent-Augmented Strategic Reasoning

OntView: What you See is What you Meant

DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs

Combining model tracing and constraint-based modeling for multistep strategy diagnoses

Buggy rule diagnosis for combined steps through final answer evaluation in stepwise tasks

BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety

Why Isn't Relational Learning Taking Over the World?

GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models

PrefPalette: Personalized Preference Modeling with Latent Attributes

GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models

MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

A Roadmap for Climate-Relevant Robotics Research

Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening

MMOne: Representing Multiple Modalities in One Scene

SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks

CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance

(Almost) Free Modality Stitching of Foundation Models

A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion

KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection

THOR: Transformer Heuristics for On-Demand Retrieval

SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

KeyRe-ID: Keypoint-Guided Person Re-Identification using Part-Aware Representation in Videos

Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model

Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

ReCode: Updating Code API Knowledge with Reinforcement Learning

Cross-Layer Discrete Concept Discovery for Interpreting Language Models

Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Multiple-Frequencies Population-Based Training

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations

GPU Performance Portability needs Autotuning

Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID

Coral Protocol: Open Infrastructure Connecting The Internet of Agents

MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness

Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence

ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

KP Quantum Neural Networks

Process-aware and high-fidelity microstructure generation using stable diffusion

Created by

Haebom

作者

Hoang Cuong Phan, Minh Tien Tran, Chihun Lee, Hoheok Kim, Sehyok Oh, Dong-Kyu Kim, Ho Won Lee

概要

本論文は、材料設計におけるプロセス構造関係の理解に不可欠であるプロセスパラメータを条件とする現実的な微細構造画像合成に焦点を当てています。既存の限られた訓練マイクログラフと連続的なプロセス変数の特性のために困難を経験するこのタスクについて、本研究は、最先端のテキスト画像拡散モデルであるStable Diffusion 3.5 Large（SD3.5-Large）を微細構造生成に適用した新しいプロセス認識生成モデリングアプローチを提示する。連続変数（アニーリング温度、時間、倍率）をモデルの条件に直接エンコードする数値認識埋め込みを導入し、指定されたプロセス条件下で制御された画像生成とプロセスベースの微細構造変化キャプチャを可能にします。データの欠如と計算上の制約を解決するために、DreamBoothとLow-Rank Adaptation（LoRA）を介してモデルの重みの一部のみを微調整して、事前に訓練されたモデルを効率的に材料領域に移行します。微調整されたU-NetとVGG16エンコーダを用いたセマンティックセグメンテーションモデルにより実在性を検証し、97.1%の精度と85.7%の平均IoUを達成し、既存の方法を凌駕する。物理技術者と空間統計を使用した定量的分析は、合成と実際の微細構造との間の強力な一致を示しています。特に、２点相関と線形経路誤差はそれぞれ２．１％および０．６％未満に維持される。この方法は、プロセス認識微細構造を生成するためのSD3.5-Largeの最初の適用例であり、データベースの材料設計のための拡張可能なアプローチを提供します。

Takeaways、Limitations

•

Takeaways:

◦

Stable Diffusion 3.5 Largeを利用したプロセス認識微細構造生成の新しいアプローチの提示

◦

限られたデータでも効率的なモデル学習が可能(DreamBooth、LoRAを活用)

◦

得られた微細構造の高い現実性（精度97.1％、平均IoU85.7％）。

◦

物理技術者と空間統計による定量的分析による実際の微細構造との強い一致の確認

◦

データ駆動型材料設計のための拡張可能なアプローチを提供します。

•

Limitations:

◦

使用されるデータセットのサイズと多様性に関する明確な言及の欠如。

◦

他のプロセス変数や材料システムの一般化の可能性に関するさらなる研究が必要です。

◦

LoRAを使用したファインチューニングの制限により、非常に複雑な微細構造の作成に困難がある可能性があります。

◦

微細構造生成の物理的現象の説明が不足している。

Made with Slashpage