[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Apple Intelligence Foundation Language Models: Tech Report 2025
Change of Thought: Adaptive Test-Time Computation
Time Series Forecastability Measures
Reading Between the Lines: Combining Pause Dynamics and Semantic Coherence for Automated Assessment of Thought Disorder
Loss-Complexity Landscape and Model Structure Functions
Acoustic Index: A Novel AI-Driven Parameter for Cardiac Disease Risk Stratification Using Echocardiography
Humans learn to prefer trustworthy AI over human partners
PHASE: Passive Human Activity Simulation Evaluation
AI-Assisted Fixes to Code Review Comments at Scale
Neural Architecture Search with Mixed Bio-inspired Learning Rules
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
CaSTFormer: Causal Spatio-Temporal Transformer for Driving Intention Prediction
Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity
AI-ming backwards: Vanishing archaeological landscapes in Mesopotamia and automatic detection of sites on CORONA imagery
Soft-ECM: An extension of Evidential C-Means for complex data
Single- to multi-fidelity history-dependent learning with uncertainty quantification and disentanglement: application to data-driven constitutive modeling
SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
Gauge Flow Models
Aligning Knowledge Graphs and Language Models for Factual Accuracy
Causal Language Control in Multilingual Transformers via Sparse Feature Steering
A Deep Learning-Based Ensemble System for Automated Shoulder Fracture Detection in Clinical Radiographs
IConMark: Robust Interpretable Concept-Based Watermark For AI Images
Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only
TopicImpact: Improving Customer Feedback Analysis with Opinion Units for Topic Modeling and Star-Rating Prediction
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition
Smart Routing for Multimodal Video Retrieval: When to Search What
Enhancing Breast Cancer Detection with Vision Transformers and Graph Neural Networks
Transformer-Based Framework for Motion Capture Denoising and Anomaly Detection in Medical Rehabilitation
H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance
VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation
Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiatio
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Just Add Geometry: Gradient-Free Open-Vocabulary 3D Detection Without Human-in-the-Loop
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
PGR-DRC: Pre-Global Routing DRC Violation Prediction Using Unsupervised Learning
Physical models realizing the transformer architecture of large language models
Generalist Bimanual Manipulation via Foundation Video Diffusion Models
The AI Ethical Resonance Hypothesis: The Possibility of Discovering Moral Meta-Patterns in AI Systems
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment
Generative AI-Driven High-Fidelity Human Motion Simulation
Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions
KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models
Towards Constraint Temporal Answer Set Programming
Cross-modal Causal Intervention for Alzheimer's Disease Prediction
Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery
Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments
When Speed meets Accuracy: an Efficient and Effective Graph Model for Temporal Link Prediction
From Extraction to Synthesis: Entangled Heuristics for Agent-Augmented Strategic Reasoning
OntView: What you See is What you Meant
DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs
Combining model tracing and constraint-based modeling for multistep strategy diagnoses
Buggy rule diagnosis for combined steps through final answer evaluation in stepwise tasks
BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety
Why Isn't Relational Learning Taking Over the World?
GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models
PrefPalette: Personalized Preference Modeling with Latent Attributes
GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
A Roadmap for Climate-Relevant Robotics Research
Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening
MMOne: Representing Multiple Modalities in One Scene
SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
(Almost) Free Modality Stitching of Foundation Models
A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
THOR: Transformer Heuristics for On-Demand Retrieval
SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems
KeyRe-ID: Keypoint-Guided Person Re-Identification using Part-Aware Representation in Videos
Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model
Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
ReCode: Updating Code API Knowledge with Reinforcement Learning
Cross-Layer Discrete Concept Discovery for Interpreting Language Models
Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Multiple-Frequencies Population-Based Training
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
GPU Performance Portability needs Autotuning
Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID
Coral Protocol: Open Infrastructure Connecting The Internet of Agents
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
KP Quantum Neural Networks
Load more
Process-aware and high-fidelity microstructure generation using stable diffusion
Created by
Haebom
作者
Hoang Cuong Phan, Minh Tien Tran, Chihun Lee, Hoheok Kim, Sehyok Oh, Dong-Kyu Kim, Ho Won Lee
概要
本論文は、材料設計におけるプロセス構造関係の理解に不可欠であるプロセスパラメータを条件とする現実的な微細構造画像合成に焦点を当てています。既存の限られた訓練マイクログラフと連続的なプロセス変数の特性のために困難を経験するこのタスクについて、本研究は、最先端のテキスト画像拡散モデルであるStable Diffusion 3.5 Large(SD3.5-Large)を微細構造生成に適用した新しいプロセス認識生成モデリングアプローチを提示する。連続変数(アニーリング温度、時間、倍率)をモデルの条件に直接エンコードする数値認識埋め込みを導入し、指定されたプロセス条件下で制御された画像生成とプロセスベースの微細構造変化キャプチャを可能にします。データの欠如と計算上の制約を解決するために、DreamBoothとLow-Rank Adaptation(LoRA)を介してモデルの重みの一部のみを微調整して、事前に訓練されたモデルを効率的に材料領域に移行します。微調整されたU-NetとVGG16エンコーダを用いたセマンティックセグメンテーションモデルにより実在性を検証し、97.1%の精度と85.7%の平均IoUを達成し、既存の方法を凌駕する。物理技術者と空間統計を使用した定量的分析は、合成と実際の微細構造との間の強力な一致を示しています。特に、2点相関と線形経路誤差はそれぞれ2.1%および0.6%未満に維持される。この方法は、プロセス認識微細構造を生成するためのSD3.5-Largeの最初の適用例であり、データベースの材料設計のための拡張可能なアプローチを提供します。
Takeaways、Limitations
•
Takeaways:
◦
Stable Diffusion 3.5 Largeを利用したプロセス認識微細構造生成の新しいアプローチの提示
◦
限られたデータでも効率的なモデル学習が可能(DreamBooth、LoRAを活用)
◦
得られた微細構造の高い現実性(精度97.1%、平均IoU85.7%)。
◦
物理技術者と空間統計による定量的分析による実際の微細構造との強い一致の確認
◦
データ駆動型材料設計のための拡張可能なアプローチを提供します。
•
Limitations:
◦
使用されるデータセットのサイズと多様性に関する明確な言及の欠如。
◦
他のプロセス変数や材料システムの一般化の可能性に関するさらなる研究が必要です。
◦
LoRAを使用したファインチューニングの制限により、非常に複雑な微細構造の作成に困難がある可能性があります。
◦
微細構造生成の物理的現象の説明が不足している。
PDFを見る
Made with Slashpage