haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation
The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
DB-FGA-Net: Dual Backbone Frequency Gated Attention Network for Multi-Class Brain Tumor Classification with Grad-CAM Interpretability
日常的な検査データを用いた早期癌検出の実現可能性の評価:不均衡なデータセットにおける機械学習アプローチの評価
On the Structure of Stationary Solutions to McKean-Vlasov Equations with Applications to Noisy Transformers
ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models
Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in OmniModels
ADPO: Anchored Direct Preference Optimization
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
MIN-Merging: Merge the Important Neurons for Model Merging
When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking
From Flows to Words: Can Zero-/Few-Shot LLMs Detect Network Intrusions? A Grammar-Constrained, Calibrated Evaluation on UNSW-NB15
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection
UNDREAM: Bridging Differentiable Rendering and Photorealistic Simulation for End-to-end Adversarial Attacks
The Chameleon Nature of LLMs: Quantifying Multi-Turn Stance Instability in Search-Enabled Language Models
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
Evidence Without Injustice: A New Counterfactual Test for Fair Algorithms
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Automatic Music Sample Identification with Multi-Track Contrastive Learning
DiffHeads: Differential Analysis and Inference-Time Masking of Bias Heads in Large Language Models
Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization
Uncovering Singularities in Feynman Integrals via Machine Learning
Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation
Token Is All You Price
LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology
IKNet: Interpretable Stock Price Prediction via Keyword-Guided Integration of News and Technical Indicators
Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices
Feasibility-Aware Decision-Focused Learning for Predicting Parameters in the Constraints
Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Holistic Order Prediction in Natural Scenes
Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models
Aligning LLMs for Multilingual Consistency in Enterprise Applications
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Automatic Discovery of One Parameter Subgroups of $SO(n)$
Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
WolBanking77: Wolof Banking Speech Intent Classification Dataset
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
EvoBrain: Dynamic Multi-Channel EEG Graph Modeling for Time-Evolving Brain Networks
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers
Membership Inference Attacks on Recommender System: A Survey
Reconstruction Alignment Improves Unified Multimodal Models
Deriving Transformer Architectures as Implicit Multinomial Regression
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management
ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation
The Role of AI in Facilitating Interdisciplinary Collaboration: Evidence from AlphaFold
Score-informed Neural Operator for Enhancing Ordering-based Causal Discovery
TaoSR1: The Thinking Model for E-commerce Relevance Search
A Data-driven ML Approach for Maximizing Performance in LLM-Adapter Serving
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing
The ISLab Solution to the Algonauts Challenge 2025: A Multimodal Deep Learning Approach to Brain Response Prediction
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective
BikeBench: A Bicycle Design Benchmark for Generative Models with Objectives and Constraints
Trusted Knowledge Extraction for Operations and Maintenance Intelligence
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports
DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
A Lightweight Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes
Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Rethinking and Exploring String-Based Malware Family Classification in the Era of LLMs and RAG
Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies
ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization
Echo State Transformer: Attention Over Finite Memories
Reasoning as an Adaptive Defense for Safety
Curious Causality-Seeking Agents Learn Meta Causal World
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
FlightKooba: A Fast Interpretable FTP Model
Thought Anchors: Which LLM Reasoning Steps Matter?
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Identifiability of Deep Polynomial Neural Networks
Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment
Distributional Training Data Attribution: What do Influence Functions Sample?
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
Unsupervised Document and Template Clustering using Multimodal Embeddings
Load more
BikeBench: A Bicycle Design Benchmark for Generative Models with Objectives and Constraints
Created by
Haebom
作者
Lyle Regenwetter, Yazan Abu Obaideh, Fabien Chiotti, Ioanna Lykourentzou, Faez Ahmed
概要
BikeBenchは、複数の実際の目標と制約を伴う問題の生成モデルを評価するためのエンジニアリング設計ベンチマークです。 BikeBenchは、生成AIが物理法則、人間のガイドライン、およびハード制約を理解する能力を評価します。このベンチマークは、自転車デザインを生成するAIモデルの能力を評価し、データセットに似ているだけでなく、特定のパフォーマンス目標と制約を満たすかどうかを測定します。 BikeBenchは、空気力学、人間工学、構造力学、人間評価の使いやすさ、主観的なテキストや画像のプロンプトとの類似性など、さまざまな人間中心および複数の物理性能特性を定量化します。ベンチマークをサポートするために、シミュレーション結果データセット、10,000の人間評価バイク評価データセット、パラメトリック、CAD / XML、SVG、およびPNG表現で構成される160万のデザインで構成された合成生成データセットが提供されます。 BikeBenchは、表形式生成モデル、大規模言語モデル(LLM)、設計最適化、およびハイブリッドアルゴリズムを一緒に評価するように構成されています。実験結果LLMと表形式生成モデルは、設計品質、制約を満たし、類似性スコアでハイブリッドGenAI +最適化アルゴリズムに及ばず、大幅な改善の余地があることを示唆しています。
Takeaways、Limitations
•
Takeaways:
◦
制約のある多目的エンジニアリング設計問題に対する生成AIの発展を促進できる最初のベンチマークです。
◦
LLMと表形式の生成モデルは、ハイブリッドGenAI +最適化アルゴリズムと比較して、設計品質、制約を満たし、類似性スコアに劣ることを示しています。
◦
さまざまなAIモデルを評価して比較できる標準化されたプラットフォームを提供します。
•
Limitations:
◦
この論文のLimitationsは明示的に言及されていません。
PDFを見る
Made with Slashpage