Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian

OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models

Beyond Imaging: Vision Transformer Digital Twin Surrogates for 3D+T Biological Tissue Dynamics

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets

Test-time Corpus Feedback: From Retrieval to RAG

$\Mathrm{TIME}[t]\subseteq \mathrm{SPACE}[O(\sqrt{t})]$ via Tree Height Compression

ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students’ Cognitive Abilities

Documenting Deployment with Fabric: A Repository of Real-World AI Governance

High-Throughput Low-Cost Segmentation of Brightfield Microscopy Live Cell Images

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering

SupraTok: Cross-Boundary Tokenization for Enhanced Language Model Performance

Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis

Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization

LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

CURE: Critical-Token-Guided Re-Concatenation for Entropy-Collapse Prevention

Fourier-Guided Attention Upsampling for Image Super-Resolution

Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks

Evaluating Contrast Localizer for Identifying Causal Units in Social & Mathematical Tasks in Language Models

Neural Logic Networks for Interpretable Classification

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Diffusing the Blind Spot: Uterine MRI Synthesis with Diffusion Models

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

Leveraging GNN to Enhance MEF Method in Predicting ENSO

Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications

SGD Convergence under Stepsize Shrinkage in Low-Precision Training

CLAP: Coreference-Linked Augmentation for Passage Retrieval

Geometry-Aware Spiking Graph Neural Network

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

4D-PreNet: A Unified Preprocessing Framework for 4D-STEM Data Analysis

MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning

Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation

Enhancing material behavior discovery using embedding-oriented Physically-Guided Neural Networks with Internal Variables

Agentic large language models improve retrieval-based radiology question answering

PARROT: An Open Multilingual Radiology Reports Dataset

Trusted Knowledge Extraction for Operations and Maintenance Intelligence

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging

Controllable Hybrid Captioner for Improved Long-form Video Understanding

Combining Cost-Constrained Runtime Monitors for AI Safety

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks

FlexOlmo: Open Language Models for Flexible Data Use

Multi-Level Fusion Graph Neural Network for Molecule Property Prediction

BiMark: Unbiased Multilayer Watermarking for Large Language Models

A foundation model with multi-variate parallel attention to generate neuronal activity

Effective Red-Teaming of Policy-Adherent Agents

From Legal Texts to Defeasible Deontic Logic via LLMs: A Study in Automated Semantic Analysis

LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models

AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models

Auto prompt sql: a resource-efficient architecture for text-to-sql translation in constrained environments

EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models

Accountability Attribution: Tracing Model Behavior to Training Processes

Equivariant Spherical Transformer for Efficient Molecular Modeling

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate

Security Concerns for Large Language Models: A Survey

Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models

Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation

An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems

IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection

Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Learning with Spike Synchrony in Spiking Neural Networks

Explainable Prediction of the Mechanical Properties of Composites with CNNs

From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

RT-Cache: Training-Free Retrieval for Real-Time Manipulation

DSADF: Thinking Fast and Slow for Decision Making

WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

ICQuant: Index Coding enables Low-bit LLM Quantization

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

Theory of Mind in Large Language Models: Assessment and Enhancement

SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features

VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

CLaP -- State Detection from Time Series

Celler:A Genomic Language Model for Long-Tailed Single-Cell Annotation

Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios

ImF: Implicit Fingerprint for Large Language Models

HoarePrompt: Structural Reasoning About Program Correctness in Natural Language

More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models

MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models

EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

Investigating the Robustness of Deductive Reasoning with Large Language Models

Field Matching: an Electrostatic Paradigm to Generate and Transfer Data

Evaluation of Large Language Models via Coupled Token Generation

Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities

CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks

Created by

Haebom

Author

Meng Li, Timothy M. McPhillips, Dingmin Wang, Shin-Rong Tsai, and Bertram Lud ascher.

Outline

This paper highlights the importance of recognizing the information flow and computations that compose data science and machine learning Python notebooks for evaluating, reusing, and adapting them to new tasks. Re-executing and examining notebooks is often impractical due to the difficulty of resolving data and software dependencies. While large-scale language models (LLMs) pretrained on large codebases have been shown to be effective in understanding code without execution, we observed that some realistic notebooks fail to be understood due to hallucinations and long contexts. To address these issues, we propose a notebook understanding task that generates a notebook's information flow graph and the corresponding cell execution dependency graph. We also demonstrate the effectiveness of a "pincer" strategy that utilizes limited syntactic analysis to facilitate complete notebook comprehension using LLMs. The Capture and Resolve Assisted Bounding Strategy (CRABS) uses shallow parsing and abstract syntax tree (AST) analysis to capture the correct interpretation of a notebook between lower and upper bound estimates of the set of inter-cell I/Os (information flow to and from cells via variables). It then resolves any remaining ambiguities using the LLM, using cell-wise zero-shot learning, to identify the actual data inputs and outputs for each cell. We evaluate and demonstrate the effectiveness of our approach using an annotated dataset consisting of 50 representative, high-vote Kaggle notebooks representing 3,454 actual cell inputs and outputs. The LLM analyzes the syntactic structure of these notebooks, correctly resolving 1,397 (98%) of the remaining 1,425 ambiguities. Across the 50 notebooks, CRABS achieves an average F1 score of 98% for identifying inter-cell information flow and an average F1 score of 99% for identifying excessive cell execution dependencies.

Takeaways, Limitations

•

Takeaways:

◦

We demonstrate that the CRABS strategy, combining limited parsing and LLM, can effectively analyze the information flow and execution dependencies of Python notebooks.

◦

We present a practical method to perform notebook understanding tasks with high accuracy (98-99% F1 score).

◦

It presents new possibilities for reuse and adaptation of data science and machine learning notebooks.

•

Limitations:

◦

Currently, only evaluation results for 50 Kaggle notebook datasets are presented, requiring further research on generalizability.

◦

Additional generalization performance evaluations for various types of Python notebooks and complex codes are needed.

◦

LLM's hallucination problem may not be completely resolved, and a more robust solution may be needed.

◦

A more detailed analysis of the computational cost and efficiency of CRABS is needed.

View PDF

Made with Slashpage