Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurofibromas in whole-body MRI

Soft Tokens, Hard Truths

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models

COLT: Enhancing Video Large Language Models with Continual Tool Usage

Do You Need Proprioceptive States in Visuomotor Policies?

CPCLDETECTOR: Knowledge Enhancement and Alignment Selection for Chinese Patronizing and Condescending Language Detection

Self-Evolving LLMs via Continual Instruction Tuning

Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework

Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning

Equip Pre-ranking with Target Attention by Residual Quantization

Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data

Patterns in the Transition From Founder-Leadership to Community Governance of Open Source

Synthetic bootstrapped pretraining

PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization

Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis

Beyond the Pre-Service Horizon: Infusing In-Service Behavior for Improved Financial Risk Forecasting

HumAine-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning

EAI-Avatar: Emotion-Aware Interactive Talking Head Generation

SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs

Do AI Companies Make Good on Voluntary Commitments to the White House?

Embedding Alignment in Code Generation for Audio

Kron-LoRA: Hybrid Kronecker-LoRA Adapters for Scalable, Sustainable Fine-tuning

From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs

Measuring Harmfulness of Computer-Using Agents

Enhancing RAG Efficiency with Adaptive Context Compression

CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis

Assay2Mol: large language model-based drug design using BioAssay context

Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation

White-Basilisk: A Hybrid Model for Code Vulnerability Detection

Energy Management for Renewable-Collocated Artificial Intelligence Data Centers

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization

HAZEMATCHING: Dehazing Light Microscopy Images with Guided Conditional Flow Matching

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

CUPID: Curating Data your Robot Loves with Influence Functions

Quantum-Classical Hybrid Quantized Neural Network

SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model

Why Do Some Inputs Break Low-Bit LLM Quantization?

A Quad-Step Approach to Uncertainty-Aware Deep Learning for Skin Cancer Classification

CellCLIP -- Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning

Urania: Differentially Private Insights into AI Use

RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Localized LoRA: A Structured Low-Rank Approximation for Efficient Fine-Tuning

PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset

To Trust Or Not To Trust Your Vision-Language Model's Prediction

SEM: Enhancing Spatial Understanding for Robust Robot Manipulation

Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

GSPRec: Temporal-Aware Graph Spectral Filtering for Recommendation

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare

LEMUR Neural Network Dataset: Towards Seamless AutoML

Towards Visual Text Grounding of Multimodal Large Language Model

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches

DP-LET: An Efficient Spatio-Temporal Network Traffic Prediction Framework

Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment

Challenges and Trends in Egocentric Vision: A Survey

Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models

Language Models Fail to Introspect About Their Knowledge of Language

Learning to Drive by Imitating Surrounding Vehicles

A Transformer Model for Predicting Chemical Products from Generic SMARTS Templates with Data Augmentation

Anomaly Detection in Complex Dynamical Systems: A Systematic Framework Using Embedding Theory and Physics-Inspired Consistency

Bridging Information Gaps with Comprehensive Answers: Improving the Diversity and Informativeness of Follow-Up Questions

HawkBench: Investigating Resilience of RAG Methods on Stratified Information-Seeking Tasks

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Compact Rule-Based Classifier Learning via Gradient Descent

BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues

Representation Convergence: Mutual Distillation is Secretly a Form of Regularization

Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets

Stylus: Repurposing Stable Diffusion for Training-Free Music Style Transfer on Mel-Spectrograms

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model

A GEN AI Framework for Medical Note Generation

Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems

Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

On the Integration of Spatial-Temporal Knowledge: A Lightweight Approach to Atmospheric Time Series Forecasting

DeNOTS: Stable Deep Neural ODEs for Time Series

TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot

RealitySummary: Exploring On-Demand Mixed Reality Text Summarization and Question Answering using Large Language Models

CLIP Can Understand Depth

CueGCL: Cluster-aware Personalized Self-Training for Unsupervised Graph Contrastive Learning

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

Markov Decision Processes under External Temporal Processes

MAPO: Mixed Advantage Policy Optimization

CogAtom: From Cognitive Atoms to Olympiad-level Mathematical Reasoning in Large Language Models

Plan Verification for LLM-Based Embodied Task Completion Agents

GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning

Compression Strategies for Efficient Multimodal LLMs in Medical Contexts

Emergent Risk Awareness in Rational Agents under Resource Constraints

Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare

Created by

Haebom

Author

Lovedeep Gondara, Jonathan Simkin, Graham Sayle, Shebnum Devji, Gregory Arbour, Raymond Ng

Outline

This study investigates the necessity of fine-tuning versus zero-shot pretraining, the benefits of domain-specific versus general pretraining, the value of additional domain-specific pretraining, and the continued relevance of small-scale language models (SLMs) over large-scale language models (LLMs) for specific tasks to guide language model selection. Using electronic pathology reports from the British Columbia Cancer Registry (BCCR), we evaluated three classification scenarios with varying difficulty and data sizes. Various SLMs and one LLM were used as models. SLMs were evaluated using both zero-shot and fine-tuning methods, while LLMs were evaluated solely on zero-shot. Fine-tuning significantly improved SLM performance compared to zero-shot results in all scenarios. Zero-shot LLMs outperformed zero-shot SLMs but consistently lagged behind fine-tuned SLMs. Domain-specific SLMs outperformed general SLMs after fine-tuning, particularly on challenging tasks. Additional domain-specific pretraining provided only a marginal benefit on easy tasks, but significant improvements on complex and data-poor tasks. In conclusion, we demonstrate that fine-tuning SLM in specific domains is crucial and can outperform zero-shot LLM on target classification tasks. Pretraining on domain-relevant or domain-specific data provides additional benefits, especially for complex problems or with limited fine-tuning data. While LLM offers powerful zero-shot capabilities, it did not match the performance of a properly fine-tuned SLM on the specific task in this study. Even in the LLM era, SLM remains relevant and efficient, and can offer a better performance-resource balance than LLM.

Takeaways, Limitations

•

Takeaways:

◦

We demonstrate that fine-tuning SLM can outperform zero-shot LLM for specific domain tasks.

◦

Domain-specific or domain-specific pre-training has been shown to contribute to improved performance, especially for difficult tasks or when data is scarce.

◦

This suggests that SLM is still useful even in the LLM era and may be more resource-efficient in terms of performance compared to LLM.

•

Limitations:

◦

The dataset used in the study was limited to electronic pathology reports from the British Columbia Cancer Registry (BCCR), which may limit generalizability.

◦

The limited number of LLMs evaluated means that comparative analysis with other LLMs is lacking.

◦

Further research is needed to draw generalized conclusions across different types of work.

View PDF

Made with Slashpage