haebom
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests
Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm
Leveraging Large Language Models for Use Case Model Generation from Software Requirements
Enhancing PIBT via Multi-Action Operations
PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Learning the Basis: A Kolmogorov-Arnold Network Approach Embedding Green's Function Priors
Automatic Grid Updates for Kolmogorov-Arnold Networks using Layer Histograms
Remodeling Semantic Relationships in Vision-Language Fine-Tuning
Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from LDCT
Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations
Personalized Chain-of-Thought Summarization of Financial News for Investor Decision Support
Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG
Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects
CORE - A Cell-Level Coarse-to-Fine Image Registration Engine for Multi-stain Image Alignment
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models
WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios
Xiaoice: Training-Free Video Understanding via Self-Supervised Spatio-Temporal Clustering of Semantic Features
A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recitation
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Artificial-Intelligence Grading Assistance for Handwritten Components of a Calculus Exam
Computing Wasserstein Barycenters through Gradient Flows
Enhancing the development of Cherenkov Telescope Array control software with Large Language Models
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences
FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering
Towards Practical Multi-label Causal Discovery in High-Dimensional Event Sequences via One-Shot Graph Aggregation
Inference Offloading for Cost-Sensitive Binary Classification at the Edge
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
Zero-Shot Referring Expression Comprehension via Vison-Language True/False Verification
Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations
Dual-Mode Deep Anomaly Detection for Medical Manufacturing: Structural Similarity and Feature Distance
ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory
ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset
Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-commerce Models
Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search
Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction
Application-Specific Component-Aware Structured Pruning of Deep Neural Networks in Control via Soft Coefficient Optimization
Cameras as Relative Positional Encoding
The Prompt War: How AI Decides on a Military Intervention
xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Interpretable and Granular Video-Based Quantification of Motor Characteristics from the Finger Tapping Test in Parkinson Disease
Feedback-MPPI: Fast Sampling-Based MPC via Rollout Differentiation -- Adios low-level controllers
VFEFL: Privacy-Preserving Federated Learning against Malicious Clients via Verifiable Functional Encryption
Understanding Human-AI Trust in Education
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Scaling Textual Gradients via Sampling-Based Momentum
Caption This, Reason That: VLMs Caught in the Middle
BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Scalable Quantum State Preparation via Large-Language-Model-Driven Discovery
OODTE: A Differential Testing Engine for the ONNX Optimizer
Constructing an Optimal Behavior Basis for the Option Keyboard
Bridging LMS and generative AI: dynamic course content integration (DCCI) for enhancing student satisfaction and engagement via the ask ME assistant
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation
Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
MMTEB: Massive Multilingual Text Embedding Benchmark
Enhanced Structured Lasso Pruning with Class-wise Information
CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning
Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation
Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors
Enhanced Suicidal Ideation Detection from Social Media Using a CNN-BiLSTM Hybrid Model
Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations
Reducing the Scope of Language Models
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Multi-Turn Interactions for Text-to-SQL with Large Language Models
Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool
Spikingformer: A Key Foundation Model for Spiking Neural Networks
SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models
An Efficient Training Pipeline for Reasoning Graphical User Interface Agents
National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution
Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression
Enhancing Logical Expressiveness in Graph Neural Networks via Path-Neighbor Aggregation
WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
Green AI: A systematic review and meta-analysis of its definitions, lifecycle models, hardware and measurement attempts
GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets
A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
PsychCounsel-Bench: Evaluating the Psychology Intelligence of Large Language Models
DOoM: Difficult Olympiads of Math
From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Small Models Struggle to Learn from Strong Reasoners
Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models
Enhancing Conflict Resolution in Language Models via Abstract Argumentation
Discussion Graph Semantics of First-Order Logic with Equality for Reasoning about Discussion and Argumentation
A Comprehensive Survey on Multi-modal Conversational Emotion Recognition with Deep Learning
Black-Box On-Policy Distillation of Large Language Models
Load more
On Zero-Shot Reinforcement Learning
Created by
Haebom
저자
Scott Jeen
제로샷 강화 학습에 대한 논문 요약
개요
본 논문은 현실 세계 문제 해결에 적용될 수 있는 제로샷 강화 학습(zero-shot RL) 방법에 대해 다룹니다. 제로샷 RL은 새로운 작업이나 도메인에 대한 훈련 없이 일반화를 수행하는 것을 목표로 합니다. 논문은 현실 세계 데이터의 제약 조건(데이터 품질, 관측 가능성, 데이터 가용성)을 해결하기 위한 방법을 제시하고, 기존 방법의 한계를 지적하며, 이를 개선하기 위한 새로운 기술을 제안합니다.
시사점, 한계점
•
시사점:
◦
현실 세계의 제약 조건을 고려한 제로샷 RL 방법론 개발의 중요성을 강조합니다.
◦
기존 방법의 한계를 지적하고, 이를 보완하기 위한 새로운 기술을 제시합니다.
◦
실제 문제 해결에 기여할 수 있는 RL 방법론 개발의 가능성을 보여줍니다.
•
한계점:
◦
구체적인 방법론과 기술에 대한 상세한 설명은 논문 내용을 참조해야 합니다.
◦
제시된 방법론의 성능과 일반화 능력에 대한 추가적인 연구가 필요합니다.
◦
실제 문제에 적용하기 위한 추가적인 실험과 검증이 필요합니다.
PDF 보기
Made with Slashpage