Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks

Created by
  • Haebom

Author

Leander Melroy Maben, Gayathri Ganesh Lakshmy, Srijith Radhakrishnan, Siddhant Arora, Shinji Watanabe

Outline

AURA is an open-source, voice-centric AI assistant that can perform complex, goal-oriented tasks using a variety of tools, such as calendar scheduling, contact lookups, web searches, and email. It is designed to be a chain of speech recognition (ASR), text-to-speech synthesis (TTS), and large-scale language models (LLM), and has a modular architecture that allows easy integration of new tools using natural language prompts and action classes. In the VoiceBench evaluation, it scored 92.75% on OpenBookQA, outperforming existing open-source systems and approaching GPT-4, and 4.39 on AlpacaEval, showing competitive results with other open-source systems. It achieved a 90% success rate in human evaluation on complex multi-pass speech tasks.

Takeaways, Limitations

Takeaways:
The first system to integrate complex multi-turn voice conversations and tool usage in an open source environment.
Modular design allows for easy integration of new tools.
Superior performance compared to existing open source systems (based on VoiceBench).
Achieve high task success rates in human assessments.
Limitations:
Lack of reference to specific Limitations. Lack of details on the systems against which performance comparisons are being made.
Despite being open source, there is a lack of information on the resources (such as computing power) required for actual implementation and use.
Lack of verification of stability or problems that may occur during long-term use.
👍