Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

GeoGPT-RAG Technical Report

Created by
  • Haebom

Author

Fei Huang, Fan Wu, Zeqing Zhang, Qihao Wang, Long Zhang, Grant Michael Boquet, Hongyang Chen

Outline

GeoGPT is an open, large-scale language model system built to advance geoscience research. It enhances domain-specific capabilities by incorporating Retrieval Augmented Generation (RAG), which adds relevant information retrieved from external knowledge sources to model output. GeoGPT generates accurate, context-sensitive answers using RAG from the GeoGPT library, a specialized corpus specifically designed for geoscience content. Users can also upload their own publication lists to create a personalized knowledge base, and GeoGPT uses the materials provided to search and respond. To further improve retrieval quality and domain alignment, we fine-tuned both the embedding model and the ranking model that scores the query relevance of retrieved phrases. These improvements optimize RAG for geoscience applications and significantly enhance the system's ability to deliver accurate and reliable output. GeoGPT reflects a strong commitment to open science by emphasizing collaboration, transparency, and community-driven development. As part of this commitment, we have open-sourced two core RAG components, GeoEmbedding and GeoReranker, to provide powerful and accessible AI tools for geoscientists, researchers, and professionals worldwide.

Takeaways, Limitations

Takeaways:
Providing open-source, large-scale language models specialized in the field of geoscience.
Providing accurate and reliable information based on RAG
Ability to create a customized knowledge base
Contributing to the advancement of geoscience research by open-sourcing core components such as GeoEmbedding and GeoReranker.
Improving information accessibility through the use of external knowledge sources
Limitations:
Lack of explicit mention of the scope and quality of content in the GeoGPT library.
Lack of detailed description of the fine-tuning process and performance evaluation.
Lack of information about ongoing maintenance and update plans for open source components.
Lack of discussion about the model's bias and limitations
👍