Share
Sign In
공부 내용
Semantic search
Y
yeji Kim
👍
ENHANCING KNOWLEDGE RETRIEVAL WITH IN-CONTEXT LEARNING AND SEMANTIC SEARCH THROUGH GENERATIVE AI
Method 1 : Generative text retrieval (GTR)
각 chunk에 word2vec 등으로 Embedding → vector database 구축
query embedding과 유사도 계산 → 가장 가까운 걸로.
Generative tabular text retrieval (GTR-T)
먼저 database table과 meta data를 가져와서 .csv로 저장.
query를 embedding하여 관련있는 table을 찾음
이 table을 Llm한테 줘서 적절한 sql 쿼리문을 생성함.
Olio: A Semantic Search Interface for Data Repositories
Intro
Q&A, exploratory search, design search.
태블로를 활용한 시각화 → 썸네일 제공?
Related works
Semantic web search system
keyword(structured query languages) based or NL based
keyword based
QUERIX - stanford CoreNLP parser + wordNet
olio는 trends, location, groupings, aggregations, filters 등으로 intentfmf qnsfbgka.
의도를 특정 그룹으로 나누는 것 같음. 내가 하려는 것과 잘 어울리는지는 모르겠어서 일단 읽기 중단.
Know where to go : make llm a relevant, responsible, and trustworthy searcher.
Intro
목표
relevant - query와 관련 있는 text를
trustworthy - 믿을 만한 source에서 가져와
responsible - 믿을 만한 evidence를 추출하기.
3 integral modules
Intent-aware generator - query와 online source 사이의 연결...?
Evidence-sensitive validator - web data로 source의 신뢰도 분석 → evidence 추출
Multi-strategy supported optimizer - LLM의 self critique ability와 web analysis capa로 신뢰도 향상
contributions
generator, validator, optimizer로 된 architecture 제안
multi-strategy fusion
comprehensive evaluation framework
Related works
LLMs
Domain-specific models
Alpaca, LLaMA-Adapter, Vicuna, Baize, Toolformer, Gorilla
이들의 학습 방법을 참고해봐도 좋을 듯.
Retrieval-augmented models
Query2Doc - fabricate pseudo docs → 학습으로 관련된 text를 생성할 확률을 올림.
LLM-URL - LLM한테 관련된 url을 찍게 시킴.
PRP - query와 document 쌍을 평가함.
ALCE - output을 평가함
FLARE - multiple search engine query으로 proactive prediction
Generative information retrieval systems
WebGPT, web GLM
Human feedback and AI feedback
RLHF, InstructGPT
Fine-grained HF -
PRM -
AlpacaFarm - human feedback을 모방
RL-CAI, PD-SA - minimal supervised signals
Methodology
개괄
Retriever, generator, scorer
retrieval/generation
direct association between queries, online sources
Generator
LLM이 믿을만한 source를 만들어내도록 가이드
2 sub modules
intent-based query expansion
Appendix B 살펴보기 !!
multi-level topic generation strategy
10 broad thematic categories → 100 sub themes.
formulate intent recognition and query expansion instruction
constrained online source generation
gradual leveraging이 중요함. (천천히 범위를 좁혀나가기?)
1차로 먼저 쿼리와 관련된 url을 찾아달라 요청한 다음,(online source generation)
해당 url의 도메인만 따와서, 그 도메인 안에서 다시 찾아달라고 하기. (generation constraint)
Self-verification
Validator
2 evidence retrrieval strategies. score-only strategy.
Experiment
Baseline - new bing, perplexity.ai, web GPT, web GLM
Subscribe to '아무튼-작업일지'
Welcome to '아무튼-작업일지'!
By subscribing to my site, you'll be the first to receive notifications and emails about the latest updates, including new posts.
Join SlashPage and subscribe to '아무튼-작업일지'!
Subscribe
👍
Other posts in '공부 내용'See all
yeji Kim
OpenAI Embedding & Semantic search
https://platform.openai.com/docs/api-reference/embeddings
yeji Kim
OpenAI API
https://platform.openai.com/docs/guides/batch/getting-started Batch API 이용하기 Upload batch file Creating the batch Checking the status of a batch Retrieving the results 다른 LLM 알아보기 이걸로 기본적인 처리는 해야겠다!!! (와)
yeji Kim
GPT fine tuning
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset https://www.datacamp.com/tutorial/fine-tuning-openais-gpt-4-step-by-step-guide finetuning GPT 4o jsonl format의 데이터를 만들어야 함. {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"messages": [{"role": "system", "content": "You are teaching assistant for Machine Learning. You should help to user to answer on his question."}, {"role": "user", "content": "What is machine learning?"}, {"role": "assistant", "content": "'Tis but the art of teaching machines to think, to learn from data most fine, and decisions to link."}]} {"messages": [{"role": "system", "content": "You are teaching assistant for Machine Learning. You should help to user to answer on his question."}, {"role": "user", "content": "How doth neural networks work?"}, {"role": "assistant", "content": "They're like the brains of yon digital beast, with layers many, and nodes that cease."}]} Upload training file Create a fine-tuned model Analyze fine-tuned model UI로도 관련 정보를 볼 수 있음. fine tuning 작업이 끝나면, job details의 'fine_tuned_model' 필드에서 모델 이름을 볼 수 있음. → 아래아 같이 모델 이름을 적고 활용하면 됨.