Share
Sign In
공부 내용
GPT fine tuning
Y
yeji Kim
👍
finetuning GPT 4o
1.
jsonl format의 데이터를 만들어야 함.
💬
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"messages": [{"role": "system", "content": "You are teaching assistant for Machine Learning. You should help to user to answer on his question."}, {"role": "user", "content": "What is machine learning?"}, {"role": "assistant", "content": "'Tis but the art of teaching machines to think, to learn from data most fine, and decisions to link."}]}

{"messages": [{"role": "system", "content": "You are teaching assistant for Machine Learning. You should help to user to answer on his question."}, {"role": "user", "content": "How doth neural networks work?"}, {"role": "assistant", "content": "They're like the brains of yon digital beast, with layers many, and nodes that cease."}]}
2.
Upload training file
from openai import OpenAI client = OpenAI() client.files.create( file=open("train.jsonl", "rb"), purpose="fine-tune" ) >>>> output FileObject(id='file-rIua39sJX1O64gzxTYfpvJx7', bytes=11165, created_at=1709499930, filename='train.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)
3.
Create a fine-tuned model
from openai import OpenAI client = OpenAI() client.fine_tuning.jobs.create( training_file="file-rIua39sJX1O64gzxTYfpvJx7", model="gpt-3.5-turbo" #change to gpt-4-0613 if you have access )
from openai import OpenAI client = OpenAI() # List 10 fine-tuning jobs client.fine_tuning.jobs.list(limit=10) # Retrieve the state of a fine-tune client.fine_tuning.jobs.retrieve("...") # Cancel a job client.fine_tuning.jobs.cancel("...") # List up to 10 events from a fine-tuning job client.fine_tuning.jobs.list_events(fine_tuning_job_id="...", limit=10) # Delete a fine-tuned model (must be an owner of the org the model was created in) client.models.delete("ft:gpt-3.5-turbo:xxx:xxx")\
4.
Analyze fine-tuned model
{ "object": "fine_tuning.job.event", "id": "ftjob-Na7BnF5y91wwGJ4EgxtzVyDD", "created_at": 1693582679, "level": "info", "message": "Step 100/100: training loss=0.00", "data": { "step": 100, "train_loss": 1.805623287509661e-5, "train_mean_token_accuracy": 1.0 }, "type": "metrics" }
UI로도 관련 정보를 볼 수 있음.
5.
fine tuning 작업이 끝나면, job details의 'fine_tuned_model' 필드에서 모델 이름을 볼 수 있음. → 아래아 같이 모델 이름을 적고 활용하면 됨.
from openai import OpenAI client = OpenAI() completion = client.chat.completions.create( model="ft:gpt-3.5-turbo-0613:personal::8k01tfYd", messages=[ {"role": "system", "content": "You are a teaching assistant for Machine Learning. You should help to user to answer on his question."}, {"role": "user", "content": "What is a loss function?"} ] ) print(completion.choices[0].message)
Subscribe to '아무튼-작업일지'
Welcome to '아무튼-작업일지'!
By subscribing to my site, you'll be the first to receive notifications and emails about the latest updates, including new posts.
Join SlashPage and subscribe to '아무튼-작업일지'!
Subscribe
👍
Other posts in '공부 내용'See all
yeji Kim
DB table 구성하기
To efficiently implement the features you described, the database structure needs to account for several types of entities: lecture notes, textbooks, past exam questions, professors, and study tips. Additionally, the system should allow cross-referencing between these entities, version control for lecture notes, and the ability to filter data by professor. Here’s how you can structure the database tables: Key Tables for the Database Structure Subjects (subjects): Stores information about each subject. Professors (professors): Stores information about the professors. Lecture Notes (lecture_notes): Stores metadata for each lecture note document. Lecture Slides (lecture_slides): Stores content for individual slides within a lecture note. Textbooks (textbooks): Stores metadata about textbooks for each subject. Textbook Pages (textbook_pages): Stores the text content of each page in a textbook. Test Questions (test_questions): Stores individual test questions. Slide to Textbook Mapping (slide_textbook_mapping): Relates specific slides to textbook pages or paragraphs. Slide to Test Question Mapping (slide_question_mapping): Relates specific slides to test questions. Test Question to Textbook Mapping (question_textbook_mapping): Relates specific test questions to textbook pages or paragraphs. Study Tips (study_tips): Stores study tips from professors related to each subject. Versions (lecture_note_versions): Manages versioning for the lecture notes. Table Structures and Relationships Subjects Table (subjects) subject_id: INT (Primary Key) subject_name: VARCHAR(255) (e.g., "Biochemistry") description: TEXT (optional description) Professors Table (professors)
yeji Kim
OpenAI Embedding & Semantic search
https://platform.openai.com/docs/api-reference/embeddings
yeji Kim
OpenAI API
https://platform.openai.com/docs/guides/batch/getting-started Batch API 이용하기 Upload batch file Creating the batch Checking the status of a batch Retrieving the results 다른 LLM 알아보기 이걸로 기본적인 처리는 해야겠다!!! (와)