HuggingFace 언어모델 활용

•

Huggingface는 다양한 모델, 데이터셋 등을 모아놓은 저장소입니다.

•

이 곳에는 매일같이 새로운 모델이 개발되어 업로드 되고 있으며, 저희는 이를 다운로드하여 사용해볼 수 있습니다.

•

혹은 이를 파인튜닝하거나, 자신만의 모델을 공개해볼 수도 있습니다. 이는 마지막 단원에서 진행됩니다.

Transformers 라이브러리로 huggingface 모델 사용하기

일부 모델의 경우, 사용하기 전 허가를 맡아야 하는 경우도 있습니다. Acknowledge license를 눌러 권한을 획득하고 사용해주면 됩니다.

토큰 발급받기

우선 토큰발급을 위해 settings로 진입합니다.

Access Tokens에 들어가 create new token 을 통해 새로운 토큰을 발급받습니다.

c. 발급받은 토큰을 이후 사용할 수 있도록 colab에 openai 토큰을 저장하듯 저장해줍니다. 이름은 HF_ACCESS_TOKEN 으로 하겠습니다.

Colab에서 Huggingface 로그인하기

import huggingface_hub
from google.colab import userdata
HF_ACCESS_TOKEN = userdata.get('HF_ACCESS_TOKEN')
huggingface_hub.login(HF_ACCESS_TOKEN)

gemma2를 huggingface로 사용해볼 수 있습니다.

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="google/gemma-2-2b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
      {"role": "system", "content": "You always put ^^ end of text"},
      {"role": "user", "content": "Hello?"},
      {"role": "assistant", "content": "Hi^^"},
      {"role": "user", "content": "What is your name?"},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)

Made with Slashpage