Introduction

Natural Language Processing (NLP) has seen tremendous advancements in recent years, thanks to powerful large language models like Falcon 7B.

Falcon 7B, is a state-of-the-art LLM based on the Transformer architecture(https://huggingface.co/blog/falcon). While Falcon 7B offers impressive out-of-the-box performance, instruction fine-tuning allows you to build your own LLM with context and knowledge about your data.

In this article, we will explore how to fine-tune Falcon7B on custom Frequently Asked Questions (FAQ) data, allowing you to create a powerful and accurate FAQ-based chat bot tailored to your specific needs.

Understanding Fine-Tuning

Fine-tuning is a transfer learning technique that involves taking a pre-trained model, such as Falcon 7B, and adapting it to perform a specific task. The pre-trained model is already equipped with knowledge about language, grammar, and context from its broad training on a large corpus of text.

Fine-tuning allows us to leverage this existing knowledge and fine-tune the model for more specific tasks, such as sentiment analysis, named entity recognition, or text classification.

Instruction fine-tuning uses a set of labeled examples in the form of {prompt, response} pairs to further train the pre-trained model in adequately predicting the response given the prompt.

Fine-tuning Falcon 7B typically involves two main steps:

Preparing Data: You’ll need a labeled dataset that corresponds to your specific NLP task. The dataset should be preprocessed and organized into input sequences compatible with the model.
Fine-Tuning: During this step, you’ll use the labeled data to update the parameters of Falcon 7B while retaining its general language understanding. Fine-tuning can be carried out using Python libraries like Hugging Face’s transformers and tensorflow or pytorch.

Preparing the Data

To start fine-tuning Falcon 7B, you’ll first need to prepare your labeled dataset. The dataset should be formatted in a way that the model can understand. For instruction finet-tuning , the data can be organized into a CSV or JSON format, where each row contains the question and its response.

Make sure the data is clean and preprocessed appropriately. Data preprocessing may include tokenization, lowercasing, removing special characters, and handling missing values. It’s crucial to preserve the meaning and context of the text while preparing the data.


import transformers
from datasets import load_dataset, Dataset
import pandas as pd


def gen_prompt(text_input):
    return f"""
    <human>: {text_input["question"]}
    <assistant>: {text_input["answer"]}
    """.strip()

def gen_and_tok_prompt(text_input):
    full_input = gen_prompt(text_input)
    tok_full_prompt = tokenizer(full_input, padding = True , truncation =True)
    return tok_full_prompt


data = Dataset.from_pandas(df_faq[['question', 'answer']])

Fine-Tuning Process

Here’s a step-by-step guide to fine-tune Falcon 7B using Python and the transformers library:

Step 1: Install Libraries

Before we start, ensure you have the necessary libraries installed:

# If you are using PyTorch backendpy
pip install torch==2.0.1
pip install transformers @ git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9
#lightning @ git+https://github.com/Lightning-AI/lightning@master
pip install tokenizers==0.13.3
pip install peft @ git+https://github.com/huggingface/peft.git@9f7492577ff91c51077308f98dade45bf32c268a
pip install jsonargparse[signatures]  # CLI
pip install bitsandbytes==0.39.1 # quantize
pip install accelerate @ git+https://github.com/huggingface/accelerate@e0f5e030098aada5e112708eee3537475dea3a83
pip install datasets==2.13.1  # quantize/gptq.py
pip install zstandard==0.19.0  # prepare_redpajama.py
pip install scipy
pip install loralib==0.1.1
pip install einops==0.6.1

Step 2: Load Pre-trained Falcon 7B Model

Use the AutoModelForSequenceClassification class from transformers to load the pre-trained Falcon 7B model:

from transformers import AutoTokenizer, AutoModelForCausalLM
# model_name = "tiiuae/falcon-7b-instruct" 
model = AutoModelForCausalLM.from_pretrained(
    "tiiuae/falcon-7b-instruct",
#     load_in_8bit=True,  #if you want to load the 8-bit model
#     device_map='auto', 
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    "tiiuae/falcon-7b-instruct",
)

Step 3: Tokenize the Data

Use the loaded tokenizer to tokenize and encode your text data:

tokenizer.pad_token = tokenizer.eos_token
data = data.map(gen_and_tok_prompt)

Make sure to tokenize both your training and evaluation data.

Step 4: Prepare Model for Fine-Tuning

Some pre-processing needs to be done before training such an int8 model using peft, therefore let’s import an utiliy function prepare_model_for_kbit_training that will:

Casts all the non int8 modules to full precision (fp32) for stability

Add a forward_hook to the input embedding layer to enable gradient computation of the input hidden states

Enable gradient checkpointing for more memory-efficient training

from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

The above code will prepare the model and print the trainable parameters.As we are using LoRA the trainable parameters will be very less as compared the actual model parameters.

trainable params: 4718592 || all params: 6926439296 || trainable%: 0.06812435363037071

Step 5: Fine-Tune the Model

Fine-tune Falcon 7B using the prepared dataset:

training_args = transformers.TrainingArguments(
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=4,
    logging_steps=25,
    output_dir="output_dir", # give the location where you want to store checkpoints 
    save_strategy='epoch',
    optim="paged_adamw_8bit",
    lr_scheduler_type = 'cosine',
    warmup_ratio = 0.05,
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

Step 6: Save the Fine-Tuned Model

Save the fine-tuned model for future use:

model.save_pretrained('location where you  want the model to be stored')

Step 7: Inference

After fine-tuning, lets do the inference from the saved model:

config = PeftConfig.from_pretrained("location where new model is stored")
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
#     load_in_8bit=True,
#     device_map='auto',
    trust_remote_code=True,

)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path)

model_inf = PeftModel.from_pretrained(model,"location where new model is stored" )

# create your own prompt  
prompt = f"""
    <human>: How can i use BDB Data Science LAB?
    <assistant>: 
    """.strip()

# encode the prompt 
encoding = tokenizer(prompt, return_tensors= "pt").to(model.device)

# set teh generation configuration params 
gen_config = model_inf.generation_config
gen_config.max_new_tokens = 200
gen_config.temperature = 0.2
gen_config.top_p = 0.7
gen_config.num_return_sequences = 1
gen_config.pad_token_id = tokenizer.eos_token_id
gen_config.eos_token_id = tokenizer.eos_token_id

# do the inference 
with torch.inference_mode():
    outputs = model.generate(input_ids = encoding.input_ids, attention_mask = encoding.attention_mask,generation_config = gen_config )
print(tokenizer.decode(outputs[0], skip_special_tokens = True ))

Conclusion

By Instruct fine-tuning Falcon 7B, you can harness the power of advanced language models and tailor them to address the unique requirements of your organisation and build your custom LLM. Happy fine-tuning!

Search This Blog

JT Blog

Instruct Fine-Tuning Falcon 7B Using LoRA