If you’re exploring fine-tuning a Large Language Model to better align with your specific data, you might be wondering about the costs this involves.
This post will outline the approximate costs of fine-tuning a model like Llama 3.1 with Low-Rank Adaptation (LoRA). This will give you a clear idea of the expected costs, helping you feel more confident in planning your budget for adapting an LLM model to your data.
In summary, two steps are required for fine tuning: creating a synthetic dataset and training LoRA. Today, these can be achieved with a version of Llama 3.1 for as little as 30 USD!
1. Cost of creating a synthetic database
Fine-tuning a large language model often demands specific, high-quality training data, which can be hard to come by. Real-world data may be limited, not perfectly suited to your needs, or challenging to collect.
A synthetic database can be a practical solution. By starting with a few initial samples and using a language model like GPT-3.5 to generate variations, you can efficiently create a database tailored to your needs. This approach helps ensure you obtain the right data without incurring excessive costs.
To determine the cost-effectiveness of your synthetic database, start by defining the size of your instructions and answers.
For example, a database sample may look like this:
1{
2 "instruction": "Answer the following question based on...",
3 "input": "In Paris, it is common that tourists...",
4 "output": "Tourists should buy the travel pass...",
5}
6
Estimate the average word count for each part: input (including both the instruction and input fields) and output. Suppose each averages 350 words. Since one LLM token is about 0.75 words, 350 words equate to roughly 500 tokens for both input and output. Therefore, each data record will average about 1,000 tokens.
For a database of 10,000 records, you will need around 5 million tokens for input and 5 million tokens for output.
With this information, we can craft the following comparative table:
Model | Cost per 1K input tokens* | Cost per 1K output tokens* | Cost per database record |
---|---|---|---|
GPT 3.5-turbo-0125 | $0.0005 | $0.0015 | $0.001 |
Claude 3.5 Sonnet | $0.003 | $0.015 | $0.009 |
Claude 2/2.1 | $0.008 | $0.024 | $0.016 |
GPT 4 | $0.03 | $0.06 | $0.045 |
From this table, generating a training database could cost as little as 10 USD if GPT-3.5 meets your needs. Otherwise, if you require a higher-quality model, the cost may increase up to 450 USD for generating the training database of 10,000 entries.
2. Cloud GPU Cost for Fine-Tuning
Once you have the database, it's time to train the model. To do this, you need to calculate the required resources.
Estimating the cost of renting a cloud-based GPU for fine-tuning can be challenging, as it depends on memory requirements, which vary based on factors such as LoRA configuration, precision used, and training parameters. Therefore, to simplify this analysis, we’ll use the information from HuggingFace for Llama 3.1. It specifies that training a Llama 3.1 model with precision FP16 requires minimum:
Model version | Memory |
---|---|
Llama 3.1 8B | 16 GB |
Llama 3.1 13B | 26 GB |
Llama 3.1 70B | 160 GB |
These numbers don’t include the extra memory required for PyTorch’s reserved space or the KV cache, so let’s add 4 GB to account for these (cushion) resulting in 20GB, 30 GB, and 164 GB accordingly.
The above gives us the information required to filter the different GPU options we could use. For this, we can use this useful reference. Since GCP is the most commonly used cloud provider, we can come up with the following table of the GPUs they offer:
GPU | Memory | Cost |
---|---|---|
L4 | 24 GB | $0.71/hr |
A100 | 40 GB | $3.67/hr |
A100 | 80 GB | $7.35/hr |
How much time do we need for the GPU to be up? This depends on the training duration, which varies based on GPU capabilities, batch size, training parameters, and model version. Based on these factors, we can outline the following:
Llama 3.1 8B | Llama 3.1 13B | |
---|---|---|
Precision | FP16 | FP16 |
Batch Size | 1 | 1 |
GPU | L4 | A100 (40GB) |
Training time per epoch | 30-60 min | 20-40 min |
Training time for 4 epochs | 2-4 hours | 1.5 - 2.5 hours |
Cost per training job | $3 | $10 |
From the table, we are not accounting for setup time (such as package installation, database download, or any issues encountered while running the training script), however, we still can get a rough idea of the minimum cost for performing a single training job.
3. Total cost of fine-tuning Llama 3.1
In conclusion, the costs associated with fine-tuning a model like Llama 3.1 with LoRA can vary widely depending on your choices regarding the synthetic database and the cloud GPU resources used for training plus the time those resources are used. As we saw, it could range a minimum from $13 to $460 for a 10K-entries database depending on your settings.
But, to provide a concrete example, let’s assume you use GPT-3.5 to generate a synthetic database of 10,000 records, which would cost around $10. You then decide to fine-tune a Llama 3.1 8B model with LoRA, using a single L4 GPU. With this setup, the cost for one training job would be approximately $3. In total, your expense for generating the database and training the model would be around $13, plus $15 (cushion) to account for setup time or additional training jobs. Therefore, you should budget around 30 USD.
References
https://www.mlexpert.io/blog/alpaca-fine-tuning
https://huggingface.co/blog/llama31
https://llama.meta.com/docs/how-to-guides/fine-tuning/
https://rocm.blogs.amd.com/artificial-intelligence/llama2-lora/README.html
https://getdeploying.com/reference/cloud-gpu
https://platform.openai.com/tokenizer
https://openai.com/api/pricing/
https://aws.amazon.com/bedrock/pricing/
*Costs fetched on August 23, 2024
**Costs using GCP as the provider