How much does it cost to fine-tune LLaMA 3.1 with LoRA?

If you’re exploring fine-tuning a Large Language Model to better align with your specific data, you might be wondering about the costs this involves.

This post will outline the approximate costs of fine-tuning a model like Llama 3.1 with Low-Rank Adaptation (LoRA). This will give you a clear idea of the expected costs, helping you feel more confident in planning your budget for adapting an LLM model to your data.

In summary, two steps are required for fine tuning: creating a synthetic dataset and training LoRA. Today, these can be achieved with a version of Llama 3.1 for as little as 30 USD!

1. Cost of creating a synthetic database

Fine-tuning a large language model often demands specific, high-quality training data, which can be hard to come by. Real-world data may be limited, not perfectly suited to your needs, or challenging to collect.

A synthetic database can be a practical solution. By starting with a few initial samples and using a language model like GPT-3.5 to generate variations, you can efficiently create a database tailored to your needs. This approach helps ensure you obtain the right data without incurring excessive costs.

To determine the cost-effectiveness of your synthetic database, start by defining the size of your instructions and answers.

For example, a database sample may look like this:

1{
2  "instruction": "Answer the following question based on...",
3  "input": "In Paris, it is common that tourists...",
4  "output": "Tourists should buy the travel pass...",
5}
6

Estimate the average word count for each part: input (including both the instruction and input fields) and output. Suppose each averages 350 words. Since one LLM token is about 0.75 words, 350 words equate to roughly 500 tokens for both input and output. Therefore, each data record will average about 1,000 tokens.

For a database of 10,000 records, you will need around 5 million tokens for input and 5 million tokens for output.

With this information, we can craft the following comparative table:

Model	Cost per 1K input tokens*	Cost per 1K output tokens*	Cost per database record
GPT 3.5-turbo-0125	$0.0005	$0.0015	$0.001
Claude 3.5 Sonnet	$0.003	$0.015	$0.009
Claude 2/2.1	$0.008	$0.024	$0.016
GPT 4	$0.03	$0.06	$0.045

From this table, generating a training database could cost as little as 10 USD if GPT-3.5 meets your needs. Otherwise, if you require a higher-quality model, the cost may increase up to 450 USD for generating the training database of 10,000 entries.

2. Cloud GPU Cost for Fine-Tuning

Once you have the database, it's time to train the model. To do this, you need to calculate the required resources.

Estimating the cost of renting a cloud-based GPU for fine-tuning can be challenging, as it depends on memory requirements, which vary based on factors such as LoRA configuration, precision used, and training parameters. Therefore, to simplify this analysis, we’ll use the information from HuggingFace for Llama 3.1. It specifies that training a Llama 3.1 model with precision FP16 requires minimum:

Model version	Memory
Llama 3.1 8B	16 GB
Llama 3.1 13B	26 GB
Llama 3.1 70B	160 GB

These numbers don’t include the extra memory required for PyTorch’s reserved space or the KV cache, so let’s add 4 GB to account for these (cushion) resulting in 20GB, 30 GB, and 164 GB accordingly.

The above gives us the information required to filter the different GPU options we could use. For this, we can use this useful reference. Since GCP is the most commonly used cloud provider, we can come up with the following table of the GPUs they offer:

GPU	Memory	Cost
L4	24 GB	$0.71/hr
A100	40 GB	$3.67/hr
A100	80 GB	$7.35/hr

How much time do we need for the GPU to be up? This depends on the training duration, which varies based on GPU capabilities, batch size, training parameters, and model version. Based on these factors, we can outline the following:

	Llama 3.1 8B	Llama 3.1 13B
Precision	FP16	FP16
Batch Size	1	1
GPU	L4	A100 (40GB)
Training time per epoch	30-60 min	20-40 min
Training time for 4 epochs	2-4 hours	1.5 - 2.5 hours
Cost per training job	$3	$10

From the table, we are not accounting for setup time (such as package installation, database download, or any issues encountered while running the training script), however, we still can get a rough idea of the minimum cost for performing a single training job.

3. Total cost of fine-tuning Llama 3.1

In conclusion, the costs associated with fine-tuning a model like Llama 3.1 with LoRA can vary widely depending on your choices regarding the synthetic database and the cloud GPU resources used for training plus the time those resources are used. As we saw, it could range a minimum from $13 to $460 for a 10K-entries database depending on your settings.

But, to provide a concrete example, let’s assume you use GPT-3.5 to generate a synthetic database of 10,000 records, which would cost around $10. You then decide to fine-tune a Llama 3.1 8B model with LoRA, using a single L4 GPU. With this setup, the cost for one training job would be approximately $3. In total, your expense for generating the database and training the model would be around $13, plus $15 (cushion) to account for setup time or additional training jobs. Therefore, you should budget around 30 USD.