AI for the Curious/Labs/Fine-Tuning: Training The Actor

Fine-Tuning: Training The Actor

Fine-tuning explained with a fun, interactive demo of changing a small model's behaviour to write like an unbearable person on LinkedIn.

LabsArticle 1 of 11 · ~14 min read

Current

Fine-Tuning: Training The Actor

I fine-tuned a small open model to write like the most insufferable person on LinkedIn - not because LinkedIn needs more humblebrags or unsolicited life lessons, but to make fine-tuning understandable.

Fine-tuning is often described abstractly - talk of changing weights, adapting parameters, training adapters, shifting model behaviour. That is accurate, but hard to envision. Fine-tuning is complex and some of the concepts are hard to grasp, so I've broken it down to a few fundamental concepts and created an intentionally silly demo to show its impact. We can give the same input to a base model and a fine-tuned model, then compare how their tone, structure, clichés, and pacing diverge.

This is Part 1 of a three-part series. Here I’ll explain what fine-tuning is, what it is not, and where it fits alongside prompting, RAG, MCP, and Skills. Part 2 will cover how I built the fine-tune, and Part 3 will look at how I tested it.

Note

In this article, I use "parameters" broadly to mean the learned numerical values inside the model, including weights. Strictly speaking, not every learned value plays the same role, but this wording aligns to my other pieces on this site. Within the diagrams where we talk specifically about matrices, I use the word "weights", which is technically correct for the type of parameters we are fine-tuning.

What Fine-Tuning Is (and Isn't)

LLMs are useful by themselves. There are multiple strategies to significantly increase their utility, and I've summarised the differences below. In previous articles we've discussed MCP and RAG. 'Skills' and tool usage offer another avenue, but all of these rely on adding capabilities onto the base model. Fine-tuning works differently: rather than adding external capability at runtime, we adjust the model’s learned behaviour patterns.

Note

This article focusses on what fine-tuning is, and when to use it. You can skip to a demo of my fine-tuned model here.

Fine-tuning is a different (and far more technical) approach to modifying how a model behaves - ingraining this behaviour in the model, rather than just making changes at runtime. It changes the model's learned response patterns. Fine-tuning is not:

  • A replacement for 'knowledge' or 'tooling' capabilities
  • A replacement for good prompting: Fine-tuning solves a slightly different problem, and well-crafted prompts can take you a long way in having a model behave or answer prompts in the way you want.
  • A way of completely 'reprogramming' a model - with fine-tuning you are nudging the outputs and behaviour in a certain direction.

Remember

If a model is an actor, fine-tuning is the rigorous training and coaching plan that prepares them to consistently play a specific role. They are not changing who they are - they don't re-learn (or forget) how to act, or command the stage, they are shaped by everything they've been shown about how that character thinks, moves, and speaks.

How Fine-Tuning Fits In

The important question is not just what fine-tuning does, but when it is the right tool. A solid strategy often includes multiple techniques, targeting different aspects of a model's overall capability. The table below compares the main techniques.

TechniqueImpacts model...Best used for...Complexity to designComplexity to implement
Fine-TuningBehaviour, universallyChanging repeated model behaviour: tone, structure, classification patterns, extraction style, domain phrasing, or response format. Not usually the best way to add changing factual knowledge.High. Dataset quality, examples, evaluation, and regression testing matter more than raw volume.Medium-High. Hosted fine-tuning can be operationally simple, but evaluation and governance are hard. Self-hosted fine-tuning adds major infrastructure complexity.
RAGKnowledge groundingGrounding answers in a corpus of documents. Example: an airline assistant answering rebooking questions using policy and operations documents.Medium-High. Requires source material, chunking strategy, metadata design, embedding/retrieval choices, and evaluation criteria.Medium-High. Basic RAG is easy to demo; production RAG needs retrieval testing, ranking, safety controls, citations, freshness, and observability.
MCPTool & system accessGiving a model controlled access to tools, APIs, databases, files, or application actions. Example: retrieving booking details, loyalty points, flight schedules, or triggering a workflow.Medium-High. The protocol is standard, but tool design, permissions, failure modes, and safe action boundaries are hard.Medium for simple read-only tools; High for production tools that change state, require auth, or affect customers.
SkillsTask procedure & executionGiving the model a reusable procedure for a repeatable task: instructions, examples, constraints, templates, or supporting resources.Medium. Writing reliable Skills is iterative and depends on clear task boundaries.Low-Medium. Simple Skills are easy to add; complex Skills need testing, versioning, and maintenance.
Prompt EngineeringInstruction shaping and interpretationShaping model behaviour through instructions, constraints, examples, formatting rules, and context.Low-Medium for simple prompts; High when handling variable inputs, edge cases, safety requirements, tool use, or strict output formats.Low. Usually trivial to change, but changes can have wide behavioural side effects.

Remember

These aren't mutually exclusive. The demo at the end of this article combines fine-tuning with prompt engineering to dial up or down how LinkedIn-brained the output gets.

An Enterprise Example of Fine-Tuning

In this setup, fine-tuning would not teach the model the airline’s policies. RAG/tooling would provide current policy and booking data. Fine-tuning would teach the model the service pattern: how to acknowledge disruption, structure options, avoid prohibited promises, and use the airline-approved tone.

  1. Acknowledge the disruption without admitting legal liability.
  2. Identify the passenger’s itinerary and disruption type.
  3. Check eligibility from retrieved policy.
  4. Offer options in priority order:
    • rebook on next available company-operated flight;
    • offer partner-airline alternatives if allowed;
    • offer refund if required;
    • escalate if special assistance, minors, medical needs, or tight connection.
  5. Never invent compensation.
  6. Never promise hotel/meals unless retrieved policy confirms eligibility.
  7. Use a calm, concise airline-approved tone.

Types of Fine-Tuning

For this article, there are two useful types of fine-tuning to understand:

  1. Full fine-tuning: is computationally heavy and more invasive. We update all or most of the model's parameters. It's much more of a major adaptation of the model's behaviour, and can yield superior results to LoRA / adapter fine-tuning. One risk with full fine-tuning is introducing a phenomenon called catastrophic forgetting where the model starts losing its base abilities - like language comprehension or reasoning because the fine-tuning has pushed the weights to far and started to tune these abilities away.

  2. LoRA / adapter fine-tuning: We freeze (don't modify) the model's main parameters. Instead we train small matrices that impact and modify specific parts of the model. LoRA stands for Low Rank Adaptation; at a conceptual level, LoRA learns a low-rank update to selected weight matrices. At inference time, the effective weight is the original frozen weight plus the learned adapter update. The next section describes the maths and mechanics in a little more detail.

    What makes LoRA useful in practice (and commercially important) is that adapters can be swapped in and out: you can have several LoRA adapters for the same base model, apply them per use case, or use an evaluation harness to compare adapters on the same task.

DimensionFull fine-tuneLoRA / adapter fine-tuning
Base model weightsUpdatedFrozen
Trainable parametersMany / allSmall fraction
GPU VRAM neededHighMuch lower
Training costHighLower
Risk of damaging base abilityHigherLower
Easy to create many variantsNoYes
Easy to swap behavioursNoYes
Maximum adaptation potentialHigherLower/ moderate
Best forDeep adaptationEfficient task/style adaptation
Deployment artifactWhole tuned modelBase model + adapter
Common in practiceYes, but expensiveVery common for efficient tuning

Mechanics of Fine-Tuning

The actor analogy above describes how the fine-tuned model behaves. This section goes a level deeper and describes what actually happens when we fine-tune.

Let's imagine we have a model (Lets say 14B - ~14 billion parameters.). It will have matrices throughout the model performing different tasks like embedding, attention, feed-forward layers, and output - likely hundreds of matrices.

Full Fine-Tuning

In full fine-tuning, the original model weights are trainable
In full fine-tuning, the original model weights are trainable

Imagine a single matrix in the model above consists of 4096x4096 squares (about 16.8M squares), and each one is coloured, representing a numerical value. Full fine-tuning would be analogous to re-painting each of the squares on many of the matrices with a different colour. This can be a drastic change, or a tiny shift.

A full fine-tune requires a large amount of VRAM because the parameters need to be held in memory at a trainable precision - usually FP16, BF16, or FP32. This would require somewhere between ~28GB and ~56GB of VRAM just to hold the parameters- more than many home labs will have. Then you need space for:

  • Gradients
  • Optimiser states
  • Activations
  • Temporary buffers

Then you need to consider what you are training the model on. If you are fine-tuning the model on short contexts like LinkedIn posts or support tickets, the context can be quite small. If you are fine-tuning on 50-page legal documents or code bases, you need a significant context window in addition. A 14B parameter model:

  • Might be usable on a 12GB GPU for inference at 4-bit quantisation.
  • Might require around 80-100GB of GPU VRAM during fine-tuning.

LoRA / Adapter Fine-Tuning

Again, imagine a single matrix in the model above consists of 4096x4096 squares (about 16.8M squares), and each one is coloured, representing a numerical value. This time, using LoRA, we freeze the model’s original weights and train small additional matrices attached to selected parts of the model.

LoRA training works like this:

  • It repeatedly shows the frozen model examples with known target outputs, the model predicts an answer.

  • The training process calculates how far the model's answer is from the target, and only the LoRA adapter parameters (the small additional matrices) are adjusted to reduce that error.

  • Over many iterations, the adapter learns a compact set of changes that nudge the frozen model toward the desired behaviour.

  • Using a matrix transformation (multiplying two smaller matrices), LoRA represents a full-size adjustment matrix for the target matrix using a fraction of the memory and computation power required for a full retrain.

In LoRA fine-tuning, the original model weights stay frozen, and only the adapter weights are trained
In LoRA fine-tuning, the original model weights stay frozen, and only the adapter weights are trained

The result is like laying a translucent adjustment sheet over the original coloured matrix. The original matrix remains unchanged, but the overlay shifts the effective values the model uses. The important part is that LoRA does not learn the overlay square by square (i.e. all 16 million). It learns a compressed representation of the overlay using two smaller matrices. This also means if we want to apply a different LoRA we can just 'swap the sheet' and immediately observe a different shift in the model's effective values without having to perform another retrain.

At inference time, the model uses the frozen base weights plus the learned adapter update
At inference time, the model uses the frozen base weights plus the learned adapter update

Note

There are adjacent techniques such as quantisation (QLoRA, pruning, distillation, and adapter merging, but they solve different problems. I am not using them in this demo, so I will keep the focus on LoRA.

Demonstrating Fine-Tuning

I developed a slightly off-the-wall demo for this one; something that is relatable to everyone. The tool below takes a mundane work-related event (you can type anything in here) and turns it into a cringeworthy LinkedIn post. It outputs both a version written by a fine-tuned model (Gemma4-E2B) and one from a regular version of the same model.

The fine-tuned model was trained on about 300 LinkedIn posts of varying length, all exhibiting a range of LinkedIn tropes and styles, such as:

  • Humblebrags
  • Phrases like 'It's not X, it's Y'
  • Effusive thanks
  • Openers like 'Here is the uncomfortable truth', or 'No one is talking about...'
  • Closing questions or prompts for engagement
  • Emoji & hashtag use

Most of these posts were actually generated by Claude Sonnet based on a small sample of real posts from sources like LinkedIn Lunatics, mainly because gathering and anonymising hundreds of real posts of sufficient quality would have been a huge effort.

  • One is fine-tuned, the other is baseline Gemma4-E2B.
  • You can configure the prompt text in two ways:
    • What was the work event that happened? (in your own words)
    • Which LinkedIn tropes and patterns should be applied to the output
  • Both models receive exactly the same prompt.
  • Both output simultaneously.

The Demo

Now the fun bit! Type in your prompt (something mundane that is work or life related), and select any tropes or patterns you would like the model to follow. Both models will generate output so you can see the impact of our LoRA fine-tune.

Observations

This is a small model and probably due to the specific prompting, some of the outputs can be a little nonsensical. This is an experiment rather than a production-ready fine-tune. I will work to further to try and improve this. However, if we pay attention to the tone and intent of the posts rather than how it reads as a whole, there are some interesting patterns.

The Cringe Factor has the largest overall impact on the model outputs - you can see this changes the prompt quite significantly as you move the slide towards 10.

At a Cringe Factor of 1-2, the two posts sound different, but honestly either could read as a down-to-earth(ish) post you might see on LinkedIn. The key parts of the prompt are:

Transform mundane professional updates into LinkedIn posts. ... Refer to people by first name only, never as @Name. Write in a professional but enthusiastic tone.

By the time we get to 4, the models are diverging - and the base Gemma4-E2B is sounding increasingly unhinged - here are the parts of the prompt that changed:

Transform mundane professional updates into LinkedIn posts. ... Write in an over-the-top enthusiastic tone. Heavy buzzwords. Multiple emojis. Treat this as a significant milestone and extract an unsolicited life lesson.

As you turn the Cringe Factor higher, the Base Gemma4-E2B model becomes almost 'cartoon-villain-like' in its responses - very over-the-top, often repeating phrases and using a very un-LinkedIn writing style. The fine-tuned model is pretty much sticking to the script and produces some very amusing outputs - a parody of a LinkedIn post. Here is the same part of the prompt at a 10:

Transform mundane professional updates into LinkedIn posts. ... Treat this mundane event as a profound turning point that changed everything. Unsolicited wisdom. Humble brags that aren't humble. Include a dramatic pause. End with a rhetorical question or a call to action.

So you can see, the fine-tuning is taking effect, and increasingly so as we dial up the absurdity we want from our post.

Another interesting observation: Gemma4-E2B will also apply the tropes ('No one is talking about', 'Here's the uncomfortable truth', etc) in a more coherent, 'LinkedIn' way, suggesting it has learned the style and how to use those openers from its fine-tuning. The baseline Gemma4-E2B model does use them, but it feels far more clunky in how it applies them.

Conclusion

We now understand what fine-tuning is (and is not), and have seen a demo of what it looks like when it's applied to a model. The example above used a LoRA/Adapter fine-tune. If you have a baseline understanding of how ML models operate, the core concepts of fine-tuning are not difficult to understand - actually performing the fine-tune and testing the outputs is somewhat more complex.

In Part 2, I'll walk through the dataset, the training run, and the decisions that shaped the fine-tune. Part 3 covers evaluation: how do you actually know it worked?

Next →

Top Trumps: LLM Edition