Experiment Tracking with MLFlow in Canonical’s Data Science Stack

Stefano Fioravanzo

on 3 March 2025

Tags: Data science , machine learning , Ubuntu Desktop

Welcome back, data scientists! In my previous post, we explored how easy it is to set up a machine learning environment with Canonical’s Data Science Stack (DSS) and run your first model using Hugging Face’s Smol Course. Today, let’s take it a step further with experiment tracking. Experimentation is at the heart of data science, and having the right tool to support it can make all the difference. That’s why we bundle MLFlow in DSS – to help you track, compare, and reproduce your experiments effortlessly.

Why Experiment Tracking Matters

When you’re exploring new ideas and fine-tuning models, it can be challenging to keep track of all your experiments manually. Imagine having to remember which hyperparameters led to which results or trying to reproduce an experiment you did weeks ago. MLFlow solves this problem by automatically logging your experiment details – from parameters and metrics to model artifacts – so you can always pick up where you left off.

To access MLFlow from your DSS environment, type the following in your terminal:

dss status

Copy-paste the MLFlow URL in your browser, and you’ll be directed to the MLFlow UI. Now it’s probably empty.

Let’s work on top of our previous fine-tuning example, and see how we can start tracking our training runs with just a few lines of code.

MLFlow in Action

Note: before proceeding, make sure to install the mlflow dependencies (we are restricting the packaging library to avoid conflicts with the HuggingFace Smol Course dependencies):

pip install mlflow
pip install "packaging<23.1"

This snippet builds on the code from our previous article, adding experiment tracking capabilities that log key parameters, capture metrics, and store the model artifact.

import mlflow

mlflow.set_experiment("FineTuning")
with mlflow.start_run():

    # Log key configuration parameters
    mlflow.log_param("max_steps", 3)
    mlflow.log_param("batch_size", 4)
    mlflow.log_param("learning_rate", 5e-5)
    mlflow.log_param("logging_steps", 10)
    mlflow.log_param("save_steps", 3)
    mlflow.log_param("eval_steps", 2)
    mlflow.log_param("use_mps_device", True if device == "mps" else False)
    mlflow.log_param("hub_model_id", finetune_name)

    # Configure the SFTTrainer
    sft_config = SFTConfig(
        output_dir="./sft_output",
        max_steps=3,  # Adjust based on dataset size and desired training duration
        per_device_train_batch_size=4,  # Set according to your GPU memory capacity
        learning_rate=5e-5,  # Common starting point for fine-tuning
        logging_steps=10,  # Frequency of logging training metrics
        save_steps=3,  # Frequency of saving model checkpoints
        evaluation_strategy="steps",  # Evaluate the model at regular intervals
        eval_steps=2,  # Frequency of evaluation
        use_mps_device=(True if device == "mps" else False),  # Use MPS for mixed precision training
        hub_model_id=finetune_name,  # Set a unique name for your model)

    # Initialize the SFTTrainer
    trainer = SFTTrainer(
        model=model,
        args=sft_config,
        train_dataset=ds["train"],
        tokenizer=tokenizer,
        eval_dataset=ds["test"])

    # Train the model
    trainer.train()

    # Save the model
    trainer.save_model(f"./{finetune_name}")

    # Log the saved model as an MLFlow artifact
    mlflow.pytorch.log_model(model, "fine_tuned_model")

What this enhanced code does

Parameter logging:
It logs key parameters (like the model identifier and the number of new tokens) so you can track how different settings affect your results.
Artifact storage:
By saving the fine-tuned model as an artifact, you ensure that you always have a record of your work to revisit or share with your team.
Integrated experiment management:
All these details are automatically available in the MLFlow dashboard within DSS, making it a breeze to compare experiments, reproduce results, and refine your approach.

If you run this snippet, you will log your experiment and your model, alongside its parameters, to the MLFLow dashboard.

If you want to automatically explore various parameters, or even be smart about it and automatically iterate on some specific hyperparameters, you could do the following:

# Define a list of learning rates to experiment with
learning_rates = [5e-5, 3e-5, 1e-5]
for lr in learning_rates: 
    with mlflow.start_run():
        # Log the current learning rate
        mlflow.log_param("learning_rate", lr)
	# add below the same code as above, but parameterize the learning rate with this new variable

In this snippet, we iterate over a list of learning rates to explore how each setting impacts the model. For each learning rate, we start a new MLFlow run to log the experiment parameters, train the model, and save the fine-tuned model. This enables you to later compare the results across different runs.

After a few training runs with custom parameters, you’ll see something like this in the MLFlow dashboard:

Click on one of the runs, you’ll see MLFlow saved all the parameters and a lot of internal details about the model.

In order to evaluate the trained model, head over to the MLFlow UI, click on the run you want to evaluate, and copy the runID from the top left (it will look something like this be1193d43a1a40c1bc84866b9462dddf). Go back to your notebook and change the Smol Course evaluation code to using MLFlow to retrieve and load the model:

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
run_id = “<YOUR MODEL RUN ID HERE>”
model_uri = f"runs:/{run_id}/fine_tuned_model"
loaded_model = mlflow.pytorch.load_model(model_uri)
outputs = model.generate(**inputs, max_new_tokens=300)
print("After training:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Once loaded, the model generates a response for your prompt, allowing you to directly see the improvements from your fine-tuning. This process not only confirms that your experiments are correctly logged but also makes it easy to compare different runs and choose the best-performing model – all without manually searching through local files.

Conclusion

Integrating MLFlow with Canonical’s Data Science Stack takes your experimentation to the next level. You no longer need to worry about manually keeping track of each run, the entire process is streamlined and automated. This means you can focus more on the creative aspects of model building and less on managing experimental details.

MLFlow is capable of much more than simply tracking your metrics and logging models. Some of the major capabilities MLFlow offers include:

Advanced Visualization: Get a comprehensive view of your experiments with interactive dashboards.
Model Registry: Manage different versions of your models for smoother deployment workflows.
Deployment Pipelines: Seamlessly transition from experimentation to production with built-in deployment support.

Ready to elevate your data science game? Give MLFlow in DSS a try and discover how effortless and powerful experiment tracking can be. Happy experimenting!

Learn more about Canonical’s Data Science Stack.

Watch our on demand webinar to explore how to get your ML environment in 3 commands on Ubuntu.

Ubuntu desktop

Learn how the Ubuntu desktop operating system powers millions of PCs and laptops around the world.

Experiment Tracking with MLFlow in Canonical’s Data Science Stack

Stefano Fioravanzo

Why Experiment Tracking Matters

MLFlow in Action

What this enhanced code does

Conclusion

Ubuntu desktop

Newsletter signup

Related posts

How to build your first model using DSS

Meet Canonical at Open Source Summit Europe 2024

Join the Canonical Data and AI team at Data Innovation Summit 2024