DEV Community

Xiaobin Zhang
Xiaobin Zhang

Posted on

Achieving 1st Place in the VN1 Forecasting Competition with Fine-Tuned Moirai Model

Introduction

In this post, we present a reproducible experiment where Salesforce’s Moirai-Base model, fine-tuned with minimal adjustments, achieved 1st place in the VN1 Forecasting — Accuracy Challenge. The competition required predicting product sales for the next 13 weeks using 183 weeks of historical data. Remarkably, our solution outperformed all competitors ,demonstrating the power of leveraging pre-trained foundation models for time series tasks. Our code and configurations are publicly available. You can find it in the vn1-competition. Below, we detail our approach, implementation, and areas for further improvement.

Competition Results

The table below displays the official competition results, where Moirai-base outperformed all competitors to claim the top position.

Model Score
Moirai-base 0.4629
1st 0.4637
2nd 0.4657
3rd 0.4758
4th 0.4774
5th 0.4808

Tutorial Overview

In this tutorial, we show you how you can finetune MOIRAI-Base on VN1 Competition’s weekly product sales dataset, and then use it to predict the evaluation dataset.

How to Reproduce

Here is the infrastructure:

  • Hardware: 4× NVIDIA A800 80GB GPUs
  • Model: Moirai 1.1 R Base
  • OS: Ubuntu 22.04
  • CUDA 12.2,PyTorch Lightning, HuggingFace Datasets

step1: Follow the instructions from the uni2ts library to create a virtual environment and install dependencies

Uni2TS is a PyTorch based library for research and applications related to Time Series Forecasting. It provides a unified framework for large-scale pre-training, fine-tuning, inference, and evaluation of Universal Time Series Transformers.The library also includes the implementation of Moirai, apre-trained time series transformer model that achieves state-of-the-art performance across various forecasting tasks and domains.

step2: Download and preprocess data

The Makefile provides the raw dataset required for use:

make download_data
Enter fullscreen mode Exit fullscreen mode

After downloading the data, replace the directory path of it and run prepare_data.py to obtain the preprocessed dataset.
Then, add the directory path of the processed dataset to the .env file:

echo "CUSTOM_DATA_PATH=PATH_TO_SAVE" >> .env
Enter fullscreen mode Exit fullscreen mode

step3: Fine-tune the model and evaluate

Replace the variable “pretrained_model_name_or_path” in the configuration file with your own path, then run the following command to fine-tune the Moirai-base model:

python -m cli.train -cp ../project/vn1-competition/fine-tune run_name=run1
Enter fullscreen mode Exit fullscreen mode

Then, replace the weight file path in the main.py file under the src directory and run main.py to evaluate prediction results.

Deep Dive

Data Processing:

Before performing data processing, low-variance sequences were filtered out, retaining only data suitable for training. The core data processing steps are implemented in the train_transform_map() function within finetune.py.

PackFields( 
 output_field="target",
 fields=("target",),
)
+ PackFields(
 output_field="past_feat_dynamic_real",
 fields=tuple(),
 optional_fields=("past_feat_dynamic_real",),
)
+ AddObservedMask(
 fields=("target",),
 optional_fields=("past_feat_dynamic_real",),
 observed_mask_field="observed_mask",
 collection_type=dict,
)
+ ImputeTimeSeries(
 fields=("target",),
 optional_fields=("past_feat_dynamic_real",),
 imputation_method=DummyValueImputation(value=0.0),
)
+ Patchify(
 max_patch_size=max(self.module.patch_sizes),
 fields=("target", "observed_mask"),
 optional_fields=("past_feat_dynamic_real",),
)
+ AddVariateIndex(
 fields=("target",),
 optional_fields=("past_feat_dynamic_real",),
 variate_id_field="variate_id",
 expected_ndim=3,
 max_dim=self.hparams.max_dim,
 randomize=True,
 collection_type=dict,
)
+ AddTimeIndex(
 fields=("target",),
 optional_fields=("past_feat_dynamic_real",),
 time_id_field="time_id",
 expected_ndim=3,
 collection_type=dict,
)
Enter fullscreen mode Exit fullscreen mode

In the data processing code above, the source data will undergo a series of transformations to produce data suitable for model input:

  • Generate the observation mask to mark which data points are valid .
  • Impute missing values in the time series using 0.
  • Split the time series into patches, pad all patches to the maximum patch size, and reshape the dimensions to (var, time, max_patch_size).
  • Encode the variable dimensions and time dimensions in the time series.

Training Task:

Since the forecasting target is a time series slice of length 13, it is critical to appropriately select the patch size when spliting the time series into patches. The current approach still employs a random selection of patch sizes.

The training task does not define fixed context and prediction lengths. Experimental results demonstrate that varying context and prediction lengths can enhance the reliability and accuracy of forecasts. For a given time series, instead of sampling fixed context and prediction lengths, the method involves:

  • Cropping a sampled window whose total length is uniformly sampled from a predefined range.
  • Splitting the window into lookback and forecast segments, where the prediction length is uniformly sampled from a proportional range of [0.1, 0.4] — — better aligned with the forecasting objective compared to pretraining tasks.

Evaluation:

The competition uses a specialized evaluation metric: the final score is derived by calculating the error and absolute error, summing them, and then normalizing the total.

def vn1_competition_evaluation(df):
 abs_err = np.nansum(abs(df - objective))
 err = np.nansum((df - objective))
 score = (abs_err + abs(err)) / objective.sum().sum()
 return score
Enter fullscreen mode Exit fullscreen mode

Future Improvements

  • Loss Function Redesign: Replace probabilistic outputs with point predictions tailored to the competition metric.
  • Patch Size Optimization: Align patch sizes with the 13-week prediction horizon.
  • Multiscale Modeling: Process input data at multiple resolutions for richer feature extraction.
  • Sequential Window Sampling: Replace random cropping with rolling windows for full sequence coverage.

Conclusion

By fine-tuning Moirai-Base with minimal code changes, we achieved state-of-the-art results in the VN1 challenge. This success underscores the potential of foundation models in time series forecasting, even with limited computational resources. Our code and configurations are publicly available, enabling others to build on this work.

References

Top comments (2)

Collapse
 
tsenjoyer profile image
tsenjoyer

Hey, congrats on the results! I am interested in fine tuning moirai and would find really helpful to understand how you did it. You mention the code is public:

Our code and configurations are publicly available. You can find it in the uni2ts.

But I don't understand where can I see the code.

Thank you for your post!

Collapse
 
orange111 profile image
Xiaobin Zhang

Thank you for your interest in moirai! I've added the link in the article for easier access.