Export Your ML Model in ONNX Format

export-your-ml-model-in-onnx-format

Source: MachineLearningMastery.com

In this article, you will learn how to export models from PyTorch, scikit-learn, and TensorFlow/Keras to ONNX and compare PyTorch vs. ONNX Runtime inference on CPU for accuracy and speed.

Topics we will cover include:

  • Fine-tuning a ResNet-18 on CIFAR-10 and exporting it to ONNX.
  • Verifying numerical parity and benchmarking CPU latency between PyTorch and ONNX Runtime.
  • Converting scikit-learn and TensorFlow/Keras models to ONNX for portable deployment.

Without further delay, let’s begin.

Export Your ML Model in ONNX Format

Export Your ML Model in ONNX Format
Image by Author

Introduction

When building machine learning models, training is only half the journey. Deploying those models reliably across different environments is where many projects slow down. This is where ONNX (Open Neural Network Exchange) becomes important. ONNX provides a common, framework-agnostic format that allows models trained in PyTorch, TensorFlow, or scikit-learn to be exported once and run anywhere.

In this tutorial, we will go step by step through the complete ONNX workflow. We will start by fine-tuning a model and saving the fine-tuned version in native PyTorch format as well as in ONNX format.

Once both versions are ready, we will compare their inference performance on CPU, focusing on two key aspects: accuracy and inference speed. This comparison will help you understand the practical tradeoffs between framework-native models and ONNX-based deployment.

Finally, we will also cover how to convert models trained with scikit-learn and TensorFlow into ONNX format, so you can apply the same deployment approach across different machine learning frameworks.

Exporting Fine-Tuned PyTorch Model To ONNX

In this section, we will fine-tune a ResNet-18 model on the CIFAR-10 dataset for image classification. After training, we will save the fine-tuned model in the normal PyTorch format and also export it into ONNX format. Then we will run both versions on CPU and compare their inference results using accuracy and macro F1 score, along with inference speed.

Setting Up

First, we install the libraries we need for training, exporting, and benchmarking. We use PyTorch and TorchVision to fine-tune the model, ONNX to store the exported model, and ONNX Runtime to run ONNX inference on CPU.

We also install scikit-learn because it provides simple evaluation metrics like accuracy and F1 score.

!pip install q torch torchvision onnx onnxruntime scikitlearn

!pip install q skl2onnx tensorflow tf2onnx protobuf

Finally, we import all the required modules so we can train the model, export it, and measure performance.

import time

import numpy as np

import torch

import torch.nn as nn

from torch.utils.data import DataLoader

from torchvision import datasets, transforms, models

import onnx

import onnxruntime as ort

from sklearn.metrics import accuracy_score, f1_score

Loading CIFAR-10 And Building ResNet-18

Now we prepare the dataset and model.

The get_cifar10_loaders function loads CIFAR-10 and returns two DataLoaders: one for training and one for testing. We resize CIFAR-10 images from 32×32 to 224×224 because ResNet-18 is designed for ImageNet-sized inputs.

We also apply ImageNet normalization values so the pretrained ResNet weights work correctly. The training loader includes random horizontal flipping to add basic data augmentation.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

def get_cifar10_loaders(batch_size: int = 64):

    “”

    Returns train and test DataLoaders for CIFAR-10.

    We resize to 224×224 and use ImageNet normalization so ResNet18 works nicely.

    ““”

    imagenet_mean = [0.485, 0.456, 0.406]

    imagenet_std = [0.229, 0.224, 0.225]

    train_transform = transforms.Compose(

        [

            transforms.Resize((224, 224)),

            transforms.RandomHorizontalFlip(),

            transforms.ToTensor(),

            transforms.Normalize(mean=imagenet_mean, std=imagenet_std),

        ]

    )

    test_transform = transforms.Compose(

        [

            transforms.Resize((224, 224)),

            transforms.ToTensor(),

            transforms.Normalize(mean=imagenet_mean, std=imagenet_std),

        ]

    )

    train_dataset = datasets.CIFAR10(

        root=“./data”, train=True, download=True, transform=train_transform

    )

    test_dataset = datasets.CIFAR10(

        root=“./data”, train=False, download=True, transform=test_transform

    )

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_loader, test_loader

The build_resnet18_cifar10 function loads a ResNet-18 model with ImageNet pretrained weights and replaces the final fully connected layer. ImageNet has 1000 classes, but CIFAR-10 has 10 classes, so we update the last layer to output 10 logits.

def build_resnet18_cifar10(num_classes: int = 10) -> nn.Module:

    “”

    ResNet18 backbone with ImageNet weights, but final layer adapted to CIFAR-10.

    ““”

    weights = models.ResNet18_Weights.IMAGENET1K_V1

    model = models.resnet18(weights=weights)

    in_features = model.fc.in_features

    model.fc = nn.Linear(in_features, num_classes)

    return model

Quick Fine-Tuning

In this step, we do a small fine-tuning run to make the model adapt to CIFAR-10. This is not meant to be a full training pipeline. It is a fast demo training loop so we can later compare PyTorch inference vs. ONNX inference.

The quick_finetune_cifar10 function trains the model for a limited number of batches. It uses cross-entropy loss because CIFAR-10 is a multi-class classification task. It uses the Adam optimizer for quick learning. The loop runs through batches, performs a forward pass, calculates the loss, runs backpropagation, and updates model weights. At the end, it prints an average training loss so we can see that training happened.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

def quick_finetune_cifar10(

    model: nn.Module,

    train_loader: DataLoader,

    device: torch.device,

    max_batches: int = 200,

):

    “”

    Very light fine-tuning on CIFAR-10 to make metrics non-trivial.

    Trains for max_batches only (1 pass over subset of train data).

    ““”

    model.to(device)

    model.train()

    criterion = nn.CrossEntropyLoss()

    optimizer = torch.optim.Adam(model.parameters(), lr=1e3)

    running_loss = 0.0

    for batch_idx, (images, labels) in enumerate(train_loader):

        if batch_idx >= max_batches:

            break

        images = images.to(device)

        labels = labels.to(device)

        optimizer.zero_grad()

        outputs = model(images)

        loss = criterion(outputs, labels)

        loss.backward()

        optimizer.step()

        running_loss += loss.item()

    avg_loss = running_loss / max_batches

    print(f“[Train] Average loss over {max_batches} batches: {avg_loss:.4f}”)

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

print(“Using device for training:”, device)

train_loader, test_loader = get_cifar10_loaders(batch_size=64)

model = build_resnet18_cifar10(num_classes=10)

print(“Starting quick fine-tuning on CIFAR-10 (demo)…”)

quick_finetune_cifar10(model, train_loader, device, max_batches=200)

# Save weights for reuse (PyTorch + ONNX export)

torch.save(model.state_dict(), “resnet18_cifar10.pth”)

print(“✅ Saved fine-tuned weights to resnet18_cifar10.pth”)

After training, we save the model weights using torch.save(). This creates a .pth file, which is the standard PyTorch format for storing model parameters.

Using device for training: cuda

Starting quick finetuning on CIFAR10 (demo)...

[Train] Average loss over 200 batches: 0.7803

Saved finetuned weights to resnet18_cifar10.pth

Exporting To ONNX

Now we export the fine-tuned PyTorch model into ONNX format so it can be deployed and executed using ONNX Runtime.

The export_resnet18_cifar10_to_onnx function loads the model architecture again, loads the fine-tuned weights, and switches the model into evaluation mode using model.eval() so inference behaves consistently.

We also create a dummy input tensor with shape (1, 3, 224, 224). ONNX export needs this dummy input to trace the model graph and understand the input and output shapes.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

def export_resnet18_cifar10_to_onnx(

    weights_path: str = “resnet18_cifar10.pth”,

    onnx_path: str = “resnet18_cifar10.onnx”,

):

    device = torch.device(“cpu”)  # export on CPU

    model = build_resnet18_cifar10(num_classes=10).to(device)

    model.load_state_dict(torch.load(weights_path, map_location=device))

    model.eval()

    # Dummy input (batch_size=1)

    dummy_input = torch.randn(1, 3, 224, 224, device=device)

    input_names = [“input”]

    output_names = [“logits”]

    dynamic_axes = {

        “input”: {0: “batch_size”},

        “logits”: {0: “batch_size”},

    }

    torch.onnx.export(

        model,

        dummy_input,

        onnx_path,

        export_params=True,

        opset_version=17,

        do_constant_folding=True,

        input_names=input_names,

        output_names=output_names,

        dynamic_axes=dynamic_axes,

    )

    print(f“✅ Exported ResNet18 (CIFAR-10) to ONNX: {onnx_path}”)

export_resnet18_cifar10_to_onnx()

Finally, torch.onnx.export() generates the .onnx file.

Exported ResNet18 (CIFAR10) to ONNX: resnet18_cifar10.onnx

Benchmarking Torch CPU Vs. ONNX Runtime

In this final part, we evaluate both formats side by side. We keep everything on CPU so the comparison is fair.

The following function performs four major tasks:

  1. Load the PyTorch model on CPU.
  2. Load and validate the ONNX model.
  3. Check output similarity on one batch.
  4. Warm up and benchmark inference speed.

Then we run timed inference for a fixed number of batches:

  • We measure the time taken by PyTorch inference on CPU.
  • We measure the time taken by ONNX Runtime inference on CPU.
  • We collect predictions from both and compute accuracy and macro F1 score.

Finally, we print average latency per batch and show an estimated speedup ratio.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

def verify_and_benchmark(

    weights_path: str = “resnet18_cifar10.pth”,

    onnx_path: str = “resnet18_cifar10.onnx”,

    batch_size: int = 64,

    warmup_batches: int = 2,

    max_batches: int = 30,

):

    device = torch.device(“cpu”)  # fair CPU vs CPU comparison

    print(“Using device for evaluation:”, device)

    # 1) Load PyTorch model

    torch_model = build_resnet18_cifar10(num_classes=10).to(device)

    torch_model.load_state_dict(torch.load(weights_path, map_location=device))

    torch_model.eval()

    # 2) Load ONNX model and create session

    onnx_model = onnx.load(onnx_path)

    onnx.checker.check_model(onnx_model)

    print(“✅ ONNX model is well-formed.”)

    ort_session = ort.InferenceSession(onnx_path, providers=[“CPUExecutionProvider”])

    print(“ONNXRuntime providers:”, ort_session.get_providers())

    # 3) Data loader (test set)

    _, test_loader = get_cifar10_loaders(batch_size=batch_size)

    # ————————-

    # A) Numeric closeness check on a single batch

    # ————————-

    images, labels = next(iter(test_loader))

    images = images.to(device)

    labels = labels.to(device)

    with torch.no_grad():

        torch_logits = torch_model(images).cpu().numpy()

    ort_inputs = {“input”: images.cpu().numpy().astype(np.float32)}

    ort_logits = ort_session.run([“logits”], ort_inputs)[0]

    abs_diff = np.abs(torch_logits ort_logits)

    max_abs = abs_diff.max()

    mean_abs = abs_diff.mean()

    print(f“Max abs diff:  {max_abs:.6e}”)

    print(f“Mean abs diff: {mean_abs:.6e}”)

    # Relaxed tolerance to account for small numerical noise

    np.testing.assert_allclose(torch_logits, ort_logits, rtol=1e02, atol=1e04)

    print(“✅ Outputs match closely between PyTorch and ONNXRuntime within relaxed tolerance.”)

    # ————————-

    # B) Warmup runs (on a couple of batches, not recorded)

    # ————————-

    print(f“nWarming up on {warmup_batches} batches (not timed)…”)

    warmup_iter = iter(test_loader)

    for _ in range(warmup_batches):

        try:

            imgs_w, _ = next(warmup_iter)

        except StopIteration:

            break

        imgs_w = imgs_w.to(device)

        with torch.no_grad():

            _ = torch_model(imgs_w)

        _ = ort_session.run([“logits”], {“input”: imgs_w.cpu().numpy().astype(np.float32)})

    # ————————-

    # C) Timed runs + metric collection

    # ————————-

    print(f“nRunning timed evaluation on up to {max_batches} batches…”)

    all_labels = []

    torch_all_preds = []

    onnx_all_preds = []

    torch_times = []

    onnx_times = []

    n_batches = 0

    for batch_idx, (images, labels) in enumerate(test_loader):

        if batch_idx >= max_batches:

            break

        n_batches += 1

        images = images.to(device)

        labels = labels.to(device)

        # Time PyTorch

        start = time.perf_counter()

        with torch.no_grad():

            torch_out = torch_model(images)

        end = time.perf_counter()

        torch_times.append(end start)

        # Time ONNX

        ort_inp = {“input”: images.cpu().numpy().astype(np.float32)}

        start = time.perf_counter()

        ort_out = ort_session.run([“logits”], ort_inp)[0]

        end = time.perf_counter()

        onnx_times.append(end start)

        # Predictions

        torch_pred_batch = torch_out.argmax(dim=1).cpu().numpy()

        onnx_pred_batch = ort_out.argmax(axis=1)

        labels_np = labels.cpu().numpy()

        all_labels.append(labels_np)

        torch_all_preds.append(torch_pred_batch)

        onnx_all_preds.append(onnx_pred_batch)

    if n_batches == 0:

        print(“No batches processed for evaluation. Check max_batches / dataloader.”)

        return

    # Concatenate across batches

    all_labels = np.concatenate(all_labels, axis=0)

    torch_all_preds = np.concatenate(torch_all_preds, axis=0)

    onnx_all_preds = np.concatenate(onnx_all_preds, axis=0)

    # ————————-

    # D) Metrics: accuracy & F1 (macro)

    # ————————-

    torch_acc = accuracy_score(all_labels, torch_all_preds) * 100.0

    onnx_acc = accuracy_score(all_labels, onnx_all_preds) * 100.0

    torch_f1 = f1_score(all_labels, torch_all_preds, average=“macro”) * 100.0

    onnx_f1 = f1_score(all_labels, onnx_all_preds, average=“macro”) * 100.0

    print(“n📊 Evaluation metrics on timed subset”)

    print(f“PyTorch  – accuracy: {torch_acc:.2f}%   F1 (macro): {torch_f1:.2f}%”)

    print(f“ONNX     – accuracy: {onnx_acc:.2f}%   F1 (macro): {onnx_f1:.2f}%”)

    # ————————-

    # E) Latency summary

    # ————————-

    avg_torch = sum(torch_times) / len(torch_times)

    avg_onnx = sum(onnx_times) / len(onnx_times)

    print(f“n⏱  Latency over {len(torch_times)} batches (batch size = {batch_size})”)

    print(f“PyTorch avg:      {avg_torch * 1000:.2f} ms / batch”)

    print(f“ONNXRuntime avg:  {avg_onnx * 1000:.2f} ms / batch”)

    if avg_onnx > 0:

        print(f“Estimated speedup (Torch / ORT): {avg_torch / avg_onnx:.2f}x”)

    else:

        print(“Estimated speedup: N/A (onnx time is 0?)”)

verify_and_benchmark(

    weights_path=“resnet18_cifar10.pth”,

    onnx_path=“resnet18_cifar10.onnx”,

    batch_size=64,

    warmup_batches=2,

    max_batches=30,

)

As a result, we get a detailed report. The accuracy remains the same, but we achieve faster inference speed with ONNX.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Using device for evaluation: cpu

ONNX model is wellformed.

ONNXRuntime providers: [‘CPUExecutionProvider’]

Max abs diff:  3.814697e06

Mean abs diff: 4.552072e07

Outputs match closely between PyTorch and ONNXRuntime within relaxed tolerance.

Warming up on 2 batches (not timed)...

Running timed evaluation on up to 30 batches...

📊 Evaluation metrics on timed subset

PyTorch   accuracy: 78.18%   F1 (macro): 77.81%

ONNX     accuracy: 78.18%   F1 (macro): 77.81%

  Latency over 30 batches (batch size = 64)

PyTorch avg:      2192.50 ms / batch

ONNXRuntime avg:  1317.09 ms / batch

Estimated speedup (Torch / ORT): 1.66x

Exporting Scikit-Learn And Keras Models To ONNX

In this section, we show how ONNX can also be used beyond deep learning frameworks like PyTorch. We will export a traditional scikit-learn model and a TensorFlow/Keras neural network into ONNX format. This demonstrates how ONNX acts as a common deployment layer across classical machine learning and deep learning models.

Exporting A Scikit-Learn Model To ONNX

We will now train a simple Random Forest classifier on the Iris dataset using scikit-learn and then export it to ONNX format for deployment.

Before conversion, we explicitly define the ONNX input type, including the input name, floating-point data type, dynamic batch size, and the correct number of input features, which ONNX requires to build a static computation graph.

We then convert the trained model, save the resulting .onnx file, and finally validate it to ensure the exported model is well-formed and ready for inference with ONNX Runtime.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from skl2onnx import convert_sklearn

from skl2onnx.common.data_types import FloatTensorType

import onnx

# 1) Train a small sklearn model

iris = load_iris()

X_train, X_test, y_train, y_test = train_test_split(

    iris.data, iris.target, test_size=0.2, random_state=42

)

rf = RandomForestClassifier(n_estimators=50, random_state=42)

rf.fit(X_train, y_train)

print(“✅ Trained RandomForestClassifier on Iris”)

# 2) Define input type for ONNX (batch_size x n_features)

n_features = X_train.shape[1]

initial_type = [(“input”, FloatTensorType([None, n_features]))]

# 3) Convert to ONNX

rf_onnx = convert_sklearn(rf, initial_types=initial_type, target_opset=17)

onnx_path_sklearn = “random_forest_iris.onnx”

with open(onnx_path_sklearn, “wb”) as f:

    f.write(rf_onnx.SerializeToString())

# 4) Quick sanity check

onnx.checker.check_model(onnx.load(onnx_path_sklearn))

print(f“✅ Exported sklearn model to {onnx_path_sklearn}”)

Our model is now trained, converted, saved, and validated.

Trained RandomForestClassifier on Iris

Exported sklearn model to random_forest_iris.onnx

Exporting A TensorFlow/Keras Model To ONNX

We will now export a TensorFlow neural network to ONNX format to demonstrate how deep learning models trained with TensorFlow can be prepared for portable deployment.

The environment is configured to run on CPU with minimal logging to keep the process clean and reproducible. A simple fully connected Keras model is built using the Functional API, with a fixed input size and a small number of layers to keep the conversion straightforward.

An input signature is then defined so ONNX knows the expected input shape, data type, and tensor name at inference time. Using this information, the Keras model is converted into ONNX format and saved as a .onnx file.

Finally, the exported model is validated to ensure it is well-formed and ready to be executed using ONNX Runtime or any other ONNX-compatible inference engine.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

import os

os.environ[“TF_CPP_MIN_LOG_LEVEL”] = “2”

os.environ[“CUDA_VISIBLE_DEVICES”] = “-1”

import tensorflow as tf

import tf2onnx

import onnx

# 3) Build a simple Keras model

inputs = tf.keras.Input(shape=(32,), name=“input”)

x = tf.keras.layers.Dense(64, activation=“relu”)(inputs)

x = tf.keras.layers.Dense(32, activation=“relu”)(x)

outputs = tf.keras.layers.Dense(10, activation=“softmax”, name=“output”)(x)

keras_model = tf.keras.Model(inputs=inputs, outputs=outputs)

keras_model.summary()

# 4) Convert to ONNX

spec = (

    tf.TensorSpec(

        keras_model.inputs[0].shape,

        keras_model.inputs[0].dtype,

        name=“input”,

    ),

)

onnx_model_keras, _ = tf2onnx.convert.from_keras(

    keras_model,

    input_signature=spec,

    opset=17,

)

onnx_path_keras = “keras_mlp.onnx”

with open(onnx_path_keras, “wb”) as f:

    f.write(onnx_model_keras.SerializeToString())

onnx.checker.check_model(onnx.load(onnx_path_keras))

print(f“✅ Exported Keras/TensorFlow model to {onnx_path_keras}”)

Our model is now trained, converted, saved, and validated.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Model: “functional_4”

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓

Layer (type)                     Output Shape                  Param # ┃

┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩

input (InputLayer)               (None, 32)                          0

├─────────────────────────────────┼────────────────────────┼───────────────┤

dense_8 (Dense)                 (None, 64)                      2,112

├─────────────────────────────────┼────────────────────────┼───────────────┤

dense_9 (Dense)                 (None, 32)                      2,080

├─────────────────────────────────┼────────────────────────┼───────────────┤

output (Dense)                   (None, 10)                        330

└─────────────────────────────────┴────────────────────────┴───────────────┘

Total params: 4,522 (17.66 KB)

Trainable params: 4,522 (17.66 KB)

Nontrainable params: 0 (0.00 B)

Exported Keras/TensorFlow model to keras_mlp.onnx

Final Thoughts

ONNX provides a practical bridge between model training and real-world deployment by making machine learning models portable, framework-independent, and easier to optimize for inference.

By fine-tuning a PyTorch model, exporting it to ONNX, and comparing accuracy and CPU inference speed, we saw that ONNX can deliver the same predictive quality with improved performance.

It simplifies the path from experimentation to production and reduces friction when deploying models across different environments.

With this level of portability, performance, and consistency, it is worth asking: what more reason do you need not to use ONNX for all of your machine learning projects?

No comments yet.