# Comparing Models

{% embed url="<https://youtu.be/QNoo4b1YdWY>" %}

Often you may have inferences from multiple different models on the same dataset - this happens most commonly when you have trained a new model and want to compare it to an old model on a validation dataset.

Aquarium offers functionality to compare inferences from multiple models on the same dataset to see how your high-level metrics changed. You can also drill down into the exact circumstances where the inferences differ - where one model performs better or worse compared to the other.

## Uploading Multiple Inferences

To compare inferences, you will need to upload multiple sets of **inferences** with the same project and base **dataset** with the Python client API.

For example, uploading a labeled base dataset might look like the following:

```
python3 upload_to_aquarium.py \
    --aquarium-project example_project \
    --aquarium-dataset example_dataset
```

Now we may want to upload a set of inferences from model A on the base dataset:

```
python3 upload_to_aquarium.py \
    --aquarium-project example_project \
    --aquarium-dataset example_dataset \
    --inferences-id model_a 
```

And then uploading another set of inferences from model B on the same base dataset:

```
python3 upload_to_aquarium.py \
    --aquarium-project example_project \
    --aquarium-dataset example_dataset \
    --inferences-id model_b 
```

## Searching Across Multiple Models

Once you have multiple sets of inferences for the same dataset, you can select a second set of inferences from the top query bar by clicking the "+" icon and selecting another set of inferences from the dropdown. In this document, we'll refer to those as the "base inferences" and "other inferences" respectively.

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIun5ZB9VENa0rB7zQg%2Fimage.png?alt=media\&token=640400b0-c646-4730-8ebb-ac36957a0104)

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIunD6gKcKj3cRTnb8L%2Fimage.png?alt=media\&token=997f570c-fb31-466e-b23a-dd57fb9ccf36)

With a second set of inferences selected, you can now write queries that reference the labels in the labeled dataset, the base inference set, and the other inference set:

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIunj7Q74NVjM-nL74A%2Fimage.png?alt=media\&token=5d35a598-3906-4d20-9c02-60d52eddb82a)

## Multiple Inference Rendering

### Object Detection

When comparing multiple sets of inferences, the label viewer will draw inferences with an additional diagonal hatching pattern -- bottom-left to top-right for the base inference set, and top-left to bottom-right for the other inference set.

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIupuqWgQroGAelQu55%2Fimage.png?alt=media\&token=2fbc562c-7328-48ab-ba8d-6dd676292173)

### Semantic Segmentation

When viewing multiple inferences for a semantic segmentation task, we present a separate view that captures pixel-wise changes.

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIuqjynEp1ly5ql8jiJ%2Fimage.png?alt=media\&token=aad1c151-970b-4ca5-b81c-7dd6d77794b9)

On the left, we have the three masks rendered on top of the image. On the right, we have a diff overlay that shows overall correctness of the two inference sets, as well as where they improved/worsened. The divider between the sections can be dragged, to make certain images larger or smaller:

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIurgP9UTI6TpziE2s8%2Fimage.png?alt=media\&token=68c6b1da-2ba7-4878-aba5-a14240bbd908)

The diff map is colored according to the following scheme:

* Blue: Both inference sets agree with the labels.
* Yellow: Both inference sets disagree with the labels.
* Green: The base inference set disagreed, and the other inference set agrees.
* Red: The base inference set agreed, and the other inference set disagrees.

Each color in the legend can be toggled, to only show a subset of the diffs:

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIuucxg4n-asE5xvW_0%2Fimage.png?alt=media\&token=26a091d7-1143-454c-bfea-d54ccbbd4e3b)

Below the images, we also have a classification report and confusion matrix for the image, which captures the relative performance change from the base inference set to the other inference set. Cells that improved will be colored a shade of green, while those that worsened will be colored a shade of red.

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIuv1I3udPJxcxkFkET%2Fimage.png?alt=media\&token=67bb9394-9325-462b-a576-117114cfa76b)

Clicking on a cell of the confusion matrix will further filter the diff view, to just the relevant pixels for that kind of confusion.

## Metrics Comparison

### Object Detection

When comparing multiple inferences, the metrics view switches to rendering the relative difference in metrics from the base inference set to the other inference set:

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIv0b578pVdMcDv7Z3q%2Fimage.png?alt=media\&token=51c39b49-cafb-4d39-9e07-e0e6c9f5aa1d)

Clicking on a cell of the confusion matrix will allow you to view samples where there was a change in that confusion between the two inference sets. For example, clicking on the cyclist -> pedestrian confusion cell will show us examples where the base inference set confused a cyclist for a pedestrian, and the other inference set fixed that confusion:

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIv1OZist7cIgKyYIbg%2Fimage.png?alt=media\&token=3244d9cd-843b-499e-8245-07ff3e787c11)

Similarly, we can also find times when a new confusion of that type was newly introduced by using the "New Confusion" tab.&#x20;

### Semantic Segmentation & User Provided Metrics

For metrics that apply to the entire image, such as semantic segmentation confusion counts or user-provided metrics, clicking on a cell of the confusion matrix will compute the per-image change for that confusion type. You can then sort the returned images by the change in the per-image confusion count.

![](https://391596125-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MI6IGz_3V5m6p1UIhXm%2F-MIuk6CG1HT-MpObUB5Q%2F-MIv3NYaZNTSo_VH7NbR%2Fimage.png?alt=media\&token=c0cd2105-60ce-4d61-9caf-f826214ddeca)
