Viewing Your Dataset

Overview of the different views within Aquarium

Selecting Data to Review

Once you've uploaded a dataset through the Aquarium API, you can go to to begin to explore and understand what's in your dataset.

To begin, select your project from the Projects page. Then click the Explore button to select a dataset and/or inference pairing to start reviewing.

Aquarium pairs labeled data with corresponding inference data, so each unique labeled dataset uploaded will have the option to click Explore.

Grid View

The first and default view for dataset exploration is the Grid View. This view lets you quickly look through your dataset and understand what it "looks like" at a glance. Labels, if available, are overlaid over the underlying data.

You can use the "Display Settings" button to toggle settings like label transparency, number of datapoints displayed per row, etc.

Frame View

By clicking on an individual datapoint you can access the Frame View and view more detailed information such as its metadata (timestamp, device ID, etc.) and its labels.

In the Frame View, you also have a "Similar Data Samples" tab you can pull up and use it to find similar examples to the datapoint you are currently looking at. This makes it easy to analyze patterns you may find interest while reviewing your data.

Analysis View

The second view for dataset understanding is the Histogram View.

You often want to view the distribution of your metadata across your dataset. This is particularly useful for understanding if you have an even spread of classes, times of day, etc. in your dataset. Simply click the dropdown, select a metadata field, and you can see a histogram of the distribution of that value across the dataset.

Embedding View

The third view for dataset understanding is the Embedding View.

The previous methods of data exploration rely a lot on metadata to find interesting parts of your dataset to look at. However, there's not always metadata for important types of variation in your dataset. We can use neural network embeddings to index the raw data in your dataset to better understand its distribution.

The embedding view plots variation in the raw underlying data. Each point in the chart represents a single datapoint. In the Image view, each point is a whole image or "row" in your dataset. In the Crop view, each point represents an individual label or inference object that is in a part of the image.

The closer points are to each other, the more similar they are. The farther apart they are, the more different. Using the embedding view, you can understand the types of differences in your raw data, find clusters of similar datapoints, and examine outlier datapoints.

You can also color the embedding points with metadata to understand the distribution of metadata relative to the distribution of the raw data.

To select a group of points for visualization, you can hold shift + click and draw a lasso around a group of points. You can then scroll through individual examples in the panel with the arrows. You can also adjust the size of the detail panel by dragging the corner.

It's also possible to change what part of the image you're looking at in the preview pane. You can zoom in and out of the image preview by "scrolling" like you would with your mouse's scroll wheel / two finger scroll. You can also click and drag the image to pan the view around the image.

Updating Label Colors

If you want to change the color associated with a specific label, you can do so by clicking on the color square next to the label in the "Display Settings" menu:

Setting Max Visible Confidence

In the "Display Settings" menu, you can adjust the max visible confidence so that only lower confidence inferences appear:

Note that the Min Confidence Threshold setting is slightly different from the Max Confidence Visible. In combination with the Min IOU Threshold, it determines how ground truth labels are matched to inference labels for metrics calculations (see Metrics Methodology for more details).

Last updated