2D Classification
Learn how to create an example 2D classification project using the oxford-iiit-pets dataset
Overview
This quickstart will walk you through uploading data into Aquarium, starting with a standard open source dataset for a 2D classification task.
Before you get started, there are also pages with some background and information around key concepts in Aquarium.
The main steps we will cover are:
Create a project within Aquarium
Upload labeled data
Upload inference data
By the end of this guide, you should have a good idea of how to upload your data into Aquarium and explore a dataset full of pets!

Prerequisites
To follow along with this Quickstart guide here are some things you'll need:
Download the quickstart dataset
The dataset contains the raw images, data, and an end-to-end example upload script
Ensure you have installed the latest Aquarium client
pip install aquariumlearning
A development environment running a version of Python 3.6+
Pet Dataset Ingestion
To highlight the core concepts and interactions, we're going to work with an open source dataset and computer vision task: classifying pet breeds from a photo. We'll be working with the oxford-iiit-pets dataset, which contains 6000 labeled images.
The source dataset and accompanying documentation can be found here: https://www.robots.ox.ac.uk/~vgg/data/pets/ You can download Aquarium's quickstart dataset (231MB) at this link.
Uploading the Data
Python Client Library
Reminder, the aquariumlearning package requires Python >= 3.6.
Aquarium provides a python client library to simplify integration into your existing ML data workflows. In addition to wrapping API requests, it also handles common needs such as efficiently encoding uploaded data or using disk space to work with datasets larger than available system memory.
You can install and use the library using the following code block:
To get your API key, you can follow these instructions.
Projects
Projects are the highest level grouping in Aquarium and they allow us to:
Define a specific core task - in this case, pet breed detection (2D Classification)
Define a specific ontology
Hold multiple datasets for a given task/ontology
You can click here for more information on defining projects and best practices! To create a project, we specify a name, what the full set of valid classifications are, and the primary task being performed.
Labeled Datasets
Often just called "datasets" for short, these are versioned snapshots of input data and ground truth labels. They belong to a Project, and consist of multiple LabeledFrames.
In most cases a Frame is a logical grouping of an image and its structured metadata. In more advanced cases, a frame may include more than one image (context imagery, fused sensors, etc.) and additional metadata. Now let's create our LabeledDataset object and add the LabeledFrames to it:
Inferences
Now that we have created a Project and a LabeledDataset, let's also upload those model inferences. Inferences, like labels, must be matched to a frame within the dataset. For each LabeledFrame in your dataset, we will create an InferencesFrame and then assign the appropriate inferences to that InferencesFrame.
Creating InferencesFrame objects and adding inferences will look very similar to creating LabeledFrames and adding labels to them.
We then add the InferencesFrame to an Inferences object and then upload the Inferences object into Aquarium!
Important Things To Note:
Each InferencesFrame must exactly match to a LabeledFrame in the dataset. This is accomplished by ensuring the
frame_idproperty is the same between corresponding LabeledFrames and InferencesFrames.It is possible to assign inferences to only a subset of frames within the overall dataset (e.g. just the test set).
At this point we have created a project in Aquarium, and uploaded our labels and inferences. The data has been properly formatted, but now as our final step, let's use the client to actually upload the data to our project!
Submit the Datasets!
Now that we have the datasets, using the client we can upload the data:
With the code snippet above your data will start uploading! Now we can monitor the status within the UI!
Monitoring Your Upload
When you start an upload, Aquarium performs some crucial tasks like indexing metadata and generating embeddings for dataset so it may take a little bit of time before you can fully view your dataset. You can monitor the status of your upload in the application as well as your console after running your upload script. To view your upload status, log into Aquarium and click on your newly created Project. Then navigate to the tab that says "Streaming Uploads" where you can view the status of your dataset uploads.

Once your upload is completed under the "Datasets" tab, you'll see a view like this:

And congrats!! You've uploaded a dataset into Aquarium! You're now ready to start exploring your data in the application!
Completed Example Upload Script
Putting it all together here is the entire script you can use to replicate this project. Remember to download the data here to have access to the script and all the needed data.
Last updated
Was this helpful?