Aquarium
  • Getting Started
    • Intro to Aquarium
    • Key Concepts
    • Account Setup and Team Onboarding
    • Quickstart Guides
      • 2D Classification
      • 2D Object Detection
      • 2D Semantic Segmentation
    • Announcements
    • Python Client API Docs
  • Data Privacy
    • Data Sharing Methodologies
      • Generate Local Credentials from AWS
      • Generating Access-Controlled URLs
      • Granting Aquarium Read Access to an AWS S3 Bucket
    • Anonymous Mode
  • Integrating With Aquarium
    • Creating Projects in Aquarium
    • Uploading Data
      • Labeled Datasets
      • Model Inferences
      • Unlabeled Datasets
    • Exporting Data
      • Batch Exports
      • Webhooks
    • Updating Datasets
  • Working In Aquarium
    • Managing Projects
    • Viewing Your Dataset
    • Analyzing Your Metadata
    • Querying Your Dataset
    • Organizing Your Data
    • Inspecting Model Performance
    • Analyzing Model Inferences
    • Finding Similar Elements Within a Dataset
    • Comparing Models
  • Common End-To-End Workflows
    • Assess Data Quality
    • Collect Relevant Data
    • Evaluate Model Performance
  • Python SDK
    • Python Client API Docs
    • Working With the SDK
    • Code Snippets and Examples
      • Segments
      • Confusion Matrix Scripting
      • Collection Campaign Scripting
  • Advanced Concepts
    • Adding Custom Embeddings
    • Dataset Checkpoints
    • Collection Campaign Classifier
    • Embeddings
    • URL formatting
    • Metrics Methodology
    • Complex Label Class Maps
    • Webhooks
      • Integrating with Labeling Using Webhooks
    • Custom Metrics
      • Stratified Metrics
    • Troubleshooting
      • Troubleshooting Common Web Issues
Powered by GitBook
On this page
  • Overview
  • Prerequisites
  • Pet Dataset Ingestion
  • Uploading the Data
  • Python Client Library
  • Projects
  • Labeled Datasets
  • Inferences
  • Submit the Datasets!
  • Monitoring Your Upload
  • Completed Example Upload Script

Was this helpful?

  1. Getting Started
  2. Quickstart Guides

2D Classification

Learn how to create an example 2D classification project using the oxford-iiit-pets dataset

PreviousQuickstart GuidesNext2D Object Detection

Last updated 2 years ago

Was this helpful?

Overview

This quickstart will walk you through uploading data into Aquarium, starting with a standard open source dataset for a 2D classification task.

Before you get started, there are also pages with some and information around in Aquarium.

The main steps we will cover are:

  • Create a project within Aquarium

  • Upload labeled data

  • Upload inference data

By the end of this guide, you should have a good idea of how to upload your data into Aquarium and explore a dataset full of pets!

Prerequisites

To follow along with this Quickstart guide here are some things you'll need:

    • The dataset contains the raw images, data, and an end-to-end example upload script

  • Ensure you have installed the latest Aquarium client

    • pip install aquariumlearning

  • A development environment running a version of Python 3.6+

Pet Dataset Ingestion

To highlight the core concepts and interactions, we're going to work with an open source dataset and computer vision task: classifying pet breeds from a photo. We'll be working with the oxford-iiit-pets dataset, which contains 6000 labeled images.

# Overall data structure
├── imgs
│   ├── Abyssinian_100.jpg
│   ├── american_bulldog_125.jpg
│   └── american_pit_bull_terrier_191.jpg
├── inferences.json
└── labels.json
└── classnames.json

# All images are mirrored online at
# https://storage.googleapis.com/aquarium-public/quickstart/pets/imgs/<filename>.jpg

# Format of labels.json
[
  {
    "file_name": "Sphynx_158.jpg",
    "class_id": 33,
    "species_id": 0,
    "class_name": "sphynx",
    "species_name": "cat",
    "split_name": "train"
  },
  ...
]

# Format of inferences.json
[
  {
    "confidence": 1,
    "frame_id": "Sphynx_158",
    "class_id": 33,
    "class_name": "sphynx"
  },
  ...
]

Uploading the Data

Python Client Library

Reminder, the aquariumlearning package requires Python >= 3.6.

Aquarium provides a python client library to simplify integration into your existing ML data workflows. In addition to wrapping API requests, it also handles common needs such as efficiently encoding uploaded data or using disk space to work with datasets larger than available system memory.

You can install and use the library using the following code block:

!pip install aquariumlearning

import aquariumlearning as al
al_client = al.Client()
al_client.set_credentials(api_key=YOUR_API_KEY)

Projects

Projects are the highest level grouping in Aquarium and they allow us to:

  • Define a specific core task - in this case, pet breed detection (2D Classification)

  • Define a specific ontology

  • Hold multiple datasets for a given task/ontology

import string
import random
import json

# Project names have to be globally unique in Aquarium
PROJECT_NAME = 'Pets_Quickstart'

# read in classnames file
with open('./classnames.json') as f:
    classnames = json.load(f)

# create the project in Aquarium
# define ML task using the primary_task field
# the from_classnames() call uses the classnames we loaded in on line 9
al_client.create_project(
    PROJECT_NAME, 
    al.LabelClassMap.from_classnames(classnames), 
    primary_task="CLASSIFICATION"
)

Labeled Datasets

In most cases a Frame is a logical grouping of an image and its structured metadata. In more advanced cases, a frame may include more than one image (context imagery, fused sensors, etc.) and additional metadata. Now let's create our LabeledDataset object and add the LabeledFrames to it:

# read in labeled data
with open('./labels.json') as f:
    label_entries = json.load(f)

# defines our dataset we will be uploading
labeled_dataset = al.LabeledDataset()

# loop through each labeled/ground truth object
for entry in label_entries:
    # Create a frame object, using the filename as an id
    # Frames must have a unique id
    frame_id = entry['file_name'].split('.jpg')[0]
    frame = al.LabeledFrame(frame_id=frame_id)

    # Add arbitrary metadata, such as the train vs test split
    frame.add_user_metadata('split_name', entry['split_name'])
    
    # Add an image to the frame
    image_url = "https://storage.googleapis.com/aquarium-public/quickstart/pets/imgs/" + entry['file_name']
    frame.add_image(image_url=image_url)

    # Add the ground truth classification label to the frame
    label_id = frame_id + '_gt'
    frame.add_label_2d_classification(
        label_id=label_id, 
        classification=entry['class_name']
    )

    # Add the frame to the dataset collection
    labeled_dataset.add_frame(frame)

Inferences

Now that we have created a Project and a LabeledDataset, let's also upload those model inferences. Inferences, like labels, must be matched to a frame within the dataset. For each LabeledFrame in your dataset, we will create an InferencesFrame and then assign the appropriate inferences to that InferencesFrame.

Creating InferencesFrame objects and adding inferences will look very similar to creating LabeledFrames and adding labels to them.

We then add the InferencesFrame to an Inferences object and then upload the Inferences object into Aquarium!

Important Things To Note:

  • Each InferencesFrame must exactly match to a LabeledFrame in the dataset. This is accomplished by ensuring the frame_id property is the same between corresponding LabeledFrames and InferencesFrames.

  • It is possible to assign inferences to only a subset of frames within the overall dataset (e.g. just the test set).

# load in inference file
with open('./inferences.json') as f:
    inference_entries = json.load(f)

# create our inference dataset we will be uploading
inference_dataset = al.Inferences()

# loop through each inference object
for entry in inference_entries:
    # Create a frame object, using the same frame id
    # labeled dataset image 
    frame_id = entry['frame_id']
    inf_frame = al.InferencesFrame(frame_id=frame_id)

    # Add the inferred classification label to the frame
    inf_label_id = frame_id + '_inf'
    inf_frame.add_inference_2d_classification(
        label_id=inf_label_id, 
        classification=entry['class_name'],
        confidence=entry['confidence']
    )

    # Add the frame to the inferences collection
    inference_dataset.add_frame(inf_frame)

At this point we have created a project in Aquarium, and uploaded our labels and inferences. The data has been properly formatted, but now as our final step, let's use the client to actually upload the data to our project!

Submit the Datasets!

Now that we have the datasets, using the client we can upload the data:

# the name of our dataset that will show up in Aquarium
LABELED_DATASET_NAME = 'pet_labels'

# call to upload our labeled dataset to our project
al_client.create_dataset(
    PROJECT_NAME, 
    LABELED_DATASET_NAME, 
    dataset=labeled_dataset, 
    # Poll for completion of the processing job
    wait_until_finish=True
)

# the name of our dataset that will show up in Aquarium
INFERENCE_DATASET_NAME = 'pet_inferences'

# call to upload our inference dataset to the project
al_client.create_inferences(
    PROJECT_NAME, 
    LABELED_DATASET_NAME, 
    inferences=inference_dataset, 
    inferences_id=INFERENCE_DATASET_NAME
)

With the code snippet above your data will start uploading! Now we can monitor the status within the UI!

Monitoring Your Upload

When you start an upload, Aquarium performs some crucial tasks like indexing metadata and generating embeddings for dataset so it may take a little bit of time before you can fully view your dataset. You can monitor the status of your upload in the application as well as your console after running your upload script. To view your upload status, log into Aquarium and click on your newly created Project. Then navigate to the tab that says "Streaming Uploads" where you can view the status of your dataset uploads.

Once your upload is completed under the "Datasets" tab, you'll see a view like this:

And congrats!! You've uploaded a dataset into Aquarium! You're now ready to start exploring your data in the application!

Completed Example Upload Script

#!pip install aquariumlearning

import aquariumlearning as al
al_client = al.Client()
al_client.set_credentials(api_key=YOUR_API_KEY)

import string
import random
import json

# defining names for project and the datasets as they'll show up in Aquarium
# project names must be unique, and within a project, datasets must also have unique names
PROJECT_NAME = 'Pets_Quickstart'
LABELED_DATASET_NAME = 'pets_labels'
INFERENCE_DATASET_NAME = 'pets_inferences'

# define the filepaths we'll be working with
classnames_filepath = 'pets_2D_classification_quickstart/classnames.json'
labels_filepath = 'pets_2D_classification_quickstart/labels.json'
inferences_filepath = 'pets_2D_classification_quickstart/inferences.json'

# load in your classname file
with open(classnames_filepath) as f:
    classnames = json.load(f)

# load in your label data
with open(labels_filepath) as f:
    label_entries = json.load(f)

# load in your inference data
with open(inferences_filepath) as f:
    inference_entries = json.load(f)

# create the project in Aquarium
# define ML task using the primary_task field
# the from_classnames() call uses the classnames we loaded in on line 23
al_client.create_project(
    PROJECT_NAME, 
    al.LabelClassMap.from_classnames(classnames), 
    primary_task="CLASSIFICATION"
)

# defines our dataset we will be uploading
labeled_dataset = al.LabeledDataset()

# loop through each labeled/ground truth object
for entry in label_entries:
    # Create a frame object, using the filename as an id
    # Frames must have a unique id
    frame_id = entry['file_name'].split('.jpg')[0]
    frame = al.LabeledFrame(frame_id=frame_id)

    # Add arbitrary metadata, such as the train vs test split
    frame.add_user_metadata('split_name', entry['split_name'])
    
    # Add an image to the frame
    image_url = "https://storage.googleapis.com/aquarium-public/quickstart/pets/imgs/" + entry['file_name']
    frame.add_image(image_url=image_url)

    # Add the ground truth classification label to the frame
    label_id = frame_id + '_gt'
    frame.add_label_2d_classification(
        label_id=label_id, 
        classification=entry['class_name']
    )

    # Add the frame to the dataset collection
    labeled_dataset.add_frame(frame)

# call to upload our dataset to our project
al_client.create_dataset(
    PROJECT_NAME, 
    LABELED_DATASET_NAME, 
    dataset=labeled_dataset, 
    # Poll for completion of the processing job
    wait_until_finish=True, 
    # Preview the first frame before submission to catch mistakes
    preview_first_frame=True
)

# create our inference dataset we will be uploading
inference_dataset = al.Inferences()

# loop through each inference object
for entry in inference_entries:
    # Create a frame object, using the same frame id
    # labeled dataset image 
    frame_id = entry['frame_id']
    inf_frame = al.InferencesFrame(frame_id=frame_id)

    # Add the inferred classification label to the frame
    inf_label_id = frame_id + '_inf'
    inf_frame.add_inference_2d_classification(
        label_id=inf_label_id, 
        classification=entry['class_name'],
        confidence=entry['confidence']
    )

    # Add the frame to the inferences collection
    inference_dataset.add_frame(inf_frame)

# call to upload our labeled dataset to our project
al_client.create_dataset(
    PROJECT_NAME, 
    LABELED_DATASET_NAME, 
    dataset=labeled_dataset, 
    # Poll for completion of the processing job
    wait_until_finish=True, 
    # Preview the first frame before submission to catch mistakes
    preview_first_frame=True
)

# call to upload our inference dataset to the project
al_client.create_inferences(
    PROJECT_NAME, 
    LABELED_DATASET_NAME, 
    inferences=inference_dataset, 
    inferences_id=INFERENCE_DATASET_NAME
)

The source dataset and accompanying documentation can be found here: You can download Aquarium's quickstart dataset (231MB) .

To get your API key, you can follow instructions.

You can click for more information on defining projects and best practices! To create a project, we specify a name, what the full set of valid classifications are, and the primary task being performed.

Often just called "datasets" for short, these are versioned snapshots of input data and ground truth labels. They belong to a , and consist of multiple .

Putting it all together here is the entire script you can use to replicate this project. Remember to download the data to have access to the script and all the needed data.

Download the quickstart dataset
API Docs for reference
https://www.robots.ox.ac.uk/~vgg/data/pets/
at this link
these
here
here
background
key concepts
Project
LabeledFrames
Example data loaded into Aquarium from the oxford-iiit-pets dataset