Labeled Datasets

Labeled datasets include all of your ground truth labels for your dataset

Overview

Labeled datasets contain your ground truth/labels. A labeled dataset belongs to a Project and consist of multiple labeled frames.

A labeled frame is one logical "frame" of data, such as an image from a camera stream. They can contain one or more media/sensor inputs, zero or more ground truth labels, and arbitrary user provided metadata.

For example, in a 2D classification case, a frame would contain the image, labels, and all associated metadata. Whereas with a 3D object detection use case, a frame can contain images, point clouds, labels, and metadata too.

For real examples of uploading labeled data, please look at our quickstart guides!

Prerequisites to Uploading Labeled Data

In order to ensure the following steps will work smoothly, this guide assumes you have already:

  • Installed the Aquarium SDK

  • Have URLs for your raw data (images, point clouds, etc.)

    • See our data sharing docs for more details on URL requirements.

  • Have access to your labels for your raw data

To view your labeled data once uploaded, you will have to make sure that you have selected and set up the appropriate data sharing method for your team.

Creating and Formatting Your Labeled Data

To ingest a labeled dataset, there are two main objects you'll work with:

  • LabeledFrame

    • An object containing all relevant information for a single frame/image. Data URLs, labels, metadata, etc.

  • LabeledDataset

    • A collection of LabeledFrames.

For each datapoint, we create a LabeledFrame and add it to the LabeledDataset in order to create the dataset that we upload into Aquarium.

This usually means looping through your data and creating LabeledFrames to add to the LabeledDataset object.

If you have generated your own embeddings and want to use them during your labeled data uploads, please also see this section for additional guidance!

Defining these objects looks like this:

labeled_dataset = al.LabeledDataset()

for frame_id, frame_data in my_list_of_data:
    # Frames must have a unique frame_id
    frame = al.LabeledFrame(frame_id=frame_id)
    ...
    labeled_dataset.add_frame(frame)

Once you've defined your frame, we need to associate some data with it! In the next sections, we show you how to add your main form of input data to the frame (images, point clouds, etc), and then associate the ground truth labels to that frame.

Adding Data to Your Labeled Frame

Each LabeledFrame in your dataset can contain one or more input pieces of data. In many computer vision tasks, this may be a single image. In a robotics or self-driving task, this may be a full suite of camera images, lidar point clouds, and radar scans.

Here are some common data types, their expected formats, and how to work with them in Aquarium:

Your ML task utilizes images and you would like to add an image to your labeled data

Python API Definition Link

Example Usage

frame.add_image(
    image_url='https://storage.googleapis.com/aquarium-public/datasets/rareplanes/train/PS-RGB_tiled/96_10400100096C2500_tile_936.png',
    preview_url='',
    date_captured='2020-07-10 15:00:00.000',
    width=1280,
    height=720
)

Relevant Function Parameter Descriptions

Parameter Name
Description

image_url

string - URL to load the image by

preview_url (Optional)

string - URL to a compressed form of the image for faster loading in browsers, must be same pixel dimensions as original image

date_captured (Optional)

ISO formatted date-time string

width (Optional)

int - will be inferred otherwise

height (Optional)

int - will be inferred otherwise

Adding Labels to Your Labeled Frame

Each labeled frame in your dataset can contain zero or more ground truth labels.

Here are some common label types, their expected formats, and how to work with them in Aquarium:

You are working with 2D or 3D data and want to add a classification label

Python API Definition Link - 2D

Example Usage

# Standard 2D case
labeled_frame.add_label_2d_classification(
    label_id='unique_id_for_this_label',
    classification='dog'
)

Relevant Function Parameter Descriptions

Parameter Name
Description

label_id

string - a unique id across all other labels in this dataset

classification

string - what the label is classified as

user_attrs (Optional)

dict - Any additional label-level metadata fields. Defaults to None.

Making Metadata Fields Queryable

When you add metadata fields to your Label, we need to take one extra step so that you can query those fields and search for them in the Analysis view! Add the code below to your script, you can add it in right after you call create_dataset():

# next section goes into detail on how to call this
# create dataset before updating metadata schema
al_client.create_dataset(
    PROJECT_NAME, 
    DATASET_NAME, 
    dataset=labeled_dataset
)

# this method is a list of dict objects where you provide the 
# name of the field and the type of field 
al_client.update_dataset_object_metadata_schema(AL_PROJECT, AL_DATASET,
    [
        {"name": 'METADATA_FIELD_NAME_1', "type": "STRING"},
        {"name": 'METADATA_FIELD_NAME_2', "type": "STRING"},
        {"name": 'METADATA_FIELD_NAME_1', "type": "STRING"}
    ]
)

You can also run update_dataset_object_metadata_schema() on it's own after an upload to update the metadata to be queryable.

You can run this snippet on it's own after an upload:

import aquariumlearning as al

al_client = al.Client()
al_client.set_credentials(api_key='YOUR_API_KEY')

AL_PROJECT = 'YOUR PROJECT NAME'
AL_DATASET = 'YOUR DATASET NAME'

al_client.update_dataset_object_metadata_schema(AL_PROJECT, AL_DATASET,
    [
        {"name": 'METADATA_FIELD_NAME_1', "type": "STRING"},
        {"name": 'METADATA_FIELD_NAME_2', "type": "STRING"},
        {"name": 'METADATA_FIELD_NAME_1', "type": "STRING"}
    ]
)

Putting It All Together

In the API docs you can see the other operations associated with a LabeledFrame.

Now that we've discussed the general steps for adding labeled data, here is an example of what this would look like for a 2D classification example would look like this:

# Add an image to the frame
image_url = "https://storage.googleapis.com/aquarium-public/quickstart/pets/imgs/" + entry['file_name']
labeled_frame.add_image(image_url=image_url)

# Add the ground truth classification label to the frame
label_id = frame_id + '_gt'
labeled_frame.add_label_2d_classification(
    label_id=label_id, 
    classification=entry['class_name']
)

# once you have created the frame, add it to the dataset you created
labeled_dataset.add_frame(labeled_frame)

Uploading Your Labeled Dataset

Now that we have everything all set up, let's submit your new labeled dataset to Aquarium!

Aquarium does some processing of your data, like indexing metadata and possibly calculating embeddings, so after they're submitted so you may see a delay before they show up in the UI. You can view some examples of what to expect as well as troubleshooting your upload here!

Submitting Your Dataset

You can submit your LabeledDataset to be uploaded in to Aquarium by calling .create_dataset().

To spot check our data immediately, we can set the preview_first_frame flag toTrue and see a link in the console to a preview frame allows you to make sure data and labels look right.

This is an example of what the create_dataset() call will look like:

DATASET_NAME = 'labels_v1'

# In order to create a dataset in Aquarium you must provide
# name you would like for your project
# name you would like for your labeled dataset
# the LabeledDataset object you have created and added frames to
al_client.create_dataset(
    PROJECT_NAME, 
    DATASET_NAME, 
    dataset=labeled_dataset
)

After kicking off your dataset, it can take anywhere from minutes to multiple hours depending on your dataset size.

You can monitor your uploads under the "Streaming Uploads" tab in the project view. Here is a guide on how to find that page.

Once completed within Aquarium on the Project page, you'll be able to see your project with an updated count of how many labeled datasets have been added to the Project (the count also includes then number of unlabeled datasets).

Additional Features

Multiple Sensor IDs

Sensor IDs are used reference data points that exist on a frame. They are usually omitted in frames and labels, but become necessary if there exists more than one type of data point on a single frame. A good example of this is a frame with multiple camera view points.

labeled_frame.add_image(sensor_id='camera_front', image_url='')
labeled_frame.add_image(sensor_id='camera_right', image_url='')
labeled_frame.add_image(sensor_id='camera_left', image_url='')

# 2D BBOX label on the `camera_front` image
labeled_frame.add_label_2d_bbox(
    sensor_id='camera_front',
    label_id='unique_id_for_this_label',
    classification='dog',
    top=200,
    left=300,
    width=250,
    height=150
)

# 2D BBOX label on the `camera_left` image
labeled_frame.add_label_2d_bbox(
    sensor_id='camera_left',
    label_id='unique_id_for_this_label',
    classification='cat',
    top=200,
    left=300,
    width=250,
    height=150
)

# Inferences MUST match the same sensor id as the
# corresponding base frame sensor id.
inference_frame.add_inference_2d_bbox(
    sensor_id='camera_front',
    label_id='abcd_inference',
    classification='cat',
    top=200,
    left=300,
    width=250,
    height=150,
    confidence=0.85
)

Quickstart Examples

For examples of how to upload labeled datasets, check out our quickstart examples.

Quickstart Guides

Last updated