Labeled Datasets

Labeled datasets include all of your ground truth labels for your dataset

Overview

Labeled datasets contain your ground truth/labels. A labeled dataset belongs to a Project and consist of multiple labeled frames.

A labeled frame is one logical "frame" of data, such as an image from a camera stream. They can contain one or more media/sensor inputs, zero or more ground truth labels, and arbitrary user provided metadata.

For example, in a 2D classification case, a frame would contain the image, labels, and all associated metadata. Whereas with a 3D object detection use case, a frame can contain images, point clouds, labels, and metadata too.

For real examples of uploading labeled data, please look at our quickstart guides!

Prerequisites to Uploading Labeled Data

In order to ensure the following steps will work smoothly, this guide assumes you have already:

  • Installed the Aquarium SDK

  • Have URLs for your raw data (images, point clouds, etc.)

    • See our data sharing docs for more details on URL requirements.

  • Have access to your labels for your raw data

To view your labeled data once uploaded, you will have to make sure that you have selected and set up the appropriate data sharing method for your team.

Creating and Formatting Your Labeled Data

To ingest a labeled dataset, there are two main objects you'll work with:

  • LabeledFrame

    • An object containing all relevant information for a single frame/image. Data URLs, labels, metadata, etc.

  • LabeledDataset

    • A collection of LabeledFrames.

For each datapoint, we create a LabeledFrame and add it to the LabeledDataset in order to create the dataset that we upload into Aquarium.

This usually means looping through your data and creating LabeledFrames to add to the LabeledDataset object.

If you have generated your own embeddings and want to use them during your labeled data uploads, please also see this section for additional guidance!

Defining these objects looks like this:

Once you've defined your frame, we need to associate some data with it! In the next sections, we show you how to add your main form of input data to the frame (images, point clouds, etc), and then associate the ground truth labels to that frame.

Adding Data to Your Labeled Frame

Each LabeledFrame in your dataset can contain one or more input pieces of data. In many computer vision tasks, this may be a single image. In a robotics or self-driving task, this may be a full suite of camera images, lidar point clouds, and radar scans.

Here are some common data types, their expected formats, and how to work with them in Aquarium:

Your ML task utilizes images and you would like to add an image to your labeled data

Python API Definition Link

Example Usage

Relevant Function Parameter Descriptions

Parameter Name
Description

image_url

string - URL to load the image by

preview_url (Optional)

string - URL to a compressed form of the image for faster loading in browsers, must be same pixel dimensions as original image

date_captured (Optional)

ISO formatted date-time string

width (Optional)

int - will be inferred otherwise

height (Optional)

int - will be inferred otherwise

Adding Labels to Your Labeled Frame

Each labeled frame in your dataset can contain zero or more ground truth labels.

Here are some common label types, their expected formats, and how to work with them in Aquarium:

You are working with 2D or 3D data and want to add a classification label

Python API Definition Link - 2D

Example Usage

Relevant Function Parameter Descriptions

Parameter Name
Description

label_id

string - a unique id across all other labels in this dataset

classification

string - what the label is classified as

user_attrs (Optional)

dict - Any additional label-level metadata fields. Defaults to None.

Making Metadata Fields Queryable

When you add metadata fields to your Label, we need to take one extra step so that you can query those fields and search for them in the Analysis view! Add the code below to your script, you can add it in right after you call create_dataset():

You can also run update_dataset_object_metadata_schema() on it's own after an upload to update the metadata to be queryable.

You can run this snippet on it's own after an upload:

Putting It All Together

In the API docs you can see the other operations associated with a LabeledFrame.

Now that we've discussed the general steps for adding labeled data, here is an example of what this would look like for a 2D classification example would look like this:

Uploading Your Labeled Dataset

Now that we have everything all set up, let's submit your new labeled dataset to Aquarium!

Aquarium does some processing of your data, like indexing metadata and possibly calculating embeddings, so after they're submitted so you may see a delay before they show up in the UI. You can view some examples of what to expect as well as troubleshooting your upload here!

Submitting Your Dataset

You can submit your LabeledDataset to be uploaded in to Aquarium by calling .create_dataset().

To spot check our data immediately, we can set the preview_first_frame flag toTrue and see a link in the console to a preview frame allows you to make sure data and labels look right.

This is an example of what the create_dataset() call will look like:

After kicking off your dataset, it can take anywhere from minutes to multiple hours depending on your dataset size.

You can monitor your uploads under the "Streaming Uploads" tab in the project view. Here is a guide on how to find that page.

Once completed within Aquarium on the Project page, you'll be able to see your project with an updated count of how many labeled datasets have been added to the Project (the count also includes then number of unlabeled datasets).

You can see your new labeled dataset fully uploaded and reflected in the count in the bottom left corner of the project card.

Additional Features

Multiple Sensor IDs

Sensor IDs are used reference data points that exist on a frame. They are usually omitted in frames and labels, but become necessary if there exists more than one type of data point on a single frame. A good example of this is a frame with multiple camera view points.

Quickstart Examples

For examples of how to upload labeled datasets, check out our quickstart examples.

Quickstart Guides

Last updated

Was this helpful?