Evaluate Model Performance
Please read about getting started with Segments first.
Model Performance Segments
Model Performance Segments allow you to organize your dataset in Aquarium and understand model performance across important subsets of your dataset.
Use Model Performance Segments to:
Group subsets of your dataset for performance reporting based on metadata, embedding space, model performance, domain, etc.
Define target performance thresholds for Precision, Recall and F1 against the grouped set of frames.
Compare multiple inference sets across all Model Performance Segments and identify areas of under/over performance.
Model Performance Segment Types
Split - Use split segments to track and report model performance on the Test, Training and Validation sets within your dataset.
Regression Test - Use regression test segments to define subsets of the dataset that mimic your domain's hardest problems, set target model performance thresholds and then easily assess pass-fail for every model experiment or release candidate.
Scenario - Use scenarios to evaluate model performance against subsets of your data
Creating Model Performance Segments
Anywhere you can select a set of frames within a dataset, you can define a Model Performance Segment.
Select a set of frames (e.g. based on a query, embedding space, model performance, etc.)
Click the Add to Segment button
Ensure the Frame / Crop toggle is set to Frame
Choose the Model Performance Segment type
Fill in relevant metadata, including the optional performance target threshold
Click Save
Once submitted, your Model Performance Segments will appear in both the Metrics View and the Model Performance subsection of the Segments page.
Learn more about analyzing your inference sets in the context of Model Performance Segments.
Managing Model Performance Segments
Access the details for each Model Performance Segment from the Segments page under the Model Performance subsection.
From the elements tab of the segment details page, use Similarity Search to identify other frames from the dataset that may belong in your Regression Test or Scenario Model Performance Segments.
From the metrics tab of the segment details page, set or update performance targets and view relative model performance for all submitted inference sets.
Common Use Cases
Measuring Model Performance Against the Test and Validation Sets
For every dataset you upload to Aquarium, provide split as a metadata field.
For each split
Use the query bar to filter the dataset to frames in that split typically (e.g.
user__split:
training
Select all frames and click add to segment
Choose the Split segment type and fill in the relevant metadata.
Once defined, every uploaded inference set will automatically have performance calculated for each split. Compare and contrast performance across splits or inference sets using the Model Metrics View.
Evaluate Release Candidates or Experiments Against Business Critical Outcomes using Regression Tests
Based on model performance or domain context, define sets of frames that represent critical scenarios the model must perform well in.
Set metric thresholds based on the current deployed model's baseline or other outcome oriented targets.
Use the multi-model comparison view on the Scenarios tab to quickly learn whether a release candidate model
improved or regressed relative to the baseline
surpassed the target thresholds for the individual Regression Test
Last updated