# Finding Similar Elements Within a Dataset

Oftentimes, when you find element(s) of interest within your dataset or inference set, you may also want to identify other similar elements (for instance, to export for relabeling).

While it's possible to only use the embedding view to find nearby points, the embedding view has been reduced down to 2 dimensions for visualization purposes, so its notion of "distance" is not entirely accurate.

Aquarium provides two ways of finding nearby elements in actual embedding space:

1. **Related Images:** For a single, specific element.
2. **Segment-based Similarity Search:** For a set of "seed" elements. This is a similar approach to [*Collection Campaigns*](/aquarium/common-workflows/collect-relevant-data.md), but for elements within the same dataset (rather than new unlabeled elements).

### Related Images

If you select a specific element from your dataset or inference set, this new feature allows you to find similar, **existing** elements from that same set.

{% embed url="<https://drive.google.com/file/d/13FrqV4z6zwjLshFS8EYAZuB0wMWF_obP/view>" %}

### Issue-based Similarity Search

{% hint style="warning" %}
**Note:** This feature is not supported in the following situations (the web app will display a warning in these cases):

* Older datasets (since they didn't undergo the necessary post-processing)
* Segments with elements from multiple datasets
  {% endhint %}

If you've identified a few problematic elements in a *Segment,* you may want an easy way to "grow" the issue by finding other similar elements.

Once an issue is created, you can generate similar elements by going to the *Similar Dataset Elements* tab and clicking the *Calculate Similar Dataset Elements* button as follows:

{% embed url="<https://drive.google.com/file/d/1S1qKuHOzsYop-dyQqPyF-ttEHFJgKpca/view?usp=sharing>" %}

You can then select and add the elements you want to your original issue.

Every time you add or remove elements to your issue, you have the option of recalculating its similar elements.

If you've already created an issue (e.g. identified a few [*problematic labels*](/aquarium/working-in-aquarium/analyzing-model-inferences.md#surfacing-labeling-errors) or [*model failures*](/aquarium/working-in-aquarium/analyzing-model-inferences.md#finding-model-failure-patterns) from the **Explore view** of the app), you can iterate through the following loop to curate and grow it:

1. Calculate similar dataset elements.
2. Add some desired subset of the suggested "similar elements" to the original issue.
3. Rinse and repeat!

This flow can used to maximize the effectiveness of your [*Collection Campaigns*](broken://pages/-MQZGS5xFJn3GtC5dA8g). For collection campaigns, a well-curated set of issue elements will help you achieve better sampling results. Ordinarily, this can be a lengthy and tedious manual process, but similarity search will help speed that up.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://legacy-docs.aquariumlearning.com/aquarium/working-in-aquarium/finding-similar-elements-within-a-dataset.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
