2020-11-06
App Improvements
Projects Summary Page
The main "Projects" page now presents a summary view of your organizations projects, including the number of issues and member datasets / inference sets.
It also provides support in the UI for archiving datasets and projects, which was previously only exposed via the python client.
Issue Element Quick-View
Previously, looking at the full context for an issue element would require opening a new tab, which could be cumbersome when attempting to look through many samples. Now, clicking on an issue element will quickly open a full detail view in the same window, like in the main dataset search interface.
Issues From Multiple Datasets
Previously, issues would support adding elements from multiple datasets, but wouldn't track the full history of which datasets / inference sets each element came from -- just the element ids and the project id. Issues now fully support adding elements from multiple datasets within the project.
Previous/Next Frame Buttons
When looking through search results, the left and right arrow keys (or corresponding buttons on the UI) will now move to the next frame without having to close the full-screen view.
No More Empty Issue Names
If you're like me, you've accidentally created an issue with no name. That's no longer possible.
Python Client / Data Upload Improvements
Reduced Memory Load While Uploading (Round 1)
When uploading larger datasets (especially with large embedding vectors), the python client's memory usage could get prohibitively high. Newer python client versions have a cleaner upload process, which reduces temporary duplication of dataset information in memory.
Explicit Type Schemas for Custom Metadata
By default, Aquarium attempts to infer datatypes from your custom dataset metadata fields. This works well in the simple cases, but can lead to non-obvious schema related failures, especially when the metadata includes nullable fields.
The python client now allows users to explicitly specify the schema that should be used for custom metadata fields, and will warn when you're attempting to provide values that are likely to produce errors later.
Data Ingestion Job Status
Until now, the data ingestion process has been pretty opaque. You upload data, then wait, and hope it resolves as successful some time later.
The python client now supports reporting data ingestion job status. You can see as the job is accepted, when resources are fully allocated to it, and whether it succeeds or fails. This should be a great starting point to see what's happening with your datasets, and we're planning a lot more work on this in the coming weeks.
Embedding Visualization Speedups
For a variety of reasons, the embedding visualizer requires the most attention to scale with larger datasets. We've made a lot of behind-the-scenes changes over the last few weeks to better support datasets with millions of labels.
This should manifest as reduced network bandwidth (with better local asset caching), load times, and embedding UI responsiveness.
Last updated