DEV Community

Cover image for Enhancing Asset Visualization in Dagster: A UI Filtering Implementation
Johnny Santamaria
Johnny Santamaria

Posted on

Enhancing Asset Visualization in Dagster: A UI Filtering Implementation

"The ability to simplify means to eliminate the unnecessary so that the necessary may speak." — Hans Hofmann


Introduction

Over the past four weeks in February of 2025, I have contributed to Dagster with a team of other two CTI members. Dagster is a powerful data orchestration tool that organizes data from multiple sources. During the first week of this micro-internship, we were introduced to the project and learned about collaborating in open-source development. The tech stack we were given to examine in the issue we were given was Python and Docker. Since I had no experience with Docker, I looked at the official documentation to console on what it is and how it is used in Dagster.

CTI is Computing Talent Initiative which is a program designed to support and accelerate the careers of students from underrepresented backgrounds in computing.

Dagster is an important application for modern data orchestration because it provides a robust and scalable way to develop, test, and deploy data pipelines. Unlike traditional workflow schedulers, Dagster offers a declarative approach to defining data dependencies, ensuring reliability and maintainability. It integrates seamlessly with various data tools, supports modular pipeline development, and enhances observability through built-in monitoring and logging features. By enabling teams to track data lineage, enforce data quality checks, and automate workflows, Dagster helps organizations improve efficiency, reduce errors, and ensure data integrity in complex data engineering processes.

Dagster is typically used by data engineers, data scientists, and DevOps teams that need to build, manage, and monitor reliable data pipelines. These teams use it to automate workflows, ensure data quality, and track dependencies across complex data systems. For example, a financial services company could use Dagster to orchestrate daily data ingestion from multiple stock market APIs, validate the data for accuracy, and generate real-time reports for analysts, ensuring seamless and error-free data processing.

Here is how Dagster is used

  1. Once the product is installed, run it on your machine and then click on assets -> Global asset lineage

  2. Click on Materialize to run the pipeline of the assets

  3. Click View and Run Details

  1. This is where you can view asset logs and metadata

    1. provide essential information for tracking, managing, and maintaining assets efficiently; like images or a piece of code
  2. You can change how the assets are displayed while the program is running by toggling the view buttons on the top left corner

Dagster has around 3.4k users right now to manage data pipelines!

The issue (#26669)

The issue discusses the need for customizable asset ordering in Dagster's UI, particularly in graphs. Currently, the order of asset kinds (e.g., Python, BigQuery, GCS) is unclear and changes unpredictably when asset properties (such as location) are modified. The user wants to define a consistent order and have more flexibility in customization, including the ability to use custom icons or emojis beyond the existing compute_kind field.

Link to the issue: https://github.com/dagster-io/dagster/issues/26669

This issue is important to the Dagster open-source project because it enhances usability and clarity in visualizing data assets. Allowing users to customize the order and representation of asset kinds improves consistency, making data pipelines easier to interpret and manage. This flexibility is especially valuable for teams working with multiple asset types across different regions or services. Addressing this issue would enhance the user experience, making Dagster more adaptable to diverse workflows and increasing its adoption among data engineers.

I was with two other teammates to tackle the issue.

Files that may support the issue

The following folders are central to the filtering functionality:

  • js_modules/dagster-ui/packages/ui-core/src/ui/Filters: This folder contains the necessary code for implementing different filter types within the UI.

  • js_modules/dagster-ui/packages/ui-core/src/ui/BaseFilters: Here, the core base filter logic resides, acting as a foundation for all other filtering operations.

  • js_modules/dagster-ui/packages/ui-core/src: This broader directory encompasses the various filter components, including those related to emojis and icons, which are key for visualizing the data filtering process.

Files for Filtering and Asset Management

  • useKindFilter.tsx

This file manages a filter labeled kind, allowing the user to filter based on specific asset types. It is a vital component that ensures users can easily categorize assets based on predefined kinds, streamlining the process of narrowing down the dataset.

  • #### useStaticSetFilter.tsx

Found through React imports, this file is responsible for processing how filters are applied across the UI. It manages the logic that enables users to filter assets according to static sets, ensuring a flexible and efficient filtering system.

  • #### useStaticSetFilterSorte.oss.tsx

Although this file is invoked within the previous one, it primarily returns null, which suggests it either serves as a placeholder for further functionality or is conditionally used based on certain criteria.

  • #### AssetGraphFilterBar.oss.tsx

The AssetGraphFilterBar plays a critical role in the rendering of assets. It integrates the filters and determines how assets are displayed on the user interface, contributing to the visual aspect of the filtering system.

Dagsters Codebase

When you peek under the hood of Dagster, you'll find a fascinating ecosystem of technologies working in harmony to deliver robust data orchestration. Let's explore the key components that make this powerful platform tick.

The Foundation: Core Infrastructure

At its heart, Dagster relies on a Backend-as-a-Service (BaaS) infrastructure, eliminating the need for manual backend setup. Docker containers ensure consistent deployment across environments, while Vercel handles frontend and serverless applications with ease.

Data Processing Powerhouse

The data processing capabilities are particularly impressive. Spark and Dask work together to handle distributed computing and scale data science workflows. For analytical queries, DuckDB provides lightning-fast in-memory database operations. When it comes to streaming data, Kafka steps in as the backbone for real-time processing.

Modern Frontend Architecture

The user interface is built on React, enhanced with Next.js for server-side rendering. Style management is handled through a combination of Sass and Styled Components, providing a flexible and maintainable approach to CSS. For bundling and optimization, Webpack and Rollup ensure efficient delivery of frontend assets.

AI and Machine Learning Integration

What sets Dagster apart is its embrace of cutting-edge AI technologies. The platform integrates with:

  • OpenAI and Anthropic for natural language processing

  • PyTorch for deep learning capabilities

  • SciKit Learn for traditional machine learning tasks

  • Hugging Face Hub for access to a vast repository of models and datasets

Monitoring and Analytics

Reliability is crucial for data orchestration, and Dagster takes this seriously with:

  • Prometheus for metrics collection and analysis

  • Sentry for error tracking and performance monitoring

  • PostHog and Mixpanel for user behavior analytics

  • Apache Airflow integration for workflow management

Developer Experience

The development experience hasn't been overlooked. Tools like iPython provide an enhanced development shell, while Pytest and Jest ensure code quality through comprehensive testing. For documentation, Docusaurus and Sphinx generate clear, maintainable technical documentation.

Data Visualization and Reporting

Data visualization is powered by a robust stack including:

  • Matplotlib for core plotting capabilities

  • Seaborn for statistical visualizations

  • Graphviz for structured data representation

  • Rive for interactive motion graphics

API and Communication

The API layer leverages GraphQL for flexible data fetching, while FastAPI and Flask provide high-performance web frameworks. For handling concurrent requests, aiohttp enables asynchronous communication.

Scientific Computing Foundation

At its core, Dagster benefits from Python's scientific computing ecosystem:

  • NumPy for numerical operations

  • SciPy for scientific computations

  • Arrow for precise handling of dates and times

This comprehensive technology stack allows Dagster to handle complex data orchestration tasks while remaining flexible and maintainable. Whether you're dealing with big data processing, machine learning workflows, or real-time analytics, Dagster's thoughtfully chosen technology stack provides the tools needed for success.

The beauty of this architecture lies not just in the individual components, but in how they work together to create a seamless experience for data engineers and scientists. As data orchestration needs continue to evolve, Dagster's modern technology stack positions it well for future growth and adaptation.

Here is a System diagram of the codebase regarding the issue

The workflow of how the codebase functions regarding the issue

  1. User Interface Layer
* The user starts by interacting with the Asset Graph UI (A)

* They access the filtering functionality through the Asset Graph Filter Bar (B)

* The Kind Filter Component (C) provides the interface for selecting asset types
Enter fullscreen mode Exit fullscreen mode
  1. Filter Logic Layer
* When a user selects filter criteria, the useKindFilter Hook (D) is triggered

* This hook coordinates with useStaticSetFilter (E) for managing filter states

* Filter Configuration (F) defines how different asset kinds should be displayed and sorted
Enter fullscreen mode Exit fullscreen mode
  1. Data Processing Layer
* Asset Kind Sorting (G) handles the prioritization of asset types (like python, bigquery, gcs)

* Asset Metadata Processing (H) processes the raw asset data

* Filter State Management (I) maintains the current state of selected filters
Enter fullscreen mode Exit fullscreen mode
  1. Storage Layer
* Asset Definitions (J) store the core information about each asset

* Asset Metadata Store (K) contains additional information like kinds and compute resources
Enter fullscreen mode Exit fullscreen mode

Here's a specific use case workflow:

  1. A user wants to filter assets to show only Python and BigQuery assets:
* They start at the Asset Graph UI (A)

* Click on the filter icon in the Asset Graph Filter Bar (B)

* The Kind Filter Component (C) displays available asset types
Enter fullscreen mode Exit fullscreen mode
  1. When the user selects "python" and "bigquery":
* useKindFilter (D) receives these selections

* The hook applies the prioritized sorting defined in the code

* useStaticSetFilter (E) updates the filter state
Enter fullscreen mode Exit fullscreen mode
  1. The Data Processing Layer then:
* Sorts the assets according to the prioritized order (G)

* Processes the metadata to match the filter criteria (H)

* Updates the filter state (I)
Enter fullscreen mode Exit fullscreen mode
  1. Finally:
* The UI updates to show only the selected asset types

* Custom icons are applied (python shows code\_block icon, bigquery shows graph icon)

* The filtered view maintains the prioritized ordering
Enter fullscreen mode Exit fullscreen mode

This workflow implements the original issue's custom ordering and visualization requirements, ensuring consistent asset ordering and proper representation of different asset kinds in the UI.

Challenges Overcame

When I first started working with Dagster, I encountered an interesting challenge during the installation process. Despite following the documentation carefully on my Mac M2, including some specific commands for M2 compatibility, I ran into a puzzling error with the assets.py file.

After pushing my changes to a separate branch and returning to the project a few days later, I was greeted with a DagsterUserCodeLoadError. The error message was clear but cryptic at first glance:

dagster._core.errors.DagsterUserCodeLoadError: Error occurred during the loading of Dagster definitions in
executable_path=/workspaces/dagster/venv/bin/python, python_file=quickstart/assets.py, working_directory=/workspaces/dagster
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_grpc/server.py", line 417, in init
    self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories(
                                                              ^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_grpc/server.py", line 239, in init
    with user_code_error_boundary(
  File "/usr/local/python/3.12.1/lib/python3.12/contextlib.py", line 158, in exit
    self.gen.throw(value)
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_core/errors.py", line 299, in user_code_error_boundary
    raise new_error from e
The above exception was caused by the following exception:
FileNotFoundError: [Errno 2] No such file or directory: 'quickstart/assets.py'
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_core/errors.py", line 289, in user_code_error_boundary
    yield
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_grpc/server.py", line 250, in init
    loadable_targets = get_loadable_targets(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_grpc/utils.py", line 41, in get_loadable_targets
    else loadable_targets_from_python_file(python_file, working_directory)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_core/workspace/autodiscovery.py", line 24, in loadable_targets_from_python_file
    loaded_module = load_python_file(python_file, working_directory)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dagster/venv/lib/python3.12/site-packages/dagster/_core/code_pointer.py", line 73, in load_python_file
    os.stat(python_file)

What I did to solve this challenge

1. Version Control

My first instinct was to check if this was a version mismatch issue. I systematically:

  • Uninstalled the current Dagster version

  • Installed specifically version 1.8.4

  • Updated the requirements.txt file

  • Verified the version using Python's import system

2. File Location Detective Work

When the version fix didn't solve the issue, I realized it might be a path problem. The error message suggested that Dagster couldn't find assets.py, so I:

  • Used the file finder to check my current directory

  • Discovered I was in the wrong folder

  • Attempted to run the command with the correct path: dagster dev -f quickstart/assets.py

3. Documentation Deep Dive

Finally, I traced the issue back to a potential documentation unclear point. The docs stated "navigate to your project's root directory," but this needed more specificity. I:

  • Reviewed the environment setup steps

  • Followed the "getting started" section

  • Realized I needed to be in the "dagster-quickstart" directory specifically

This is the Dagster documentation I consulted for the entirety of this issue, as well as tackling this challenge

Solution

The breakthrough came when I used Git's find command to locate the exact path of the assets.py file:

find /quickstart -name "assets.py"

Lessons learned

This experience taught me several valuable lessons about working with Dagster:

  1. Always verify file paths when working with Dagster's file-based configuration

  2. Use Git commands to help locate files in complex project structures

  3. Pay close attention to the working directory when running Dagster commands

  4. Keep track of Dagster versions to ensure compatibility

For anyone else setting up Dagster, I recommend double-checking your working directory and using the find command if you encounter similar path-related errors. Sometimes the simplest solutions are the most effective!

Solution of issue #26669 - A Breakdown

The solution has two main components:

  1. Prioritized Asset Ordering
  • Implemented a priority system for asset kinds (python, bigquery, gcs)

  • Created a consistent sorting mechanism that maintains order even when asset properties change

  • Used React's useMemo hook for performance optimization

  1. Custom Icon Mapping
  • Added a configurable icon mapping system for different asset types

  • Implemented fallback to default icons for undefined asset kinds

  • Enhanced visual distinction between different asset types

Technical Implementation

The solution leverages several key technologies from our stack:

  1. React Components and Hooks
  • Used React's useMemo hook for efficient asset kind sorting

  • Implemented custom hooks for state management

  • Integrated with Dagster's existing UI component library

  1. Styled Components
  • Utilized Styled Components for consistent visual styling

  • Implemented flexible layout using Box component

  • Maintained visual hierarchy through consistent spacing

The useKindFilter.tsx file is central to the repositories issue since it handles two critical aspects of the Dagster UI enhancement:

  1. Asset Type Filtering
  • It's a React custom hook that manages how different types of assets (Python, BigQuery, GCS, etc.) are filtered in the UI

  • Controls what users see when they want to view specific types of assets in their data pipelines

  1. Asset Ordering
  • This file contains the logic for how assets are sorted and displayed

  • Implements the prioritization system that determines which asset types appear first

  • Solves the original issue where asset ordering was inconsistent and unpredictable

The main issue was that users had no control over how their assets were ordered in the UI, and the ordering would change unexpectedly. By modifying useKindFilter.tsx, we could add:

  • Consistent ordering rules

  • Custom icons for different asset types

  • A more predictable filtering experience

Think of it as the "traffic controller" for how assets appear and are organized in Dagster's interface - it's where we define the rules for what shows up where and how it looks.

Pathway to Pull request #26669: Asset Filtering and Metadata System

The main parts of the tech stack we needed:

  • Python, hosts the UI of the assets

  • React, displays the assets

Our team tackled the challenge of enhancing asset filtering and visualization in our data pipeline management system. The solution involved several key components:

  1. Customization of Icons: We expanded the icon options in the useKindFilter.tsx file, allowing users to associate custom icons with different asset types. This improvement leverages the @dagster-io/ui-components library, a core part of our tech stack for building intuitive user interfaces.

  2. Asset Metadata Enhancement: My teammate Brent made significant contributions by introducing a new location field to the asset metadata. This update required modifications to several core Dagster files:

* python\_modules/dagster/dagster/\_core/assets.py  

* python\_modules/dagster/dagster/\_core/definitions/asset\_out.py  

* python\_modules/dagster/dagster/\_core/definitions/decorators/asset\_decorator.py  
Enter fullscreen mode Exit fullscreen mode
  1. These changes allow us to specify where an asset is hosted, enhancing our system's ability to track and manage assets across different locations.

  2. Filter Optimization: We implemented a prioritized sorting system for asset kinds in the useKindFilter hook. This improvement ensures that commonly used asset types like 'python', 'bigquery', and 'gcs' appear at the top of the filter list, improving user experience and efficiency.

  3. Integration with Existing Components: Our solution integrates seamlessly with the Asset Catalog component of our system diagram. By enhancing the filtering and metadata capabilities, we've improved how users interact with and visualize assets within the catalog.

To verify the functionality of our solution, we ran a series of tests:

  1. We created a test pipeline to ensure the new icon customizations rendered correctly in the UI.

  2. We manually tested the asset kind filter to confirm that prioritized kinds appeared at the top of the list.

  3. While Brent's changes to the asset metadata weren't fully tested, we plan to create comprehensive unit tests to verify the correct handling of the new location field.

  4. We ran the local development version of the app in the Dagster documentation in the section titled “Developing the Dagster webserver/UI”

Here's a code snippet showcasing the icon customization in useKindFilter.tsx:

const customIcons: {[key: string]: IconName | string} = {
  python: 'code_block',
  bigquery: 'graph',
  gcs: 'cloud',
  bug: 'bug',
  calendar: 'calendar'
  // More icons can be added here
};

// Use custom icon or emoji if available, otherwise default to 'compute_kind'
const icon = customIcons[value.value] || 'compute_kind';

return (
  <Box flex={{direction: 'row', gap: 4, alignItems: 'center'}}>
    {typeof icon === 'string' ? <span>{icon}</span> : <Icon name={icon} />}
    <TruncatedTextWithFullTextOnHover tooltipText={value.value} text={value.value} />
  </Box>
);

This code allows for easy expansion of custom icons for different asset types, improving the visual representation of assets in our system.

While we haven't submitted a pull request yet, we plan to do so once we've completed thorough testing of all components, especially the new asset metadata features introduced by Brent.

—--

To-Do after meeting with our CTI Codeday assigned mentor: Dhiraj Patil

  • Add customization of icons, be able to customize what icons users can use in icon.tsx

  • More icons in useKindFilter.tsx

  • Run a pipeline to see how the issue functions

Create a pull request with the work so far with tests

File I changed:

  • useKindFilter.tsx

The file with what code I changed:

import {Box, Icon, IconName} from '@dagster-io/ui-components';
import {useMemo} from 'react';

import {COMMON_COLLATOR} from '../../app/Util';
import {TruncatedTextWithFullTextOnHover} from '../../nav/getLeftNavItemsForOption';
import {StaticBaseConfig, useStaticSetFilter} from '../BaseFilters/useStaticSetFilter';

const emptyArray: any[] = [];

export const useKindFilter = ({
allAssetKinds,
kinds,
setKinds,
}: {
allAssetKinds: string[];
kinds?: null | string[];
setKinds?: null | ((s: string[]) => void);
}) => {
// Sort asset kinds with prioritized kinds first
const sortedAssetKinds = useMemo(() => {
  // Define prioritized kinds
  const prioritizedKinds = ['python', 'bigquery', 'gcs'];
  return [
    ...prioritizedKinds,
    ...allAssetKinds.filter((kind) => !prioritizedKinds.includes(kind)).sort((a, b) => COMMON_COLLATOR.compare(a, b)),
  ];
}, [allAssetKinds]);

return useStaticSetFilter<string>({
  ...BaseConfig,
  allValues: useMemo(
    () =>
      sortedAssetKinds.map((value) => ({
        value,
        match: [value],
      })),
    [sortedAssetKinds],
  ),
  menuWidth: '300px',
  state: kinds ?? emptyArray,
  onStateChanged: (values) => {
    setKinds?.(Array.from(values));
  },
  canSelectAll: true,
});
};

export const getStringValue = (value: string) => value;

export const BaseConfig: StaticBaseConfig<string> = {
name: 'Kind',
icon: 'compute_kind',
renderLabel: (value) => {
  // Define custom icons or emojis for specific kinds
  const customIcons: {[key: string]: IconName | string} = {

    // icons pulled from icon.tsx
    python: 'code_block', // Emoji for python
    bigquery: 'graph', // Emoji for bigquery
    gcs: 'cloud', // Emoji for gcs
    bug: 'bug',
    calendar: 'calendar'

    // Add more custom icons or emojis here
  };

  // Use custom icon or emoji if available, otherwise default to 'compute_kind'
  const icon = customIcons[value.value] || 'compute_kind';

  return (
    <Box flex={{direction: 'row', gap: 4, alignItems: 'center'}}>
      {typeof icon === 'string' ? <span>{icon}</span> : <Icon name={icon} />}
      <TruncatedTextWithFullTextOnHover tooltipText={value.value} text={value.value} />
    </Box>
  );
},
getStringValue,
getKey: getStringValue,
matchType: 'all-of',
};

export function useAssetKindsForAssets(
assets: {definition?: {kinds?: string[] | null} | null}[],
): string[] {
return useMemo(
  () =>
    Array.from(new Set(assets.map((a) => a?.definition?.kinds || []).flat())).sort((a, b) =>
      COMMON_COLLATOR.compare(a, b),
    ),
  [assets],
);
}

One of my  teammates made a change toward the issue solution

“I introduced a new location field to the asset metadata to specify where an asset is hosted, I then updated relevant classes and decorators to support the location, I modified three files:

python_modules/dagster/dagster/_core/assets.py

python_modules/dagster/dagster/_core/definitions/asset_out.py

-I included a location parameter to the AssetOut class

-Modified the constructor to handle the location field and ensure it is correctly passed to the asset metadata.

python_modules/dagster/dagster/_core/definitions/decorators/asset_decorator.py

-Modified the asset decorator to hand the location parameter

-Made sure that the location is correctly propagated in the asset definition

Unfortunately, I wasn't able to test these changes for now, I might do so in the following days, I will submit a new pull request if I succeed”

Conclusion

Throughout this four-week micro-internship, my team and I achieved a lot on the topic given to us in Dagster, even though we could not finish it fully. We learned and applied Python and Docker in our tasks, and even though I had no experience with Docker at first, I soon knew about it through the official documentation that I went through. Collaboration with an open-source community provided invaluable hands-on experience in integration of a wider group of code, debugging, and problem-solving. Our mentor, Dhiraj Patil, acknowledged our development, reaffirming the value of our work. Through this process, my data orchestration expertise further gained strength while both my technical ability and teamwork potential were established well for potential projects in the open-source community.

Following is the link to our solution:

https://github.com/dagster-io/dagster/pull/28007


Top comments (0)