Investigate Dev/Debug Mode visualisation on the flowchart #1464

amandakys · 2023-07-27T10:36:56Z

Description

Following on from the work on displaying dataset statistics in the metadata panel (#662), being able to display dataset statistics or other dataset specific information on the flowchart was raised as a possible extension of that work.

Context

Viewing dataset specific information on the flowchart was first proposed by a user to improve the debugging workflow. The user describes using dataset size to diagnose where the pipeline had failed.

Possible Implementation

Where to access the feature

Currently we have two places where actions affect the display of the flowchart.

Filters
Flow chart menu

In the flowchart menu we change how content in the flowchart is displayed. Show/Hide labels and Show/Hide Layers.
In the filters menu we change what content is displayed in the flowchart.
There is some overlap in that when parameters selected for display in the filters menu, they appear in the flowchart with a yellow outline.

As such, a feature that displayed dataset specific information on the flowchart could fit in both menus.

In the filter menu, dataset statistics could be defined as a Element Type, then when enabled they would be displayed in the flowchart.
In the flowchart menu, dataset statistics would be more like an overlay, on top of the current flowchart view. This would either be displayed for all datasets simultaneously (potentially a rendering/visualisation challenge) or only visible on hover.

IMO the flowchart menu is a better place for it as dataset statistics are inherently tied to datasets so they don't add extra elements to the flowchart. But I notice that Tags is also an element type, so maybe thats similar.

The feature

What information should be displayed once enabled?

for now dataset statistics focuses on row/columns/file-size
there is scope to allow users to define their own values, and this should be taken in to account when we decide how to display this information in the flowchart.

How should datasets be displayed once enabled?

There are a wide variety of options:

Tooltips
Coloured Outline
Expanding the nodes to show more content
Displaying information along the lines

To differentiate between these options, more information would be needed on what users would get out of this feature, what they wanted to see, how/what they planned to use it for. If we better understand the motivation for people accessing this feature we can pick the more appropriate option for visualising it.

Next Steps

find out more about use case for these feature
evaluate design options based on these findings

tynandebold · 2023-08-07T18:03:08Z

Thank you for writing this up 💥

A couple points of clarification from me:

There are actually three places where actions affect the display of the flowchart, the third being from the sidebar. From there, you're able to hide/show nodes and focus/expand modular pipelines.
Tags aren't element types. When you view the demo website, that fact isn't too clear. But if you take a look at the demo with different data you'll see that Tags are their own paradigm and organizational grouping, much like Element types group nodes, datasets, and parameters.

tynandebold · 2023-08-14T14:50:07Z

Additional considerations and actions:

Write up and post a Slack poll asking what other things users may want to see overlaid on the Flowchart, or more broadly, ask about user's debugging workflows and how they debug.
The only thing that needs to be considered is how easy it would be to see the stats on the flowchart, especially when the flow chart is big.
For huge pipelines it may be good to narrow down debug mode on a per layer basis? E.g. giving users a debug settings panel to enable a layer-wide debug mode.

NeroOkwa · 2023-08-17T13:25:08Z

Here are some additional user feedback regarding this pain point and debugging use case:

“And in case something fails, it will be amazing if you can say I know that in this, at this point, something fails, so this box, it's red [indicates node on Viz]”.
“I'm always debugging because I'm always running the whole pipeline. When you see the error in Kedro, it's not always clear where something is failing. So if you can validate that quickly in Kedro-Viz, it'll be super useful".

“I think there's something around debugging… something like this [code preview] is probably more helpful than if we're just gonna go line by line through the code so that I can understand what part of the code are we running into issues and then help".

Using Kedro-Viz to debug a Kedro project - “I think an unintended use, for example, when my team tell me the project is ready, and the first thing I do is to plot the repository, install the requirement, open Kedro-Viz. If there's an error and I can run it, I know that there's a problem in the catalog in the pipe somewhere, but that's one of things I use Kedro-Viz for".

“So there were times when Kedro-Viz, obviously it needs all of the dependencies to be working correctly in order to generate the, the visualisation. At times, it felt like it would be valuable if those dependencies weren't accurate. If those pieces could be orphaned as almost like troubleshooting where the pipeline might be going wrong. It's more nice to have, obviously the pipeline should just be right, but it felt like it might be a cool opportunity".
“This would help to track down upstream and downstream the missing dependencies or where something might be broken in a pipeline to make troubleshooting easier using DAGs instead of the IDEs ".

Copying the initial user comment and feature request on the slack channel here:
"I want to log the number of rows for the datasets at each step of my pipeline. It's for debugging. The goal is to notice big drop of rows during one data transformation step. For example, after one node, I may see that my number of lines drops by 30% when it’s supposed to stay the same."

CC @amandakys

yetudada · 2023-09-01T10:10:35Z

We're going to close this issue and recreate a larger research task about debugging.

amandakys added the Issue: Feature Request label Jul 27, 2023

amandakys mentioned this issue Jul 27, 2023

Visualize size of processed datasets #662

Closed

tynandebold added this to Kedro-Viz Jul 27, 2023

tynandebold moved this to Inbox in Kedro-Viz Jul 27, 2023

tynandebold added Design: Research Design: Visual Design labels Jul 27, 2023

tynandebold moved this from Inbox to Backlog in Kedro-Viz Aug 14, 2023

tynandebold moved this from Backlog to Todo in Kedro-Viz Aug 21, 2023

NeroOkwa moved this from Todo to In Progress in Kedro-Viz Aug 25, 2023

yetudada closed this as completed Sep 1, 2023

github-project-automation bot moved this from In Progress to Done in Kedro-Viz Sep 1, 2023

yetudada mentioned this issue Sep 1, 2023

Understand debugging use cases #1519

Open

stephkaiser mentioned this issue Nov 10, 2023

[Debugging] Visualise dataset statistics in the Flowchart #1635

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Dev/Debug Mode visualisation on the flowchart #1464

Investigate Dev/Debug Mode visualisation on the flowchart #1464

amandakys commented Jul 27, 2023 •

edited

Loading

tynandebold commented Aug 7, 2023

tynandebold commented Aug 14, 2023 •

edited

Loading

NeroOkwa commented Aug 17, 2023

yetudada commented Sep 1, 2023

Investigate Dev/Debug Mode visualisation on the flowchart #1464

Investigate Dev/Debug Mode visualisation on the flowchart #1464

Comments

amandakys commented Jul 27, 2023 • edited Loading

Description

Context

Possible Implementation

Where to access the feature

The feature

Next Steps

tynandebold commented Aug 7, 2023

tynandebold commented Aug 14, 2023 • edited Loading

NeroOkwa commented Aug 17, 2023

yetudada commented Sep 1, 2023

amandakys commented Jul 27, 2023 •

edited

Loading

tynandebold commented Aug 14, 2023 •

edited

Loading