Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Dev/Debug Mode visualisation on the flowchart #1464

Closed
2 tasks
amandakys opened this issue Jul 27, 2023 · 4 comments
Closed
2 tasks

Investigate Dev/Debug Mode visualisation on the flowchart #1464

amandakys opened this issue Jul 27, 2023 · 4 comments

Comments

@amandakys
Copy link

amandakys commented Jul 27, 2023

Description

Following on from the work on displaying dataset statistics in the metadata panel (#662), being able to display dataset statistics or other dataset specific information on the flowchart was raised as a possible extension of that work.

Context

Viewing dataset specific information on the flowchart was first proposed by a user to improve the debugging workflow. The user describes using dataset size to diagnose where the pipeline had failed.

Possible Implementation

Where to access the feature

Currently we have two places where actions affect the display of the flowchart.

  1. Filters
    image

  2. Flow chart menu
    image

In the flowchart menu we change how content in the flowchart is displayed. Show/Hide labels and Show/Hide Layers.
In the filters menu we change what content is displayed in the flowchart.
There is some overlap in that when parameters selected for display in the filters menu, they appear in the flowchart with a yellow outline.

As such, a feature that displayed dataset specific information on the flowchart could fit in both menus.

  • In the filter menu, dataset statistics could be defined as a Element Type, then when enabled they would be displayed in the flowchart.
  • In the flowchart menu, dataset statistics would be more like an overlay, on top of the current flowchart view. This would either be displayed for all datasets simultaneously (potentially a rendering/visualisation challenge) or only visible on hover.

IMO the flowchart menu is a better place for it as dataset statistics are inherently tied to datasets so they don't add extra elements to the flowchart. But I notice that Tags is also an element type, so maybe thats similar.

The feature

What information should be displayed once enabled?

  • for now dataset statistics focuses on row/columns/file-size
  • there is scope to allow users to define their own values, and this should be taken in to account when we decide how to display this information in the flowchart.

How should datasets be displayed once enabled?

There are a wide variety of options:

  • Tooltips
  • Coloured Outline
  • Expanding the nodes to show more content
  • Displaying information along the lines

To differentiate between these options, more information would be needed on what users would get out of this feature, what they wanted to see, how/what they planned to use it for. If we better understand the motivation for people accessing this feature we can pick the more appropriate option for visualising it.

Next Steps

  • find out more about use case for these feature
  • evaluate design options based on these findings
@tynandebold
Copy link
Member

Thank you for writing this up 💥

A couple points of clarification from me:

  • There are actually three places where actions affect the display of the flowchart, the third being from the sidebar. From there, you're able to hide/show nodes and focus/expand modular pipelines.
  • Tags aren't element types. When you view the demo website, that fact isn't too clear. But if you take a look at the demo with different data you'll see that Tags are their own paradigm and organizational grouping, much like Element types group nodes, datasets, and parameters.

@tynandebold
Copy link
Member

tynandebold commented Aug 14, 2023

Additional considerations and actions:

  1. Write up and post a Slack poll asking what other things users may want to see overlaid on the Flowchart, or more broadly, ask about user's debugging workflows and how they debug.
  2. The only thing that needs to be considered is how easy it would be to see the stats on the flowchart, especially when the flow chart is big.
  3. For huge pipelines it may be good to narrow down debug mode on a per layer basis? E.g. giving users a debug settings panel to enable a layer-wide debug mode.

@tynandebold tynandebold moved this from Inbox to Backlog in Kedro-Viz Aug 14, 2023
@NeroOkwa
Copy link
Contributor

Here are some additional user feedback regarding this pain point and debugging use case:

“And in case something fails, it will be amazing if you can say I know that in this, at this point, something fails, so this box, it's red [indicates node on Viz]”.
“I'm always debugging because I'm always running the whole pipeline. When you see the error in Kedro, it's not always clear where something is failing. So if you can validate that quickly in Kedro-Viz, it'll be super useful".

“I think there's something around debugging… something like this [code preview] is probably more helpful than if we're just gonna go line by line through the code so that I can understand what part of the code are we running into issues and then help".

Using Kedro-Viz to debug a Kedro project - “I think an unintended use, for example, when my team tell me the project is ready, and the first thing I do is to plot the repository, install the requirement, open Kedro-Viz. If there's an error and I can run it, I know that there's a problem in the catalog in the pipe somewhere, but that's one of things I use Kedro-Viz for".

“So there were times when Kedro-Viz, obviously it needs all of the dependencies to be working correctly in order to generate the, the visualisation. At times, it felt like it would be valuable if those dependencies weren't accurate. If those pieces could be orphaned as almost like troubleshooting where the pipeline might be going wrong. It's more nice to have, obviously the pipeline should just be right, but it felt like it might be a cool opportunity".
“This would help to track down upstream and downstream the missing dependencies or where something might be broken in a pipeline to make troubleshooting easier using DAGs instead of the IDEs ".

Copying the initial user comment and feature request on the slack channel here:
"I want to log the number of rows for the datasets at each step of my pipeline. It's for debugging. The goal is to notice big drop of rows during one data transformation step. For example, after one node, I may see that my number of lines drops by 30% when it’s supposed to stay the same."

CC @amandakys

@tynandebold tynandebold moved this from Backlog to Todo in Kedro-Viz Aug 21, 2023
@NeroOkwa NeroOkwa moved this from Todo to In Progress in Kedro-Viz Aug 25, 2023
@yetudada
Copy link
Contributor

yetudada commented Sep 1, 2023

We're going to close this issue and recreate a larger research task about debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants