Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating the Kedro-Viz experience for large pipelines #1726

Closed
NeroOkwa opened this issue Jan 31, 2024 · 4 comments
Closed

Evaluating the Kedro-Viz experience for large pipelines #1726

NeroOkwa opened this issue Jan 31, 2024 · 4 comments

Comments

@NeroOkwa
Copy link
Contributor

NeroOkwa commented Jan 31, 2024

Description

This research study is aimed at investigating the experience of technical users when rendering their pipelines in Kedro-Viz. These technical users include: Data Scientists, Data Engineers, and Machine Learning Engineers.

Research Objective

The primary objective is to assess the overall experience of Kedro-Viz for technical users with large pipelines.
It would seek to identify their workflow, pain points, and unmet needs.

A large project is defined as one with 1000 nodes (this is the current size warning on Kedro-Viz).
As shown in the graph below, this make up only 5% of Kedro projects. This is still important as this (internal) user group are active Kedro-Viz users.

The hypothesis are:

  1. The existing experience of Kedro-Viz meets the needs of these users.
  2. Users experience a decrease in performance when rendering large graphs.
  3. Users are aware of the ‘collapsible modular pipeline’ feature in the side panel.
  4. Users are aware of the ‘size warning‘ toggle in the settings panel which shows before rendering very large graphs.
  5. Users are aware of the ‘expand all modular pipelines’ toggle in the settings panel which expands all modular pipelines on the first load.
  6. The existing experience of Kedro-Viz does not meets the needs of these users, because they require other features.

Supporting Data

Quantitative

image
link to chart

The graph above shows the proportion of users, for each node count, as an indication of their project size (Jan 2023 - Jan 2024).
There is a significant proportion of users with 25 - 35 nodes in their project, but very little (5%) with 1000 nodes and above.

Qualitative

From Slack:

  • Internal QB user - “I know of at least 2 consultant in the Madrid office who were struggling to use Viz because the pipelines were just too large and tangled'‘
  • An external user whose project had 1500 nodes - “It’s just a performance issue - the layout itself is fine, but the responsiveness is low”
  • Internal QB user - “A helpful feature is to be able to fix the state of the landing page - e.g. we have huge pipelines, we've collapsed some things and hid some outputs - can this state be "frozen", so that every time you open kedro viz, you land at this "prepared" view?’ "Frozen" state of the landing page for kedro viz (for easier navigation) #1673

This was also the premise behind creating a massive test pipeline for the kedro team here.

Method

We aim to speak to 5-8 internal/external technical users on zoom, that use Kedro-Viz.

Participants

The technical users include: Data Scientists, Data Engineers, and Machine Learning Engineers.

Who will we be speaking to?

  • David Pérez
  • Ridlo Rahman
  • Ricardo Picon
  • Javier Goez Sanz
  • Esther Leong

Interview Guide (45 mins)

User details (10 mins)

  • What is your current role at work?
  • How familiar are you with Kedro and Kedro-Viz? How often do you use Kedro and Kedro-Viz?
  • Do you use any other tools with Kedro and Kedro-Viz?

User motivation and workflow (10 mins)

  • Why do you use Kedro and Kedro-Viz? What are your overall main goals? What do you use it for?
  • How do you currently use Kedro-Viz? What are your tasks? What are you trying to accomplish?
  • How big is your kedro project? no of nodes, pipelines, and datasets?

User pain points and workflow (20 mins)

  • What pain points are you currently experiencing when using Kedro-Viz?
  • What pain points are you experiencing as your project gets bigger?
  • Did you see any warning before your large graph was rendered?
  • Were you able to expand all your modular pipelines?
  • Were you able to collapse all your modular pipelines?
  • Where in your overall Kedro and Kedro-Viz experience can we improve?

User recommendations & Wrap up (5 mins)

  • How would you rate your overall satisfaction with Kedro-Viz? 1-10 (10 meaning fully satisfied, no suggested improvements)
  • What haven’t we asked you today that you think would be valuable for us to know?
  • Do you have any other suggestions or feedback for us?
  • May I contact you if we have any other questions or for possible further research in the future?

What decisions will this research enable?

  • This research will help us to understand and solve any pain points experienced by users with large pipelines.
  • It would also enable us understand any pain points/blockers to adopting shareable Viz, towards improving the experience.
  • These would help the team to prioritise a roadmap that solves the issues, and enables us to achieve our adoption goals.

Research Outcomes

  • Synthesised insights with potential opportunities and recommendations to further improve the large pipeline experience on Kedro-Viz.
  • Synthesised insights with potential opportunities and recommendations to further improve shareable Viz.
@yetudada
Copy link
Contributor

I think the only details I'm missing from this are two things:

  • What is classified as a large project i.e. how many nodes, datasets, pipelines does a large project need to have to experience performance issues?
  • And therefore when you have defined that size, what % of Kedro projects are like that?

I see in the linked data that you talk about projects with 25 - 35 nodes but is that a large pipeline? The reason I ask this is because your quotes talk about pipelines with 1,500 nodes experiencing the slow down.

@NeroOkwa
Copy link
Contributor Author

I think the only details I'm missing from this are two things:

  • What is classified as a large project i.e. how many nodes, datasets, pipelines does a large project need to have to experience performance issues?
  • And therefore when you have defined that size, what % of Kedro projects are like that?

I see in the linked data that you talk about projects with 25 - 35 nodes but is that a large pipeline? The reason I ask this is because your quotes talk about pipelines with 1,500 nodes experiencing the slow down.

Updated the research objective section with this:

  • A large project is one with 1000 nodes and above.
  • This makes up only 5% of Kedro projects.

@rashidakanchwala rashidakanchwala moved this to Inbox in Kedro-Viz Mar 11, 2024
@rashidakanchwala rashidakanchwala changed the title Evaluating the Kedro-Viz experience for large pipelines [wip] Evaluating the Kedro-Viz experience for large pipelines Mar 11, 2024
@NeroOkwa NeroOkwa changed the title [wip] Evaluating the Kedro-Viz experience for large pipelines Evaluating the Kedro-Viz experience for large pipelines Mar 12, 2024
@NeroOkwa NeroOkwa moved this from Inbox to Backlog in Kedro-Viz Mar 12, 2024
@astrojuanlu
Copy link
Member

Related: kedro-org/kedro#3790

@rashidakanchwala
Copy link
Contributor

This is not priority for now. There are existing tickets on creating complex/larger pipelines to demo kedro/kedro-viz against. We have solved a major issue on this #1673 using Stateful URLS. We can reopen if necessary.

@github-project-automation github-project-automation bot moved this from Inbox to Done in Kedro-Viz Sep 9, 2024
@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants