Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Label History Model #46

Open
2 of 4 tasks
jon-openspace opened this issue Mar 30, 2023 · 6 comments
Open
2 of 4 tasks

[Feature] Label History Model #46

jon-openspace opened this issue Mar 30, 2023 · 6 comments

Comments

@jon-openspace
Copy link

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

We use GitHub Labels to mark important milestones, and folks want to be able to quickly look up:

  1. When labels were applied to PR's
  2. How long PR's spent with a Label before the next was applied

Describe alternatives you've considered

I've built out this type of model for Jira Issue Time in Status before, so my current plan is to just re-do that with PR/Issue Labels.

Are you interested in contributing this feature?

  • Yes.
  • Yes, but I will need assistance and will schedule time during your office hours for guidance.
  • No.

Anything else?

No response

@elanfivetran
Copy link

elanfivetran commented Mar 31, 2023

Hey @jon-openspace thanks for reaching out about this feature request!

I definitely see the value in having historical records of when labels were applied. The Github connector does not support history mode, however I'm assuming you plan to ISSUE_LABEL table to understand the times in which these labels were added.

It seems that you're open to contributing to this feature which is great news! Would you be able to schedule some time in our office hours so we can discuss specifics of how that will be represented in our package?

@jon-openspace
Copy link
Author

Morning! No need for history mode - the Github Connector provides an ISSUE_LABEL_HISTORY table.
image
I built a dbt model that gives me what I need yesterday, and I'll schedule some office hours time so you can decide if you want it in this package.

@jon-openspace
Copy link
Author

jon-openspace commented Apr 4, 2023

Here's how I do the staging model (I believe this would go into the dbt_github_source package)

with source as (

    select * from {{ source('github', 'issue_label_history') }}

),

renamed as (

    select
        issue_id,
        label_id,
        updated_at,
        labeled,
        actor_id,
        _fivetran_synced

    from source

)

select * from renamed


@jon-openspace
Copy link
Author

Here's how I do the "Time in Label" Model. This gets joined to your existing github__pull_requests model. I believe this would only work with Snowflake, so you would need to make it Warehouse Agnostic.


with

    label_history as (
        select * from {{ ref("stg_github_internal__issue_label_history") }}
    ),

    labels as (select * from {{ ref("stg_github__label") }}),

    named_label_history as (
        select
            label_history.issue_id,
            label_history.updated_at                                                        as label_changed_ts,
            case when labeled = 'True' then 'Added ' else 'Removed ' end
            || labels.label                                                                 as this_label_change,
            dense_rank() over (
                partition by label_history.issue_id
                order by label_history.updated_at asc
            )                                                                               as label_change_number
        from label_history
        join labels on label_history.label_id = labels.label_id
    ),

    transitions as (
        select
            base.*,
            next.this_label_change                                                          as next_label_change,
            datediff(second, base.label_changed_ts, next.label_changed_ts)
            / 60.0                                                                          as time_to_next_label_minutes
        from named_label_history base
        left join
            named_label_history next
            on (
                base.issue_id = next.issue_id
                and base.label_change_number = next.label_change_number - 1
            )
    )

select *
from transitions

@jon-openspace
Copy link
Author

Finally, I have an Intermediate Model to Pivot the above model.


with
    github_label_history as (select * from {{ ref("github_label_history") }}),

    pivoted_values as (
        select
            issue_id,
            {{
                dbt_utils.pivot(
                    "this_label_change",
                    dbt_utils.get_column_values(
                        ref("github_label_history"), "this_label_change"
                    ),
                    agg="MIN",
                    then_value="label_changed_ts",
                    prefix="First ",
                    suffix=" Label",
                    else_value="NULL",
                )
            }}
        from {{ ref("github_label_history") }}
        group by 1
    )

select *
from pivoted_values

@fivetran-reneeli
Copy link
Contributor

Thanks @jon-openspace for following up with this after our call!

We'll be taking a look and hopefully bring it in in an upcoming sprint!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants