Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy superset #3703

Closed
15 of 22 tasks
Tracked by #3732
bendnorman opened this issue Jul 1, 2024 · 7 comments · Fixed by #3715
Closed
15 of 22 tasks
Tracked by #3732

Deploy superset #3703

bendnorman opened this issue Jul 1, 2024 · 7 comments · Fixed by #3715
Assignees

Comments

@bendnorman
Copy link
Member

bendnorman commented Jul 1, 2024

Required

  1. superset
  2. 4 of 9
    duckdb superset
    zaneselvans

Nice to haves

  1. superset
@bendnorman bendnorman self-assigned this Jul 1, 2024
@bendnorman bendnorman converted this from a draft issue Jul 1, 2024
@bendnorman bendnorman moved this from Backlog to In progress in Catalyst Megaproject Jul 2, 2024
@bendnorman
Copy link
Member Author

Also, in the com dev meeting we decided doing superset demos during the required interviews doesn’t make a ton a sense so our July 17th deadline is moot. Do we still want to aim to have a public beta by the end of the month?

@bendnorman
Copy link
Member Author

Creating superset user roles

Superset provides a few predefined roles: Admin, Alpha, Gamma, Public and sql_lab. We want people who register to assume the Gamma role because it has the fewest number of permissions. By default they don't have access to databases and SQL lab so we need to create a new role. To do this I exported the existing roles as a json, created a new role called GammaSQLLab with Gamma and sql_lab permissions. I also needed to add the all_database_access permissions so it could access the database. This role can query the duckdb database, create charts and dashboards but can't edit or add new data sources.

You can export and import roles using these commands in the superset container:

superset fab export-roles --path {path to write role to}.json
superset fab import-roles --path {path to edited roles}.json

Once the GammaSQLLab role is created, we can set it as the default registration role in config_supserset.py:

AUTH_USER_REGISTRATION_ROLE = "GammaSQLLab"

Mapping auth0 accounts to roles

This is our desired registration and permissions workflow:

  • New user register using auth, they assume the a role with enough permissions to query the database
  • Person with a @catalyst.coop email register and they assume admin permissions

The correct way to do this is to assign users to oauth groups and map these groups to superset roles. I can't figure out how to create and manager groups using auth0. I tried using this Authorization extension but didn't get very far. I'm sure there is a way to make this work but it's above my pay grade.

I figured out a work around for now. We can create a [email protected] user in the auth0 User Management tab. Then, we can create a new superset admin user inside the container with this command:

superset fab create-admin \
              --username 'auth0_auth0|{user_id generated by auth0}' \
              --firstname {Superset} \
              --lastname {Admin} \
              --email {email} \
              --password {password created in auth0}

Now we can log into superset with this admin account and give catalyst accounts admin or alpha permissions.

@bendnorman
Copy link
Member Author

bendnorman commented Jul 12, 2024

Now that the auth0 and permissions stuff is mostly working I'm going to move onto the hosting infrastructure.

  1. Push dockerfile to artifact registry and update cloud run instance. Can use default sqlite superset database for now.
  2. Create a Cloud SQL instance and hook it up to superset running on cloud run
  3. Copy the duckdb and superset config file to GCS bucket
  4. Run superset setup commands

@zschira
Copy link
Member

zschira commented Jul 15, 2024

@bendnorman it sounds reasonable to me to extend the timeline. Seems like a good thing to discuss during inframundo sprint planning today.

@bendnorman bendnorman changed the title Deploy superset for user interviews Deploy superset Jul 25, 2024
@bendnorman bendnorman linked a pull request Jul 25, 2024 that will close this issue
@zaneselvans
Copy link
Member

  • I feel like we might want a more specific domain for Superset. I always regretted picking http://data.catalyst.coop for Datasette because it was too generic -- like anything could be at that destination. PUDL is all data. https://superset.catalyst.coop would be more obviously specific to this project.
  • I know I complained before but the 100,000 row limit but it seems like it's probably plenty. I think the main purpose of the CSV export is getting the data into a form an Excel user can work with, and more than 100,000 rows is going to be extremely challenging in Excel. If someone actually wants to download bulk data for programmatic use... we've got them covered with the Parquet and DuckDB options.

@bendnorman
Copy link
Member Author

Some notes on usage data we have access to:

The most valuable information tracked in the superset database is the queries and registered users. The query table in the superset database contains all user queries and the user id. We can use this to see what types of queries people are doing and which tables people are accessing. Here is a query to access this information:

SELECT 
    users.first_name,
    users.last_name,
    queries.* 
FROM 
    "public"."query" AS queries
LEFT JOIN 
    "public"."ab_user" AS users 
ON 
    users.id = queries.user_id 
ORDER BY start_time DESC
LIMIT 1000;

I'm not sure if it's possible to track how long people spend on the site. Any thoughts @jdangerx?

We should also probably save all the Cloud Run logs to bigquery or cloud storage using a cloud sink. By default, logs are only saved for 30 days. I think these logs will be helpful for debugging cloud run failures, tracking cloud run resource use and total downloads.

@bendnorman
Copy link
Member Author

Also @jdangerx, the auth0 screen has a warning about using development OAuth keys in production. I wonder if this is related the HTTP redirect issue we're having.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants