Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to allow permission to pandas-gbq multiple times #33

Open
HelenCEBM opened this issue Feb 24, 2020 · 12 comments
Open

Need to allow permission to pandas-gbq multiple times #33

HelenCEBM opened this issue Feb 24, 2020 · 12 comments
Assignees

Comments

@HelenCEBM
Copy link
Contributor

It seems that every time you make changes to an environment, the first cell pulling data from bigquery fails to run and the permission process (pasting in a link and allowing pandas-gbq access via your google account) has to be repeated.

@sebbacon
Copy link
Contributor

Could you give an example of the sort of environment change you mean? Thanks

@HelenCEBM
Copy link
Contributor Author

Like installing a new package

@sebbacon
Copy link
Contributor

Could you link to a repo that has this problem? So I can reproduce exactly.

@CarolineMorton
Copy link
Contributor

I am not sure this is easily resolvable. I think it is because every time you are installing a package / adding to the requirements, you are technically having to do a new build and therefore the security permissions need to be done again. There may be a way around this but it might make BQ less secure, which I suspect we will want to avoid.

We could think about adding in the security details in as an argument to pass to Docker when it builds.... is that possible?? Could we pass in the details as part of the Config file?

@HelenCEBM
Copy link
Contributor Author

The example I have to hand is the LTCs one but it's a large and very slow query on the more detailed prescribing data, so I'd suggest writing a simple query on a smaller table for testing!

@sebbacon
Copy link
Contributor

With this code:

from ebmdatalab import bq
# -


df = bq.cached_read("SELECT * FROM  ebmdatalab.hscic.normalised_prescribing_standard  LIMIT 5", use_cache=False, csv_path="../data/foo.csv")

I only have to authenticate with Google once even if I:

  • pip install bs4 (or whatever); or
  • restart the kernel

So I'm unable to reproduce per the report.

I can reproduce if I shut down the notebook (and docker container) completely, which perhaps is what you're reporting?

@CarolineMorton
Copy link
Contributor

Is it possible that you are not updating packages from within docker, ie. using the command line, rather than the bash console within Docker....

See https://github.com/ebmdatalab/custom-docker/blob/clearer-doc/DEVELOPERS.md#installing-new-packages

@alexwalkerepi
Copy link

It happens with every docker repo I've used too. When I make any change to requirements.txt or change the docker base image, then restart the docker container, I have to re-authenticate.

@sebbacon
Copy link
Contributor

OK. So if we're talking about restarting the docker container, then yes, this is by design in the underlying libraries as each container is like a separate computer, and you wouldn't want your credentials being stored on other peoples' computers.

I could work around this by writing some custom credentials code, which wraps this method with a custom credentials_cache which is located in the host computer. It's a bit fiddly but happy to do so: however, I just want to check we think it's worth the extra complexity / work? I was thinking that once you've got a query right (which you can do in-browser or in the notebook but often within one session) you're normally going to be using the cached version, so perhaps it's not much of an issue in practice?

@alexwalkerepi
Copy link

For example, in writing the code for this PR:
ebmdatalab/factors-associated-with-changing-notebook#1
I'd guess I ended up re-authenticating about 5-10 times over the course a week or two. It's relatively complex, with multiple queries in different parts of the code, and I changed the folder structure a bit over time. Simpler notebooks would likely be less than that.

Not the end of the world, but I guess depends how much work it would be to work around.

@HelenCEBM
Copy link
Contributor Author

HelenCEBM commented Feb 24, 2020

Probably not worth the effort of fixing if fiddly! However I'm sure I restarted containers a couple of times and didn't have to re-authenticate.... Ah no the data would have been cached so not re-extracting the data from BQ

@brianmackenna
Copy link

brianmackenna commented Mar 25, 2020

I've been doing multiple workbook in shortspace of time this morning and I've had to reauthenticate about 5 times in an hour . Even with this I don't think it is worth @sebbacon time to fix this particular issue. This is not true - I must've been closing the notebooks down before starting the next one. Regardless, still don't think it is worth time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants