Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use random number generators, not global seeds #94

Merged
merged 8 commits into from
Oct 29, 2024
Merged

Use random number generators, not global seeds #94

merged 8 commits into from
Oct 29, 2024

Conversation

caleb-johnson
Copy link
Collaborator

Fixes #52

This PR removes the use of global seeds to control numpy randomness throughout the source

@caleb-johnson
Copy link
Collaborator Author

Actually the tutorial experiment no longer converges after these changes. Need to look further into it

@caleb-johnson
Copy link
Collaborator Author

Actually the tutorial experiment no longer converges after these changes. Need to look further into it

False alarm, the new generator was just giving a particularly bad sampling with that seed so config recovery couldn't bootstrap. Changing the seed brought things back to normal

Copy link
Collaborator

@kevinsung kevinsung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebooks need to be updated too, to use a single random generator rather than repeatedly passing the same integer seed.

qiskit_addon_sqd/subsampling.py Outdated Show resolved Hide resolved
@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Oct 29, 2024

The notebooks need to be updated too, to use a single random generator rather than repeatedly passing the same integer seed.

So this means all of the functions need to be updated to accept a Generator? We can deprecate the usage of int for rand_seed and move to this. I just want to make sure I understand best practices

@kevinsung
Copy link
Collaborator

kevinsung commented Oct 29, 2024

The notebooks need to be updated too, to use a single random generator rather than repeatedly passing the same integer seed.

So this means all of the functions need to be updated to accept a Generator? We can deprecate the usage of int for rand_seed and move to this. I just want to make sure I understand best practices

Any function that accepts a seed should immediately pass the seed to np.random.default_rng. So the type of seed is the type of a valid input of np.random.default_rng. I never quite figured out the correct type annotation for that, but I think int | np.random.Generator | None will do.

int is a valid type for the seed so there's no need to deprecate its usage.

@caleb-johnson
Copy link
Collaborator Author

int is a valid type for the seed so there's no need to deprecate its usage.

@kevinsung , thanks, I think I have it right now

"from qiskit_addon_sqd.counts import generate_counts_uniform\n",
"\n",
"# Create a seed to control randomness throughout this workflow\n",
"rand_seed = 42\n",
"rand_seed = np.random.default_rng(2**24)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an oddly specific seed, might as well keep using 42 or any other number. Also, the variable should be called rng for consistency.

Copy link
Collaborator Author

@caleb-johnson caleb-johnson Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say to use a very large number. I figure this is better than 42 or writing out a bunch of digits

Copy link
Collaborator Author

@caleb-johnson caleb-johnson Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"
Seeds should be large positive integers. default_rng can take positive integers of any size. We recommend using very large, unique numbers to ensure that your seed is different from anyone else’s.
"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the varnames in notebooks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for using a large number is to ensure that you don't accidentally choose the same seed as somebody else in the world. But this is not a concern here, where we only need our results to be statistically valid by themselves. Usually for these situations (testing, or just to make a notebook deterministic), you can use any integer, even 0.

But if you insist on using a very large number, then certainly 2**24 is not big enough. The doc you link suggests this:

secrets.randbits(128)  

But again, this is really unnecessary here.

Copy link
Collaborator Author

@caleb-johnson caleb-johnson Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It truly doesn't matter at all to me. I would think 42 (a number used by everyone in every context for everything) would be worse. I'll change it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. In this situation, 42 is fine. If we were going to publish scientific results, then we should indeed use a large number, such as 266529787016279213832423171828752297713.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I updated them all to 24 🤝

qiskit_addon_sqd/configuration_recovery.py Outdated Show resolved Hide resolved
qiskit_addon_sqd/counts.py Outdated Show resolved Hide resolved
@caleb-johnson caleb-johnson merged commit 6f5660c into main Oct 29, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use numpy random Generator for random numbers
2 participants