-
-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Random probability distribution function #1862
Comments
Do you refer to something like this? function exponentialDistributionNumber(start = 1, stepScale = 2, stepProbability = 0.5, limit = Number.MAX_SAFE_INTEGER) {
let max = start;
while(faker.datatype.boolean(stepProbability) && max < limit) {
max *= stepScale;
}
return faker.number.int({ min: 0, max: Math.min(max, limit) });
} Result occurrences for 1 Mio runs of
|
Interesting! Yes something like this would suffice. That would be nice if it was implemented as an API function. |
Something similar could also be achieved by having a variant of faker.helpers.arrayElement where each element of the array has a fixed independent probability of being included in the return values |
Like helpers.weightedArrayElement? Well not really but close when used for the length. |
Team decision There is an existing workaround for this problem. If you want/need this feature please upvote this issue. |
Thank you for your feature proposal. We marked it as "waiting for user interest" for now to gather some feedback from our community:
We would also like to hear about other community members' use cases for the feature to give us a better understanding of their potential implicit or explicit requirements. We will start the implementation based on:
We do this because:
|
Here an improved version of the function: /**
* Generates a random number between min and max using an exponential distribution.
* The lower bound is inclusive, but the upper bound is exclusive.
*
* @param options The options for generating the number.
* @param options.min The minimum value to generate. Defaults to `0`.
* @param options.max The maximum value to generate. Defaults to `1`.
* @param options.bias The bias of the distribution. Must be greater than 0. Defaults to 1.
* The lower the bias, the more likely the number will be closer to the min ([email protected] -> avg: ~0.025).
* A bias of 1 will generate the default exponential distribution (0-1@1 -> avg: ~0.202).
* The higher the bias, the more likely the number will be closer to the max (0-1@10 -> avg: ~0.691).
*
* @throws If bias is less than or equal to 0.
* @throws If max is less than min.
*/
function exponentialDistributionNumber(
options:
| number
| {
/**
* The minimum value to generate.
*
* @default 0
*/
min?: number;
/**
* The maximum value to generate.
*
* @default 1
*/
max?: number;
/**
* The bias of the distribution. Must be greater than 0.
*
* The lower the bias, the more likely the number will be closer to the min ([email protected] -> avg ~0.025).
* A bias of 1 will generate the default exponential distribution (0-1@1 -> avg ~0.202).
* The higher the bias, the more likely the number will be closer to the max (0-1@10 -> avg ~0.691).
*
* @default 1
*/
bias?: number;
}
) {
if (typeof options === 'number') {
options = { max: options };
}
const { min = 0, max = 1, bias = 1 } = options;
if (bias <= 0) {
throw new FakerError('Bias must be greater than 0');
}
if (max === min) {
return min;
}
if (max < min) {
throw new FakerError(`Max ${max} should be greater than min ${min}.`);
}
const random = faker.number.float(); // [0,1)
const exponent = random ** (1 / bias); // [0,1)
const range = max - min + 1; // +1 to account for x ** 0 = 1
return min + range ** exponent - 1; // -1 to account for x ** 0 = 1
} Generating 100kk values between 0-100: |
Here is probably the final stage of improvements to the method from my side: /**
* Generates a random number between min and max using an exponential distribution.
* The lower bound is inclusive, but the upper bound is exclusive.
*
* @param options The options for generating the number.
* @param options.min The minimum value to generate (inclusive). Defaults to `0`.
* @param options.max The maximum value to generate (exclusive). Defaults to `1`.
* @param options.base The base of the exponential distribution. Must be greater than 0. Defaults to `2`.
* The higher/more above 1 the base, the more likely the number will be closer to the minimum value.
* The lower/closer to zero the base, the more likely the number will be closer to the maximum value.
* Values of 1 will generate a uniform distribution.
*
* The following table shows the rough distribution of values generated using `Math.floor(exponentialDistributionNumber({ min: 0, max: 10, base: x }))`:
*
* | Value | Base 0.1 | Base 0.5 | Base 1 | Base 2 | Base 10 |
* | :---: | -------: | -------: | -----: | -----: | ------: |
* | 0 | 4.1% | 7.4% | 10.0% | 13.8% | 27.8% |
* | 1 | 4.5% | 7.8% | 10.0% | 12.5% | 16.9% |
* | 2 | 5.0% | 8.2% | 10.0% | 11.5% | 12.1% |
* | 3 | 5.7% | 8.7% | 10.0% | 10.7% | 9.4% |
* | 4 | 6.6% | 9.3% | 10.0% | 10.0% | 7.8% |
* | 5 | 7.8% | 9.9% | 10.0% | 9.3% | 6.6% |
* | 6 | 9.4% | 10.7% | 10.0% | 8.8% | 5.7% |
* | 7 | 12.1% | 11.5% | 10.0% | 8.2% | 5.0% |
* | 8 | 16.9% | 12.6% | 10.0% | 7.8% | 4.5% |
* | 9 | 27.9% | 13.8% | 10.0% | 7.5% | 4.1% |
*
* Can alternatively be configured using the `bias` option. `base` takes precedence over `bias`.
*
* Defaults to `2`.
*
* @param options.bias An alternative way to specify the `base`. Also accepts values below zero.
* The higher/more positive the bias, the more likely the number will be closer to the maximum value.
* The lower/more negative the bias, the more likely the number will be closer to the minimum value.
* Values of 0 will generate a uniform distribution.
*
* The following table shows the rough distribution of values generated using `Math.floor(exponentialDistributionNumber({ min: 0, max: 10, bias: x }))`:
*
* | Value | Bias -9 | Bias -1 | Bias 0 | Bias 1 | Bias 9 |
* | :---: | ------: | ------: | -----: | -----: | -----: |
* | 0 | 27.9% | 13.7% | 10.0% | 7.4% | 4.1% |
* | 1 | 16.9% | 12.5% | 10.0% | 7.8% | 4.5% |
* | 2 | 12.1% | 11.6% | 10.0% | 8.3% | 5.1% |
* | 3 | 9.5% | 10.7% | 10.0% | 8.8% | 5.7% |
* | 4 | 7.8% | 10.0% | 10.0% | 9.3% | 6.6% |
* | 5 | 6.6% | 9.3% | 10.0% | 9.9% | 7.7% |
* | 6 | 5.7% | 8.8% | 10.0% | 10.7% | 9.5% |
* | 7 | 5.0% | 8.2% | 10.0% | 11.5% | 12.1% |
* | 8 | 4.5% | 7.8% | 10.0% | 12.6% | 16.8% |
* | 9 | 4.1% | 7.4% | 10.0% | 13.7% | 27.9% |
*
* This option is ignored if `base` is specified.
*
* Defaults to `-1`.
*
* @throws If base is less than or equal to `0`.
* @throws If max is less than min.
*/
function exponentialDistributionNumber(
options:
| number
| {
/**
* The minimum value to generate (inclusive).
*
* @default 0
*/
min?: number;
/**
* The maximum value to generate (exclusive).
*
* @default 1
*/
max?: number;
/**
* The base of the exponential distribution. Must be greater than 0. Defaults to `2`.
* The higher/more above 1 the base, the more likely the number will be closer to the minimum value.
* The lower/closer to zero the base, the more likely the number will be closer to the maximum value.
* Values of 1 will generate a uniform distribution.
* Can alternatively be configured using the `bias` option. `base` takes precedence over `bias`.
*
* @default 2
*/
base?: number;
/**
* An alternative way to specify the `base`. Also accepts values below zero.
* The higher/more positive the bias, the more likely the number will be closer to the maximum value.
* The lower/more negative the bias, the more likely the number will be closer to the minimum value.
* Values of 0 will generate a uniform distribution.
* This option is ignored if `base` is specified.
*
* @default -1
*/
bias?: number;
}
) {
if (typeof options === 'number') {
options = { min: 0, max: options };
}
const {
min = 0,
max = 1,
bias = -1,
base = bias <= 0 ? -bias + 1 : 1 / (bias + 1),
} = options;
if (base === 1) {
return faker.number.float({ min, max });
} else if (base <= 0) {
throw new FakerError('Base must be greater than 0');
}
if (max === min) {
return min;
} else if (max < min) {
throw new FakerError(`Max ${max} should be greater than min ${min}.`);
}
const exponent = faker.number.float();
const factor = (base ** exponent - 1) / (base - 1);
return min + (max - min) * factor;
} The following table shows the rough distribution of values generated when using
|
I created a PR that add a function for exponentially distributed data: #3375 Do we also need a function for normal distributed numbers (bell shaped)? |
Clear and concise description of the problem
So I'm seeding a database with faker. I have field that allow array of some type. I want to generate multiple array, but with different size. Some where the array is empty, some where the array has 1 elements and some where the array has multiple elements.
Most of the case will have one element in the array, but I also want to test limit case, so having a way to generate Random distributed data would be nice.
Let's said that I'm faking an array of value and I want some length to be more common than others. It's common to have an array of length 1 to 3 but it's very rare to have an array of 100. I would like to have a random probability distribution function for this.
Suggested solution
In my case, I'm looking for a random exponential distribution.
The function would accept an argument like this:
And would generate a number using the distribution called.
I would expect to call faker.random.exponentialDistribution({min: 0, max: 100, curveSettings: {...}}) and the number generated from this would have more chance to be closer to 0 than closer to 100. On a scale of 1000 random value generated, we could see few value with a number close to 100.
I wouldn't limit the feature to only exponential distribution, I would also add gaussian distribution, Rayleigh distribution, gamma distribution, etc...
Alternative
No response
Additional context
I'm not sure if what I'm asking is out of scope for faker, but at the same time, faker is generating data from a random value. Why would faker couldn't generate number base on some probability of that number to be generated?
Btw, I'm no mathematician, so I might be incorrect with what I explain, but I still think faker could add some random probability distribution function.
The text was updated successfully, but these errors were encountered: