Clear and concise description of the problem
I want faker.helpers.arrayElements to always return each element with the same probability.
However, in certain situations, some indexes are picked far more often than others.
For example, when sampling 10 elements from an array with 1000 elements, indexes ending in 9 are are picked ~20 times more often than indexes ending in 1. And when sampling 9 or fewer elements, indexes ending in 1 seem to be never picked at all.
Suggested solution
The root cause of the problem is that arrayElements picks array indices using faker.number.float() here with the default precision of 0.01
This works fine with an array of 100 elements, but as the length grows, anomalies begin to appear.
The simplest solution would be to replace this.faker.number.float({ max: 0.99 }) with Math.random(). This would, however, break some deterministic test cases.
Alternative
We could also use something like this.faker.number.float({ max: 0.999999999, precision: 0.000000001 }) but I'm not sure what the best number of digits is.
The precision could also be conceivably derived from the length of the given array.
If I get some suggestions from maintainers, I may be able to submit a PR.
Additional context
Code to reproduce issue:
const { faker } = require('@faker-js/faker/locale/en')
const arrayLength = 1000
const ids = Array(arrayLength)
for (let i = 0; i < arrayLength; i++) {
ids[i] = i
}
const countByMod = new Map()
for (let i = 0; i < 1000; i++) {
for (const id of faker.helpers.arrayElements(ids, 10)) {
const mod = id % 10
const count = countByMod.get(mod) ?? 0
countByMod.set(mod, count + 1)
}
}
console.log('nines:', countByMod.get(9), 'ones:', countByMod.get(1))
Some outputs:
nines: 2793 ones: 116
nines: 2760 ones: 126
nines: 2755 ones: 107
Both should be much closer to 1000, which is the case when the same code is run on a fixed version of arrayElements:
nines: 990 ones: 1098
nines: 1008 ones: 1013
nines: 1023 ones: 943
Clear and concise description of the problem
I want faker.helpers.arrayElements to always return each element with the same probability.
However, in certain situations, some indexes are picked far more often than others.
For example, when sampling 10 elements from an array with 1000 elements, indexes ending in 9 are are picked ~20 times more often than indexes ending in 1. And when sampling 9 or fewer elements, indexes ending in 1 seem to be never picked at all.
Suggested solution
The root cause of the problem is that arrayElements picks array indices using
faker.number.float()here with the default precision of 0.01This works fine with an array of 100 elements, but as the length grows, anomalies begin to appear.
The simplest solution would be to replace
this.faker.number.float({ max: 0.99 })withMath.random(). This would, however, break some deterministic test cases.Alternative
We could also use something like
this.faker.number.float({ max: 0.999999999, precision: 0.000000001 })but I'm not sure what the best number of digits is.The precision could also be conceivably derived from the length of the given array.
If I get some suggestions from maintainers, I may be able to submit a PR.
Additional context
Code to reproduce issue:
Some outputs:
Both should be much closer to 1000, which is the case when the same code is run on a fixed version of arrayElements: