In other words, consider n being 8-bits. I want a function that for every input number from [0..255] produces a random output number between [0..255], with each output number produced only once. In other words, it's a one-to-one mapping of 8 input bits to 8 output bits.
The traditional linear congruent generator PRNG used for rand() seems like an excellent choice. For n=32, I could use a function like:
unsigned translate32(uint32_t x)
return (x * 214013 + 2531011) & 0xFFFFFFFF;
These constants are chosen to produce all 4 billion possible outputs. meaning there is a one-to-one mapping between inputs and outputs.
However, I can't create an 8 bit version by doing:
#define translate8(x) translate32(x & 0xFF)& 0xFF
That's because while the constants work for 32 bits, they don't work for 8. Some outputs will be produced twice, and some outputs won't be produced at all. To make a translate8() function, I'll have to choose different constants.
How do I go about choosing the right constants? Do I simply write a program that tests a bunch of combinations until I get some that work the way I want? The thing is, I want this to work all the way out to 48-bit inputs. I doubt I could discover the right constants for 48 bits, it's not computationally feasible, unless there's some trick to it. (I really should study my Knuth better).
Or, maybe I can do this for small n and simply combine them. For a 48-bit number, simply combine a 32-bit version with a 16-bit version.
Why do I want to do this? The answer is port scanning. I want to scan a subnet, but instead of linearly scanning the addresses [192.168.1.1, 192.168.1.2, 192.168.1.3, ...] I want to scan in a random fashion of [192.168.1.148, 192.168.1.15, 192.168.1.231, ...]. I want to do the same for port numbers. Let's say that I'm scanning all ports on that subnet. This means the output is a 24-bit number: 8-bits for the subnet and 16-bits for the port number.
By the fact that I said "48-bits" should hint that what I really want to do is use this for scanning the entire Internet. I want anybody seeing incoming IP/port combinations to see them in a random order and random timing between them.
A pseudo-randomly congruent function means I can take linear scanning code that thinks it's trying addresses/ports in a sequential fashion and produce a random sequence out of it.
BAH! As many people point out "lrn2readWikipedia". The solution is at https://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length. Somehow I'd read that, but totally missed it. My only excuse is that I'm a programming nerd and not a theory/math nerd, and thus miss obvious things like that.
The equation is:
unsigned translate(unsigned x, unsigned a, unsigned c, unsigned n)
unsigned m = (1 << (n-1));
return (x * a + c) % m;
Where 'x' is the original value, and the return result is my new pseudo-random translation, a and c and m are the constants in the linear-congruent-generator formula, but m is just derived from my n, the number of bits in the IPv4 CIDR suffix.
Translating this into English, these three rules for choosing the constant:
This means that my original statement is false about the translate8() function above: it actually does work, at least in terms of producing a 1-to-1 correspondence. It'll do that for all values of n between 1 and 32 bits. I find that counterintuitive, so but it's true. I've written code to test this assertion, and it works.
But, the numbers aren't very random. Testing it out, I find that the first few numbers going from 0 to 10 translate to the sequence:
0 -> 195
1 -> 192
2 -> 189
3 -> 186
4 -> 183
5 -> 180
6 -> 177
7 -> 174
8 -> 171
9 -> 168
Therefore, for good randomness, I need to choose better constants. Choosing a=993319 and c=193939 results in:
0 -> 113
1 -> 152
2 -> 191
3 -> 230
4 -> 13
5 -> 52
6 -> 91
7 -> 130
8 -> 169
9 -> 208
Looking on the longer list, it oscillates from 3 digit numbers for a bit, then 2 digit numbers for a bit, and back to 3 digit numbers, so it's not completely random as I can see a pattern there. This is random enough for my purposes, but ideally, I'd like to choose numbers with a bit better randomness.
So, what's are some good tests of randomness I can apply? Ideally, I'd like to choose a single set of 64-bit sized constants that are "sufficiently random" for any sized n from 1 to 48 bits. In particular, when scanning the entire Internet (all 4-billion targets), I'd like every Class C subnet to get its share of the traffic (1/16-millionth), rather than having some Class C subnets hit very hard with traffic for a while while others are inactive for a bit.