Reservoir sampling is a family of randomized algorithms for randomly choosing k samples from a list of n items, where n is either a very large number. Typically n is large enough that the list doesn’t fit into main memory. For example, a list of search queries in Google and Facebook.
given a big array (or stream) of numbers (to simplify), and we need to write an efficient function to randomly select k numbers where 1 <= k <= n. Let the input array be stream.
Create an array sample[0..k-1] and copy first k items of stream to it.
Now one by one consider all items from (k+1)th item to nth item.
Generate a random number ‘j’ from 0 to i where i is index of current item in stream.
If j is in range 0 to k-1, replace sample[j] with stream[i]