## Sunday, 27 April 2014

One of the most common misconception about randomness is that it is often confused with uniformity.
Could you tell which is the most random distribution of dots between the two below?

 Which is random? (click to enlarge)

The most common answer would be the one on the right, which is wrong. In fact, the pattern on the right is generated by applying a small wobble (or uncertainty) on an uniform distribution. The pattern on the left, instead, has been generated using a pseudorandom [1] number generator in BASH (details in this blog post). Randomness allows for "clumps" to form and it's because of those clumps that the unpredictable behaviour of randomness comes from (unless, of course, you know the "shape" of the random distribution a priori). This appears more clearly whenever we look at the distributions shown above in 1D rather than 2D:

 A random and an uniform distribution (click to enlarge)

It is clear that in the case of a uniform distribution, we have a greater level of predictability. Imagine you have a series of events which follows the uniform distribution. You are filling the histograms with events one by one by reaching 5000 events which recreate the distribution above. Whenever you approach higher numbers of events filled if there is a lack of events in one region of the histogram, there will be a higher chance for the next even to fall in that region.

You can picture in practice a 2D uniform distribution by imagining pouring a layer of marbles in a box, one by one. At the beginning their position will look random, because they can move around and take any possible position, but the fuller it gets the less available spaces there are and soon it will be easy to predict which are the only available spots for the next marble to go in.

The key thing is that whenever a position is occupied, it cannot be occupied again. This allows for a level of predictability, and this is not true of random distributions. Randomness is not predictable, its profile is unknown by definition, and it allows for the same position to be occupied again. This caused by the assumption that each random event is independent from the previous (Poisson process). Maybe this is also the reason why we believe uniform distributions to be "random". We are more familiar with those in practice and random events are either mostly abstract or harder to visualize.

This explains tendency of people to estimate randomness wrongly. This happens every time during lotteries. If a number has not come up for a long time, we feel that it is due soon. This is because we imagine random events as uniform. In fact, the probability of picking any number in a lottery is the same for every extraction, making all number equally probable to be picked, and not anyone more probable, because every extraction is independent from the previous.

An equivalent example is found in coin flipping. We feel that repeated head or tail events in a sequence of coin flips is a rare event. In fact, the chance of obtaining heads or tails is still 50% even after any number of heads have come up already. For example, events of the kind: THHTHTTHTTH and TTTTTHHHHH are equally as likely.

I created a computer-generated set of tosses to work out the frequencies of occurrences of repeated heads or tails (details on how this is done are in here). These are the results for 100000 occurrences of 15 tosses each:

3 repetitions:  93894/100000
4 repetitions:  64634/100000
5 repetitions:  34667/100000
6 repetitions:  16723/100000
7 repetitions:    7789/100000
8 repetitions:    3487/100000
9 repetitions:    1584/100000
10 repetitions:     719/100000
11 repetitions:     332/100000
12 repetitions:     125/100000
13 repetitions:       53/100000
14 repetitions:       18/100000
15 repetitions:         7/100000

15 repetitions would mean having a full set of heads or tails, which seems impossible, but it happens 0.006% of the times. In fact, over 100000 occurrences, it happened 7 times. Close enough.

The data shows that three repetitions is an event which happens almost all the times, with four repetitions more than half of the times. Up to 6 repetitions, in fact, is not that much of an uncommon event. Almost half of the sequence! I am sure it would be hard to define "random" - in the common sense - anything which has more than 4 repetitions, although this simple test shows that it's almost the norm.

Another great example of how bad we are at understanding randomness is given by this small application at Nick Berry's DataGenetics. It will distinguish a randomly generated sequences of tosses (from a real coin) from the ones you made up yourself. It is not infallible, as it is based on the Pearson Chi-squared test (so it cannot always predict which is which) and after some trials it is easy to trick it to make it believe your sequence is random. Yet, I am sure it will give you a better understanding of randomness!

[1] I have used the word pseudorandom because every computer-generated programs simulates randomness. True randomness can only be found in some not completely understood natural phenomena. Being a simulation, there are different "qualities" of generated random numbers. I am not sure about the one used in BASH, but it is usually good to stick with a higher quality random number generator for more serious business (e.g. GSL)

## Sunday, 24 November 2013

I went to a restaurant with friends yesterday night and since service was not included, so we decided to tip 10% (I live in the UK!).

Restaurants account for tips separately when you pay by card (some card reader have a tip option). We all paid by card separately and shared the tip equally between us. Since it would be slow to tip every one of us, the tip was added at the end and I happened to pay last, so the tip for the whole dinner was paid by me.

This makes sense as we all paid the same amount, even if it looks like I paid a smaller amount with a huge tip, leading to funny receipts as this one:

 "If you really liked the service, you don't just tip 10%, but 110%"

What is even funnier, though, is the peculiarity that the gratuity was almost exactly 10% of the sale plus the original price. 10% was also our original tip on the whole dinner, so I was wondering if this happened by pure chance or if there was a connection between these two numbers.

This can be easily solved by some math:

By setting these variables:

x - total price of the dinner
n - number of people with which the dinner is shared
t - tip (in percentage)

You can describe mathematically the different way in which tip can be payed:

$\sum_{j=1}^{n} {\frac{x(1+t)}{n}} = x (1+t)$

$(\sum_{j=1}^{n-1} {\frac{x(1+t)}{n}}) + (\frac{x(1+t)}{n}-xt)+xt = x (1+t)$

which is at each card payment, or at the end, respectively. This holds making the assumption that the tip $xt$ is smaller than the price paid for each person $x/n$:

$t < \frac{1}{n}$

This is a simplification to make sure only the last person pays the tip. It can get more complicated, in cases where the last person pays part of the tip and second last pays a fraction of the remaining tip. But let's stick to the easy case!

For the last person, the total to pay each (Total in the receipt), minus the tip (Gratuity in the receipt) gives the Sale, as in the receipt. Expressing this mathematically:

$\frac{x(1+t)}{n}-xt=\frac{xt}{1+t_{1}}$

Here, I have expressed the Sale, which is on the right hand side, as the untipped amount before applying a tip $t_{1}$, which gets us the Gratuity $xt$.

So, that's where we get our answer. We have observed the peculiar fact that $t=t_{1}$. Does this happen for any amount and number of people? Which is: does this happen for any $x$ and $n$?

You can immediately notice that x (being non-zero and positive) can be removed from any part of the equation. So this result does not depend on the amount spent.

Rearranging the equation above in $t$, we get a simple quadratic equation:

$(2-n)t^2+(1-2n)t+1=0$

This DOES depend on $n$, which means that the result depends on the number of people. As we require $t>0$ (negative tips?!) we can also check our solution:

 Close enough!

You can see that the solution is for $t$ very close to 10%, the original amount of tip. You can notice that the result depends on $n$, which is the number of people.

In fact, if you try to change the number of people in the equation, you get different results for $t$ (smaller for bigger n, smaller for bigger n).

In conclusion, this shows that it was indeed a peculiar result of chance, and it depended on how many people shared their bill.

BONUS: from the numbers in the receipt you can find out how many people I was with yesterday and how much did we spend.

## Friday, 5 April 2013

### Connection between derivative and integral

I remember that most of my struggle in understanding geometrical properties of calculus in first year undergrad classes was spent in working out why the integral is the inverse operation of derivative.

There are many mathematical proofs of this, which make perfect sense, but their connection is counter-intuitive as they perform seemingly different operations on curves. The derivative finds the slope at any point and the integral sums infinitesimal slabs of area under the curve. All derivatives can be solved analytically, while that is not true for integrals. Integration depends to an additive constant, while derivatives do not. Dissimilarities surely are numerous.

I have recently found this paper I wrote during my undergraduates in the quest to understand the geometrical origin of the connection between. Hope it is of any help to you, if you are having trouble in this like I had. I remember it made much more sense when I - very unrigorously - put it in this simple way:

P.S. for "Derivate" I mean "Derivative"