We now try to understand the distribution of meaurement probabilities (not to be confused with distributionof meaurement outcomes) by studying the matrix elements of the random unitary $U$.
To make sure $U$ is unitary, we need that $U^\dagger U = I$.
$$U^\dagger U = I \implies \sum_{j=1}^N U_{kj}^\dagger U_{jk} = 1 \implies \forall k, \sum_{k=1}^N |U_{ij}|^2 = 1. $$
The joint probabilility of the elements in a single row of $U$ is constrained by this normalization.
$$P(U_{k1},\ldots, U_{k2}, U_{kN}) \propto\delta\left(1-\sum_{j=1}^N |U_{kj}|^2\right).$$
We can integrate, for example, the last element $U_{iN}$, to get a marginal on the remainder of the matrix elements.
$$P(U_{k1},\ldots, U_{k2}, U_{k(N-1)}) = \int d(x) P(U_{k1},\ldots, U_{k2}, U_{k(N-1)}, x).$$
Realizing that the integral of delta function is the Heaviside Theta function, we get something like the following
$$P(U_{k1},\ldots, U_{k2}, U_{k(N-1)}) \propto \theta\left(1-\sum_{j=1}^{N-1} |U_{kj}|^2\right).$$
Calculating further marginals is a bit tricky.
We can verify the following property for Heaviside-theta functions.
$$ \int_{0}^{\infty} \theta(1-x^2-a^2) dx \propto \sqrt{1-\alpha^2}\theta(1-\alpha^2)$$
Using this, we can integrate over one more matrix element $U_{k(N-1)}$, we get something of the form
$$P(U_{k1},\ldots, U_{k2}, U_{k(N-2)}) \propto \left(1-\sum_{j=1}^{N-2} |U_{kj}|^2\right)\theta\left(1-\sum_{j=1}^{N-2} |U_{kj}|^2\right)$$
We can iteratively integrate out $N-1$ matrix elements to calculate the marginal of a single matrix element $U_{k1}$.
$$P(U_{k1}) \propto \left(1-|U_{k1}|^2\right)^{N-2}$$
There is a linear correspondence between probability of observing the matrix element $U_{k1}$ and the probabilty of obeserving the norm $|U_{k1}|^2$. We relabel $x=|U_{k1}|^2$ to get
$$P(x) \propto \left(1-x\right)^{N-2}$$
The distribution $(1-x)^{N-2}$ penalizes matrix elements with large norms.
In the limit $N \gg 1$ and for small $x$, this approximates the exponential distribution
$$P(x) \propto \left(1-\frac{Nx}{N}\right)^{N-2} \approx e^{-Nx}$$
To get the proportionality constant, we make sure the probability adds up to 1
$$\int_{0}^1 P(x) = 1 \implies = C\int_{0}^{1} e^{-Nx} = 1\implies C \approx N$$
Giving the final distribution
$$ P(x) \approx Ne^{-Nx}$$
And this is the famous Porter-Thomas distribution.