Counting semiprimes

The prime-counting function $\pi(n)$ returns the number of prime numbers less than or equal to $n$. The prime number theorem gives the asymptotic behaviour $\pi(n)\sim n/\log n$.

We shall call a number semiprime if it is the product of two primes. Let $\omega(n)$ denote the semiprime-counting function. What is the asymptotic behaviour of $\omega(n)$? I claim that

$\omega(n) \sim \frac{n}{\log n} \log\left( \frac{\log n}{\log 2} \right)$

Edit (17 September 2016) Based on experiments in Mathematica, the above seems to be a lower bound, while an upper bound is given by

$\omega(n) \sim \frac{n \log \log n}{\log n}$

I obtain these by the following heuristic, probabilistic argument. The probability that a random integer chosen between 1 and $n$ is $\pi(n)/n \sim 1/\log n$; loosely, this means that the probability of a randomly chosen large number $n$ is approximately $1/\log n$. Similarly, $\omega(n) / n$ is the probability that a randomly-chosen $m < n$ is semiprime.

An exact formula for $\omega(n)$ is

$\omega(n) = \sum_{x=2}^{n/2} \sum_{y=2}^{n/x} 1(xy \text{ is prime}) = \frac{1}{2} \sum_{x=2}^{n/2} \sum_{y=2}^{n/x} 1(x \text{ is prime}) 1(y \text{ is prime}).$

The sum over $y$ is trivially $\pi(n/x) \sim (n/x)/(\log n - \log x)$. Hence

$\omega(n) \sim \sum_{x=2}^{n/2} 1(x \text{ is prime}) \frac{n}{x (\log n - \log x)}.$

We split the sum at $x = \sqrt n$ into two parts, and assume that the first part is dominated by the second. In the second part, $x$ is large and we invoke $1(x \text{ is prime}) \sim 1/\log x$ and approximate the sum as an integral, to get

$\omega(n) \sim \int_{\sqrt n}^{n/2} \frac{n}{x \log x (\log n - \log x)} dx.$

Evaluating the integral and keeping only the highest-order terms, we get the desired formula.

The semiprime-counting function can be implemented in Mathematica by

SemiprimeQ[n_] := SemiprimeQ[n] = Module[{f},
f = FactorInteger[n];
(
((Length[f] == 2) && (Total[Last /@ f] == 2))
|| ((Length[f] == 1) && f[[1]][[2]] == 2) )
];

w[n_] := w[n] = Length[Select[Range[n], SemiprimeQ]]


(where we memoise the results to speed things up). The following code produces a graph showing the validity of these approximation:

W1[n_] := n/Log[n] Log[Log[n]];
W2[n_] := n/Log[n] Log[Log[n]/Log[2]];
nn = 150000; nj = 500;
Show[
ListPlot[Table[{n, w[n]/W1[n]}, {n, 1, nn, nj}],
PlotStyle -> Red],
ListPlot[Table[{n, w[n]/W2[n]}, {n, 1, nn, nj}],
PlotStyle -> Green]
]


Mathematica modules

In Mathematica, a module is an extended function that may perform several steps. An advantage of a module is that it may declare and define local variables which can be used inside the module but not in the global scope. It is generally good programming practice to keep variables inside as local a scope as possible, because variables in global scope can conflict with each other if you are not careful to keep track of their names.

A module for solving the KdV equation

Below is an example usage of a module. The module solat computes the solution to the KdV equation

$u_t + 6uu_x + u_{xxx} = 0$

at a desired time tm, given initial data data0. It uses a split-step pseudospectral integrator, whose details, as well as an equivalent code in Matlab, can be found at Wikiwaves. (Note that their Matlab code makes reference to $k$, the angular wavenumber, whereas this code uses $\nu$, which is equal to $k / (2\pi)$.

nx = 128;
xmax = 1;
dx = 1/nx;
xs = Range[0, xmax - dx, dx];
numax = 1/(2*dx);
dnu = 1/xmax;
nus = Join[Range[0, numax - dnu, dnu], Range[-numax, -dnu, dnu]];

u0[x_] := Sin[6 Pi x];
data = u0 /@ xs;

solat[tmax_, data_] :=
Module[{t = 0, dft = Fourier[data], dt = 0.4/nx^2, dft1 = 0, sol = 0},
While[t < tmax,
dt = Min[dt, tmax - t];
dft1 = dft*Exp[I (2 Pi nus)^3 dt];
dft = dft1 - 3 I 2 Pi nus dt Fourier[InverseFourier[dft1]^2];
t += dt;
];
sol = Re[InverseFourier[dft]];
sol
]


We declare the variables t, dft, dt, dft1 and sol as local to the module, and initialise them to sensible values. (Note that dft1 and sol has been initialised to 0; this is completely arbitrary, because this value will be overwritten a few lines later.)

The last line inside the module, sol, simply causes sol to be returned from the module.

A module for solving the sine-Gordon equation

We now consider the sine-Gordon equation

$\phi_{tt} - \phi_{xx} + \sin\phi = 0.$

Like KdV, the sine-Gordon equation is quasilinear, in the sense that no term contains any nonlinear function of a derivative. Moreover, in both equations there is only one nonlinear term and no inhomogeneities; the rest are linear and may be treated easily. From a computational perspective, the most important difference between the two is that the sine-Gordon equation is second-order in time. We first write it as a pair of coupled first-order equations:

$\phi_t = \psi \text{ and } \psi_t = \phi_{xx} - \sin\phi.$

This system can now be solved using a split-step pseudospectral method, much like the above, except that we must keep track of both dynamical variables $\phi$ and $\latex \psi$. The derivation of this is a bit messy, but follows the same principle of evolving the linear parts of the equations separately from the nonlinear part. Here is the Mathematica code:

nx = 128;
xmax = 1;
dx = 1/nx;
xs = Range[0, xmax - dx, dx];
numax = 1/(2*dx);
dnu = 1/xmax;
nus = Join[Range[0, numax - dnu, dnu], Range[-numax, -dnu, dnu]];

u0[x_] := Sin[6 Pi x];
data = u0 /@ xs;
sinegordonsol[tmax_, phi0_, psi0_] := Module[
{t = 0, phift = Fourier[phi0], psift = Fourier[psi0],
dt = 0.4/nx^2,
phift1 = 0, psift1 = 0, sol = 0},
While[t < tmax,
dt = Min[dt, tmax - t];
phift1 = phift Cos[dt] + psift Sin[dt];
psift1 = psift Cos[dt] - phift Sin[dt];
phift = phift1;
psift = psift1 - Fourier[Sin[InverseFourier[phift]]] dt;
t += dt;
];
sol = Re[InverseFourier[phift]];
sol
]


To use it, do something like

sgs = sinegordonsol[1, data, 0*data];
ListLinePlot[Transpose[{xs, sgs}]]


The discrete Fourier transform in Mathematica

The Fourier transform is a very useful and widely used tool in many branches of maths. The continuous Fourier transform, available in Mathematica as FourierTransform, tries to perform the Fourier transform of a symbolic expression. This is useful for solving many problems if the Fourier transform can also be expressed in a nice symbolic form, but such a form is not necessarily available (and indeed the Fourier transform might not exist at all).

In practice, the discrete Fourier transform, which works with a finite set of data points and approximates the continuous Fourier transform, is more important. The DFT is available in Mathematica as Fourier, and the inverse DFT as InverseFourier. Note that Mathematica uses the $1/\sqrt{n}$ normalisation by default for both the DFT and the IDFT.

Example

Here is an example usage of the DFT in Mathematica:

xmax = 1;
nx = 128;
dx = xmax / nx;
xs = Range[0, xmax - dx, dx];
f[x_] := Exp[Sin[6 Pi x]] + Sin[40 Pi x];
data = Map[f, xs];
ListLinePlot[Transpose[{xs, data}]]

dft = Fourier[data];
numax = 1/(2*dx);
dnu = 1/xmax;
nus = Join[Range[0, numax - dnu, dnu], Range[-numax, -dnu, dnu]];
ListPlot[
Transpose[{nus, Abs[dft]}],
PlotRange -> All]


The first block generates a list data which consists of sampled values of the function f at the values xs, which are 128 values between 0 and 1 (not including the endpoint at 1). These sampled points are plotted.

The second plot calculates the discrete Fourier transform of data (according to the default normalisation). Then the list nus is constructed: these are the wavenumbers that correspond to each element of dft. (With the default normalisation, this is not the angular wavenumber: the wavenumber of $\sin(40\pi x)$ is said to be 20, while the angular wavenumber is $40\pi$. The symbol $k$ is often used for the angular wavenumber; I am using $\nu$ for the wavenumber.) We then plot the absolute values of dft against the wavenumbers nus.

Aliasing

Note how the values of $\nu_{\max}$ and $\Delta\nu$ are related to the values of $\Delta x$ and $x_{\max}$:

$\nu_{\max} = \frac{1}{2\Delta x} \text{ and } \Delta\nu = \frac{1}{x_{\max}}.$

What this means is the following:

• If you sample more densely (so $\Delta x$ is smaller), then you can pick up higher frequencies.
• If you sample for a longer domain (so $x_{\max}$ is larger), then you can pick up lower frequencies.

The highest frequency that you can pick up, $1/2\Delta x$, is called the Nyquist frequency and is equal to half the sampling rate of the data. If your function in fact oscillates at a higher frequency than this, then your sampled data will not pick up that component properly, and many of your results will be inaccurate. This effect is known as aliasing.

Applications

The Fourier transform can be used to solve many different problems, and is particularly useful for solving linear partial differential equations. Numerical PDE solvers that make use of the DFT are part of a class of numerical schemes known as spectral methods. For linear problems where the solution is oscillatory or where a ‘conservation law’ (usually analogous to conservation of mass or momentum) applies, spectral methods tend to be better than finite differences. The DFT can also be useful for quasilinear PDEs, using a family of methods known as pseudospectral methods.

An introduction to an introduction to Mathematica

This is the first post in a series on Mathematica. The series is meant to complement and supplement a five-day course in basic Mathematica that I shall be giving at Cambridge in June 2016. I will post after each lesson if I feel that there is something that needs to be clarified, or if some example code will be useful, but these might not be complete course notes.