Counting semiprimes

The prime-counting function \pi(n) returns the number of prime numbers less than or equal to n. The prime number theorem gives the asymptotic behaviour \pi(n)\sim n/\log n.

We shall call a number semiprime if it is the product of two primes. Let \omega(n) denote the semiprime-counting function. What is the asymptotic behaviour of \omega(n)? I claim that

\omega(n) \sim \frac{n}{\log n} \log\left( \frac{\log n}{\log 2} \right)

Edit (17 September 2016) Based on experiments in Mathematica, the above seems to be a lower bound, while an upper bound is given by

\omega(n) \sim \frac{n \log \log n}{\log n}

I obtain these by the following heuristic, probabilistic argument. The probability that a random integer chosen between 1 and n is \pi(n)/n \sim 1/\log n; loosely, this means that the probability of a randomly chosen large number n is approximately 1/\log n. Similarly, \omega(n) / n is the probability that a randomly-chosen m < n is semiprime.

An exact formula for \omega(n) is

\omega(n) = \sum_{x=2}^{n/2} \sum_{y=2}^{n/x} 1(xy \text{ is prime}) = \frac{1}{2} \sum_{x=2}^{n/2} \sum_{y=2}^{n/x} 1(x \text{ is prime}) 1(y \text{ is prime}).

The sum over y is trivially \pi(n/x) \sim (n/x)/(\log n - \log x). Hence

\omega(n) \sim \sum_{x=2}^{n/2} 1(x \text{ is prime}) \frac{n}{x (\log n - \log x)}.

We split the sum at x = \sqrt n into two parts, and assume that the first part is dominated by the second. In the second part, x is large and we invoke 1(x \text{ is prime}) \sim 1/\log x and approximate the sum as an integral, to get

\omega(n) \sim \int_{\sqrt n}^{n/2} \frac{n}{x \log x (\log n - \log x)} dx.

Evaluating the integral and keeping only the highest-order terms, we get the desired formula.

The semiprime-counting function can be implemented in Mathematica by

SemiprimeQ[n_] := SemiprimeQ[n] = Module[{f},
    f = FactorInteger[n];
    (
     ((Length[f] == 2) && (Total[Last /@ f] == 2))
        || ((Length[f] == 1) && f[[1]][[2]] == 2) )
    ];

w[n_] := w[n] = Length[Select[Range[n], SemiprimeQ]]

(where we memoise the results to speed things up). The following code produces a graph showing the validity of these approximation:

W1[n_] := n/Log[n] Log[Log[n]];
W2[n_] := n/Log[n] Log[Log[n]/Log[2]];
nn = 150000; nj = 500;
Show[
 ListPlot[Table[{n, w[n]/W1[n]}, {n, 1, nn, nj}],
   PlotStyle -> Red],
 ListPlot[Table[{n, w[n]/W2[n]}, {n, 1, nn, nj}],
   PlotStyle -> Green]
 ]

Task scheduling in MPI

I wrote this simple parallel task-scheduling system in C++/MPI, following discussions with Juha Jäyykä. It seems to work most of the time, but fails if jobs finish so quickly or so close to each other that a node does not have the change to update nexttask_ind.

I can’t work out the exact failure criterion, and would welcome any other advice with the code.

/* A simple task scheduler in C++ and MPI. 
 * Usage: 
 *   $ mpic++ taskscheduler.cpp -o taskscheduler 
 *   $ mpirun -np 4 ./taskscheduler Ntasks
 */
// see http://stackoverflow.com/questions/11180624/mpi-task-scheduling
// and
// http://stackoverflow.com/questions/12810391/mpi-asynchronous-broadcast-gather#12810617
#include<iostream>
#include<mpi.h>
#include<unistd.h>
#include<assert.h>
#include<time.h>
#include<stdlib.h>

int main (int argc, char** argv) {
    MPI_Init (&argc, &argv);      /* starts MPI */
    MPI_Request mpireq;
    MPI_Status* mpistatus;

    assert(argc > 1);
    int Ntasks = atoi(argv[1]);
    
    /* get number of nodes */
    int nnodes; 
    MPI_Comm_size (MPI_COMM_WORLD, &nnodes);        
    /* get my node rank */
    int rank; 
    MPI_Comm_rank (MPI_COMM_WORLD, &rank); 
    /* get my node hostname */
    char hostname[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(hostname, &name_len);

    /* This is the task that we are currently working on. */
    int mytask_ind;
    /* This buffer stores the next task that needs to be done. */
    int nexttask_ind = nnodes;

    /* We start off by giving task i to node i. */
    mytask_ind = rank;
    
    srandom(rank);
    
    printf("Node %d signing on.\n", rank);
    while (true)
    {
        /* Do our task, if there is any more work to do. */
        if (mytask_ind >= Ntasks) 
            break;

        /* Do our task. */

        printf("Node rank %d on %s is now doing task %d.\n", 
                rank, hostname, mytask_ind);
        /* Whatever we're doing takes a while... */
        int usec = 0;
        while(true)
        {
            if (random() % 100000 == 0) 
                break;

            usec++;
            usleep(1);

            /* Every now and then, check for updates to nexttask_ind from other
             * nodes. */
            if (usec % 100 == 0)
                for (int node = 0; node < nnodes; node++)
                    if (node != rank)
                    {
                        int flag = 0;
                        MPI_Iprobe(node, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, mpistatus);
                        if (flag)
                        {
                            MPI_Irecv( &nexttask_ind, 1, MPI_INT, node, MPI_ANY_TAG, MPI_COMM_WORLD, &mpireq );
                        }
                    }
        }


        /* Advertise the fact that we have started work on this task, by
         * updating nexttask_ind. */
        mytask_ind = nexttask_ind;
        nexttask_ind++;
        for (int node = 0; node < nnodes; node++)
            if (node != rank)
                MPI_Isend( &nexttask_ind, 1, MPI_INT, node, 0, MPI_COMM_WORLD, &mpireq);

    }

    MPI_Finalize();
    return 0;
}