Setting up a simple — and cheap — home backup system

TL;DR: Hardware: USB hard drive attached to a Raspberry Pi. Software: Unison (with some caveats). Internet connectivity: No-IP (for now).

I decided to set up a home backup system. The system should:

  • be accessible across the Internet, not just from home;
  • be suitable for multiple users (so that, for example, my housemate can use it as well);
  • be suitable for syncing between multiple computers, flagging up conflicts;
  • be cross-platform, especially on Linux machines but also Windows and OS X;
  • be ‘self-sufficient’, relying as little as possible on third parties, and being as open and standards-compliant as possible.

There are a lot of hard drives marketed as ‘home cloud’ solutions, which offer the first three points but not the last two. All the ones that I could find make  you use their proprietary software, and given my tribulations with Time Machine, my distrust in black-box systems is once again where it should be. These software usually do not support Linux, and sometimes don’t even support OS X.

On the other extreme, plenty of people on Linux forums such as StackOverflow suggested simply using rsync, which is a bit too bare — in particular, it doesn’t do conflict resolution, just overwrites stuff.

I eventually decided to construct my own system, using a 4 TB USB hard drive and a Raspberry Pi. The total cost, including peripherals for the Pi, was around £160, although I plan to install further drives, including an SSD. The power consumption is under 10 W, which works out as about £1/month, depending on electricity prices.

Here’s how I set it up.

Preparing the system

When the Raspberry Pi turns on, it mostly worked out of the box (although I had some difficulty following the instructions, and couldn’t work out how to turn it on!). I attached the peripherals and plugged it into a TV using an HDMI cable. I changed the default account’s password, changed the hostname (calling it resilience), set up an account for myself and one for my housemate, and turned on SSH. The Wi-Fi adaptor that came with it doesn’t seem to work (and I can’t figure out why), so I connected it to the router via Ethernet. This required moving the Pi away from the TV, which is why I set up SSH first.

I prepared (a partition on) the drive by formatting it as ext4, and connected it to the Pi. I edited /etc/fstab so that the drive would be automatically mounted on startup. This particular drive is called asclepius and is mounted on /media/asclepius. I created a subdirectory for each user, making sure to set ownership and group-ownership as necessary.

The drive needs to have its own power supply, either built-in or by using a USB hub. The Pi by itself is unable to supply the required power through a USB connection.

Synchronisation software

Unison seems to provide many of the features that I need, and a reasonably friendly interface (although I haven’t tried the GUI interface, or on Windows) as well as good documentation.

When installing Unison, one has to note that different versions are not cross-compatible. The Pi’s Raspbian repositories, as well as the machines at DAMTP, currently offer version 2.40.102. For my laptop, the Ubuntu repositories offer a later one, so I had to build version 2.40.102 myself. This wasn’t too difficult.

Addendum (7 November 2017): It turns out that same versions of Unison may be incompatible with each other if they were compiled using different versions of OCaml, due to a change in OCaml’s serialisation format between versions 4.01 and 4.02 (and there’s no guarantee that further changes won’t happen). I therefore decided to also build ocamlopt from source instead of relying on the repositories, going with version 4.02.

I’m not sure what got updated in the Ubuntu repositories to break my setup, but there seem to have been similar problems with Homebrew users in the past. Annoyingly, because the serialisation formats have differed, my reinstall of Unison has wiped the synchroniser state, so that the lengthy process of state detection must be repeated.

Therefore, let this be a lesson: When creating a new protocol, define your own file format properly, rather than relying on third-party algorithms.

Connecting to the outside world

In order to make resilience accessible from the outside world, I had to configure our router to forward SSH connections on Port 22 to resilience.

Like most residential connections, we are on DHCP and so our IP address changes from time to time. To get around this, I am temporarily using the free No-IP service, although we should contact our ISP to request a static IP address.

The connection to the outside world is quite slow, since it is going through a residential connection. The connection is sometimes unreliable and might break once every few hours, but Unison apparently handles interruptions well.

Further notes

Unison has a number of features for customisation, which I haven’t fully explored yet.

It might be useful, especially if I have more than two users on the system, to set up quotas on the disc. Alternatively, each person could supply their own disc.

Recovering a Time Machine backup onto Linux

When my laptop was stolen at the weekend, I was fortunate that most of my stuff had been backed up using Apple’s Time Machine, onto an external hard drive. However, Apple doesn’t make it easy to recover these files onto a non-OS X system.

I’ve tried a number of solutions, such as fuse-time-machine and tmfuse. While I’ve managed to copy across my text files (and fortunately most of my writing is in LaTeX), a lot of my binary files (PNGs and PDFs) have been corrupted in the process.

So, I’d welcome advice from anyone who’s successfully recovered data from a Time Machine backup.

Cambridge’s spam filter

I’ve been fascinated to learn about how SpamAssassin, the system used by Cambridge University’s email system, works. It assigns a score to each incoming message, based on the reputation of the sender (whether they are blacklisted or from trusted domains) and the contents of the message. Other technical flags are noted as well. If the score is sufficiently high then your email client will put that message into your ‘junk’ folder.

Here’s an example that I received a while ago. Interestingly, the flag LOTS_OF_MONEY doesn’t attract any score.

Received: from ppsw-42.csi.cam.ac.uk (ppsw-42-intramail.csi.cam.ac.uk [192.168.128.142])
	 by cyrus-1a.csi.private.cam.ac.uk (Cyrus v2.4.17) with LMTPA;
	 Wed, 05 Apr 2017 17:54:16 +0100
X-Sieve: CMU Sieve 2.4
X-Cam-SpamScore: ssss
X-Cam-SpamDetails: score 4.3 from SpamAssassin-3.4.1-1786853 
 * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
 *      trust
 *      [209.85.220.193 listed in list.dnswl.dnsbl.ja.net]
 *  0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source
 *      [209.85.220.193 listed in dnsbl.sorbs.net]
 * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
 *      [209.85.220.193 listed in wl.mailspike.net]
 *  1.5 SUBJ_ALL_CAPS Subject is all capitals
 *  0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 *       (faridsagbohan[at]gmail.com)
 * -0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20%
 *      [score: 0.0738]
 * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
 *      author's domain
 *  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
 *      valid
 *  1.4 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
 * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
 *  0.0 LOTS_OF_MONEY Huge... sums of money
 * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
 *  1.0 FREEMAIL_REPLY From and body contain different freemails
 *  0.0 T_MONEY_PERCENT X% of a lot of money for you
 *  0.0 MONEY_FRAUD_8 Lots of money and very many fraud phrases
X-Cam-ScannerInfo: http://help.uis.cam.ac.uk/email-scanner-virus
[...]
From: Rohany hosan 
Date: Wed, 5 Apr 2017 17:54:14 +0100
Message-ID: 
Subject: DEAREST FRIEND
To: undisclosed-recipients:;
Content-Type: text/plain; charset=UTF-8
Bcc: jmft2@cam.ac.uk

Some privacy practices when using Facebook

Tor

I have recently started using the Tor Browser when browsing Facebook and Twitter (although I need to get into the habit of doing so consistently). Amongst other things, Tor can help to protect against IP address tracking (although it is not bulletproof!). I find it unnerving that Facebook and Twitter are able to discern who my housemates and officemates are, even if I have had almost no interaction with them. (I don’t mind being associated with my particular housemates and officemates at the moment, but some people may mind that.)

Mobile website and JavaScript

I also use the mobile Facebook website on my laptop, and avoid it completely on my phone. The mobile Facebook website can run without JavaScript, which you can disable using NoScript, a Firefox extension which is automatically shipped with the Tor Browser. Disabling JavaScript protects against some of their data collection habits, such as cursor tracking and reading from unsubmitted forms.

Facebook is not the only website that practises such habits with JavaScript: another example is described here. The Independent‘s website is particularly bad as far as JavaScript is concerned: many articles on their site do not display properly unless you allow a load of JavaScript from third parties, including many advertisers.

Clickjacking

When you click on outgoing weblinks on Facebook, they do not directly take you to the desired website. Instead, they take you first via a tracking page: note the URL at the bottom:

This intermediate tracking page allows Facebook to know what links you have clicked on, at what time and in what context. (A browser doesn’t send this information to the webserver of the page containing the link.) To partially overcome this, I give the full URL when posting a link. A reader can then copy this URL and go to it directly, skipping the intermediary.

Since many users are not aware of Facebook’s outlink tracking, it should be considered a form of clickjacking.

Conclusion

These practices are a start, in lieu of managing to persuade one’s friends to migrate to a more private (and oftentimes more open, somewhat ironically) platform. Of course, Facebook may still harvest information about me, including anything I post, the things that I click on (or not click on), and anything said about me by other people.

By the way, you should care about privacy even if you are not a criminal and have ‘nothing to hide’. That claim is patently false: we all have secrets which we would like to keep, even if they are not illegal. Society has stigmas, and an unscrupulous government or employer could use its illicitly-obtained knowledge about your relationship, which you might like to keep quiet about, to blackmail you. And while it’s unlikely that any Facebook employee would personally be interested in your life, the psychological profiling that they may obtain from tracking your cursor could be very lucrative for an insurance company. And, of course, we often talk about other people behind their backs, in unflattering ways.

What techies are missing in the debate over surveillance

I recently started volunteering for Julian Huppert’s campaign to become the Liberal Democrat MP for Cambridge. (For more on that, go to their website.) Some of the other volunteers were members of the tech sector; as such, they used a lot of encryption in their work, had a lucid understanding of how encryption works. Of course, we are all very strongly worried by the attitudes towards Internet surveillance and encryption that Theresa May and the Conservative Party seem to hold. These includes last year’s Snoopers’ Charter, which gives the option of requiring ISPs to hand over users’ browsing history to the state (and not just to police and security agencies, but also other, unrelated, branches of government). More recently, the section on digital issues in the Conservative Party manifesto* contains rather troublesome proposals, including

  • Verify, a single digital ID system to be used for both government services and private services such as banking, and
  • the words ‘we do not believe that there should be a safe space for terrorists to be able to communicate online and will work to prevent them from having this capability’.

Unsurprisingly, the Manchester bombing last week will be used to justify activating the Snoopers’ Charter (but only after the election, of course!).

* I actually rather like some other parts of that section in the manifesto, especially ‘central and local government will be required to release information regularly and in an open format’; such a process would be long and costly, but would be very useful for future policymakers.

Like me, Julian and many others, they were quick to point out how heavily encryption is used in day-to-day, perfectly innocuous transactions over the Internet. (See also this piece by the web company Mythic Beasts.) We also knew how surveillance or web censorship could be defeated, using freely available tools such as Tor. Despite all these things, the Tory attitude towards Internet surveillance stands popular; Labour and the SNP abstained in the vote over the Snoopers’ Charter.

Why are we doing so poorly in this argument? One reason is that

The widespread public understanding of encryption is not accurate.

Or, more facetiously:

The debate over encryption is not a debate over encryption.

Okay, my use of the phrase ‘widespread public understanding of encryption’ may be a little hyperbolic, since I can’t speak for the country as a whole. But I think it’s clear that plenty of people don’t understand that normal people use encryption, not just criminals, perverts and terrorists. In some ways, this is laudable: it illustrates how computer and software manufacturers have been able to preconfigure their systems so that people can use them safely without having to think about all the processes (like encryption) that go on under the bonnet. The fact that computing is so accessible is a good thing. One should not need an understanding of mechanical engineering and combustion chemistry in order to drive a car.

However, the same sort of accessibility means that there is a large disjunction between how most people use their computers, and how techies use them. (I know ‘techies’ is a very loose term.) It’s true that policies such as censorship, surveillance and ‘bans’ on encryption can be defeated easily by those with the technical know-how. This doesn’t mean that the policy is moot, because

The effectiveness or otherwise of any policy depends on social factors as well as its technical merits.

Many people will go along with these authoritarian digital policies, reasoning along the lines of ‘I have nothing to hide, so I have nothing to fear’, or ‘we should do anything to keep our children and our country safe’. How else is it that the Great Firewall of China manages to keep a billion people in check, despite its many weaknesses?

The upcoming election may be a fait accompli as far as this issue is concerned. Labour is not devoted to protecting digital liberties, while the Conservatives are keen to abolish them. (Perhaps a third party, either in a coalition or in opposition, may be strong enough to moderate the government on this issue, but neither the LibDems nor the Greens are likely to be strong enough to do that effectively.) As we continue campaigning on this issue until and after the election, we must not focus too much on the technical weaknesses. In doing so, we’d risk blinding people with endless facts about Tor, VPNs, RSA and other obscure three-letter words and acronyms. Instead, we must focus on the social harms of a surveillance state and the benefits of personal privacy (including as a matter of LGBT+ rights).

Petition to the UK government: ‘Recognise the importance of citizens’ access to encryption’

I’ve just submitted a petition (indeed, my first) to the UK government. The petition is still in the sponsorship stage, but you can click this link to sign it. Once it becomes live I shall put the updated link here. The petition became live on 7 April, and can be found here. The text is below:

The government must recognise the personal and economic benefits to encryption, and that any backdoor into WhatsApp cannot remain exclusive to GCHQ, but would soon become known to foreign intelligence services or criminal groups.

Home Secretary and Europol are demanding companies such as WhatsApp to install backdoors so that security services may read suspected terrorists’ messages. (Times, 27.03.17) The UK government may have ‘noble’ aims, but any backdoor would soon be found by the Russian or Chinese intelligence services. This would make the UK vulnerable to economic espionage, and have a chilling impact on dissidents in those countries. It could also be exploited by groups such as Anonymous, which may use intercepted messages to harass vulnerable groups such as LGBT+ people. T

Unfortunately, the petition had a character limit, so here are a few more words about the issue.

The petition is in response to the Home Secretary Amber Rudd’s demand towards (and plans to force) messaging services such as WhatsApp, Telegram and Apple iMessage, which offer end-to-end encryption for their users, to open up backdoors for the UK security services, ostensibly as a response to the reports that the Westminster attacker Khalid Masood used WhatsApp to communicate, possibly in order to plan the attack (although this is not known). The government argues that this is just the modern equivalent of the traditional practice of steaming open the envelopes carrying letters of suspected criminals, but the analogy is a poor one. Never did the police have the power to systematically steam open all envelopes, without supervision. They were subject to limited jurisdiction; the American or Russian police had no right to enter a British post office and open the letters there.

The adage that ‘if you have nothing to hide then you have nothing to fear’ would be a valid argument iff (a) the British security services were the only people with the means to read your communications, and (b) their only motives were to prevent crime and terrorism, for some suitable definition of ‘crime’ and ‘terrorism’. The first assumption is a terrible one. There have been countless examples of individuals or small groups finding weaknesses in widely-used software, such as the Heartbleed bug and Shellshock. What is there to stop a third party from finding and opening a backdoor intended only for GCHQ? It is a longstanding principle of cryptography that ‘security by obscurity‘ offers very little security. Once the weakness becomes available to others, the second assumption also goes out of the window. Unfortunately, the Russian and Chinese police and intelligence agencies have rather different ideas about what counts as ‘terrorism’. By forcing messaging companies to open up loopholes in their encryption, the UK government would be indirectly supporting the surveillance mechanisms of those states.

In fact, even the UK’s police and intelligence services should not be idolised (although it was tempting to do this after a police officer died in the Westminster attack). A day before the attack, it was reported that the Met Police spied on Greenpeace activists, in coordination with Indian police and mercenary crackers. Greenpeace may have more destructive elements, but these activists were mostly peaceful protestors and the surveillance could not be justified as being in order to pre-empt a criminal act.

Moreover, groups such as Anonymous have habitually practised the ‘doxing‘ of individuals, as in the Gamergate controversy, releasing personally sensitive information about other people. For example, some gay and transgender people have been threatened with being outed, as a means of blackmailing or otherwise harassing them. Being gay or transgender isn’t illegal in most of the West, but it can still have a social stigma that is strong enough to make this an effective tactic. This sort of abuse would only become much more common if its practitioners were able to intercept the messages of vulnerable people. Hence, privacy should be regarded as an LGBT+ issue as well.

A purely military solution cannot win a war. This truth has been expounded by military thinkers such as Sun Tzu and Clausewitz, and we continue to learn it the hard way. In the warfare of the computer era, a purely technical solution can be no better. A backdoor may help the police find the motives and co-conspirators of Khalid Masood in this instance, but it cannot be seen as a panacea for terrorism. People will still become terrorists or dissidents if they are drawn by political or social causes, and it is at these that we must strike.

A monopoly on communication

I came across this article by Salim Virani which describes some of the transgressions that Facebook makes. This goes well with Richard Stallman’s Reasons not to be used by Facebook.

It is more convincing, however, to read the list, prepared by Facebook itself, of a subset of the data that it collects (and saves permanently). Every search, every message, every defriending, every poke. They let you download and view a subset of this subset. For me, this download comes to around 80MB (of which around 31MB is ‘private’ messages).

xkcd: Infrastructures

xkcd: Infrastructures

A few months ago, Facebook disabled Messages on its mobile website in an attempt to get people to download the Messenger app. I refuse to do that, and because of that, I now mostly use SMS messages and emails to contact most of my friends or contacts. (There is a workaround by using the ‘basic mobile’ website, which offers limited (but sufficient) functionality.) Completely leaving Facebook has proved difficult because (a) there isn’t an adequate substitute for group chat, and (b) there are some people for whom I have no other means of contacting. When we meet new people, we no longer share contact details such as our email (or physical) addresses or phone numbers: The default is to add newly-met people on Facebook and to conduct all communication there (and asking for other contact details is seen, ironically, as too personal).

A first step towards society moving away from Facebook should be that we start sharing our contact details properly, as we used to. Mine are available on the home page of this website, and I invite you to tell me yours as well.

Addenda

  • For group chat, WhatsApp and Skype are problematic for the same reasons. Regretfully, IRC and XMPP are not so widely used, even though they have good merits as open and decentralised protocols. Part of the reason is that Windows and OS X do not ship with an IRC client.
  • This is not primarily about privacy or security. Like any other form of communication, emails, SMS messages and IRC may be intercepted. The advantage of these systems is that they are decentralised, in the sense that your communications are not controlled by a single company. You can switch between email and mobile providers quite easily.
    • Unfortunately, many people’s personal emails are from Google Mail or Yahoo! Mail, which means their email address–that is, part of their identity–is tied to Google or Yahoo!.
    • Although WhatsApp and Skype promise end-to-end encryption, they are closed-source, centralised systems and you have no guarantee of it.
    • Encrypting emails is relatively straightforward using tools such as Enigmail.
  • While they are useful as soapboxes, Facebook, Twitter, Reddit and such are vulnerable to censorship (as discussed in Virani’s post) and therefore should not be exclusively relied upon.
    • An article such as this one (written to, stored on and displayed from my personal website) is at the mercy of only my ISP (currently the SRCF at the University of Cambridge), which would have no motive in taking down this website, and even then I could switch to a different ISP. (Unless, for example, my ISP receives a court order for a takedown, for example if I write hate speech.)
  • On either privacy or censorship, there is little that you can do against a sufficiently determined eavesdropper or adversary. Courts can order takedowns and intelligence services and police have the technical capability to tap lines and crack passwords. Whether they should use these privileges is a subject for policymakers, but there is no reason to give Facebook these privileges as well.

Task scheduling in MPI

I wrote this simple parallel task-scheduling system in C++/MPI, following discussions with Juha Jäyykä. It seems to work most of the time, but fails if jobs finish so quickly or so close to each other that a node does not have the change to update nexttask_ind.

I can’t work out the exact failure criterion, and would welcome any other advice with the code.

/* A simple task scheduler in C++ and MPI. 
 * Usage: 
 *   $ mpic++ taskscheduler.cpp -o taskscheduler 
 *   $ mpirun -np 4 ./taskscheduler Ntasks
 */
// see http://stackoverflow.com/questions/11180624/mpi-task-scheduling
// and
// http://stackoverflow.com/questions/12810391/mpi-asynchronous-broadcast-gather#12810617
#include<iostream>
#include<mpi.h>
#include<unistd.h>
#include<assert.h>
#include<time.h>
#include<stdlib.h>

int main (int argc, char** argv) {
    MPI_Init (&argc, &argv);      /* starts MPI */
    MPI_Request mpireq;
    MPI_Status* mpistatus;

    assert(argc > 1);
    int Ntasks = atoi(argv[1]);
    
    /* get number of nodes */
    int nnodes; 
    MPI_Comm_size (MPI_COMM_WORLD, &nnodes);        
    /* get my node rank */
    int rank; 
    MPI_Comm_rank (MPI_COMM_WORLD, &rank); 
    /* get my node hostname */
    char hostname[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(hostname, &name_len);

    /* This is the task that we are currently working on. */
    int mytask_ind;
    /* This buffer stores the next task that needs to be done. */
    int nexttask_ind = nnodes;

    /* We start off by giving task i to node i. */
    mytask_ind = rank;
    
    srandom(rank);
    
    printf("Node %d signing on.\n", rank);
    while (true)
    {
        /* Do our task, if there is any more work to do. */
        if (mytask_ind >= Ntasks) 
            break;

        /* Do our task. */

        printf("Node rank %d on %s is now doing task %d.\n", 
                rank, hostname, mytask_ind);
        /* Whatever we're doing takes a while... */
        int usec = 0;
        while(true)
        {
            if (random() % 100000 == 0) 
                break;

            usec++;
            usleep(1);

            /* Every now and then, check for updates to nexttask_ind from other
             * nodes. */
            if (usec % 100 == 0)
                for (int node = 0; node < nnodes; node++)
                    if (node != rank)
                    {
                        int flag = 0;
                        MPI_Iprobe(node, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, mpistatus);
                        if (flag)
                        {
                            MPI_Irecv( &nexttask_ind, 1, MPI_INT, node, MPI_ANY_TAG, MPI_COMM_WORLD, &mpireq );
                        }
                    }
        }


        /* Advertise the fact that we have started work on this task, by
         * updating nexttask_ind. */
        mytask_ind = nexttask_ind;
        nexttask_ind++;
        for (int node = 0; node < nnodes; node++)
            if (node != rank)
                MPI_Isend( &nexttask_ind, 1, MPI_INT, node, 0, MPI_COMM_WORLD, &mpireq);

    }

    MPI_Finalize();
    return 0;
}

libpnghelpers: A simple C library for PNG files

I have made a small and simple library for creating PNG files in C. The source code is available here.

The library (especially the workhorse inside it) is largely taken from Ben Bullock’s tutorial. I added some extra functions: a constructor and a destructor are provided, some error-handling is added, and there is a routine that creates an image out of a two-dimensional array of doubles.

To compile and install: Change the INSTALL_PREFIX in Makefile to an appropriate location (possibly something like /home/username/local or /usr/local, and then run

$ make
$ make install

To use, compile your code and link using the flags -lpng -lpnghelpers.

Some example codes are provided, although usage should be self-explanatory.

About this website

This is the first post on this website. At the moment, it consists of just a few pages describing me and my work. Occasionally, I will post things about maths, science, computing, Chinese history, or anything else that interests me.

I am starting this blog as part of a migration away from Facebook. There are several reasons for this, many of which are privacy-related. Richard Stallman and the Free Software Foundation has a more detailed list of Facebook’s transgressions. Below I explain some of the main points.

Privacy

When people talk about ‘privacy’ on a social network such as Facebook, they often think about controls that keep co-workers, bosses or students from seeing posts that they make in their personal lives. This is an important aspect of privacy, and while Facebook does not offer complete protection, it has made improvements, and a savvy user can achieve these controls quite easily.

But the true danger to privacy that Facebook presents is that Facebook themselves may read posts or things said in supposedly private conversations between users. The danger is not that Mark Zuckerberg will personally read your conversations and use it for blackmailing or shaming you. Rather, it is your usage patterns, writing style or unconscious behaviour which give away the most interesting information about you. Facebook is also capable of tracking your browsing habits on other sites. Logging out doesn’t protect you from this tracking.

The upshot: Even if you never write a message or post a status explicitly stating anything, and even if you give a false name, age or gender, it is easy to build a detailed profile of you, by linking together all of the information that is collected.

Targeted advertising is not a huge worry for me; I never pay attention to adverts anyway. I am most concerned by the prospect of medical information being collected or deduced: an insurance company could use this to set my premiums, or a prospective employer could discriminate against me based on my medical conditions. (The latter may be illegal, but that wouldn’t necessarily stop them.) This is not an unfounded concern: one of my friends noted that she was getting adverts targeted towards one of her conditions.

Ownership, openness and censorship

Centralised, proprietary systems such as Facebook, but also other networks such as Tumblr or WordPress.com, are not a sensible medium for storing or publishing media such as articles or photos. The danger comes from (a) the possibility that the service could be terminated with little or no warning, causing your media to be lost, and (b) the possibility of the host censoring your media.

I don’t know anything about copyright law or fair use, but the prospect of Facebook using my photos as their own (perhaps selling them off as stock photos, for example) is actually a fairly minor concern for me.

Facebook can censor posts arbitrarily. In 2014, it removed a photo of breastfeeding. In practice, its censorship seems to be motivated not by its own morality, but its desire to keep itself unblocked in countries such as Russia and Turkey. It does this by censoring pages of dissent, essentially to appease the Russian and Turkish governments.

Although there is no evidence of WordPress.com doing the same, one has no guarantee against it.

This website is hosted independently server in Cambridge (but independent of the University Computing Service), and is far less vulnerable to this sort of censorship. If I posted something illegal, libellous or extremely controversial, then the service provider may order the shutting down of this site or the government may order my arrest, but these powers are subject to public oversight, and are harder to abuse.

(Note that WordPress.com refers to the blog hosting service; this website is powered by the software WordPress but is hosted independently.)

Facebook as a walled garden

While Facebook can be useful for sharing things amongst immediate friends, the audience of such posts is in most cases ultimately limited to other users of Facebook. Hence Facebook is not really such a public platform. (Contrast that against this post, for example, which can be read by anybody on the Internet.)

Student unions often use Facebook to make announcements, rather than university email. This means that announcements, including important announcements such as upcoming committee elections, never reach students who are not on Facebook or not connected to the rest of the student body. This is undemocratic, and particularly affects mainland Chinese students.

Epilogue

Writing this has taken much longer than I had expected, and I need to go and do some work now, but hopefully it will be useful for persuading some other people to leave, perhaps reverting to email (or even face-to-face contact!) for communication.