Type inference for lazy LaTeXing

I am doing some work with asymptotic expansions of the form

 h = h^{(0)} + \epsilon h^{(1)} + O(\epsilon^2)

and I don’t care about second-order terms. The parentheses are there to indicate that these are term labels, not powers. But actually, there’s no need to have them, because if I ever need to raise something to the zeroth power, I can just write 1; and if I need to raise something to the first power, I don’t need to write the power at all. So, there’s no confusion at all by writing h^0 instead of h^{(0)} ! If I need to square it, I can write h^{02}. If I need to square h^{(1)}, then I can write h^{12}; it’s unlikely I’ll need to take anything to the 12th power.

It’s an awful idea and a sane reviewer would reject it, but it does save time when LaTeXing…

Using Matrix instant messaging

Following my recent rant on decentralising our communications, I’ve started trying out the Matrix communication protocol on the suggestion of a friend. It’s a wonderful idea, and it’s great that the network can be connected to by various different clients. And it seems to be very easy to add people to the network: you just need to give their email address(*) to invite them. The Riot.im is quite easy to use for basic usage, although there are some nuances that I haven’t got used to yet.

One thing that’s not immediately obvious is how you refer to things. On Twitter, you can refer to people as @jftsang and to groups as #example. On IRC networks, channels are usually called #channel or ##unofficialchannel.

Well, on Matrix, user IDs take the form @username:server, such as @jftsang:matrix.org. The latter part tells you about the homeserver of the user, which is needed because Matrix is a distributed network and different users might be accessing through different servers. Rooms take the form #room:server, and communities take the form +room:server. I’m not yet sure what the relationship between rooms and communities is.

(*) Ten years of relying almost exclusively on Facebook means that we tend not to have many of our friends’ email addresses. The situation was particularly bad when Facebook tried pushing their @facebook.com email addresses, which fortunately didn’t catch on.

I would recommend anyone interested in a free-as-in-speech-and-as-in-beer IM service to try this out; send me a message on @jftsang:matrix.org on Matrix, or giving me your email address so that I may invite you.

The LaTeX psalm chant

LaTeX’s output, showing its hyphenation algorithms at work, makes me want to set my bibliography to plainchant:


[19] [20] [21] (./blasius.bbl
Underfull \hbox (badness 1210) in paragraph at lines 13--15
[]\OT1/cmr/m/sc/9 Andreotti, Bruno, Forterre, Yo[]el & Pouliquen, Oliver \OT1/c
mr/m/n/9 2013 \OT1/cmr/m/it/9 Gran-u-lar Me-dia\OT1/cmr/m/n/9 .
[22]
Underfull \hbox (badness 6396) in paragraph at lines 156--158
[]\OT1/cmr/m/sc/9 Peregrine, D. H. \OT1/cmr/m/n/9 1967 Long waves on a beach. \
OT1/cmr/m/it/9 Jour-nal of Fluid Me-chan-ics

Underfull \hbox (badness 5954) in paragraph at lines 180--184
[]\OT1/cmr/m/sc/9 Rajchenbach, Jean \OT1/cmr/m/n/9 2005 Rhe-ol-ogy of dense gra
n-u-lar ma-te-ri-als: steady, uni-form

Underfull \hbox (badness 10000) in paragraph at lines 180--184
\OT1/cmr/m/n/9 flow and the avalanche regime. \OT1/cmr/m/it/9 Jour-nal of Physi
cs: Con-densed Mat-ter
[23]) [24] (./blasius.aux)

Free speech and free beer

Facebook is in the news in both the UK and the USA for its dealings with Cambridge Analytica, in which Facebook users’ personal information was handed over to the political consultancy firm. There are a couple of problems with the way this is being reported in the media.

  • We, as mere users of Facebook, are not Facebook’s customers. We are its products. We refer to Facebook as a ‘service’ because we tend to think of it in much the same way as we think of the fire service, the armed forces or the NHS. But of these four things, only one of them is privately owned and largely unregulated.
  • This was not a data breach, it was a data leak if not a data transfer.
  • I haven’t read Facebook’s terms of use in any detail recently, but its almost certain that they permit this sort of data transfer to third parties (or at least does not specifically forbid them). It’s not a secret.
  • Even if it were a secret, it should be unsurprising. The selling of personal information is Facebook’s business model, which should be surprising to nobody. What do you think is paying for the upkeep costs of a website serving billions of people each day, doing so at zero price?
  • Although the #deletefacebook and #boycottfacebook campaigns are currently enjoying a surge in popularity and have the backing of prominent figures, they won’t result in immediate or significant change: I have said before that Facebook holds a monopoly on communication. This is exemplified by the fact that these campaigns are taking place, to a large extent, on Facebook.
  • Moreover, if we all migrate to a new social network (just as we have all left MySpace and Bebo), then the new company would be just as abusive towards its users. In any case, there is currently no centralised, zero-priced website that offers all of the things that Facebook offers its users: a soapbox, an address book, a calendar and an instant messaging service. Nor is there likely to be such a website in the future. This is a matter of economics; the ‘market’ for social networking websites admits natural monopolies.
  • As much as Brexit and Trump are regrettable, the fact that Cambridge Analytica was behind the two campaigns, helping to spread fake news and influencing the two votes to a strong degree, is a distraction. The real danger is our willingness to sign up to such a website in the first place.

For users of social networks, the only long-lasting solution is to break away from the centralised model and start using decentralised models, such as diaspora*. For instant messaging, using something like IRC would be a good first step towards decentralisation. As for having a soapbox, I run my own website (which you are reading now), because I want to have ownership and control over what I write. I have the freedom to amend or retract my content without the information being held in perpetuity by another party. I can make content accessible to a select group of friends without it also being accessible to any third parties (provided that I trust my friends, which by definition I do). Setting up one’s own website is not technically challenging, if one is happy to pay a monthly fee in exchange.

In summary,

‘Free as in beer’ implies ‘not free as in speech’.

That means you can’t expect to have a social network (or any other ‘service’) that does not charge you and yet entirely respects your digital rights.

Setting up a simple — and cheap — home backup system

TL;DR: Hardware: USB hard drive attached to a Raspberry Pi. Software: Unison (with some caveats). Internet connectivity: No-IP (for now).

I decided to set up a home backup system. The system should:

  • be accessible across the Internet, not just from home;
  • be suitable for multiple users (so that, for example, my housemate can use it as well);
  • be suitable for syncing between multiple computers, flagging up conflicts;
  • be cross-platform, especially on Linux machines but also Windows and OS X;
  • be ‘self-sufficient’, relying as little as possible on third parties, and being as open and standards-compliant as possible.

There are a lot of hard drives marketed as ‘home cloud’ solutions, which offer the first three points but not the last two. All the ones that I could find make  you use their proprietary software, and given my tribulations with Time Machine, my distrust in black-box systems is once again where it should be. These software usually do not support Linux, and sometimes don’t even support OS X.

On the other extreme, plenty of people on Linux forums such as StackOverflow suggested simply using rsync, which is a bit too bare — in particular, it doesn’t do conflict resolution, just overwrites stuff.

I eventually decided to construct my own system, using a 4 TB USB hard drive and a Raspberry Pi. The total cost, including peripherals for the Pi, was around £160, although I plan to install further drives, including an SSD. The power consumption is under 10 W, which works out as about £1/month, depending on electricity prices.

Here’s how I set it up.

Preparing the system

When the Raspberry Pi turns on, it mostly worked out of the box (although I had some difficulty following the instructions, and couldn’t work out how to turn it on!). I attached the peripherals and plugged it into a TV using an HDMI cable. I changed the default account’s password, changed the hostname (calling it resilience), set up an account for myself and one for my housemate, and turned on SSH. The Wi-Fi adaptor that came with it doesn’t seem to work (and I can’t figure out why), so I connected it to the router via Ethernet. This required moving the Pi away from the TV, which is why I set up SSH first.

I prepared (a partition on) the drive by formatting it as ext4, and connected it to the Pi. I edited /etc/fstab so that the drive would be automatically mounted on startup. This particular drive is called asclepius and is mounted on /media/asclepius. I created a subdirectory for each user, making sure to set ownership and group-ownership as necessary.

The drive needs to have its own power supply, either built-in or by using a USB hub. The Pi by itself is unable to supply the required power through a USB connection.

Synchronisation software

Unison seems to provide many of the features that I need, and a reasonably friendly interface (although I haven’t tried the GUI interface, or on Windows) as well as good documentation.

When installing Unison, one has to note that different versions are not cross-compatible. The Pi’s Raspbian repositories, as well as the machines at DAMTP, currently offer version 2.40.102. For my laptop, the Ubuntu repositories offer a later one, so I had to build version 2.40.102 myself. This wasn’t too difficult.

Addendum (7 November 2017): It turns out that same versions of Unison may be incompatible with each other if they were compiled using different versions of OCaml, due to a change in OCaml’s serialisation format between versions 4.01 and 4.02 (and there’s no guarantee that further changes won’t happen). I therefore decided to also build ocamlopt from source instead of relying on the repositories, going with version 4.02.

I’m not sure what got updated in the Ubuntu repositories to break my setup, but there seem to have been similar problems with Homebrew users in the past. Annoyingly, because the serialisation formats have differed, my reinstall of Unison has wiped the synchroniser state, so that the lengthy process of state detection must be repeated.

Therefore, let this be a lesson: When creating a new protocol, define your own file format properly, rather than relying on third-party algorithms.

Connecting to the outside world

In order to make resilience accessible from the outside world, I had to configure our router to forward SSH connections on Port 22 to resilience.

Like most residential connections, we are on DHCP and so our IP address changes from time to time. To get around this, I am temporarily using the free No-IP service, although we should contact our ISP to request a static IP address.

The connection to the outside world is quite slow, since it is going through a residential connection. The connection is sometimes unreliable and might break once every few hours, but Unison apparently handles interruptions well.

Further notes

Unison has a number of features for customisation, which I haven’t fully explored yet.

It might be useful, especially if I have more than two users on the system, to set up quotas on the disc. Alternatively, each person could supply their own disc.

Recovering a Time Machine backup onto Linux

When my laptop was stolen at the weekend, I was fortunate that most of my stuff had been backed up using Apple’s Time Machine, onto an external hard drive. However, Apple doesn’t make it easy to recover these files onto a non-OS X system.

I’ve tried a number of solutions, such as fuse-time-machine and tmfuse. While I’ve managed to copy across my text files (and fortunately most of my writing is in LaTeX), a lot of my binary files (PNGs and PDFs) have been corrupted in the process.

So, I’d welcome advice from anyone who’s successfully recovered data from a Time Machine backup.

Cambridge’s spam filter

I’ve been fascinated to learn about how SpamAssassin, the system used by Cambridge University’s email system, works. It assigns a score to each incoming message, based on the reputation of the sender (whether they are blacklisted or from trusted domains) and the contents of the message. Other technical flags are noted as well. If the score is sufficiently high then your email client will put that message into your ‘junk’ folder.

Here’s an example that I received a while ago. Interestingly, the flag LOTS_OF_MONEY doesn’t attract any score.

Received: from ppsw-42.csi.cam.ac.uk (ppsw-42-intramail.csi.cam.ac.uk [192.168.128.142])
	 by cyrus-1a.csi.private.cam.ac.uk (Cyrus v2.4.17) with LMTPA;
	 Wed, 05 Apr 2017 17:54:16 +0100
X-Sieve: CMU Sieve 2.4
X-Cam-SpamScore: ssss
X-Cam-SpamDetails: score 4.3 from SpamAssassin-3.4.1-1786853 
 * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
 *      trust
 *      [209.85.220.193 listed in list.dnswl.dnsbl.ja.net]
 *  0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source
 *      [209.85.220.193 listed in dnsbl.sorbs.net]
 * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
 *      [209.85.220.193 listed in wl.mailspike.net]
 *  1.5 SUBJ_ALL_CAPS Subject is all capitals
 *  0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 *       (faridsagbohan[at]gmail.com)
 * -0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20%
 *      [score: 0.0738]
 * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
 *      author's domain
 *  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
 *      valid
 *  1.4 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
 * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
 *  0.0 LOTS_OF_MONEY Huge... sums of money
 * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
 *  1.0 FREEMAIL_REPLY From and body contain different freemails
 *  0.0 T_MONEY_PERCENT X% of a lot of money for you
 *  0.0 MONEY_FRAUD_8 Lots of money and very many fraud phrases
X-Cam-ScannerInfo: http://help.uis.cam.ac.uk/email-scanner-virus
[...]
From: Rohany hosan 
Date: Wed, 5 Apr 2017 17:54:14 +0100
Message-ID: 
Subject: DEAREST FRIEND
To: undisclosed-recipients:;
Content-Type: text/plain; charset=UTF-8
Bcc: jmft2@cam.ac.uk

Some privacy practices when using Facebook

Tor

I have recently started using the Tor Browser when browsing Facebook and Twitter (although I need to get into the habit of doing so consistently). Amongst other things, Tor can help to protect against IP address tracking (although it is not bulletproof!). I find it unnerving that Facebook and Twitter are able to discern who my housemates and officemates are, even if I have had almost no interaction with them. (I don’t mind being associated with my particular housemates and officemates at the moment, but some people may mind that.)

Mobile website and JavaScript

I also use the mobile Facebook website on my laptop, and avoid it completely on my phone. The mobile Facebook website can run without JavaScript, which you can disable using NoScript, a Firefox extension which is automatically shipped with the Tor Browser. Disabling JavaScript protects against some of their data collection habits, such as cursor tracking and reading from unsubmitted forms.

Facebook is not the only website that practises such habits with JavaScript: another example is described here. The Independent‘s website is particularly bad as far as JavaScript is concerned: many articles on their site do not display properly unless you allow a load of JavaScript from third parties, including many advertisers.

Clickjacking

When you click on outgoing weblinks on Facebook, they do not directly take you to the desired website. Instead, they take you first via a tracking page: note the URL at the bottom:

This intermediate tracking page allows Facebook to know what links you have clicked on, at what time and in what context. (A browser doesn’t send this information to the webserver of the page containing the link.) To partially overcome this, I give the full URL when posting a link. A reader can then copy this URL and go to it directly, skipping the intermediary.

Since many users are not aware of Facebook’s outlink tracking, it should be considered a form of clickjacking.

Conclusion

These practices are a start, in lieu of managing to persuade one’s friends to migrate to a more private (and oftentimes more open, somewhat ironically) platform. Of course, Facebook may still harvest information about me, including anything I post, the things that I click on (or not click on), and anything said about me by other people.

By the way, you should care about privacy even if you are not a criminal and have ‘nothing to hide’. That claim is patently false: we all have secrets which we would like to keep, even if they are not illegal. Society has stigmas, and an unscrupulous government or employer could use its illicitly-obtained knowledge about your relationship, which you might like to keep quiet about, to blackmail you. And while it’s unlikely that any Facebook employee would personally be interested in your life, the psychological profiling that they may obtain from tracking your cursor could be very lucrative for an insurance company. And, of course, we often talk about other people behind their backs, in unflattering ways.

What techies are missing in the debate over surveillance

I recently started volunteering for Julian Huppert’s campaign to become the Liberal Democrat MP for Cambridge. (For more on that, go to their website.) Some of the other volunteers were members of the tech sector; as such, they used a lot of encryption in their work, had a lucid understanding of how encryption works. Of course, we are all very strongly worried by the attitudes towards Internet surveillance and encryption that Theresa May and the Conservative Party seem to hold. These includes last year’s Snoopers’ Charter, which gives the option of requiring ISPs to hand over users’ browsing history to the state (and not just to police and security agencies, but also other, unrelated, branches of government). More recently, the section on digital issues in the Conservative Party manifesto* contains rather troublesome proposals, including

  • Verify, a single digital ID system to be used for both government services and private services such as banking, and
  • the words ‘we do not believe that there should be a safe space for terrorists to be able to communicate online and will work to prevent them from having this capability’.

Unsurprisingly, the Manchester bombing last week will be used to justify activating the Snoopers’ Charter (but only after the election, of course!).

* I actually rather like some other parts of that section in the manifesto, especially ‘central and local government will be required to release information regularly and in an open format’; such a process would be long and costly, but would be very useful for future policymakers.

Like me, Julian and many others, they were quick to point out how heavily encryption is used in day-to-day, perfectly innocuous transactions over the Internet. (See also this piece by the web company Mythic Beasts.) We also knew how surveillance or web censorship could be defeated, using freely available tools such as Tor. Despite all these things, the Tory attitude towards Internet surveillance stands popular; Labour and the SNP abstained in the vote over the Snoopers’ Charter.

Why are we doing so poorly in this argument? One reason is that

The widespread public understanding of encryption is not accurate.

Or, more facetiously:

The debate over encryption is not a debate over encryption.

Okay, my use of the phrase ‘widespread public understanding of encryption’ may be a little hyperbolic, since I can’t speak for the country as a whole. But I think it’s clear that plenty of people don’t understand that normal people use encryption, not just criminals, perverts and terrorists. In some ways, this is laudable: it illustrates how computer and software manufacturers have been able to preconfigure their systems so that people can use them safely without having to think about all the processes (like encryption) that go on under the bonnet. The fact that computing is so accessible is a good thing. One should not need an understanding of mechanical engineering and combustion chemistry in order to drive a car.

However, the same sort of accessibility means that there is a large disjunction between how most people use their computers, and how techies use them. (I know ‘techies’ is a very loose term.) It’s true that policies such as censorship, surveillance and ‘bans’ on encryption can be defeated easily by those with the technical know-how. This doesn’t mean that the policy is moot, because

The effectiveness or otherwise of any policy depends on social factors as well as its technical merits.

Many people will go along with these authoritarian digital policies, reasoning along the lines of ‘I have nothing to hide, so I have nothing to fear’, or ‘we should do anything to keep our children and our country safe’. How else is it that the Great Firewall of China manages to keep a billion people in check, despite its many weaknesses?

The upcoming election may be a fait accompli as far as this issue is concerned. Labour is not devoted to protecting digital liberties, while the Conservatives are keen to abolish them. (Perhaps a third party, either in a coalition or in opposition, may be strong enough to moderate the government on this issue, but neither the LibDems nor the Greens are likely to be strong enough to do that effectively.) As we continue campaigning on this issue until and after the election, we must not focus too much on the technical weaknesses. In doing so, we’d risk blinding people with endless facts about Tor, VPNs, RSA and other obscure three-letter words and acronyms. Instead, we must focus on the social harms of a surveillance state and the benefits of personal privacy (including as a matter of LGBT+ rights).

Petition to the UK government: ‘Recognise the importance of citizens’ access to encryption’

I’ve just submitted a petition (indeed, my first) to the UK government. The petition is still in the sponsorship stage, but you can click this link to sign it. Once it becomes live I shall put the updated link here. The petition became live on 7 April, and can be found here. The text is below:

The government must recognise the personal and economic benefits to encryption, and that any backdoor into WhatsApp cannot remain exclusive to GCHQ, but would soon become known to foreign intelligence services or criminal groups.

Home Secretary and Europol are demanding companies such as WhatsApp to install backdoors so that security services may read suspected terrorists’ messages. (Times, 27.03.17) The UK government may have ‘noble’ aims, but any backdoor would soon be found by the Russian or Chinese intelligence services. This would make the UK vulnerable to economic espionage, and have a chilling impact on dissidents in those countries. It could also be exploited by groups such as Anonymous, which may use intercepted messages to harass vulnerable groups such as LGBT+ people. T

Unfortunately, the petition had a character limit, so here are a few more words about the issue.

The petition is in response to the Home Secretary Amber Rudd’s demand towards (and plans to force) messaging services such as WhatsApp, Telegram and Apple iMessage, which offer end-to-end encryption for their users, to open up backdoors for the UK security services, ostensibly as a response to the reports that the Westminster attacker Khalid Masood used WhatsApp to communicate, possibly in order to plan the attack (although this is not known). The government argues that this is just the modern equivalent of the traditional practice of steaming open the envelopes carrying letters of suspected criminals, but the analogy is a poor one. Never did the police have the power to systematically steam open all envelopes, without supervision. They were subject to limited jurisdiction; the American or Russian police had no right to enter a British post office and open the letters there.

The adage that ‘if you have nothing to hide then you have nothing to fear’ would be a valid argument iff (a) the British security services were the only people with the means to read your communications, and (b) their only motives were to prevent crime and terrorism, for some suitable definition of ‘crime’ and ‘terrorism’. The first assumption is a terrible one. There have been countless examples of individuals or small groups finding weaknesses in widely-used software, such as the Heartbleed bug and Shellshock. What is there to stop a third party from finding and opening a backdoor intended only for GCHQ? It is a longstanding principle of cryptography that ‘security by obscurity‘ offers very little security. Once the weakness becomes available to others, the second assumption also goes out of the window. Unfortunately, the Russian and Chinese police and intelligence agencies have rather different ideas about what counts as ‘terrorism’. By forcing messaging companies to open up loopholes in their encryption, the UK government would be indirectly supporting the surveillance mechanisms of those states.

In fact, even the UK’s police and intelligence services should not be idolised (although it was tempting to do this after a police officer died in the Westminster attack). A day before the attack, it was reported that the Met Police spied on Greenpeace activists, in coordination with Indian police and mercenary crackers. Greenpeace may have more destructive elements, but these activists were mostly peaceful protestors and the surveillance could not be justified as being in order to pre-empt a criminal act.

Moreover, groups such as Anonymous have habitually practised the ‘doxing‘ of individuals, as in the Gamergate controversy, releasing personally sensitive information about other people. For example, some gay and transgender people have been threatened with being outed, as a means of blackmailing or otherwise harassing them. Being gay or transgender isn’t illegal in most of the West, but it can still have a social stigma that is strong enough to make this an effective tactic. This sort of abuse would only become much more common if its practitioners were able to intercept the messages of vulnerable people. Hence, privacy should be regarded as an LGBT+ issue as well.

A purely military solution cannot win a war. This truth has been expounded by military thinkers such as Sun Tzu and Clausewitz, and we continue to learn it the hard way. In the warfare of the computer era, a purely technical solution can be no better. A backdoor may help the police find the motives and co-conspirators of Khalid Masood in this instance, but it cannot be seen as a panacea for terrorism. People will still become terrorists or dissidents if they are drawn by political or social causes, and it is at these that we must strike.