I am sure this has been discussed many times before. What is the best cipher for SCP to use so you get maximum throughput. So I decided to do a quick survey of the current state of the art and share it.
SSH, in some versions, including the version distributed in Debian Jessie has a cool new option which lists all the ciphers that are supported.
ssh -Q cipher
In case yours does not, you can do
So I put together a little bash script to run on my Debian server ( nothing a special - a single CPU i5 with an SSD ) to copy the same 500MB file ( Debian ISO ) to a target computer via every supported cipher. First I tried this against a Raspberry Pi, and then my Mac Book Pro connected to the same gig switch. Here is the script for your amusement and pleasure.
#!/bin/bash for i in `ssh -Q cipher` do echo "scp -o Cipher=$i ./debian-live-7.7.0-amd64-standard.iso root@pi1:/mnt/sda1" `time scp -o Ciphers=$i ./debian-live-7.7.0-amd64-standard.iso root@pi1:/mnt/sda1 2>&1` echo "" echo "" done
If your particular system does not support the -Q option you can copy and paste the list of ciphers from the 'man ssh_config' and modify the second line of the script to read something like
for i in aes128-ctr aes192-ctr aes256-ctr arcfour256 arcfour128 firstname.lastname@example.org email@example.com aes128-cbc 3des-cbc blowfish-cbc cast128-cbc aes192-cbc aes256-cbc arcfour
First observation - not all ciphers that the ssh client supports are enabled by default on all ssh servers so you will get a bunch of errors. I am not going to enable the additional ciphers, since this is more of a practical guide.
Raspberry SCP results
Here are the throughput results copying the 500 MB file from the PC to Pi and vice versa.
|SCP Throughput with various algorithms using the SD card as storage on the Pi|
Using the USB attached 7200RPM HDD on the Pi as a storage device
|SCP Throughput with various algorithms using USB 7200RPM HDD as storage|
|CPU System usage SD vs HDD|
So what about User CPU per algorithm?
I did get some really interesting results here, but not from the Pi but from the PC. The Pi looked pretty much as expected with less CPU resources being used on algorithms that achieved higher throughput.
|CPU Used by Cipher on the Raspberry Pi|
The data makes almost no sense. The worst performing algorithms in terms of throughput seem to use the least CPU. So what gives? How is it possible for chacha to be using the most CPU yet have the highest throughput?
Acceleration, my dear Watson, acceleration. The Intel CPU on my PC supports the AES-NI instruction set which makes it so that AES based ciphers, especially the GCM family. According to Wikipedia, the acceleration provides an 'increase in throughput from approximately 28.0 cycles per byte to 3.5 cycles per byte'.
This acceleration is not available on the Raspberry Pi CPU, which is actually the bottleneck in this transfer. Since the Intel is not hitting anywhere near a hundred percent, the fact that chacha is using 7.77% vs aes128-gcm at 1.55 makes no difference to our session performance.
The clear winner here when considering single session performance between a Pi and and a non-heavily utilized modern x86 is firstname.lastname@example.org. If, however, the PC is doing a lot of other work, supporting many sessions and generally has its CPU peaking frequently, something like aes128-ctr may be a better choice which gives us a good balance.
PC and SCP - results (beta)
So now that we looked at what the goodness and limitations of the Raspberry Pi, I figured we should take a step further and check out what modern PCs can do. The setup in this case is
Macbook Pro <-- 5 Ghz WiFi --> Router <-- Gig Switch --> Linux PC
You may question the wisdom of using WiFi. I would.
Well, it turns out that I only have a 100Mbs USB2 adapter to use as a wired connection on the Mac. And as you will shortly see, the WiFi actually gives us pretty high throughput. Not quite a gig, but half way there. Good enough for significant results. I will repeat the test again when I go back to work after the holidays and publish a follow up. Maybe even try to push a 10 Gig pipe.
So I did a quick wireless survey, found a clean 5Ghz channel, and put my Mac Book about a foot away from the router. Here is what we saw.
PC -> Mac
The results are very interesting, but lets look a little deeper to find out more.
Lets examine the CPU utilization by the cipher on the client machine. This will tell us how scalable the algorithm is in supporting multiple connections. This chart shows relative CPU percentage utilization during the time of the copy scaled by the duration. The empty space between the top of the column and 100% can roughly be interpreted as idle time.
Note: this should be interpreted as the utilization of a single core
I tried to think of better ways to analyze and present the data, but in the end I felt I can't the story without a raw representation. It tells the story better than any massaging could. This shows the amount of clock time (real), user time (cipher) and system time each algorithm used.
Mac -> PC
For completeness sake - I tried to run the test from Mac -> PC. There were only 3 supported ciphers - essentially the AES XXX CBC family. They had exactly the same results, but slightly better performance. Yawn.
Here are the graphs
|Time consumed by each function in seconds|
When dealing with fully featured modern machines, I would probably choose AES-256-CBC. It seems to have the best throughput/cpu utilization ratio and provides pretty decent security.