/Teaching/System Level Programming/Assignments/A6


Pull from upstream before solving this task.


Task 6.1: Inline Assembly

Assembly Language

To be able to execute programs written in high-level programming languages, they first have to be translated into CPU instructions using a compiler. Language constructs and statements of high-level programming languages are CPU architecture independent and do not have any special relationship with individual CPU instructions. The compiler is responsible for selecting and emitting the appropriate instructions which are required for the task at hand.Assembly languages are special kinds of low-level programming languages. Unlike their high-level counterparts, assembly languages are not architecture-independent but instead target a specific instruction set. While language constructs in high-level languages may be compiled into any number of instructions, each statement in an assembly language is translated into one specific CPU instruction. An assembly language can also be thought of as a translation between human-readable mnemonics and binary opcodes readable by the CPU (e.g.jmp = 0xE9). A compiler for an assembly language is also called an assembler.While it is usually a lot more efficient and practical to write code in high-level languages, some things still require low-level assembly instructions because they are simply not possible to accomplish in other programming languages. This is especially true when using CPU hardware features that can only be accessed using special instructions. For instance, the rdtsc instruction, which you may use in the Operating Systems course for precise timing, or the later discussed cpuid instruction. When using the GCC C compiler, it is possible to combine normal code written in C with assembly code in the same source file or even the same function by using GCC inline assembly. In order to gather experience with inline assembly you will implement the first task of this exercise.

The cpuid instruction

On the x86 architecture, the cpuid instruction can be used to retrieve information about the CPU that a program is running on. This includes information about the processor type and manufacturer, as well as information about features and instruction set extensions that are supported by the CPU. The specific category of information to be retrieved can be selected by setting the eax register (and in some cases also the ecx register) to various ID values before executing the cpuid instruction. The CPU then provides the requested information in the eax, ebx, ecx
and edx registers.

Your Task

Use the cpuid instruction to read information about the CPU (5 Points)

Write a program that uses the cpuid instruction to read the following pieces of information about the CPU:

  • Manufacturer ID string (e.g. ‘AuthenticAMD’)
  • Processor brand string (e.g. ‘AMD Ryzen 9 3900X 12-Core Processor’)

You should also determine which of the following features are supported by reading the corresponding feature flags (for more information about these flags, see the resources). It is sufficient to stick to Intels layout for the feature flags.

  • DE
  • PAE
  • HTT
  • TM
  • EST
  • AVX2
  • SMEP
  • RDSEED

Print the retrieved information to stdout via the printf calls provided as comments in the code (see file cpuid/cpuid.c). Do not use any compiler intrinsics or library functions to call the cpuid instruction, only inline assembly. Make sure to properly handle all inputs, outputs, and clobbered registers in your inline assembly blocks. Verify your results by comparing them with the output of cat /proc/cpuinfo.

Resources

Task 6.2: ABI – Calling Conventions

Calling Conventions

Practically all programs are split into modular functions that an application may call from anywhere in the code. In order for this to work, the caller of these functions as well as the called function (the callee) need to have a set of rules that define, e.g. where the parameters for calling the function are stored (in registers or on the stack …), where the return value is stored and which registers the function may use without having to save the previous values. These rules are called Calling Conventions and are part of the Application Binary Interface (ABI). The ABI is similar to an API (Application Programming Interface) only on the instruction level and heavily depends on the architecture and compiler in use. On x86 64-bitLinux, the standard calling convention is the System V AMD64 ABI. (32-bit Linux and Windows use different calling conventions.) The aim of this part of the exercise is to familiarize yourself with the System V AMD64 ABI, which generates an essential understanding you will need to start new threads later in the Operating Systems course for example.

Your Tasks

Task A: Call a function in inline assembly (5 Points)

Familiarize yourself with the System V AMD64 64-bit calling convention and implement a function call using only assembly instructions via gcc inline assembly. Take care to avoid unintentionally interfering with code outside of your inline assembly block by correctly using the clobber list to notify the compiler about potentially modified registers. (If you call a function in an inline assembly block, all side effects of that function call also need to be considered. Hint: see the calling convention  for potential effects you need to take into account)
Use the provided framework in a_caller/caller.c and see the comments for details on what you need to implement.
Do not use C code for this task, only inline assembly!

Task B: Implement a function executing a syscall in assembly (5 Points)

Syscalls

System calls (short syscalls) are well-defined interfaces to communicate with the operating system. Some functionality, like printing some text to the console, requires syscalls, because direct access to the console from userspace is restricted by the operating system. In x86 64-bit syscalls are triggered by the syscall assembler instruction, which reads the value from the rax register to determine which syscall should be executed. After completion of the system call, the operating system passes control back to the userspace program. Therefore system calls are a central part of the design of operating systems and you will get in touch with that concept during this task. The syscall numbers for Linux systems can be found here. The calling convention for syscalls (where the arguments are passed, where the return value goes to, etc) can be found in the man pages: man 2 syscall. In addition, the system call clobbers the registers rax, rcx and r11. So make sure to preserve their content if you need it.

Implementation

In this part, your task is to implement the function fsize in assembly in order to get to know the receiving end of a function call. This function subsequently issues the fstat system call where we are interested in the size of the file.
Use x86 64-bit assembly to implement the function in b_callee/sysv_abi.S. Follow the System V AMD64 64-bit calling convention like in the previous task for the implementation of the function and take care to e.g. save and restore registers as required. You are not allowed to call functions in the assembly file, only syscalls.
Information about the syscall fstat can be found on the man page: man 2 fstat.
Information about how to store and access data from memory (in our case, global memory) can be found here. You will need to define your memory on the .bbs-subsection where you can define memory space using .zero size. Putting it together:

.bss
your_memory: .zero 8

 

struct stat fstat_memory; 
ssize_t fsize(int fd, ssize_t *size) {
    int err = fstat(fd, &fstat_memory);
    if(err){ 
        return 1; 
    }else{
        *size = fstat_memory.st_size;
        return 0;
    }
}

Resources for GCC inline assembly

Inline Assembly Overview
GCC Inline assembly documentation
Output / Input operands
Clobber list documentation
Input/output constraints for gcc inline assembly:

Resources for the System V ABI

System V AMD64 ABI overview

System V ABI on OSDev

Specification of the System V AMD64 ABI (Please note that this is the full specification document and thus quite comprehensive. Better stick to the previous resources as they summarize the most important aspects and only come back to this if you search for details, which are not needed for this assignment.)

Building

For compiling your programs follow these steps:

  • Open the A6 folder in a terminal.
  • Create a new directory called build in A6 using mkdir build and enter it.
  • In build call cmake .., this will set up the build environment.
  • Execute make. This compiles your code. The executables are put into the corresponding subdirectories.
  • Do not push the build directory or its content to your git repository.

Debugging

In order to debug your submission, you can use gdb with the peda plugin. This provides a very convenient overview of register contents, which may be helpful while solving the tasks.

Submission

Develop your solution in the A6 folder in your git repository and use the provided files. Changes to your Makefiles won’t be included in the test system. Do not change the given source files except for the part marked with TODO.
Tag your submission with A6 and push it to the server. Your submission will be tested automatically.

Assignment Tutor

If you have any questions regarding this assignment, feel free to ask on Discord (or slp@iaik.tugraz.at as a second fallback option). If you have a more direct question regarding your specific solution, you can also ask the assignment tutor:
Benedikt Kantz, benedikt.kantz@student.tugraz.at