Operating Systems Petr Ročkai Organisation • lectures only, no seminar • written exam at the end — multiple choice — free-form questions • 1 online test mid-term, 1 before exam — mainly training for the exam proper Operating Systems 2/746 Preliminaries Mid-Term and End-Term Tests • 24 hours to complete, 2 attempts possible • 10 questions, picked from review questions — mid-term first 24, end-term second 24 • you need to pass either mid-term or end-term • 7 out of 10 required for mid-term, 8 of 10 for end-term • preliminary mid-term date: 18th of April, 4pm Operating Systems 3/746 Preliminaries Study Materials • this course is undergoing a major update • lecture slides will be in the IS — they will be added as we go • you can also use slides from previous years — they are already in study materials — but: not everything is covered in those Operating Systems 4/746 Preliminaries Books • there are a few good OS books • you are encouraged to get and read them • A. Tanenbaum: Modern Operating Systems • A. Silberschatz et al.: Operating System Concepts • L. Skočovský: Principy a problémy OS UNIX • W. Stallings: Operating Systems, Internals and Design • many others, feel free to explore Operating Systems 5/746 Preliminaries Topics 1. Anatomy of an OS 2. System Libraries and APIs 3. The Kernel 4. File Systems 5. Basic Resources and Multiplexing 6. Concurrency and Locking Operating Systems 6/746 Preliminaries Topics (cont'd) 7. Device Drivers 8. Network Stack 9. Command Interpreters & User Interfaces 10. Users and Permissions 11. Virtualisation & Containers 12. Special-Purpose Operating Systems Operating Systems 7/746 Preliminaries Related Courses • PB150/PB151 Computer Systems • PB153 Operating Systems and their Interfaces • PA150 Advanced OS Concepts • PV062 File Structures • PB071 Principles of Low-level programming • PB173 Domain-specific Development in C/C++ Operating Systems 8/746 Preliminaries Organisation of the Semester • generally, one lecture = one topic • 30th of March is a state holiday • a 50-minute review in the last lecture • online mid-term first week in April Operating Systems 9/746 Preliminaries Part 1: Semester Overview 2. System Libraries and APIs • POSIX: Portable Operating System Interface • UNIX: (almost) everything is a file • the least common denominator of programs: C • user view: objects, archives, shared libraries • compiler, linker Operating Systems 11/746 Semester Overview 3. The Kernel • privileged CPU mode • the boot process • boundary enforcement • kernel designs: micro, mono, exo, ... • system calls Operating Systems 12/746 Semester Overview 4. File Systems • why and how • abstraction over shared block storage • directory hierarchy • everything is a file revisited • i-nodes, directories, hard & soft links Operating Systems 13/746 Semester Overview 5. Basic Resources and Multiplexing • virtual memory, processes • sharing CPUs & scheduling • processes vs threads • interrupts, clocks Operating Systems 14/746 Semester Overview 6. Concurrency and Locking • inter-process communication • accessing shared resources • mutual exclusion • deadlocks and deadlock prevention Operating Systems 15/746 Semester Overview 7. Device Drivers • user vs kernel drivers • interrupts &c. • GPU • PCI &c. • block storage • network devices, wifi • USB • bluetooth Operating Systems 16/746 Semester Overview 8. Network Stack • TCP/IP • name resolution • socket APIs • firewalls and packet filters • network file systems Operating Systems 17/746 Semester Overview 9. Command Interpreters & User Interfaces • interactive systems • history: consoles and terminals • text-based terminals, RS-232 • bash and other Bourne-style shells, POSIX • graphical: XI1, Wayland, OS X, Windows, Android, iOS Operating Systems 18/746 Semester Overview 10. Users and Permissions • multi-user systems • isolation, ownership • file system permissions • capabilities Operating Systems 19/746 Semester Overview 11. Virtualisation & Containers • resource multiplexing redux • isolation redux • multiple kernels on a single system • type 1 and type 2 hypervisors • virtio Operating Systems 20/746 Semester Overview 12. Special-Purpose Operating Systems • general-purpose vs special-purpose • embedded systems • real-time systems • high-assurance systems (seL4) Operating Systems 21/746 Semester Overview Part 2: Anatomy of an OS What is an OS? • the software that makes the hardware tick • and makes other software easier to write Also • catch-all phrase for low-level software • an abstraction layer over the machine • but the boundaries are not always clear Operating Systems 23/746 Anatomy of an OS What is not (part of) an OS? • firmware: (very) low level software — much more hardware-specific than an OS — often executes on auxiliary processors • application software — runs on top of an operating system — this is what you got the computer for — eg. games, spreadsheets, photo editing, ... Operating Systems 24/746 Anatomy of an OS Operating Systems: Examples • Microsoft Windows • Apple macOS & iOS • Google Android • Linux • FreeBSD, OpenBSD • MINIX • many many others Operating Systems 25/746 Anatomy of an OS What does an OS do? • interact with the user • manage and multiplex hardware • manage other software • organises and manages data • provides services for other programs • enforces security Operating Systems 26/746 Anatomy of an OS What is an OS made of? • the kernel • system libraries • system daemons / services • user interface • system utilities Basically every OS has those. Operating Systems 27/746 Anatomy of an OS The Kernel • lowest level of an operating system • executes in privileged mode • manages all the other software — including other OS components • enforces isolation and security • provides low-level services to programs Operating Systems 28/746 Anatomy of an OS System Libraries • form a layer above the OS kernel • provide higher-level services — use kernel services behind the scenes — easier to use than the kernel interface • typical example: libe — provides C functions like printf — also known as msvc rt on Windows Operating Systems 29/746 Anatomy of an OS System Daemons • programs that run in the background • they either directly provide services — but daemons are different from libraries — we will learn more in later lectures • or perform maintenance or periodic tasks • or perform tasks requested by the kernel Operating Systems 30/746 Anatomy of an OS User Interface • mediates user-computer interaction • the main shell is typically part of the OS — command line on UNIX or DOS — graphical interfaces with a desktop and windows — but also buttons on your microwave oven • also building blocks for application UI — buttons, tabs, text rendering, OpenGL... — provided by system libraries and/or daemons Operating Systems 31/746 Anatomy of an OS System Utilities • small programs required for OS-related tasks • e.g. system configuration — things like the registry editor on Windows — or simple text editors • filesystem maintenance, daemon management, ... — programs like Is/dir or newf s or f disk • also bigger programs, like file managers Operating Systems 32/746 Anatomy of an OS Optional Components • bundled application software — web browser, media player,... • (3rd-party) software management • a programming environment — eg. a C compiler & linker — C header files &c. • source code Operating Systems 33/746 Anatomy of an OS Programming Interface • kernel provides system calls — ABI: Application Binary Interface — defined in terms of machine instructions • system libraries provide APIs — Application Programming Interface — symbolic / high-level interfaces — typically defined in terms of C functions — system calls also available as an API Operating Systems 34/746 Anatomy of an OS Message Passing • APIs do not always come as C functions • message-passing interfaces are possible — based on inter-process communication — possible even across networks • form of API often provided by system daemons — may be also wrapped by C APIs Operating Systems 35/746 Anatomy of an OS General-Purpose Operating Systems • suitable for use in most situations • flexible but complex and big • run on both servers and clients • cut down versions run on smartphones • support variety of hardware Operating Systems 36/746 Anatomy of an OS Special-Purpose Operating Systems • embedded devices — limited budget — small, slow, power-constrained — hard or impossible to update • real-time systems — must react to real-world events — often safety-critical — robots, autonomous cars, space probes,... Operating Systems 37/746 Anatomy of an OS Size and Complexity • operating systems are usually large and complex • typically 100K and more lines of code • 10+ million is quite possible • many thousand man-years of work • special-purpose systems are much smaller Operating Systems 38/746 Anatomy of an OS Portability • some OS tasks require close HW cooperation — virtual memory and CPU setup — platform-specific device drivers • but many do not — scheduling algorithms — memory allocation — all sorts of management • porting: changing a program to run in a new environment — for an OS, typically new hardware Operating Systems 39/746 Anatomy of an OS Hardware Platform • CPU instruction set (ISA) • busses, 10 controllers - PCI, USB, Ethernet,... • firmware, power management Examples • x86 (ISA) - PC (platform) • ARM - Snapdragon, i.MX 6, ... • m68k - Amiga, Atari,... Operating Systems 40/746 Anatomy of an OS Platform & Architecture Portability • an OS typically supports many platforms — Android on many different ARM SoC's • quite often also different CPU ISAs — long tradition in UNIX-style systems — NetBSD runs on 15 different ISAs — many of them comprise 6+ different platforms • special-purpose systems are usually less portable Operating Systems 41/746 Anatomy of an OS Code Re-Use • it makes a lot of sense to re-use code • majority of OS code is HW-independent • this was not always the case — pioneered by UNIX, which was written in C — typical OS of the time was in machine language — porting was basically "writing again" Operating Systems 42/746 Anatomy of an OS Application Portability • applications care more about the OS than about HW — apps are written in high-level languages — and use system libraries extensively • it is enough to port the OS to new/different HW — most applications can be simply recompiled • still a major hurdle (cf. Itanium) Operating Systems 43/746 Anatomy of an OS Application Portability (2) • same application can often run on many OSes • especially within the POSIX family • but same app can run on Windows, macOS, UNIX, ... — Java, Qt (C++) — web applications (HTML, JavaScript) • many systems provide the same set of services — differences are mostly in programming interfaces — high-level libraries and languages can hide those Operating Systems 44/746 Anatomy of an OS Abstraction • instruction sets abstract over CPU details • compilers abstract over instruction sets • operating systems abstract over hardware • portable runtimes abstract over operating systems • applications sit on top of the abstractions Operating Systems 45/746 Anatomy of an OS Abstraction Costs • more complexity • less efficiency • leaky abstractions Abstraction Benefits • easier to write and port software • fewer constraints on HW evolution Operating Systems 46/746 Anatomy of an OS Abstraction Trade-Offs • powerful hardware allows more abstraction • embedded or real-time systems not so much — the OS is smaller & less portable — same for applications — more efficient use of resources Operating Systems 47/746 Anatomy of an OS Kernel Revisited • bugs in the kernel are very bad — system crashes, data loss — critical security problems • bigger kernel means more bugs • third-party drivers inside the kernel? Operating Systems 48/746 Anatomy of an OS Monolithic Kernels • lot of code in the kernel • less abstraction, less isolation • faster and more efficient Microkernels • move as much as possible out of kernel • more abstraction, more isolation • slower and less efficient Operating Systems 49/746 Anatomy of an OS Paradox? • real-time & embedded systems often use microkernels • isolation is good for reliability • efficiency also depends on the workload — throughput vs latency • real-time does not necessarily mean fast Operating Systems 50/746 Anatomy of an OS Review Questions 1. What are the roles of an operating system? 2. What are the basic components of an OS? 3. What is an operating system kernel? 4. What is an Application Programming Interface? Operating Systems 51/746 Anatomy of an OS Part 3: System Libraries and APIs Programming Interfaces • kernel system call interface • system libraries / APIs • inter-process protocols • command-line utilities (scripting) Operating Systems 53/746 System Libraries and APIs Lecture Overview 1. The C Programming Language 2. System Libraries — what is a library? — header files & libraries 3. Compiler & Linker — object files, executables 4. File-based APIs Operating Systems 54/746 System Libraries and APIs Sidenote: UNIX and POSIX • we will mostly use those terms interchangeably • it is a family of operating systems — started in late 60s / early 70s • POSIX is a specification — a document describing what the OS should provide — including programming interfaces We will assume POSIX unless noted otherwise Operating Systems 55/746 System Libraries and APIs Part 3.1: The C Programming Language Programming Languages • there are many different languages — C, C++, Java, C#,... — Python, Perl, Ruby,... - ML, Haskell, Agda,... • but C has a special place in most OSes Operating Systems 57/746 System Libraries and APIs C: The Least Common Denominator • except for assembly, C is the "bare minimum" • you can almost think of C as portable assembly • it is very easy to call C functions • and to use C data structures You can use C libraries in almost every language Operating Systems 58/746 System Libraries and APIs The Language of Operating Systems • many (most) kernels are written in C • this usually extends to system libraries • and sometimes to almost the entire OS • non-C operating systems provide C APIs Operating Systems 59/746 System Libraries and APIs Part 3.2: System Libraries (System) Libraries • mainly C functions and data types • interfaces defined in header files • definitions provided in libraries — static libraries (archives): libc.a — shared (dynamic) libraries: libc. so • on Windows: msvc rt. lib and msvc rt. dll • there are (many) more besides libc / msvcrt Operating Systems 61/746 System Libraries and APIs Declaration: what but not how int sum( int a, int b ); Definition: how is the operation done? int sum( int a, int b ) { return a + b; } Operating Systems 62/746 System Libraries and APIs Library Files • /us r/lib on most Unices — may be mixed with application libraries — especially on Linux-derived systems — also /us r/local/lib for user/app libraries • on Windows: C:\Windows\System32 — user libraries often bundled with programs Operating Systems 63/746 System Libraries and APIs Static Libraries • stored in libfile. a, or file .lib (Windows) • only needed for compiling (linking) programs • the code is copied into the executable • the resulting executable is also called static — and is easier to work with for the OS — but also more wasteful Operating Systems 64/746 System Libraries and APIs Shared (Dynamic) Libraries • required for running programs • linking is done at execution time • less code duplication • can be upgraded separately • but: dependency problems Operating Systems 65/746 System Libraries and APIs Header Files • on UNIX: /us ("/include • contains prototypes of C functions • and definitions of C data structures • required to compile C and C++ programs Operating Systems 66/746 System Libraries and APIs Header Example 1 (from unistd. h) int execv(char *, char **); pid_t fork(void); int pipe(int *); ssize_t read(int, void *, size_t); (and many more prototypes) Operating Systems 67/746 System Libraries and APIs Header Example 2 (from sys/time. h) struct timeval { time_t tv_sec; long tv_usec; }; /* ... */ int gettimeofday(timeval *, timezone *); int settimeofday(timeval *, timezone *); Operating Systems 68/746 System Libraries and APIs The POSIX C Library • libc - the C runtime library • contains ISO C functions — printf, fopen, f read • and a number of POSIX functions — open, read, gethostbyname,... — C wrappers for system calls Operating Systems 69/746 System Libraries and APIs System Calls: Numbers • system calls are performed at machine level • which syscall to perform is decided by a number — e.g. SYS_write is 4 on OpenBSD — numbers defined by sys/syscall. h — different for each OS Operating Systems 70/746 System Libraries and APIs System Calls: the syscall function • there is a C function called syscall — prototype: int syscall( int number, ... ) • this implements the low-level syscall sequence • it takes a syscall number and syscall parameters — this is a bit like printf — first parameter decides what other parameters there are • (more about how syscall () works next week) Operating Systems 71/746 System Libraries and APIs System Calls: Wrappers • using syscall () directly is inconvenient • libc has a function for each system call — SYS_write int write( int, char *, size_t ) — SYS_open int open( char *, int ) — and so on and so forth • those wrappers use syscall () internally Operating Systems 72/746 System Libraries and APIs Portability • libraries provide an abstraction layer over OS internals • they are responsible for application portability — along with standardised filesystem locations — and user-space utilities to some degree • higher-level languages rely on system libraries Operating Systems 73/746 System Libraries and APIs NeXT and Objective C • the NeXT OS was built around Objective C • system libraries had ObjC APIs • in API terms, ObjC is very different from C — also very different from C++ — traditional OOP features (like Smalltalk) • this has been partly inherited into macOS — evolving into Swift Operating Systems 74/746 System Libraries and APIs System Libraries: UNIX • the math library libm — implements math functions like sin and exp • thread library libpth read • terminal access: libcurses • cryptography: libcrypto [OpenSSL] • the C++ standard library libstdc++ or libc++ Operating Systems 75/746 System Libraries and APIs System Libraries: Windows • msvcrt.dll -the ISO C functions • kernel32.dll -basic OS APIs • gdi32. dll - Graphics Device Interface • user32.dll - standard GUI elements Operating Systems 76/746 System Libraries and APIs Documentation • manual pages on UNIX — try e.g. man 2 writeonaisa.fi.muni.cz — section 2: system calls — section 3: library functions (man 3 printf) • MSDN for Windows — https://msdn.microsoft.com • you can learn a lot from those sources Operating Systems 77/746 System Libraries and APIs Part 3.3: Compiler & Linker C Compiler • many POSIX systems ship with a C compiler • the compiler takes a C source file as input — a text file with a . c suffix • and produces an object file as its output — binary file with machine code in it — but cannot be directly executed Operating Systems 79/746 System Libraries and APIs Object Files • contain native machine (executable) code • along with static data — e.g. string literals used in the program • possibly split into a number of sections — . text, . rodata, . data and so on • and metadata — list of symbols (function names) and their addresses Operating Systems 80/746 System Libraries and APIs Object File Formats • a. out - earliest UNIX object format • COFF - Common Object File Format — adds support for sections over a. out • PE - Portable Executable (MS Windows) • Mach-0 - Mach Microkernel Executable (macOS) • ELF - Executable and Linkable Format (all modern Unices) Operating Systems 81/746 System Libraries and APIs Archives (Static Libraries) • static libraries on UNIX are called archives • this is why they get the . a suffix • they are like a zip file full of object files • plus a table of symbols (function names] Operating Systems 82/746 System Libraries and APIs Linker • object files are incomplete • they can refer to symbols that they do not define — the definitions can be in libraries — or in other object files • a linker puts multiple object files together — to produce a single executable — or maybe a shared library Operating Systems 83/746 System Libraries and APIs Symbols vs Addresses • we use symbolic names to call functions &c. • but the call machine instruction needs an address • the executable will eventually live in memory • data and instructions need to be given addresses • what a linker does is assign those addresses Operating Systems 84/746 System Libraries and APIs Resolving Symbols • the linker processes one object file at a time • it maintains a symbol table — mapping symbols [names] to addresses — dynamically updated as more objects are processed • objects can only use symbols already in the table • resolving symbols = finding their addresses Operating Systems 85/746 System Libraries and APIs Executable • finished image of a program to be executed • usually in the same format as object files • but already complete, with symbols resolved — but: may use shared libraries — in that case, some symbols remain unresolved Operating Systems 86/746 System Libraries and APIs Shared Libraries • each shared library only needs to be in memory once • shared libraries use symbolic names (like object files) • there is a "mini linker" in the OS to resolve those names — usually known as a runtime linker — resolving = finding the addresses • shared libraries can use other shared libraries — they can form a DAG (Directed Acyclic Graph) Operating Systems 87/746 System Libraries and APIs Addresses Revisited • when you run a program, it is loaded into memory • parts of the program refer to other parts of the program — this means they need to know where it will be loaded — this is a responsibility of the linker • shared libraries use position-independent code — works regardless of the base address it is loaded at — we won't go into detail on how this is achieved Operating Systems 88/746 System Libraries and APIs Compiler, Linker &c. • the C compiler is usually called cc • the linker is known as Id • the archive (static library) manager is a r • the runtime linker is often known as Id. so Operating Systems 89/746 System Libraries and APIs Part 3.4: File-Based APIs Everything is a File • part of the UNIX design philosophy • directories are files • devices are files • pipes are files • network connections are (almost) files Operating Systems 91/746 System Libraries and APIs Why is Everything a File • re-use the comprehensive file system API • re-use existing file-based command-line tools • bugs are bad simplicity is good • want to print? cat file.txt > /dev/ulptO — (reality is a little more complex) Operating Systems 92/746 System Libraries and APIs What is a Filesystem? • a set of files and directories • usually lives on a single block device — but may also be virtual • directories and files form a tree — directories are internal nodes — files are leaf nodes Operating Systems 93/746 System Libraries and APIs File Paths • filesystems use paths to point at files • a string with / as a directory delimiter — the delimiter is \ on Windows • a leading / indicates the filesystem root • e.g. /usr/include Operating Systems 94/746 System Libraries and APIs The File Hierarchy home var usr xrockai include stdio.h ) ( unistd.h ) (libc.a libm. a Operating Systems 95/746 System Libraries and APIs The Role of Files and Filesystems • very central in Plan9 • central in most UNIX systems — cf. Linux pseudo-filesystems — /p roc provides info about all processes — /sys gives info about the kernel and devices • somewhat reduced in Windows • quite suppressed in Android (and more on iOS) Operating Systems 96/746 System Libraries and APIs The Filesystem API • you open a file (using the open () syscall] • you can read () and write () data • you close () the file when you are done • you can rename() and unlink() files • you can use mkdi r () to create directories Operating Systems 97/746 System Libraries and APIs File Descriptors • the kernel keeps a table of open files • the file descriptor is an index into this table • you do everything using file descriptors • non-Unix systems have similar concepts — descriptors are called handles on Windows Operating Systems 98/746 System Libraries and APIs Regular files • these contain sequential data (bytes) • may have inner structure but the OS does not care • there is metadata attached to files — like when were they last modified — who can and who cannot access the file • you read () and write () files Operating Systems 99/746 System Libraries and APIs Directories • a list of files and other directories — internal nodes of the filesystem tree — directories give names to files • can be opened just like files — but read () and write () is not allowed — files are created with open () or c reat () — directories with m kd i r () — directory listing with opendir() and readdir() Operating Systems 100/746 System Libraries and APIs Mounts • UNIX joins all file systems into a single hierarchy • the root of one filesystem becomes a directory in another — this is called a mount point • Windows uses drive letters instead (C:, D: &c.) Operating Systems 101/746 System Libraries and APIs Devices • block and character devices are (special) files • block devices are accessed one block at a time — a typical block device would be a disk — includes USB mass storage, flash storage, etc — you can create a file system on a block device • character devices are more like normal files — terminals, tapes, serial ports, audio devices Operating Systems 102/746 System Libraries and APIs Pipes • pipes are a simple communication device • one program can write () data to the pipe • another program can read () that same data • each end of the pipe gets a file descriptor • a pipe can live in the filesystem (named pipe) Operating Systems 103/746 System Libraries and APIs Sockets • the socket API comes from early BSD Unix • socket represents a (possible) network connection • sockets are more complicated than normal files — establishing connections is hard — messages get lost much more often than file data • you get a file descriptor for an open socket • you can read () and write () to sockets Operating Systems 104/746 System Libraries and APIs Socket Types • sockets can be internet or unix domain — internet sockets connect to other computers — Unix sockets live in the filesystem • sockets can be stream or datagram — stream sockets are like files — you can write a continuous stream of data — datagram sockets can send individual messages Operating Systems 105/746 System Libraries and APIs Review Questions 5. What is a shared (dynamic) library? 6. What does a linker do? 7. What is a symbol in an object file? 8. What is a file descriptor? Operating Systems 106/746 System Libraries and APIs Part 4: The Kernel Lecture Overview 1. privileged mode 2. booting 3. kernel architecture 4. system calls 5. kernel-provided services Operating Systems 108/746 The Kerne Reminder: Software Layering • the kernel • system libraries • system services / daemons • utilities • application software Operating Systems 109/746 The Kerne Part 4.1: Privileged Mode CPU Modes • CPUs provide a privileged (supervisor] and a user mode • this is the case with all modern general-purpose CPUs — not necessarily with micro-controllers • x86 provides 4 distinct privilege levels — most systems only use ring 0 and ring 3 — Xen paravirtualisation uses ring 1 for hosted kernels Operating Systems 111/746 The Kerne Privileged Mode • many operations are restricted in user mode — this is how user programs are executed — also most of the operating system • software running in privileged mode can do ^anything — most importantly it can program the MMU — the kernel runs in this mode Operating Systems 112/746 The Kerne Memory Management Unit • is a subsystem of the processor • takes care of address translation — user software uses virtual addresses — the MMU translates them to physical addresses • the mappings can be managed by the OS kernel Operating Systems 113/746 The Kerne Paging • physical memory is split into frames • virtual memory is split into pages • pages and frames have the same size (usually 4KiB) • frames are places, pages are the content • page tables map between pages and frames Operating Systems 114/746 The Kerne Swapping Pages • RAM used to be a scarce resource • paging allows the OS to move pages out of RAM — a page (content) can be written to disk — and the frame can be used for another page • not as important with contemporary hardware • useful for memory mapping files (cf. next lecture) Operating Systems 115/746 The Kerne Look Ahead: Processes • process is primarily defined by its address space — address space meaning the valid virtual addresses • this is implemented via the MMU • when changing processes, a different page table is loaded — this is called a context switch • the page table defines what the process can see Operating Systems 116/746 The Kerne Memory Maps • different view of the same principles • the OS maps physical memory into the process • multiple processes can have the same RAM area mapped — this is called shared memory • often, a piece of RAM is only mapped in a single process Operating Systems 117/746 The Kerne Page Tables • the MMU is programmed using translation tables — those tables are stored in RAM — they are usually called page tables • and they are fully in the management of the kernel • the kernel can ask the MMU to replace the page table — this is how processes are isolated from each other Operating Systems 118/746 The Kerne Kernel Protection • kernel memory is usually mapped into all processes — this substantially improves performance on many CPUs — well, until Meltdown hit us, anyway • kernel pages have a special 'supervisor' flag set — code executing in user mode cannot touch them — else, user code could tamper with kernel memory Operating Systems 119/746 The Kerne Part 4.2: Booting Starting the OS • upon power on the system is in a default state — mainly because RAM is volatile • the entire platform needs to be initialised — this is first and foremost the CPU — and the console hardware (keyboard, monitor, ...) — then the rest of the devices Operating Systems 121/746 The Kerne Boot Process • the process starts with a built-in hardware init • when ready, the hardware hands off to the firmware — this was BIOS on 16 and 32 bit systems — replaced with EFI on current amd64 platforms • the firmware then loads a bootloader • the bootloader loads the kernel Operating Systems 122/746 The Kerne Boot Process (cont'd) • the kernel then initialises device drivers • and the root filesystem • then it hands off to the init process • at this point, the user space takes over Operating Systems 123/746 The Kerne User-mode Initialisation • in it mounts the remaining file systems • the init process starts up user-mode system services • then it starts application services • and finally the login process Operating Systems 124/746 The Kerne After Log-In • the login process initiates the user session • loads desktop modules and application software • drops the user in a (text or graphical) shell • now you can start using the computer Operating Systems 125/746 The Kerne CPU Init • this depends on both architecture and platform • on x86, the CPU starts in 16-bit mode • on legacy systems, BIOS & bootloader stay in this mode • the kernel then switches to protected mode during its boot Operating Systems 126/746 The Kerne Bootloader • historically limited to tens of kilobytes of code • the bootloader locates the kernel on disk — it often allows the operator to choose different kernels — limited understanding of file systems • then it loads the kernel image into RAM • and hands off control to the kernel Operating Systems 127/746 The Kerne Modern Booting on x86 • on modern system, the bootloader runs in protected mode — or even the long mode on 64-bit CPUs • the firmware understands the FAT filesystem — it can load files from there into memory — this vastly simplifies the boot process Operating Systems 128/746 The Kerne Booting ARM • on ARM boards, there is no unified firmware interface • U-boot is as close as one gets to unification • the bootloader needs low-level hardware knowledge • this makes writing bootloaders for ARM quite tedious • current U-boot can use the EFI protocol from PCs Operating Systems 129/746 The Kerne Part 4.3: Kernel Architecture Architecture Types • monolithic kernels (Linux, *BSD) • microkernels (Mach, L4, QNX, NT,...] • hybrid kernels (macOS) • type 1 hypervisors (Xen] • exokernels, rump kernels Operating Systems 131/746 The Kerne Microkernel • handles memory protection • (hardware) interrupts • task / process scheduling • message passing • everything else is separate Operating Systems 132/746 The Kerne Monolithic kernels • all that a microkernel does • plus device drivers • file systems, volume management • a network stack • data encryption,... Operating Systems 133/746 The Kerne Microkernel Redux • we need a lot more than a microkernel provides • in a "true" microkernel OS, there are many modules • each device driver runs in a separate process • the same for file systems and networking • those modules / processes are called servers Operating Systems 134/746 The Kerne Hybrid Kernels • based around a microkernel • and a gutted monolithic kernel • the monolithic kernel is a big server — takes care of stuff not handled by the microkernel — easier to implement than true microkernel OS — strikes middle ground on performance Operating Systems 135/746 The Kerne Micro vs Mono • microkernels are more robust • monolithic kernels are more efficient — less context switching • what is easier to implement is debatable — in the short view, monolithic wins • hybrid kernels are a compromise Operating Systems 136/746 The Kerne Exokernels • smaller than a microkernel • much fewer abstractions — applications only get block storage — networking is much reduced • only research systems exist Operating Systems 137/746 The Kerne Type 1 Hypervisors • also known as bare metal or native hypervisors • they resemble microkernel operating systems — or exokernels, depending on the viewpoint • the "applications" for a hypervisor are operating systems — hypervisor can use coarser abstractions than an OS — entire storage devices instead of a filesystem Operating Systems 138/746 The Kerne Unikernels • kernels for running a single application — makes little sense on real hardware — but can be very useful on a hypervisor • bundle applications as virtual machines — without the overhead of a general-purpose OS Operating Systems 139/746 The Kerne Exo vs Uni • an exokernel runs multiple applications — includes process-based isolation — but abstractions are very bare-bones • unikernel only runs a single application — provides more-or-less standard services — e.g. standard hierarchical file system — socket-based network stack / API Operating Systems 140/746 The Kerne Part 4.4: System Calls Reminder: Kernel Protection • kernel executes in privileged mode of the CPU • kernel memory is protected from user code But: Kernel Services • user code needs to ask kernel for services • how do we switch the CPU into privileged mode? • cannot be done arbitrarily (security] Operating Systems 142/746 The Kerne System Calls • hand off execution to a kernel routine • pass arguments into the kernel • obtain return value from the kernel • all of this must be done safely Operating Systems 143/746 The Kerne Trapping into the Kernel • there are a few possible mechanisms • details are very architecture-specific • in general, the kernels sets a fixed entry address — an instruction can change the CPU into privileged mode — while at the same time jumping to this address Operating Systems 144/746 The Kerne Trap Example: x86 • there is an int instruction on those CPUs • this is called a software interrupt — interrupts are normally a hardware thing — interrupt handlers run in privileged mode • it is also synchronous • the handler is set in IDT (interrupt descriptor table) Operating Systems 145/746 The Kerne Software Interrupts • those are available on a range of CPUs • generally not very efficient for system calls • extra level of indirection — the handler address is retrieved from memory — a lot of CPU state needs to be saved Operating Systems 146/746 The Kerne Aside: SW Interrupts on PCs • those are used even in real mode — legacy 16-bit mode of 80x86 CPUs — BIOS (firmware) routines via int 0x10 & 0x13 — MS-DOS API via int 0x21 • and on older CPUs in 32-bit protected mode — Windows NT uses int 0x2e — Linux uses int 0x80 Operating Systems 147/746 The Kerne Trap Example: amd64 / x86_64 • sysenter and syscall instructions — and corresponding sysexit / sysret • the entry point is stored in a machine state register • there is only one entry point — unlike with software interrupts • quite a bit faster than interrupts Operating Systems 148/746 The Kerne Which System Call? • often there are many system calls — there are more than 300 on 64-bit Linux — about 400 on 32-bit Windows NT • but there is only a handful of interrupts — and only one sysente r address Operating Systems 149/746 The Kerne Reminder: System Call Numbers • each system call is assigned a number • available as SYS_write &c. on POSIX systems • for the "universal" in t syscall( int sys, ... ) • this number is passed in a CPU register Operating Systems 150/746 The Kerne System Call Sequence • first, libc prepares the system call arguments • and puts the system call number in the correct register • then the CPU is switched into privileged mode • this also transfers control to the syscall handler Operating Systems 151/746 The Kerne System Call Handler • the handler first picks up the system call number • and decides where to continue • you can imagine this as a giant switch statement switch ( sysnum ) { case SYS_write: return syscall_write(); case SYS_read: return syscall_read(); /* many more */ } Operating Systems 152/746 The Kerne System Call Arguments • each system call has different arguments • how they are passed to the kernel is CPU-dependent • on 32-bit x86, most of them are passed in memory • on amd64 Linux, all arguments go into registers — 6 registers available for arguments Operating Systems 153/746 The Kerne Part 4.5: Kernel Services What does a Kernel Do? • memory & process management • task (thread] scheduling • device drivers — SSDs, GPUs, USB, bluetooth, HID, audio,... • file systems • networking Operating Systems 155/746 The Kerne Additional Services • inter-process communication • timers and time keeping • process tracing, profiling • security, sandboxing • cryptography Operating Systems 156/746 The Kerne Reminder: Microkernel Systems • the kernel proper is very small • it is accompanied by servers • in "true" microkernel systems, there are many servers — each device, filesystem, etc. is separate • in hybrid systems, there is one, or a few — a "superserver" that resembles a monolithic kernel Operating Systems 157/746 The Kerne Kernel Services • we usually don't care which server provides what — each system is different — for services, we take a monolithic view • the services are used through system librares — they abstract away many of the details — e.g. whether a service is a system call or an IPC call Operating Systems 158/746 The Kerne User-Space Drivers in Monolithic Systems • not all device drivers are part of the kernel • case in point: printer drivers • also some USB devices (not the USB bus though] • part of the GPU/graphics stack — memory and output management in kernel — most of OpenGL in user space Operating Systems 159/746 The Kerne Review Questions 9. What CPU modes are there and how are they used? 10. What is the memory management unit? 11. What is a microkernel? 12. What is a system call? Operating Systems 160/746 The Kerne Part 5: File Systems Lecture Overview 1. Filesystem Basics 2. The Block Layer 3. Virtual Filesystem Switch 4. The UNIX Filesystem 5. Advanced Features Operating Systems 162/746 File Systems Part 5.1: Filesystem Basics What is a File System? • a collection of files and directories • (mostly) hierarchical • usually exposed to the user • usually persistent (across reboots] • file managers, command line, etc. Operating Systems 164/746 File Systems What is a (Regular) File? • a sequence of bytes • and some basic metadata • owner, group, timestamp • the OS does not care about the content • text, images, video, source code are all the same • executables are somewhat special Operating Systems 165/746 File Systems What is a Directory? • a list of name file mappings • an associative container if you will — semantically the value types are not homogeneous — syntactically, they are just i-nodes • one directory = one component of a path — /usr/local/bin Operating Systems 166/746 File Systems What is an i-node? • an anonymous, file-like object • could be a regular file — or a directory — or a special file — or a symlink Operating Systems 167/746 File Systems Files are Anonymous • this is the case with UNIX — not all file systems work like this • there are pros and cons to this approach — e.g. open files can be unlinked • names are assigned via directory entries Operating Systems 168/746 File Systems What Else is a Byte Sequence? • characters coming from a keyboard • bytes stored on a magnetic tape • audio data coming from a microphone • pixels coming from a webcam • data coming on a TCP connection Operating Systems 169/746 File Systems Writing Byte Sequences • sending data to a printer • playing back audio • writing text to a terminal (emulator) • sending data over a TCP stream Operating Systems 170/746 File Systems Special Files • many things look somewhat like files • let's exploit that and unify them with files • recall part 2 on APIs: "everything is a file" — the API is the same for special and regular files — not the implementation though Operating Systems 171/746 File Systems File System Types • fatl6, fat32, vfat, exfat (DOS, flash media) • ISO 9660 (CD-ROMs) • UDF (DVD-ROM) • NTFS (Windows NT) • HFS+ (macOS) • ext2, ext3, ext4 (Linux) • ufs,ffs(BSD) Operating Systems 172/746 File Systems Multi-User Systems • file ownership • file permissions • disk quotas Operating Systems 173/746 File Systems Ownership & Permissions • we assume a discretionary model • whoever creates a file is its owner • ownership can be transferred • the owner decides about permissions — basically read, write, execute Operating Systems 174/746 File Systems Disk Quotas • disks are big but not infinite • bad things happen when the file system fills up — denial of service — programs may fail and even corrupt data • quotas limits the amount of space per user Operating Systems 175/746 File Systems Part 5.2: The Block Layer Disk-Like Devices • disk drives provide block-level access • read and write data in 512-byte chunks — or also 4K on big modern drives • a big numbered array of blocks Operating Systems 177/746 File Systems Aside: Disk Addressing Schemes • CHS: Cylinder, Head, Sector — structured adressing used in (very) old drives — exposes information about relative seek times — useless with variable-length cylinders — 10:4:6 CHS = 1024 cylinders, 16 heads, 63 sectors • LBA: Logical Block Addessing — linear, unstructured address space — started as 22, later 28, ... now 48 bit Operating Systems 178/746 File Systems Block-Level Access • disk drivers only expose linear addressing • one block (sector) is the minimum read/write size • many sectors can be written "at once" — sequential access is faster than random — maximum throughput vs I OPS Operating Systems 179/746 File Systems Aside: Access Times • block devices are slow (compared to RAM) — RAM is slow (compared to CPU) • we cannot treat drives as an extension of RAM — not even fastest modern flash storage — latency: HDD 3-12 ms, SSD 0.1 ms, RAM 70 ns Operating Systems 180/746 File Systems Block Access Cache • caching is used to hide latency — same principle between CPU and RAM • files recently accessed are kept in RAM — many cache management policies exist • implemented entirely in the OS — many devices implement their own caching — but the amount of fast memory is usually limited Operating Systems 181/746 File Systems Write Buffers • the write equivalent of the block cache • data is kept in RAM until it can be processed • must synchronise with caching — other users may be reading the file Operating Systems 182/746 File Systems I/O Scheduler (Elevator) • reads and writes are requested by users • access ordering is crucial on a mechanical drive — not as important on an SSD — but sequential access is still much preferred • requests are queued (recall, disks are slow] — but they are not processed in FIFO order Operating Systems 183/746 File Systems RAID • hard drives are also unreliable — backups help, but take a long time to restore • RAID = Redundant Array of Inexpensive Disks — live-replicate same data across multiple drives — many different configurations • the system stays online despite disk failures Operating Systems 184/746 File Systems RAID Performance • RAID affects the performance of the block layer • often improved reading throughput — data is recombined from multiple channels • write performance is more mixed — may require a fair amount of computation — more data needs to be written for redundancy Operating Systems 185/746 File Systems Block-Level Encryption • symmetric & length-preserving • encryption key is derived from a passphrase • also known as "full disk encryption" • incurs a small performance penalty • very important for security / privacy Operating Systems 186/746 File Systems Storing Data in Blocks • splitting data into fixed-size chunks is unnatural • there is no permission system for individual blocks — this is unlike virtual (paged) memory — it'd be really inconvenient for users • processes are not persistent, but block storage is Operating Systems 187/746 File Systems Filesystem as Resource Sharing • usually only 1 or few disks per computer • many programs want to store persistent data • file system allocates space for the data — which blocks belong to which file • different programs can write to different files — no risk of trying to use the same block Operating Systems 188/746 File Systems Filesystem as Abstraction • allows the data to be organised into files • enables the user to manage and review data • files have arbitrary & dynamic size — blocks are transparently allocated & recycled • structured data instead of a flat block array Operating Systems 189/746 File Systems Part 5.3: Virtual Filesystem Switch Virtual File System Layer • many different filesystems • the OS wants to treat them all alike • VFS provides an internal, in-kernel API • filesystem syscalls are hooked up to VFS Operating Systems 191/746 File Systems VFS in OOP terms • VFS provides an abstract class, f ilesystem • each filesystem implementation derives f ilesystem — e.g. class iso9660 : public filesystem • each actual file system gets an instance — /home, /us r, /mnt/usbf lash each one — the kernel uses the abstract interface to talk to them Operating Systems 192/746 File Systems The f ilesystem Class struct handle { /* . */ }; struct filesystem { virtual int open( const char * path ) = 0; virtual int read( handle file, ... ) = 0; /* ... */ } Operating Systems 193/746 File Systems Filesystem-Specific Operations • open: look up the file for access • read, write - self-explanatory • seek: move the read/write pointer • sync: flush data to disk • mmap: memory-mapped 10 • select: 10 readiness notification Operating Systems 194/746 File Systems Standard 10 • the usual way to use files • open the file — operations to read and write bytes • data has to be buffered in user space — and then copied to/from kernel space • not very efficient Operating Systems 195/746 File Systems Memory-mapped 10 • uses virtual memory (cf. last lecture] • treat a file as if it was swap space • the file is mapped into process memory — page faults indicate that data needs to be read — dirty pages cause writes • available as the mmap system call Operating Systems 196/746 File Systems Sync-ing Data • recall that the disk is very slow • waiting for each write to hit disk is inefficient • but if data is held in RAM, what if power is cut? — the sync operation ensures the data has hit disk — often used in database implementations Operating Systems 197/746 File Systems Filesystem-Agnostic Operations • handling executables • f cntl handling • special files • management of file descriptors • file locks Operating Systems 198/746 File Systems Executables • memory mapped (like mmap) • may be paged in lazily • executables must be immutable while running • but can be still unlinked from the directory Operating Systems 199/746 File Systems The f cntl Syscall • mostly operations relating to file descriptors — synchronous vs asynchronous access — blocking vs non-blocking — close on exec: more on this in a later lecture • also one of the several locking APIs Operating Systems 200/746 File Systems Special Files • device nodes, pipes, sockets,... • only metadata for special files lives on disk — this includes permissions & ownership — type and properties of the special file • they are just different kind of an i-node • open, read, write, etc. bypass the filesystem Operating Systems 201/746 File Systems File Locking • multiple programs writing the same file is bad — operations will come in randomly — the resulting file will be a mess • file locks fix this problem — multiple APIs: fcntl vs flock — differences on networked filesystems Operating Systems 202/746 File Systems Mount Points • recall that there is only a single directory tree • but there are multiple disks and filesystems • file systems can be joined at directories • root of one becomes a subdirectory of another Operating Systems 203/746 File Systems Part 5.4: The UNIX Filesystem Superblock • holds toplevel information about the filesystem • locations of i-node tables • locations of i-node and free space bitmaps • block size, filesystem size Operating Systems 205/746 File Systems I-Nodes • recall that i-node is an anonymous file — or a directory or a special • i-nodes only have numbers • directories tie names to i-nodes Operating Systems 206/746 File Systems I-Node Allocation • often a fixed number of i-nodes • i-nodes are either used or free • free i-nodes may be stored in a bitmap • alternatives: B-trees Operating Systems 207/746 File Systems I-Node Content • exact content of an i-node depends on its type • regular file i-nodes contain a list of data blocks — both direct and indirect (via a data block) • symbolic links contain the target path • special devices describe what device they represent Operating Systems 208/746 File Systems Attaching Data to I-Nodes • a few direct block addresses in the i-node — eg. 10 refs, 4K blocks, max. 40 kilobytes • indirect data blocks — a block full of addresses of other blocks — one indirect block approx. 2 MiB of data • extents: a contiguous range of blocks Operating Systems 209/746 File Systems Fragmentation • internal - not all blocks are fully used — files are of variable size, blocks are fixed — a 4100 byte file needs 2 4 KiB blocks • external - free space is non-contiguous — happens when many files try to grow at once — this means new files are also fragmented Operating Systems 210/746 File Systems Fragmentation Problems • performance: can't use fast sequential 10 — programs often read files sequentially — fragmention random 10 on the device • metadata size: can't use long extents • internal: waste of disk space Operating Systems 211/746 File Systems Directories • uses data blocks (like regular files] • but the blocks hold name i-node maps • modern file systems use hashes or trees • the format of directory data is filesystem-specific Operating Systems 212/746 File Systems File Name Lookup • we often need to find a file based on a path • each component means a directory search • directories can have many thousands entries Operating Systems 213/746 File Systems Old-Style Directories • unsorted sequential list of entries • new entries are simply appended at the end • unlinking can create holes • lookup in large directories is very inefficient Operating Systems 214/746 File Systems Hash-Based Directories • only need one block read on average • often the most efficient option • extendible hashing — directories can grow over time — gradually allocates more blocks Operating Systems 215/746 File Systems Tree-Based Directories • self-balancing search trees • optimised for block-level access • B trees, B+ trees, B* trees • logarithmic number of reads — this is worst case, unlike hashing Operating Systems 216/746 File Systems Hard Links • multiple names can refer to the same i-node — names are given by directory entries — we call such multiple-named files hard links — it's usually forbidden to hard-link directories • hard links cannot cross device boundaries — i-node numbers are only unique within a filesystem Operating Systems 217/746 File Systems Soft Links (Symlinks) • they exist to lift the one-device limitation • soft links to directories are OK — this can cause loops in the filesystem • the soft link i-node contains a path — the meaning can change when paths change • dangling link: points to a non-existent path Operating Systems 218/746 File Systems Free Space • similar problem to i-node allocation — but regards data blocks • goal: quickly locate data blocks to use — also: keep data of a single file close together — also: minimise external fragmentation • usually bitmaps or B-trees Operating Systems 219/746 File Systems File System Consistency • what happens if power is cut? • data buffered in RAM is lost • the 10 scheduler can re-order disk writes • the file system can become corrupt Operating Systems 220/746 File Systems Journalling • also known as an intent log • write down what was going to happen synchronously • fix the actual metadata based on the journal • has a performance penalty at run-time — reduces downtime by making consistency checks fast — may also prevent data loss Operating Systems 221/746 File Systems Part 5.5: Advanced Features What Else Can Filesystems Do? • transparent file compression • file encryption • block de-duplication • snapshots • checksums • redundant storage Operating Systems 223/746 File Systems File Compression • use one of the standard compression algorithms — must be fairly general-purpose (i.e. not JPEG) — and of course lossless — e.g. LZ77, LZW, Huffman Coding, ... • quite challenging to implement — the length of the file changes (unpredictably) — efficient random access inside the file Operating Systems 224/746 File Systems File Encryption • use symmetric encryption for individual files — must be transparent to upper layers (applications] — symmetric crypto is length-preserving — encrypted directories, inheritance, &c. • a new set of challenges — key and passphrase management Operating Systems 225/746 File Systems Block De-duplication • sometimes the same data block appears many times — virtual machine images are a common example — also containers and so on • some filesystems will identify those cases — internally point many files to the same block — copy on write to preserve illusion of separate files Operating Systems 226/746 File Systems Snapshots • it is convenient to be able to copy entire filesystems — but this is also expensive — snapshots provide an efficient means for this • snapshot is a frozen image of the filesystem — cheap, because snapshots share storage — easier than de-duplication — again implemented as copy-on-write Operating Systems 227/746 File Systems Checksums • hardware is unreliable — individual bytes or sectors may get corrupted — this may happen without the hardware noticing • the filesystem may store checksums along with metadata — and possibly also file content — this protects the integrity of the filesystem • beware: not cryptographically secure Operating Systems 228/746 File Systems Redundant Storage • like filesystem-level RAID • data and metadata blocks are replicated — may be between multiple local block devices — but also across a cluster / many computers • drastically improves fault tolerance Operating Systems 229/746 File Systems Review Questions 13. What is a block device? 14. What is an 10 scheduler? 15. What does memory-mapped 10 mean? 16. What is an i-node? Operating Systems 230/746 File Systems Part 6: Basic Resources & Multiplexing Lecture Overview 1. processes and virtual memory 2. thread scheduling 3. interrupts and clocks Operating Systems 232/746 Basic Resources & Multiplexing Part 6.1: Processes and Virtual Memory Prehistory: Batch Systems • first computers ran one program at a time • programs were scheduled ahead of time • we are talking punch cards &c. • and computers that took an entire room Operating Systems 234/746 Basic Resources & Multiplexing History: Time Sharing • "mini" computers could run programs interactively • teletype terminals, screens, keyboards • multiple users at the same time • hence, multiple programs at the same time Operating Systems 235/746 Basic Resources & Multiplexing Processes: Early View • process is an executing program • there can be multiple processes • various resources belong to a process • each process belongs to a particular user Operating Systems 236/746 Basic Resources & Multiplexing Process Resources • memory (address space) • processor time • open files (descriptors) — also working directory — also network connections Operating Systems 237/746 Basic Resources & Multiplexing Process Memory Segments • program text: contains instructions • data: static and dynamic data — with a separate read-only section • stack memory: execution stack — return addresses — automatic variables Operating Systems 238/746 Basic Resources & Multiplexing Process Memory • each process has its own address space • this means processes are isolated from each other • requires that the CPU has an MMU • implemented via paging (page tables] Operating Systems 239/746 Basic Resources & Multiplexing Process Switching • switching processes means switching page tables • physical addresses do not change • but the mapping of virtual addresses does • large part of physical memory is not mapped — could be completely unallocated (unused] — or belong to other processes Operating Systems 240/746 Basic Resources & Multiplexing Paging and TLB • address translation is slow • recently-used pages are stored in a TLB — short for Translation Look-aside Buffer — very fast hardware cache • the TLB needs to be flushed on process switch — this is fairly expensive (microseconds) Operating Systems 241/746 Basic Resources & Multiplexing Processor Time Sharing • CPU time is sliced into time shares • time shares (slices) are like memory frames • process computation is like memory pages • processes are allocated into time shares Operating Systems 242/746 Basic Resources & Multiplexing Multiple CPUs • execution of a program is sequential • instructions depend on results of previous instructions • one CPU = one instruction sequence • physical limits on CPU speed multiple cores Operating Systems 243/746 Basic Resources & Multiplexing Threads • how to use multiple cores in one process? • threads: a new unit of CPU scheduling • each thread runs sequentially • one process can have multiple threads Operating Systems 244/746 Basic Resources & Multiplexing What is a Thread? • thread is a sequence of instructions • different threads run different instructions — as opposed to SIMD or many-core units [GPUs] • each thread has its own stack • multiple threads can share an address space Operating Systems 245/746 Basic Resources & Multiplexing Modern View of a Process • in a modern view, process is an address space • threads are the right scheduling abstraction • process is a unit of memory management • thread is a unit of computation • old view: one process = one thread Operating Systems 246/746 Basic Resources & Multiplexing Memory Segment Redux • one (shared) text segment • a shared read-write data segment • a read-only data segment • one stack for each thread Operating Systems 247/746 Basic Resources & Multiplexing Fork • how do we create new processes? • by f o rk-ing existing processes • fork creates an identical copy of a process • execution continues in both processes — each of them gets a different return value Operating Systems 248/746 Basic Resources & Multiplexing Lazy Fork • paging can make fork quite efficient • we start by copying the page tables • initially, all pages are marked read-only • the processes start out sharing memory Operating Systems 249/746 Basic Resources & Multiplexing Lazy Fork: Faults • the shared memory becomes copy on write • fault when either process tries to write — remember the memory is marked as read-only • the OS checks if the memory is supposed to be writable — if yes, it makes a copy and allows the write Operating Systems 250/746 Basic Resources & Multiplexing Init • on UNIX, f o rk is the only way to make a process • but fork splits existing processes into 2 • the first process is special • it is directly spawned by the kernel on boot Operating Systems 251/746 Basic Resources & Multiplexing Process Identifier • processes are assigned numeric identifiers • also known as PID (Process ID) • those are used in process management • used calls like kill or set priority Operating Systems 252/746 Basic Resources & Multiplexing Process vs Executable • process is a dynamic entity • executable is a static file • an executable contains an initial memory image — this sets up memory layout — and content of the text and data segments Operating Systems 253/746 Basic Resources & Multiplexing Exec • on UNIX, processes are created via f o rk • how do we run programs though? • exec: load a new executable into a process — this completely overwrites process memory — execution starts from the entry point • running programs: fork + exec Operating Systems 254/746 Basic Resources & Multiplexing Part 6.2: Thread Scheduling What is a Scheduler? • scheduler has two related tasks — plan when to run which thread — actually switch threads and processes • usually part of the kernel — even in micro-kernel operating systems Operating Systems 256/746 Basic Resources & Multiplexing Switching Threads • threads of the same process share an address space — a partial context switch is needed — only register state has to be saved and restored • no TLB flushing - lower overhead Operating Systems 257/746 Basic Resources & Multiplexing Fixed vs Dynamic Schedule • fixed schedule = all processes known in advance — only useful in special / embedded systems — can conserve resources — planning is not part of the OS • most systems use dynamic scheduling — what to run next is decided periodically Operating Systems 258/746 Basic Resources & Multiplexing Preemptive Scheduling • tasks (threads) just run as if they owned the CPU • the OS forcibly takes the CPU away from them — this is called preemption • pro: a faulty program cannot block the system • somewhat less efficient than cooperative Operating Systems 259/746 Basic Resources & Multiplexing Cooperative Scheduling • threads (tasks) cooperate to share the CPU • each thread has to explicitly yield • this can be very efficient if designed well • but a bad program can easily block the system Operating Systems 260/746 Basic Resources & Multiplexing Scheduling in Practice • cooperative on Windows 3.x for everything • cooperative for threads on classic Mac OS — but preemptive for processes • preemptive on pretty much every modern OS — including real-time and embedded systems Operating Systems 261/746 Basic Resources & Multiplexing Waiting and Yielding • threads often need to wait for resources or events — they could also use software timers • a waiting thread should not consume CPU time • such a thread will yield the CPU • it is put on a list and later woken up by the kernel Operating Systems 262/746 Basic Resources & Multiplexing Run Queues • runnable (not waiting] threads are queued • could be priority, round-robin or other queue types • scheduler picks threads from the run queue • preempted threads are put back Operating Systems 263/746 Basic Resources & Multiplexing Priorities • what share of the CPU should a thread get? • priorities are static and dynamic • dynamic priority is adjusted as the thread runs — this is done by the system / scheduler • a static priority is assigned by the user Operating Systems 264/746 Basic Resources & Multiplexing Fairness • equal (or priority-based) share per thread • what if one process has many more threads? • what if one user has many more processes? • what if one user group has many more active users? Operating Systems 265/746 Basic Resources & Multiplexing Fair Share Scheduling • we can use a multi-level scheduling scheme • CPU is sliced fairly first among user groups • then among users • then among processes • and finally among threads Operating Systems 266/746 Basic Resources & Multiplexing Scheduling Strategies • first in, first served (batch systems) • earliest deadline first (realtime) • round robin • fixed priority preemptive • fair share scheduling (multi-user) Operating Systems 267/746 Basic Resources & Multiplexing Interactivity • throughput vs latency • latency is more important for interactive workloads — think phone or desktop systems — but also web servers • throughput is more important for batch systems — think render farms, compute grids, simulation Operating Systems 268/746 Basic Resources & Multiplexing Reducing Latency • shorter time slices • more willingness to switch tasks (more preemption) • dynamic priorities • priority boost for foreground processes Operating Systems 269/746 Basic Resources & Multiplexing Maximising Throughput • longer time slices • reduce context switches to minimum • cooperative multitasking Operating Systems 270/746 Basic Resources & Multiplexing Multi-Core Schedulers • traditionally one CPU, many threads • nowadays: many threads, many CPUs (cores) • more complicated algorithms • more complicated & concurrent-safe data structures Operating Systems 271/746 Basic Resources & Multiplexing Scheduling and Caches • threads can move between CPU cores — important when a different core is idle — and a runnable thread is waiting for CPU • but there is a price to pay — thread / process data is extensively cached — caches are typically not shared by all cores Operating Systems 272/746 Basic Resources & Multiplexing Core Affinity • modern schedulers try to avoid moving threads • threads are said to have an affinity to a core • an extreme case is pinning — this altogether prevents the thread to be migrated • practically, this practice improves throughput — even if nominal core utilisation may be lower Operating Systems 273/746 Basic Resources & Multiplexing NUMA Systems • non-uniform memory architecture — different memory is attached to different CPUs — each symmetric block within a NUMA is called a node • migrating a process to a different node is expensive — thread vs node ping-pong can kill performance — threads of one process should live on one node Operating Systems 274/746 Basic Resources & Multiplexing Part 6.3: Interrupts and Clocks Interrupt • a way for hardware to request attention • CPU mechanism to divert execution • partial (CPU state only) context switch • switch to privileged (kernel) CPU mode Operating Systems 276/746 Basic Resources & Multiplexing Hardware Interrupts • asynchronous, unlike software interrupts • triggered via bus signals to the CPU • IRQ = interrupt request — just a different name for hardware interrupts • PIC = programmable interrupt controller Operating Systems 277/746 Basic Resources & Multiplexing Interrupt Controllers • PIC: simple circuit, typically with 8 input lines — peripherals connect to the PIC with wires — PIC delivers prioritised signals to the CPU • APIC: advanced programmable interrupt controller — split into a shared 10 APIC and per-core local APIC — typically 24 incoming IRQ lines • OpenPIC, MPIC: similar to APIC, used by e.g. Freescale Operating Systems 278/746 Basic Resources & Multiplexing Timekeeping • PIT: programmable interval timer — crystal oscillator + divider - IRQ line to the CPU • local APIC timer: built-in, per-core clock • HPET: high-precision event timer • RTC: real-time clock Operating Systems 279/746 Basic Resources & Multiplexing Timer Interrupt • generated by the PIT or the local APIC • the OS can set the frequency • a hardware interrupt happens on each tick • this creates an opportunity for bookkeeping • and for preemptive scheduling Operating Systems 280/746 Basic Resources & Multiplexing Timer Interrupt and Scheduling • measure how much time the current thread took • if it ran out of its slice, preempt it — pick a new thread to execute — perform a context switch • those checks are done on each tick — rescheduling is usually less frequent Operating Systems 281/746 Basic Resources & Multiplexing Timer Interrupt Frequency • typical is 100 Hz • this means a 10 ms scheduling slice (quantum) • 1 kHz is also possible — harms throughput but improves latency Operating Systems 282/746 Basic Resources & Multiplexing Tickless Kernels • the timer interrupt wakes up the CPU • this can be inefficient if the system is idle • alternative: use one-off timers — allows the CPU to sleep longer — this improves power efficiency on light loads Operating Systems 283/746 Basic Resources & Multiplexing Tickless Scheduling • slice length (quantum) becomes part of the planning • if a core is idle, wake up on next software timer — synchronisation of software timers • other interrupts are delivered as normal — network or disk activity — keyboard, mice, ... Operating Systems 284/746 Basic Resources & Multiplexing Other Interrupts • serial port — data is available on the port • network hardware — data is available in a packet queue • keyboards, mice — user pressed a key, moved the mouse • USB devices in general Operating Systems 285/746 Basic Resources & Multiplexing Interrupt Routing • not all CPU cores need to see all interrupts • APIC can be told how to deliver IRQs — the OS can route IRQs to CPU cores • multi-core systems: IRQ load balancing — useful to spread out IRQ overhead — especially useful with high-speed networks Operating Systems 286/746 Basic Resources & Multiplexing Review Questions 17. What is a thread and a process? 18. What is a (thread, process] scheduler? 19. What do f o rk and exec do? 20. What is an interrupt? Operating Systems 287/746 Basic Resources & Multiplexing Part 7: Concurrency and Locking Lecture Overview 1. Inter-Process Communication 2. Synchronisation 3. Deadlocks Operating Systems 289/746 Concurrency and Locking What is Concurrency? • events that can happen at the same time • it is not important if it does, only that it can • events can be given a happens-before partial order • they are concurrent if unordered by happens-before Operating Systems 290/746 Concurrency and Locking Why Concurrency? • problem decomposition — different tasks can be largely independent • reflecting external concurrency — serving multiple clients at once • performance and hardware limitations — higher throughput on multicore computers Operating Systems 291/746 Concurrency and Locking Parallel Hardware • hardware is inherently parallel • software is inherently sequential • something has to give — hint: it's not going to be hardware Operating Systems 292/746 Concurrency and Locking Part 7.1: Inter-Process Communication Reminder: What is a Thread • thread is a sequence of instructions • each instruction happens-before the next — or: happens-before is a total order on the thread • basic unit of scheduling Operating Systems 294/746 Concurrency and Locking Reminder: What is a Process • the basic unit of resource ownership — primarily memory, but also open files &c. • may contain one or more threads • processes are isolated from each other — IPC creates gaps in that isolation Operating Systems 295/746 Concurrency and Locking I/O vs Communication • take standard input and output — imagine process A writes a file — later, process B reads that file • communication happens in real time — between two running threads / processes — automatic: without user intervention Operating Systems 296/746 Concurrency and Locking Direction • bidirectional communication is typical — this is analogous to a conversation • but unidirectional communication also makes sense — e.g. sending commands to a child process — do acknowledgments count as communication? Operating Systems 297/746 Concurrency and Locking Communication Example • network services are a typical example • take a web server and a web browser • the browser sends a request for a web page • the server responds by sending data Operating Systems 298/746 Concurrency and Locking Files • it is possible to communicate through files • multiple processes can open the same file • one can write data and another can process it — the original program picks up the results — typical when using programs as modules Operating Systems 299/746 Concurrency and Locking A File-Based IPC Example • files are used e.g. when you run cc file.c — it first runs a preprocessor: cpp -o file.i file.c — then the compiler proper: ccl -o file.o file.i — and finally a linker: Id file.o crt.o -1c • the intermediate files may be hidden in /tmp — and deleted when the task is completed Operating Systems 300/746 Concurrency and Locking Directories • communication by placing files or links • typical use: a spool directory — clients drop files into the directory for processing — a server periodically picks up files in there • used for e.g. printing and email Operating Systems 301/746 Concurrency and Locking Pipes • a device for moving bytes in a stream — note the difference from messages • one process writes, the other reads • the reader blocks if the pipe is empty • the writer blocks if the pipe buffer is full Operating Systems 302/746 Concurrency and Locking UNIX and Pipes • pipes are used extensively in UNIX • pipelines built via the shell's | operator • e.g. Is | grep hello.c • most useful for processing data in stages Operating Systems 303/746 Concurrency and Locking Sockets • similar to, but more capable than pipes • allows one server to talk to many clients • each connection acts like a bidirectional pipe • could be local but also connected via a network Operating Systems 304/746 Concurrency and Locking Shared Memory • memory is shared when multiple threads can access it — happens naturally for threads of a single process — the primary means of inter-thread communication • many processes can map same piece of physical memory — this is the more traditional setting — hence also allows inter-process communication Operating Systems 305/746 Concurrency and Locking Message Passing • communication using discrete messages • we may or may not care about delivery order • we can decide to tolerate message loss • often used across a network Operating Systems 306/746 Concurrency and Locking Part 7.2: Synchronisation Shared Variables • structured view of shared memory • typical in multi-threaded programs • e.g. any global variable in a program • but may also live in memory from malloc Operating Systems 308/746 Concurrency and Locking Shared Heap Variable void *thread( int *x ) { *x = 7; } int main() { pthread_t id; int *x = malloc( sizeof( int ) ); pthread_create( &id, NULL, thread, x ); } Operating Systems 309/746 Concurrency and Locking Race Condition: Example • consider a shared counter, i • and the following two threads int i = 0; void threadl() { i = i + 1; } void thread2() { i = i - 1; } What is the value of i after both finish? Operating Systems 310/746 Concurrency and Locking Race on a Variable • memory access is not atomic • take x = x + 1 ao «- load x bo «- load x ai «- ao + 1 bi «- bo + 1 store ai x store bi x Operating Systems 311/746 Concurrency and Locking Critical Section • any section of code that must not be interrupted • the statement x = x + 1 could be a critical section • what is a critical section is domain-dependent — another example could be a bank transaction — or an insertion of an element into a linked list Operating Systems 312/746 Concurrency and Locking Race Condition: Definition • (anomalous) behaviour that depends on timing • typically among multiple threads or processes • an unexpected sequence of events happens • recall that ordering is not guaranteed Operating Systems 313/746 Concurrency and Locking Races in a Filesystem • the file system is also a shared resource • and as such, prone to race conditions • e.g. two threads both try to create the same file — what happens if they both succeed? — if both write data, the result will be garbled Operating Systems 314/746 Concurrency and Locking Mutual Exclusion • only one thread can access a resource at once • ensured by a mutual exclusion device (a.k.a mutex] • a mutex has 2 operations: lock and unlock • lock may need to wait until another thread unlocks Operating Systems 315/746 Concurrency and Locking Semaphore • somewhat more general than a mutex • allows multiple interchangeable instances of a resource — that many threads can enter the critical section • basically an atomic counter Operating Systems 316/746 Concurrency and Locking Monitors • a programming language device (not OS-provided] • internally uses standard mutual exclusion • data of the monitor is only accessible to its methods • only one thread can enter the monitor at any given time Operating Systems 317/746 Concurrency and Locking Condition Variables • what if the monitor needs to wait for something? • imagine a bounded queue implemented as a monitor — what happens if it becomes full? — the writer must be suspended • condition variables have wait and signal operations Operating Systems 318/746 Concurrency and Locking Spinlocks • a spinlock is the simplest form of a mutex • the lock method repeatedly tries to acquire the lock — this means it is taking up processor time — also known as busy waiting • spinlocks between threads on the same CPU are very bad — but can be very efficient between CPUs Operating Systems 319/746 Concurrency and Locking Suspending Mutexes • these need cooperation from the OS scheduler • when lock acquisition fails, the thread sleeps — it is put on a waiting queue in the scheduler • unlocking the mutex will wake up the waiting thread • needs a system call slow compared to a spinlock Operating Systems 320/746 Concurrency and Locking Condition Variables Revisited • same principle as a suspending mutex • the waiting thread goes into a wait queue • the signal method moves the thread back to a run queue • the busy-wait version is known as polling Operating Systems 321/746 Concurrency and Locking Barrier • sometimes, parallel computation proceeds in phases — all threads must finish phase 1 — before any can start phase 2 • this is achieved with a barrier — blocks all threads until the last one arrives — waiting threads are usually suspended Operating Systems 322/746 Concurrency and Locking Read-Copy-Update • the fastest lock is no lock • RCU allows readers to work while updates are done — make a copy and update the copy — point new readers to the updated copy • when is it safe to reclaim memory? Operating Systems 324/746 Concurrency and Locking Part 7.3: Deadlocks Shared Resources • hardware comes in a limited number of instances • many devices can only do one thing at a time • think printers, DVD writers, tape drives,... • we want to use the devices efficiently sharing Operating Systems 326/746 Concurrency and Locking Network-based Sharing • sharing is not limited to processes on one computer • printers and scanners can be network-attached • all computers on network may need to coordinate access — this could lead to multi-computer deadlocks Operating Systems 327/746 Concurrency and Locking Locks as Resources • we explored locks in the previous section • locks (mutexes) are also a form of resource — a mutex can be acquired (locked] and released — a locked mutex belongs to a particular thread • locks are proxy (stand-in) resources Operating Systems 328/746 Concurrency and Locking Preemptable Resources • sometimes, held resources can be taken away • this is the case with e.g. physical memory — a process can be swapped to disk if need be • preemtability may also depend on context — maybe paging is not available Operating Systems 329/746 Concurrency and Locking Non-preemptable Resources • those resources cannot be (easily) taken away • think photo printer in the middle of a page • or a DVD burner in the middle of writing • non-preemptable resources can cause deadlocks Operating Systems 330/746 Concurrency and Locking Resource Acquisition • a process needs to request access to a resource • this is called an acquisition • when the request is granted, it can use the device • after it is done, it must release the device — this makes it available for other processes Operating Systems 331/746 Concurrency and Locking Waiting • what to do if we wish to acquire a busy resource? • unless we don't really need it, we have to wait • this is the same as waiting for a mutex • the thread is moved to a wait queue Operating Systems 332/746 Concurrency and Locking Resource Deadlock • two resources, A and B • two processes, P and Q • P acquires A, Q acquires B • P tries to acquire B but has to wait for Q • Q tries to acquire A but has to wait for P Operating Systems 333/746 Concurrency and Locking Deadlock Conditions 1. mutual exclusion 2. hold and wait condition 3. non-preemtability 4. circular wait Deadlock is only possible if all 4 are present. Operating Systems 334/746 Concurrency and Locking Non-Resource Deadlocks • not all deadlocks are due to resource contention • imagine a message-passing system • process A is waiting for a message • process B sends a message to A and waits for reply • the message is lost in transit Operating Systems 335/746 Concurrency and Locking Example: Pipe Deadlock • recall that both the reader and writer can block • what if we create a pipe in each direction? • process A writes data and tries to read a reply — it blocks because the opposite pipe is empty • process B reads the data but waits for more deadlock Operating Systems 336/746 Concurrency and Locking Deadlocks: Do We Care? • deadlocks can be very hard to debug • they can also be exceedingly rare • we may find the risk of a deadlock acceptable • just reboot everything if we hit a deadlock — also known as the ostrich algorithm Operating Systems 337/746 Concurrency and Locking Deadlock Detection • we can at least try to detect deadlocks • usually by checking the circular wait condition • keep a graph of who owns what and who waits for what • if there is a loop in the graph deadlock Operating Systems 338/746 Concurrency and Locking Deadlock Recovery • if a preemptable resource is involved, reassign it • otherwise, it may be possible to do a rollback — this needs elaborate checkpointing mechanisms • all else failing, kill some of the processes — the devices may need to be re-initialised Operating Systems 339/746 Concurrency and Locking Deadlock Avoidance • we can possibly deny acquisitions to avoid deadlocks • we need to know the maximum resources for each process • avoidance relies on safe states — worst case all processes ask for maximum resources — safe means we can avoid a deadlock in the worst case Operating Systems 340/746 Concurrency and Locking Deadlock Prevention • deadlock avoidance is typically impractical • there are 4 conditions for deadlocks to exist • we can try attacking those conditions • if we can remove one of them, deadlocks are prevented Operating Systems 341/746 Concurrency and Locking Prevention via Spooling • this attacks the mutual exclusion property • multiple programs could write to a printer • the data is collected by a spooling daemon • which then sends the jobs to the printer in sequence Operating Systems 342/746 Concurrency and Locking Prevention via Reservation • we can also try removing hold-and-wait • for instance, we can only allow batch acquisition — the process must request everything at once — this is usually impractical • alternative: release and re-acquire Operating Systems 343/746 Concurrency and Locking Prevention via Ordering • this approach eliminates circular waits • we impose a global order on resources • a process can only acquire resources in this order — must release + re-acquire if the order is wrong • it is impossible to form a cycle this way Operating Systems 344/746 Concurrency and Locking Livelock • in a deadlock, no progress can be made • but it's not much better if processes go back and forth — for instance releasing and re-acquiring resources — they make no useful progress — they additionally consume resources • this is as livelock and is just as bad as a deadlock Operating Systems 345/746 Concurrency and Locking Starvation • starvation happens when a process can't make progress • generalisation of both deadlock and livelock • for instance, unfair scheduling on a busy system • also recall the readers and writers problem Operating Systems 346/746 Concurrency and Locking Review Questions 21. What is a mutex? 22. What is a deadlock? 23. What are the conditions for a deadlock to form? 24. What is a race condition? Operating Systems 347/746 Concurrency and Locking Part 8: Device Drivers Lecture Overview 1. Drivers, 10 and Interrupts 2. System and Expansion Busses 3. Graphics 4. Persistent Storage 5. Networking and Wireless Operating Systems 349/746 Device Drivers Part 8.1: Drivers, 10 and Interrupts Input and Output • we will mostly think in terms of 10 • peripherals produce and consume data • input - reading data produced by a device • output - sending data to a device Operating Systems 351/746 Device Drivers What is a Driver? • piece of software that talks to a device • usually quite specific / unportable — tied to the particular device — and also to the operating system • often part of the kernel Operating Systems 352/746 Device Drivers Kernel-mode Drivers • they are part of the kernel • running with full kernel privileges — including unrestricted hardware access • no or minimal context switching overhead — fast but dangerous Operating Systems 353/746 Device Drivers Microkernels • drivers are excluded from microkernels • but the driver still needs hardware access — this could be a special memory region — it may need to react to interrupts • in principle, everything can be done indirectly — but this may be quite expensive, too Operating Systems 354/746 Device Drivers User-mode Drivers • many drivers can run completely in user space • this improves robustness and security — driver bugs can't bring the entire system down — nor can they compromise system security • possibly at some cost to performance Operating Systems 355/746 Device Drivers Drivers in Processes • user-mode drivers typically run in their own process • this means context switches — every time the device demands attention (interrupt) — every time another process wants to use the device • the driver needs system calls to talk to the device — this incurs even more overhead Operating Systems 356/746 Device Drivers In-Process Drivers • what if a (large portion of) a driver could be a library • best of both worlds — no context switch overhead for requests — bugs and security problems remain isolated • often used for GPU-accelerated 3D graphics Operating Systems 357/746 Device Drivers Port-Mapped 10 • early CPUs had very limited address space — 16-bit addresses mean 64KB of memory • peripherals got a separate address space • special instructions for using those addresses — e.g. in and out on x86 processors Operating Systems 358/746 Device Drivers Memory-mapped 10 • devices share address space with memory • more common in contemporary systems • 10 uses the same instructions as memory access — load and store on RISC, mov on x86 • allows selective user-level access (via the MMU) Operating Systems 359/746 Device Drivers Programmed 10 • input or output is driven by the CPU • the CPU must wait until the device is ready • would usually run at bus speed - 8 MHz for ISA (and hence ATA-1) • PIO would talk to a buffer on the device Operating Systems 360/746 Device Drivers Interrupt-driven 10 • peripherals are much slower than the CPU — polling the device is expensive • the peripheral can signal data availability — and also readiness to accept more data • this frees up CPU to do other work in the meantime Operating Systems 361/746 Device Drivers Interrupt Handlers • also known as first-level interrupt handler • they must run in privileged mode — they are part of the kernel by definition • the low-level interrupt handler must finish quickly — it will mask its own interrupt to avoid re-entering — and schedule any long-running jobs for later (SLIH) Operating Systems 362/746 Device Drivers Second-level Handler • does any expensive interrupt-related processing • can be executed by a kernel thread — but also by a user-mode driver • usually not time critical (unlike first-level handler) — can use standard locking mechanisms Operating Systems 363/746 Device Drivers Direct Memory Access • allows the device to directly read/write memory • this is a huge improvement over programmed 10 • interrupts only indicate buffer full/empty • the device can read and write arbitrary physical memory — opens up security / reliability problems Operating Systems 364/746 Device Drivers IO-MMU • like the MMU, but for DMA transfers • allows the OS to limit memory access per device • very useful in virtualisation • only recently found its way into consumer computers Operating Systems 365/746 Device Drivers Part 8.2: System and Expansion Busses History: ISA (Industry Standard Architecture) • 16-bit system expansion bus on IBM PC/AT • programmed 10 and interrupts (but no DMA) • a fixed number of hardware-configured interrupt lines — likewise for I/O port ranges — the HW settings then need to be typed back for SW • parallel data and address transmission Operating Systems 367/746 Device Drivers MCA, EISA • MCA: Micro Channel Architecture — proprietary to IBM, patent-encumbered — 32-bit, software-driven device configuration — expensive and ultimately a market failure • EISA: Enhanced ISA — a 32-bit extension of ISA — mostly created to avoid MCA licensing costs — short-lived and replaced by PCI Operating Systems 368/746 Device Drivers VESA Local Bus • memory mapped 10 & DMA on otherwise ISA systems • tied to the 80486 line of Intel CPUs (and AMD clones] • primarily for graphics cards — but also used with hard drives • quickly fell out of use with the arrival of PCI Operating Systems 369/746 Device Drivers PCI: Peripheral Component Interconnect • a 32-bit successor to ISA - 33 MHz (compared to 8 MHz for ISA] - later revisions at 66 MHz, PCI-X at 133 MHz — added support for bus-mastering and DMA • still a shared, parallel bus — all devices share the same set of wires Operating Systems 370/746 Device Drivers Bus Mastering • normally, the CPU is the bus master — which means it initiates communication • it's possible to have multiple masters — they need to agree on a conflict resolution protocol • usually used for accessing the memory Operating Systems 371/746 Device Drivers DMA (Direct Memory Access) • the most common form of bus mastering • the CPU tells the device what and where to write • the device then sends data directly to RAM — the CPU can work on other things in the meantime — completion is signaled via an interrupt Operating Systems 372/746 Device Drivers Plug and Play • the ISA system for IRQ configuration was messy • MCA pioneered software-configured devices • PCI further improved on MCA with "Plug and Play" — each PCI device has an ID it can tell the system — allows for enumeration and automatic configuration Operating Systems 373/746 Device Drivers PCI IDs and Drivers • PCI allows for device enumeration • device identifiers can be paired to device drivers • this allows the OS to load and configure its drivers — or even download / install drivers from a vendor Operating Systems 374/746 Device Drivers AGP: Accelerated Graphics Port • PCI eventually became too slow for GPUs — AGP is based on PCI and only improves performance — enumeration and configuration stays the same • adds a dedicated point-to-point connection • multiple transfers per clock (up to 8, for 2 GB/s] Operating Systems 375/746 Device Drivers PCI Express • the current high-speed peripheral bus for PC • builds on / extends conventional PCI • point-to-point, serial data interconnect • much improved throughput (up to ~30GB/s) Operating Systems 376/746 Device Drivers USB: Universal Serial Bus • primarily for external peripherals — keyboards, mice, printers,... — replaced a host of legacy ports • later revisions allow high-speed transfers — suitable for storage devices, cameras &c. • device enumeration, capability negotiation Operating Systems 377/746 Device Drivers USB Classes • a set of vendor-neutral protocols • HID = human-interface device • mass storage = disk-like devices • audio equipment • printing Operating Systems 378/746 Device Drivers Other USB Uses • ethernet adapters • usb-serial adapters • wifi adapters [dongles] — there isn't a universal protocol — each USB WiFi adapter needs • bluetooth Operating Systems 379/746 Device Drivers ARM Busses • ARM is typically used in System-on-a-Chip designs • those use a proprietary bus to connect peripherals • there is less need for enumeration — the entire system is baked into a single chip • the peripherals can be pre-configured Operating Systems 380/746 Device Drivers USB and PCIe on ARM • USB nor PCIe are exclusive to the PC platform • most ARM SoC's support USB devices — for slow and medium-speed off-SoC devices — e.g. used for ethernet on RPi 1 • some ARM SoC's support PCI Express — this allows for high-speed off-SoC peripherals Operating Systems 381/746 Device Drivers PCMCIA & PC Card • People Can't Memorize Computer Industry Acronyms — PC = Personal Computer, MC = Memory Card • hotplug-capable notebook expansion bus • used for memory cards, network adapters, modems • comes with its own set of drivers (cardbus) Operating Systems 382/746 Device Drivers ExpressCard • an expansion card standard like PCMCIA / PC Card • based on PCIe and USB — can mostly re-use drivers for those standards • not in wide use anymore — last update was in 2009, introducing USB 3 support — the industry association disbanded the same year Operating Systems 383/746 Device Drivers miniPCIe, mSATA, M.2 • those are physical interfaces, not special busses • they provide some mix of PCIe, SATA and USB — also other protocols like I2C, SMBus, ... • used mainly for compact SSDs and wireless — also GPS, NFC, bluetooth,... Operating Systems 384/746 Device Drivers Part 8.3: Graphics and GPUs Graphics Cards • initially just a device to drive displays • reads pixels from memory and provides display signal — basically a DAC with a clock — the memory can be part of the graphics card • evolved acceleration capabilities Operating Systems 386/746 Device Drivers Graphics Accelerator • allows common operations to be done in hardware • like drawing lines or filled polygons • the pixels are computed directly in video RAM • this can save considerable CPU time Operating Systems 387/746 Device Drivers 3D Graphics • rendering 3D scenes is computationally intensive • CPU-based, software-only rendering is possible — texture-less in early flight simulators — bitmap textures since '95 / '96 (Descent, Quake] • CAD workstation had 3D accelerators (OpenGL '92) Operating Systems 388/746 Device Drivers GPU (Graphical Processing Unit) • a term coined by nVidia near the end of '90s • originally a purpose-built hardware renderer — based on polygonal meshes and Z buffering • increasingly more flexible and programmable • on-board RAM, high-speed connection to system RAM Operating Systems 389/746 Device Drivers GPU Drivers • split into a number of components • graphics output / frame buffer access • memory management is often done in kernel • geometry, textures &c. are prepared in-process • front end API: OpenGL, Direct3D, Vulkan,... Operating Systems 390/746 Device Drivers Shaders • current GPUs are computation devices • the GPU has its own machine code for shaders • the GPU driver contains a shader compiler — either all the way from a high level language (HLSL) — or starting with an intermediate code (SPIR) Operating Systems 391/746 Device Drivers Mode Setting • this part deals with screen configuration and resolution • including support for e.g. multiple displays • usually also supports primitive (SW-only) framebuffer • often done by a kernel with minimum user-level support Operating Systems 392/746 Device Drivers Graphics Servers • multiple apps cannot all drive the graphics card — the graphics hardware needs to be shared — one option is a graphics server • provides an IPC-based drawing and/or windowing API • performs painting on behalf of the applications Operating Systems 393/746 Device Drivers Compositors • a more direct way to share graphics cards • each application gets its own buffer to paint into • painting is mostly done by a [context-switched] GPU • the individual buffers are then composed onto screen — composition is also hardware-accelerated Operating Systems 394/746 Device Drivers GP-GPU • general-purpose GPU (CUDA, OpenCL,...) • used for computation instead of just graphics • basically a return of vector processors • close to CPUs but not part of normal OS scheduling Operating Systems 395/746 Device Drivers Part 8.4: Persistent Storage Drivers • split into adapter, bus and device drivers • often a single driver per device type — at least for disk drives and CD-ROMs • bus enumeration and configuration • data addressing and data transfers Operating Systems 397/746 Device Drivers IDE / ATA • Integrated Drive Electronics — disk controller becomes part of the disk — standardised as ATA-1 (AT Attachment...) • based on the ISA bus, but with cables • later adapted for non-disk use via ATAPI Operating Systems 398/746 Device Drivers ATA Enumeration • each ATA interface can attach only 2 drives — the drives are HW-configured as master/slave — this makes enumeration quite simple • multiple ATA interfaces were standard • no need for specific HDD drivers Operating Systems 399/746 Device Drivers PIO vs DMA • original IDE could only use programmed 10 • this eventually became a serious bottleneck • later ATA revisions include DMA modes - up to 160MB/s with highest DMA modes - compare 1900MB/s for SATA 3.2 Operating Systems 400/746 Device Drivers SATA • serial, point-to-point replacement for ATA • hardware-level incompatible to (parallel) ATA — but SATA inherited the ATA command set — legacy mode allows PATA drivers to talk to SATA drives • hot-swap capable - replace drives in a running system Operating Systems 401/746 Device Drivers AHCI (Advanced Host Controller Interface) • vendor-neutral interface to SATA controllers — in theory only a single 'AHCI' driver is needed • an alternative to 'legacy mode' • NCQ = Native Command Queuing — allows the drive to re-order requests — another layer of 10 scheduling Operating Systems 402/746 Device Drivers ATA and SATA Drivers • the host controller (adapter) is mostly vendor-neutral • the bus driver will expose the ATA command set — including support for command queuing • device driver uses the bus driver to talk to devices • partially re-uses SCSI drivers for ATAPI &c. Operating Systems 403/746 Device Drivers SCSI (Small Computer System Interface) • originated with minicomputers in the 80's • more complicated and capable than ATA — ATAPI basically encapsulates SCSI over ATA • device enumeration, including aggregates — e.g. entire enclosures with many drives • also allows CD-ROM, tapes, scanners (!) Operating Systems 404/746 Device Drivers SCSI Drivers • split into a host bus adapter [HBA] driver • a generic SCSI bus and command component — often re-used in both ATAPI and USB storage • and per-device or per-class drivers — optical drives, tapes, CD/DVD-ROM — standard disk and SSD drives Operating Systems 405/746 Device Drivers iSCSI • basically SCSI over TCP/IP • entirely software-based • allows standard computers to serve as block storage • takes advantage of fast cheap ethernet • re-uses most of the SCSI driver stack Operating Systems 406/746 Device Drivers NVMe: Non-Volatile Memory Express • a fairly simple protocol for PCIe-attached storage • optimised for SSD-based devices — much bigger and more command queues than AHCI — better / faster interrupt handling • stresses concurrency in the kernel block layer Operating Systems 407/746 Device Drivers USB Mass Storage • an USB device class (vendor-neutral protocol] — one driver for the entire class • typically USB flash drives, but also external disks • USB 2 is not suitable for high-speed storage - USB 3 introduced UAS = USB-Attached SCSI Operating Systems 408/746 Device Drivers Tape Drives • unlike disk drives, only allow sequential access • needs support for media ejection, rewinding • can be attached with SCSI, SATA, USB • parts of the driver will be bus-neutral • mainly for data backup, capacities 6-15TB Operating Systems 409/746 Device Drivers Optical Drives • mainly used as a read-only distribution medium • laser-facilitated reading of a rotating disc • can be again attached to SCSI, SATA or USB • conceived for audio playback very slow seek Operating Systems 410/746 Device Drivers Optical Disk Writers (Burners) • behaves more like a printer for optical disks • drivers are often done in user space • attached by one of the standard disk busses • special programs required to burn disks — alternative: packet-writing drivers Operating Systems 411/746 Device Drivers Part 8.5: Networking and Wireless Networking • networks allow multiple computers to exchange data — this could be files, streams or messages • there are wired and wireless networks • we will only deal with the lowest layers for now • NIC = Network Interface Card Operating Systems 413/746 Device Drivers Ethernet • specifies the physical media • on-wire format and collision resolution • in modern setups, mostly point-to-point links — using active packet switching devices • transmits data in frames (low-level packets) Operating Systems 414/746 Device Drivers Addressing • at this level, only local addressing — at most a single LAN segment • uses baked-in MAC addresses — MAC = Media Access Control • addresses belong to interfaces, not computers Operating Systems 415/746 Device Drivers Transmit Queue • packets are picked up from memory • the OS prepares packets into the transmit queue • the device picks them up asynchronously • similar to how SATA queues commands and data Operating Systems 416/746 Device Drivers Receive Queue • data is also queued in the other direction • the NIC copies packets into a receive queue • it invokes an interrupt to tell the OS about new items — the NIC may batch multiple packets per interrupt • if the queue is not cleared quickly packet loss Operating Systems 417/746 Device Drivers Multi-Queue Adapters • fast adapters can saturate a CPU — e.g. lOGbE cards, or multi-port GbE • these NICs can manage multiple RX and TX queues — each queue gets its own interrupt — different queues can be handled by different CPU cores Operating Systems 418/746 Device Drivers Checksum and TCP Offloading • more advanced adapters can offload certain features • commonly computation of mandatory packet checksums • but also TCP-related features • this needs both driver support and TCP/IP stack support Operating Systems 419/746 Device Drivers WiFi • wireless network interface - "wireless ethernet" • shared medium - electromagnetic waves in air • (almost) mandatory encryption — otherwise easy to eavesdrop or even actively attack • a very complex protocol (relative to hardware standards) — assisted by firmware running on the adapter Operating Systems 420/746 Device Drivers Bluetooth • a wireless alternative to USB • allows short-distance radio links with peripherals — input (keyboard, mice, game controllers) — audio (headsets, speakers) — data transmission (e.g. smartphone sync) — gadgets (watches, heartrate monitoring, GPS, ...) Operating Systems 421/746 Device Drivers Review Questions 25. What is memory-mapped 10 and DMA? 26. What is a system bus? 27. What is a graphics accelerator? 28. What is a NIC receive queue? Operating Systems 422/746 Device Drivers Part 9: Network Stack Lecture Overview 1. Networking Intro 2. The TCP/IP Stack 3. Using Networks 4. Network File Systems Operating Systems 424/746 Network Stack Part 9.1: Networking Intro Host and Domain Names • hostname = human readable computer name • hierarchical system, big-endian: www. f i. muni. cz • FQDN = fully-qualified domain name • the local suffix may be omitted (ping aisa) Operating Systems 426/746 Network Stack Network Addresses • address = machine-friendly and numeric • IPv4 address: 4 octets (bytes): 192.168.1.1 • IPv6 address: 16 octets • Ethernet (MAC): 6 octets, c8: 5b: 76: bd: 6e: 0b Operating Systems 427/746 Network Stack Network Types • LAN = Local Area Network — Ethernet: wired, up to 10Gb/s — WiFi (802.11): wireless, up to lGb/s • WAN = Wide Area Network (the Internet) — PSTN, xDSL, PPPoE — GSM, 2G (GPRS, EDGE), 3G (UMTS), 4G (LTE) — also LAN technologies - Ethernet, WiFi Operating Systems 428/746 Network Stack Networking Layers 2. Link (Ethernet, WiFi) 3. Network (IP) 4. Transport (TCP, UDP,...) 7. Application (HTTP, SMTP, ...) Operating Systems 429/746 Network Stack Networking and Operating Systems • a network stack is a standard part of an OS • large part of the stack lives in the kernel — although this only applies to monolithic kernels — microkernels use user-space networking • another chunk is in system libraries & utilities Operating Systems 430/746 Network Stack Kernel-Side Networking • device drivers for networking hardware • network and transport protocol layers • routing and packet filtering (firewalls) • networking-related system calls (sockets) • network file systems (SMB, NFS) Operating Systems 431/746 Network Stack System Libraries • the socket and related APIs • host name resolution (a DNS dient) • encryption and data authentication (SSL, TLS) • certificate handling and validation Operating Systems 432/746 Network Stack System Utilities • network configuration (if config] • diagnostics (ping, trace route] • packet logging and inspection (t cpdump] • route management (route, bgpd] Operating Systems 433/746 Network Stack Networking Aspects • packet format — what are the units of communication • addressing — how are the sender and recipient named • packet delivery — how a message is delivered Operating Systems 434/746 Network Stack Protocol Nesting • protocols run on top of each other • this is why it is called a network stack • higher levels make use of the lower levels — HTTP uses abstractions provided by TCP — TCP uses abstractions provided by IP Operating Systems 435/746 Network Stack Packet Nesting • higher-level packets are just data to the lower level • an Ethernet frame can carry an IP packet in it • the IP packet can carry a TCP packet • the TCP packet can carry an HTTP request Operating Systems 436/746 Network Stack Stacked Delivery • delivery is, in the abstract, point-to-point — routing is mostly hidden from upper layers — the upper layer requests delivery to an address • lower-layer protocols are usually packet-oriented — packet size mismatches can cause fragmentation • a packet can pass through different low-level domains Operating Systems 437/746 Network Stack Layers vs Addressing • not as straightforward as packet nesting — address relationships are tricky • special protocols exist to translate addresses — DNS for hostname vs IP address mapping — ARP for IP vs MAC address mapping Operating Systems 438/746 Network Stack ARP (Address Resolution Protocol) • finds the MAC that corresponds to an IP • required to allow packet delivery — IP uses the link layer to deliver its packets — the link layer must be given a MAC address • the OS builds a map of IP MAC translations Operating Systems 439/746 Network Stack Ethernet • link-level communication protocol • largely implemented in hardware • the OS uses a well-defined interface — packed receive and submit — using MAC addresses (ARP is part of the OS] Operating Systems 440/746 Network Stack Packet Switching • shared media are inefficient due to collisions • ethernet is typically packet switched — a switch is usually a hardware device — but also in software (usually for virtualisation) — physical connections form a star topology Operating Systems 441/746 Network Stack Bridging • bridges operate at the link layer (layer 2) • a bridge is a two-port device — each port is connected to a different LAN — the bridge joins the LANs by forwarding frames • can be done in hardware or software — brctl on Linux, if conf ig on OpenBSD Operating Systems 442/746 Network Stack Tunneling • tunnels are virtual layer 2 or 3 devices • they encapsulate traffic using a higher-level protocol • tunneling is used to implement Virtual Private Networks — a software bridge can operate over an UDP tunnel — the tunnel is usually encrypted Operating Systems 443/746 Network Stack PPP (Point-to-Point Protocol) • a link-layer protocol for 2-node networks • available over many physical connections — phone lines, cellular connections, DSL, Ethernet — often used to connect endpoints to the ISP • supported by most operating systems — split between the kernel and system utilities Operating Systems 444/746 Network Stack Wireless • WiFi is mostly like (slow, unreliable) Ethernet • needs encryption since anyone can listen • also authentication to prevent rogue connections - PSK (pre-shared key), EAP / 802.1 lx • encryption needs key management Operating Systems 445/746 Network Stack Part 9.2: The TCP/IP Stack IP (Internet Protocol) • uses 4 byte (v4) or 16 byte (v6) addresses — split into network and host parts • it is a packet-based protocol • is a best-effort protocol — packets may get lost, reordered or corrupted Operating Systems 447/746 Network Stack IP Networks • IP networks roughly correspond to LANs — hosts on the same network are located with ARP — remote networks are reached via routers • a netmask splits the address into network/host parts • IP typically runs on top of Ethernet or PPP Operating Systems 448/746 Network Stack Routing • routers forward packets between networks • somewhat like bridges but layer 3 • routers act as normal LAN endpoints — but represent entire remote IP networks — or even the entire Internet Operating Systems 449/746 Network Stack Services and TCP/UDP Port Numbers • networks are generally used to provide services — each computer can host multiple • different services can run on different ports • port is a 16-bit number and some ar given names - port 25 is SMTP, port 80 is HTTP, ... Operating Systems 450/746 Network Stack ICMP: Internet Control Message Protocol • control messages (packets] — destination host/network unreachable — time to live exceeded — fragmentation required • diagnostic packets, e.g. the ping command — echo request and echo reply — combine with TTL for trace route Operating Systems 451/746 Network Stack TCP: Transmission Control Protocol • a stream-oriented protocol on top of IP • works like a pipe (transfers a byte sequence] — must respect delivery order — and also re-transmit lost packets • must establish connections Operating Systems 452/746 Network Stack TCP Connections • the endpoints must establish a connection first • each connection serves as a separate data stream • a connection is bidirectional • TCP uses a 3-way handshake: SYN, SYN/ACK, ACK Operating Systems 453/746 Network Stack Sequence Numbers • TCP packets carry sequence numbers • these numbers are used to re-assemble the stream — IP packets can arrive out of order • they are also used to acknowledge reception — and subsequently to manage re-transmission Operating Systems 454/746 Network Stack Packet Loss and Re-transmission • packets can get lost for a variety of reasons — a link goes down for an extended period of time — buffer overruns on routing equipment • TCP sends acknowledgments for received packets — the ACKs use sequence numbers to identify packets Operating Systems 455/746 Network Stack UDP: User (Unreliable) Datagram Protocol • TCP comes with non-trivial overhead — and its guarantees are not always required • UDP is a much simpler protocol — a very thin wrapper around IP — with minimal overhead on top of IP Operating Systems 456/746 Network Stack Name Resolution • users do not want to remember numeric addresses — phone numbers are bad enough • host names are used instead • can be stored in a file, e.g. /etc/hosts — not very practical for more than 3 computers — but there are millions of computers on the Internet Operating Systems 457/746 Network Stack DNS: Domain Name Service • hierarchical protocol for name resolution — runs on top of TCP or UDP • domain names are split into parts using dots — each domain knows whom to ask for the next bit — the name database is effectively distributed Operating Systems 458/746 Network Stack DNS Recursion • takewww.fi.muni.cz. as an example domain • resolution starts from the right at root servers — the root servers refer us to the cz. servers — thecz. serversreferustomuni.cz — finallymuni.cz. tellsusaboutfi.muni.cz Operating Systems 459/746 Network Stack DNS Recursion Example $ dig www.fi.muni.cz. A +trace IN NS j.root-servers.net. cz. IN NS b.ns.nic.cz. muni.cz. IN NS ns.muni.cz. fi.muni.cz. IN NS aisa.fi.muni.cz. www.fi.muni.cz. IN A 147.251.48.1 Operating Systems 460/746 Network Stack DNS Record Types • A is for (IP) Address • AAAA is for an IPv6 Address • CNAME is for an alias • MX is for mail servers • and many more Operating Systems 461/746 Network Stack Firewalls • the name comes from building construction — a fire-proof barrier between parts of a building • the idea is to separate networks from each other — making attacks harder from the outside — limiting damage in case of compromise Operating Systems 462/746 Network Stack Packet Filtering • packet filtering is how firewalls are usually implemented • can be done on a router or at an endpoint • dedicated routers + packet filters are more secure — a single such firewall protects the entire network — less opportunity for mis-configuration Operating Systems 463/746 Network Stack Packet Filter Operation • packet filters operate on a set of rules — the rules are generally operator-provided • each incoming packet is classified using the rules • and then dispatched accordingly — may be forwarded, dropped, rejected or edited Operating Systems 464/746 Network Stack Packet Flter Examples • packet filters are often part of the kernel • the rule parser is a system utility — it loads rules from a configuration file — and sets up the kernel-side filter • there are multiple implementations — iptables, nf tables in Linux — pf in OpenBSD, ipfw in FreeBSD Operating Systems 465/746 Network Stack Part 9.3: Using Networks Sockets Reminder • the socket API comes from early BSD Unix • socket represents a (possible) network connection • you get a file descriptor for an open socket • you can read () and write () to sockets — but also sendmsg() and recvmsg() Operating Systems 467/746 Network Stack Socket Types • sockets can be internet or unix domain — internet sockets work across networks • stream sockets are like files — you can write a continuous stream of data — usually implemented using TCP • datagram sockets send individual messages — usually implemented using UDP Operating Systems 468/746 Network Stack Creating Sockets • a socket is created using the socket () function • it can be turned into a server using listen () — individual connections are established with accept () • or into a client using connect () Operating Systems 469/746 Network Stack Resolver API • libc contains a resolver — available as gethostbyname (and gethostbyname2) — also gethostbyadd r for reverse lookups • can look in many different places — most systems support at least /etc/hosts — and DNS-based lookups Operating Systems 470/746 Network Stack Network Services • servers listen on a socket for incoming connections — a client actively establishes a connection to a server • the network simply transfers data between them • interpretation of the data is a layer 7 issue — could be commands, file transfers, ... Operating Systems 471/746 Network Stack Network Service Examples • (secure) remote shell - sshd • the internet email suite — MTA = Mail Transfer Agent, speaks SMTP — SMTP = Simple Mail-Transfer Protocol • the world wide web — web servers provide content (files) — clients and servers speak HTTP and HTTPS Operating Systems 472/746 Network Stack Client Software • the s s h command talks uses the SSH protocol — a very useful system utility on virtually all UNIXes • web browser is the client for world wide web — browsers are complex application programs — some of them bigger than even operating systems • email client is also known as a MUA (Mail User Agent) Operating Systems 473/746 Network Stack Part 9.4: Network File Systems Why Network Filesystems? • copying files back and forth is impractical — and also error-prone (which is the latest version?) • how about storing data in a central location • and sharing it with all the computers on the LAN Operating Systems 475/746 Network Stack NAS (Network-Attached Storage) • a (small) computer dedicated to storing files • usually running a cut down operating system — often based on Linux or FreeBSD • provides file access to the network • sometimes additional app-level services — e.g. photo management, media streaming,... Operating Systems 476/746 Network Stack NFS (Network File System) • the traditional UNIX networked filesystem • hooked quite deep into the kernel — assumes generally reliable network (LAN) • filesystems are exported for use over NFS • the client side mounts the NFS-exported volume Operating Systems 477/746 Network Stack NFS History • originated in Sun Microsystems in the 80s • v2 implemented in System V, DOS, ... • v3 appeared in '95 and is still in use • v4 arrives in 2000, improving security Operating Systems 478/746 Network Stack VFS Reminder • implementation mechanism for multiple FS types • an object-oriented approach — open: look up the file for access — read, write - self-explanatory — rename: rename a file or directory Operating Systems 479/746 Network Stack RPC (Remote Procedure Call) • any protocol for calling functions on remote hosts — ONC-RPC = Open Network Computing RPC — NFS is based on ONC-RPC (also known as Sun RPC) • NFS basically runs VFS operations using RPC — this makes it easy to implement on UNIX-like systems Operating Systems 480/746 Network Stack Port Mapper • ONC-RPC is executed over TCP or UDP — but it is more dynamic wrt. available services • TCP/UDP port numbers are assigned on demand • po rtmap translates from RPC services to port numbers — the port mapper itself listens on port 111 Operating Systems 481/746 Network Stack The NFS Daemon • also known as nfsd • provides NFS access to a local file system • can run as a system service • or it can be part of the kernel — this is more typical for performance reasons Operating Systems 482/746 Network Stack SMB (Server Message Block) • a network file system from Microsoft • available in Windows since version 3.1 (1992) — originally ran on top of NetBIOS — later versions used TCP/IP • SMB1 accumulated a lot of cruft and complexity Operating Systems 483/746 Network Stack SMB2.0 • simpler than SMB1 due to fewer retrofits and compat • better performance and security • support for symbolic links • available since Windows Vista (2006) Operating Systems 484/746 Network Stack Review Questions 29. What is ARP (Address Resolution Protocol)? 30. What is IP (Internet Protocol)? 31. What is TCP (Transmission Control Protocol)? 32. What is DNS (Domain Name Service)? Operating Systems 485/746 Network Stack Part 10: Shells & User Interfaces Lecture Overview 1. Command Interpreters 2. The Command Line 3. Graphical Interfaces Operating Systems 487/746 Shells & User Interfaces Part 10.1: Command Interpreters Shell • programming language centered on OS interaction • rudimentary control flow • untyped, text-centered variables • dubious error handling Operating Systems 489/746 Shells & User Interfaces Interactive Shells • almost all shells have an interactive mode • the user inputs a single statement on keyboard • when confirmed, it is immediately executed • this forms the basis of command-line interfaces Operating Systems 490/746 Shells & User Interfaces Shell Scripts • a shell script is an (executable] file • in simplest form, it is a sequence of commands — each command goes on a separate line — executing a script is about the same as typing it • but can use structured programming constructs Operating Systems 491/746 Shells & User Interfaces Shell Upsides • very easy to write simple scripts • first choice for simple automation • often useful to save repetitive typing • definitely not good for big programs Operating Systems 492/746 Shells & User Interfaces Bourne Shell • a specific language in the "shell" family • the first shell with consistent programming support — available since 1976 • still widely used today — best known implementation is bash — /bin/sh is mandated by POSIX Operating Systems 493/746 Shells & User Interfaces C Shell • also known as csh, first released in 1978 • more C-like syntax than sh (Bourne Shell) — but not really very C-like at all • improved interactive mode (over sh from '76) • also still used today (t c s h) Operating Systems 494/746 Shells & User Interfaces Korn Shell • also known as ksh, released in 1983 • middle ground between sh and csh • basis of the POSIX.2 requirements • a number of implementations exists Operating Systems 495/746 Shells & User Interfaces Commands • typically a name of an executable — may also be control flow or a built-in • the executable is looked up in the filesystem • the shell doas a f o rk + exec — this means new process for each command — process creation is fairly expensive Operating Systems 496/746 Shells & User Interfaces Built-in Commands • cd change the working directory • expo rt for setting up environment • echoprinta message • exec replace the shell process (no fork) Operating Systems 497/746 Shells & User Interfaces Variables • variable names are made of letters and digits • using variables is indicated with $ • setting variables does not use the $ • all variables are global (except subshells] VARIABLES1 some text" echo $VARIABLE Operating Systems 498/746 Shells & User Interfaces Variable Substitution • variables are substituted as text • $f oo is simply replaced with the content of f oo • arithmetic is not well supported in most shells — or any expression syntax, e.g. relational operators — consider z=$( ($x + $y)) for addition in bash Operating Systems 499/746 Shells & User Interfaces Command Substitution • basically like variable substitution • written as N command"or $(command) — first executes the command — and captures its standard output — then replaces $ (command) with the output Operating Systems 500/746 Shells & User Interfaces Quoting • whitespace is an argument separator in shell • multi-word arguments must be quoted • quotes can be double quotes "" or single 1 1 — double quotes allow variable substitution Operating Systems 501/746 Shells & User Interfaces Quoting and Substitution • whitespace from variable substitution must be quoted — Nfoo="hello world"NN — Is $f oo is different than Is "$foo" • bad quoting is a very common source of bugs • consider also filenames with spaces in them Operating Systems 502/746 Shells & User Interfaces Special Variables • $ ? is the result of last command • $$ is the PID of the current shell • $1 through $9 are positional parameters — $# is the number of parameters • $0 is the name of the shell (a rgv[0]) Operating Systems 503/746 Shells & User Interfaces Environment • is like shell variables but not the same • the environment is passed to all executed programs — but a child cannot modify environment of its parent • variables are moved into the environment by expo rt • environment variables often act as settings Operating Systems 504/746 Shells & User Interfaces Important Environment Variables • $PATH tells the system where to find programs • $H0ME is the home directory of the current user • $EDIT0R and $VISUAL set which text editor to use • $EMAI L is the email address of the current user • $PWD is the current working directory Operating Systems 505/746 Shells & User Interfaces Globbing • patterns for quickly listing multiple files • e.g. Is *. c shows all files ending in . c • * matches any number of characters • ? matches one arbitrary character • works on entire paths (Is s rc/*/*. c) Operating Systems 506/746 Shells & User Interfaces Conditionals • allows conditional execution of commands • if cond; then cmdl; else cmd2; fi • alsoelif cond2; then cmd3; fi • cond is also a command (the exit code is used) Operating Systems 507/746 Shells & User Interfaces test (evaluating boolean expressions) • originally an external program, also known as [ — nowadays built-in in most shells — works around lack of expressions in shell • evaluates its arguments and returns t rue or false — can be used with if and while constructs Operating Systems 508/746 Shells & User Interfaces test Examples • test filel -nt file2'nt'= newer than • test 32 -gt 14 'gt' = greater than • test foo = ba r string equality • combines with variable substitution (test $y = x) Operating Systems 509/746 Shells & User Interfaces Loops • while cond; do cmd; done — cond is a command, like in if • for i in 1 2 3 4; do cmd; done — allows globs: for f in *.c; do cmd; done — also command substitution — for f in Nseq 1 10N; do cmd; done Operating Systems 510/746 Shells & User Interfaces Case Analysis • selects a command based on pattern matching • case $x in *.c) cc $x;; *) Is $x;; esac — yes, case really uses unbalanced parens — the ;; indicates end of a case Operating Systems 511/746 Shells & User Interfaces Command Chaining • ; (semicolon): run two commands in sequence • && run the second command if the first succeeded • | | run the second command if the first failed • e.g. compile and run: cc file.c && ./a.out Operating Systems 512/746 Shells & User Interfaces Pipes • shells can run pipelines of commands • cmdl | cmd2 | cmd3 — all commands are run in parallel — output of cmdl becomes input of cmd2 — output of cmd2 is processed by cmd3 echo hello world | sed -e s,hello,goodbye, Operating Systems 513/746 Shells & User Interfaces Functions • you can also define functions in shell • mostly a light-weight alternative to scripts — no need to expo rt variables — but cannot be invoked by non-shell programs • functions can also set variables Operating Systems 514/746 Shells & User Interfaces Part 10.2: The Command Line Interactive Shell • the shell displays a prompt and waits • the user types in a command and hits enter • the command is executed immediately • output is printed to the terminal Operating Systems 516/746 Shells & User Interfaces Command Completion • most shells let you use TAB to auto-complete — works at least for command names and file names — but "smart completion" is common • interactive history: hit "up" to recall a command — also interactive history search, e.g. C- r in bash Operating Systems 517/746 Shells & User Interfaces Prompt • the string printed when shell expects a command • controlled by the PS 1 environment variable • usually shows at least your username and the hostname • also: working directory battery status, time, weather, ... Operating Systems 518/746 Shells & User Interfaces Job Control • only one program can run in the foreground (terminal) • but a running program can be suspended (C - z) • and resumed in background (bg) or in foreground (f g) • use & to run a command in background: ./spambot & Operating Systems 519/746 Shells & User Interfaces Terminal • can print text and read text from a keyboard • normally everything is printed on the last line • the text could contain escape (control) sequences — for printing colourful text or clearing the screen — also for printing text at a specific coordinate Operating Systems 520/746 Shells & User Interfaces Full-Screen Terminal Apps • applications can use the entire terminal screen • a library abstracts away the low-level control sequences — the library is called ncurses for new curses — different terminals use different control sequences • special characters exist to draw frames and separators Operating Systems 521/746 Shells & User Interfaces UNIX Text Editors • sed - stream editor, non-interactive • ed - line oriented, interactive • vi - visual, screen oriented • ex - line-oriented mode of vi Operating Systems 522/746 Shells & User Interfaces TUI: Text User Interface • the program draws a 2D interface on a terminal • these types of interfaces can be quite comfortable • they are often easier to program than GUIs • very low bandwidth requirements for remote use Operating Systems 523/746 Shells & User Interfaces Part 10.3: Graphical Interfaces Windowing Systems • each application runs in its own window — or possibly multiple windows • multiple applications can be shown on screen • windows can be moved around, resized &c. — facilitated by frames around window content — generally known as window management Operating Systems 525/746 Shells & User Interfaces Window-less Systems • especially popular on smaller screens • applications take the entire screen — give or take status or control widgets • task switching via a dedicated screen Operating Systems 526/746 Shells & User Interfaces A GUI Stack • graphics card driver, mode setting • drawing/painting (usually hardware-accelerated) • multiplexing (e.g. using windows) • widgets: buttons, labels, lists, ... • layout: what goes where on the screen Operating Systems 527/746 Shells & User Interfaces Well-known GUI Stacks • Windows • macOS, iOS • Xll • Wayland • Android Operating Systems 528/746 Shells & User Interfaces Portability • GUI "toolkits" make portability easy — Qt, GTK, Swing, HTML5+CSS,... — many of them run on all major platforms • code portability is not the only issue — GUIs come with look and feel guidelines — portable applications may fail to fit Operating Systems 529/746 Shells & User Interfaces Text Rendering • a surprisingly complex task • unlike terminals, GUIs use variable pitch fonts — brings up issues like kerning — hard to predict pixel width of a line • bad interaction with printing (cf. WYSIWIG) Operating Systems 530/746 Shells & User Interfaces Bitmap Fonts • characters are represented as pixel arrays — usually just black and white • traditionally pixel-drawn by hand — very time consuming (many letters, sizes, variants) • the result is sharp but jagged (not smooth) Operating Systems 531/746 Shells & User Interfaces Outline Fonts • Typel, TrueType - based on splines • they can be scaled to arbitrary pixel sizes • same font can be used for screen and for print • rasterisation is usually done in software Operating Systems 532/746 Shells & User Interfaces Hinting, Anti-Aliasing • screens are low resolution devices — typical HD displays have DPI around 100 — laser printers have DPI of 300 or more • hinting: deform outlines to better fit a pixel grid • anti-aliasing: smooth outlines using grayscale Operating Systems 533/746 Shells & User Interfaces XI1 (X Window System) • a traditional UNIX windowing system • provides a C API (xlib) • built-in network transparency (socket-based) • core protocol version 11 from 1987 Operating Systems 534/746 Shells & User Interfaces XI1 Architecture • X server provides graphics and input • X client is an application that uses X • a window manager is a (special) client • a compositor is another special client Operating Systems 535/746 Shells & User Interfaces Remote Displays • application is running on computer A • the display is not the console of A — could be a dedicated graphical terminal — could be another computer on a LAN — or even across the internet Operating Systems 536/746 Shells & User Interfaces Remote Display Protocols • one approach is pushing pixels - VNC (Virtual Network Computing) • XI1 uses a custom drawing protocol • others use high-level abstractions - NeWS (PostScript-based) - HTML5 + JavaScript Operating Systems 537/746 Shells & User Interfaces VNC (Virtual Network Computing) • sends compressed pixel data over the wire — can leverage regularities in pixel data — can send incremental updates • and input events in the other direction • no support for peripherals or file sync Operating Systems 538/746 Shells & User Interfaces RDP (Remote Desktop Protocol) • more sophisticated than VNC (but proprietary] • can also send drawing commands over the wire — like XI1, but using DirectX drawing — also allows remote OpenGL • support for audio, remote USB &c. Operating Systems 539/746 Shells & User Interfaces SPICE (Simple Protocol for Indep. Computing Env.) • open protocol somewhere between VNC and RDP • can send OpenGL (but only over a local socket] • two-way audio, USB, clipboard integration • still mainly based on pushing (compressed) pixels Operating Systems 540/746 Shells & User Interfaces Remote Desktop Security • the user needs to be authenticated over network — passwords are easy, biometric data less so • the data stream should be encrypted — not part of the XI1 or NeWS protocols — or even HTTP by default (used for HTML5/JS] Operating Systems 541/746 Shells & User Interfaces Review Questions 33. What is a shell? 34. What does variable substitution mean? 35. What is an environment variable? 36. What belongs into the GUI stack? Operating Systems 542/746 Shells & User Interfaces Partii: Access Control Lecture Overview 1. Multi-User Systems 2. File Systems 3. Sub-user Granularity Operating Systems 544/746 Access Control Part 11.1: Multi-User Systems Users • originally a proxy for people • currently a more general abstraction • user is the unit of ownership • many permissions are user-centered Operating Systems 546/746 Access Control Computer Sharing • computer is a (often costly) resource • efficiency of use is a concern — a single user rarely exploits a computer fully • data sharing makes access control a necessity Operating Systems 547/746 Access Control Ownership • various objects in an OS can be owned — primarily files and processes • the owner is typically whoever created the object — ownership can be transferred — usually at the impetus of the original owner Operating Systems 548/746 Access Control Process Ownership • each process belongs to some user • the process acts on behalf of the user — the process gets the same privilege as its owner — this both constrains and empowers the process • processes are active participants Operating Systems 549/746 Access Control File Ownership • each file also belongs to some user • this gives rights to the user (or rather their processes) — they can read and write the file — they can change permissions or ownership • files are passive participants Operating Systems 550/746 Access Control Access Control Models • owners usually decide who can access their objects — this is known as discretionary access control • in high-security environments, this is not allowed — known as mandatory access control — a central authority decides the policy Operating Systems 551/746 Access Control (Virtual) System Users • users are an useful ownership abstraction • various system services get their own "fake" users • this allows them to own files and processes • and also limit their access to the rest of the OS Operating Systems 552/746 Access Control Principle of Least Privilege • entities should have minimum privilege required — applies to software components — but also to human users of the system • this limits the scope of mistakes — and also of security compromises Operating Systems 553/746 Access Control Privilege Separation • different parts of a system need different privilege • least privilege dictates splitting the system — components are isolated from each other — they are given only the rights they need • components communicate using the simplest feasible IPC Operating Systems 554/746 Access Control Process Separation • recall that each process runs in its own address space — but shared memory can be requested • each user has a view of the filesystem — a lot more is shared by default in the filesystem — especially the namespace (directory hierarchy) Operating Systems 555/746 Access Control Access Control Policy • there are 3 pieces of information — the subject [user] — the verb (what is to be done] — the object (the file or other resource] • there are many ways to encode this information Operating Systems 556/746 Access Control Access Rights Subjects • in a typical OS those are (possibly virtual) users — sub-user units are possible (e.g. programs) — roles and groups could also be subjects • the subject must be named (names, identifiers) — easy on a single system, hard in a network Operating Systems 557/746 Access Control Access Rights Verbs • the available "verbs" (actions) depend on object type • a typical object would be a file — files can be read, written, executed — directories can be searched or listed or changed • network connections can be established &c. Operating Systems 558/746 Access Control Access Rights Objects • anything that can be manipulated by programs — although not everything is subject to access control • could be files, directories, sockets, shared memory, ... • object names depend on their type — file paths, i-node numbers, IP addresses, ... Operating Systems 559/746 Access Control Subjects in POSIX • there are 2 types of subjects: users and groups • each user can belong to multiple groups • users are split into normal users and root — root is also known as the super-user Operating Systems 560/746 Access Control User Management • the system needs a database of users • in a network, user identities often need to be shared • could be as simple as a text file — /etc/passwd and /etc/g roup on UNIX systems • or as complex as a distributed database Operating Systems 561/746 Access Control User and Group Identifiers • users and groups are represented as numbers — this improves efficiency of many operations — the numbers are called uid and gid • those numbers are valid on a single computer — or at most, a local network Operating Systems 562/746 Access Control Changing Identities • each process belongs to a particular user • ownership is inherited across f o rk () • super-user processes can use setuid () • exec () can sometimes change a process owner Operating Systems 563/746 Access Control Login • a super-user process manages user logins • the user types their name and provides credentials — upon successful authentication, login calls f ork() — the child calls setuid () to the user — and uses exec () to start a shell for the user Operating Systems 564/746 Access Control User Authentication • the user needs to authenticate themselves • passwords are the most commonly used method — the system needs to know the right password — user should be able to change their password • biometric methods are also quite popular Operating Systems 565/746 Access Control Remote Login • authentication over network is more complicated • passwords are easiest, but not easy — encryption is needed to safely transmit passwords — along with computer authentication • 2-factor authentication is a popular improvement Operating Systems 566/746 Access Control Computer Authentication • how to ensure we send the password to the right party? — an attacker could impersonate our remote computer • usually via asymmetric cryptography — a private key can be used to sign messages — the server will sign a message establishing its identity Operating Systems 567/746 Access Control 2-factor Authentication • 2 different types of authentication — harder to spoof both at the same time • there are a few factors to pick from — something the user knows (password) — something the user has (keys) — what the user is (biometric) Operating Systems 568/746 Access Control Enforcement: Hardware • all enforcement begins with the hardware — the CPU provides a privileged mode for the kernel — DMA memory and 10 instructions are protected • the MMU allows the kernel to isolate processes — and protect its own integrity Operating Systems 569/746 Access Control Enforcement: Kernel • kernel uses hardware facilities to implement security — it stands between resources and processes — access is mediated through system calls • file systems are part of the kernel • user and group abstractions are part of the kernel Operating Systems 570/746 Access Control Enforcement: System Calls • the kernel acts as an arbitrator • a process is trapped in its own address space • processes use system calls to access resources — kernel can decide what to allow — based on its access control model and policy Operating Systems 571/746 Access Control Enforcement: Service APIs • userland processes can enforce access control — usually system services which provide IPC API • e.g. via the getpeereid () system call — tells the caller which user is connected to a socket — user-level access control is rooted in kernel facilities Operating Systems 572/746 Access Control Part 11.2: File Systems File Access Rights • file systems are a case study in access control • all modern file systems maintain permissions — the only extant exception is FAT (USB sticks) • different systems adopt different representation Operating Systems 574/746 Access Control Representation • file systems are usually object-centric — permissions are attached to individual objects — easily answers "who can access this file"? • there is a fixed set of verbs — those may be different for files and directories — different systems allow different verbs Operating Systems 575/746 Access Control The UNIX Model • each file and directory has a single owner • plus a single owning group — not limited to those the owner belongs to • ownership and permissions are attached to i-nodes Operating Systems 576/746 Access Control Access vs Ownership • POSIX ties ownership and access rights • only 3 subjects can be named on a file — the owner (user) — the owning group — anyone else Operating Systems 577/746 Access Control Access Verbs in POSIX File Systems • read: read a file, list a directory • write: write a file, link/unlink i-nodes to a directory • execute: exec a program, enter the directory • execute as owner [group]: setuid/setgid Operating Systems 578/746 Access Control Permission Bits • basic UNIX permissions can be encoded in 9 bits • 3 bits per 3 subject designations — first comes the owner, then group, then others — written as e.g. rwxr-x—or 0750 • plus two numbers for the owner/group identifiers Operating Systems 579/746 Access Control Changing File Ownership • the owner and root can change file owners • chown and chg rp system utilities • or via the C API — chown(), fchown(), fchownat(),lchown() — same set for chg rp Operating Systems 580/746 Access Control Changing File Permissions • again available to the owner and to root • chmod is the user space utility — either numeric argument: chmod 644 file.txt — or symbolic: chmod +x script, sh • and the corresponding system call (numeric-only) Operating Systems 581/746 Access Control setuid and setgid • special permissions on executable files • they allow exec to also change the process owner • often used for granting extra privileges — e.g. the mount command runs as the super-user Operating Systems 582/746 Access Control Sticky Directories • file creation and deletion is a directory permission — this is problematic for shared directories — in particular the system /tmp directory • in a sticky directory, different rules apply — new files can be created as usual — only the owner can unlink a file from the directory Operating Systems 583/746 Access Control Access Control Lists • ACL is a list of ACE's (access control elements) — each ACE is a subject + verb pair — it can name an arbitrary user • ACL is attached to an object (file, directory) • more flexible than the traditional UNIX system Operating Systems 584/746 Access Control ACLs and POSIX • part of POSIX.le (security extensions) • most POSIX systems implement ACLs — this does not supersede UNIX permission bits — instead, they are interpreted as part of the ACL • file system support is not universal (but widespread) Operating Systems 585/746 Access Control Device Files • UNIX represents devices as special i-nodes — this makes them subject to normal access control • the particular device is described in the i-node — only a super-user can create device nodes — users could otherwise gain access to any device Operating Systems 586/746 Access Control Sockets and Pipes • named sockets and pipes are just i-nodes — also subject to standard file permissions • especially useful with sockets — a service sets up a named socket in the file system — file permissions decide who can talk to the service Operating Systems 587/746 Access Control Special Attributes • flags that allow additional restrictions on file use — e.g. immutable files (cannot be changed by anyone) — append-only files (for logfile integrity protection) — compression, copy-on-write controls • non-standard (Linux chattr, BSD chflags) Operating Systems 588/746 Access Control Network File System • NFS 3.0 simply transmits numeric uid and gid — the numbering needs to be synchronised — can be done via a central user database • NFS 4.0 uses per-user authentication — the user authenticates to the server directly — filesystem uid and gid values are mapped Operating Systems 589/746 Access Control File System Quotas • storage space is limited, shared by users — files take up storage space — file ownership is also a liability • quotas set up limits space use by users — exhausted quota can lead to denial of access Operating Systems 590/746 Access Control Removable Media • access control at file system level makes no sense — other computers may choose to ignore permissions — user names or id's would not make sense anyway • option 1: encryption (for denying reads) • option 2: hardware-level controls — usually read-only vs read-write on the entire medium Operating Systems 591/746 Access Control The ch root System Call • each process in UNIX has its own root directory — for most, this coincides with the system root • the root directory can be changed using ch root () • can be useful to limit file system access — e.g. in privilege separation scenarios Operating Systems 592/746 Access Control Uses of chroot • ch root alone is not a security mechanism — a super-user process can get out easily — but not easy for a normal user process • also useful for diagnostic purposes • and as lightweight alternative to virtualisation Operating Systems 593/746 Access Control Part 11.3: Sub-User Granularity Users are Not Enough • users are not always the right abstraction — creating users is relatively expensive — only a super-user can create new users • you may want to include programs as subjects — or rather, the combination user + program Operating Systems 595/746 Access Control Naming Programs • users have user names, but how about programs? • option 1: cryptographic signatures — portable across computers but complex — establishes identity based on the program itself • option 2: i-node of the executable — simple, local, identity based on location Operating Systems 596/746 Access Control Program as a Subject • program: passive [file] vs active (processes] — only a process can be a subject — but program identity is attached to the file • rights of a process depend on its program — exec () will change privileges Operating Systems 597/746 Access Control Mandatory Access Control • delegates permission control to a central authority • often coupled with security labels — classifies subjects (users, processes) — and also objects (files, sockets, programs) • the owner cannot change object permissions Operating Systems 598/746 Access Control Capabilities • not all verbs (actions] need to take objects • e.g. shutting down the computer (there is only one) • mounting file systems (they can't be always named) • listening on ports with number less than 1024 Operating Systems 599/746 Access Control Dismantling the root User • the traditional root user is all-powerful — "all or nothing" is often unsatisfactory — violates the principle of least privilege • many special properties of root are capabilities — root then becomes the user with all capabilities — other users can get selective privileges Operating Systems 600/746 Access Control Security and Execution • security hinges on what is allowed to execute • arbitrary code execution are the worst exploits — this allows unauthorized execution of code — same effect as impersonating the user — almost as bad as stolen credentials Operating Systems 601/746 Access Control Untrusted Input • programs often process data from dubious sources — think image viewers, audio & video players — archive extraction, font rendering,... • bugs in programs can be exploited — the program can be tricked into executing data Operating Systems 602/746 Access Control Process as a Subject • some privileges can be tied to a particular process — those only apply during the lifetime of the process — often restrictions rather than privileges — this is how privilege dropping is done • processes are identified using their numeric pid — restrictions are inherited across f o rk () Operating Systems 603/746 Access Control Sandboxing • tries to limit damage from code execution exploits • the program drops all privileges it can — this is done before it touches any of the input — the attacker is stuck with the reduced privileges — this can often prevent a successful attack Operating Systems 604/746 Access Control Untrusted Code • traditionally, you would only execute trusted code — usually based on reputation or other external factors — this does not scale to a large number of vendors • it is common to execute untrusted, even dubious code — this can be okay with sufficient sandboxing Operating Systems 605/746 Access Control API-Level Access Control • capability system for user-level resources — things like contact lists, calendars, bookmarks — objects not provided directly by the kernel • enforcement e.g. via a virtual machine — not applicable to execution of native code — alternative: an IPC-based API Operating Systems 606/746 Access Control Android/iOS Permissions • applications from a store are semi-trusted • typically single-user computers/devices • permissions are attached to apps instead of users • partially virtual users, partially API-level Operating Systems 607/746 Access Control Review Questions 37. What is a user? 38. What is the principle of least privilege? 39. What is an access control object? 40. What is a sandbox? Operating Systems 608/746 Access Control Part 12: Virtualisation & Containers Lecture Overview 1. Hypervisors 2. Containers 3. Management Operating Systems 610/746 Virtualisation & Containers Part 12.1: Hypervisors What is a Hypervisor • also known as a Virtual Machine Monitor • allows execution of multiple operating systems • like a kernel that runs kernels • improves hardware utilisation Operating Systems 612/746 Virtualisation & Containers Motivation • OS-level sharing is tricky — user isolation is often insufficient — only root can install software • the hypervisor/OS interface is simple — compared to OS-application interfaces Operating Systems 613/746 Virtualisation & Containers Virtualisation in General • many resources are "virtualised" — physical memory by the MMU — peripherals by the OS • makes resource management easier • enables isolation of components Operating Systems 614/746 Virtualisation & Containers Hypervisor Types • type 1: bare metal — standalone, microkernel-like • type 2: hosted — runs on top of normal OS — usually need kernel support Operating Systems 615/746 Virtualisation & Containers Type 1 (Bare Metal) • IBMz/VM • (Citrix) Xen • Microsoft Hyper-V • VMWareESX Operating Systems 616/746 Visualisation & Containers Type 2 (Hosted) • VMWare (Workstation, Player) • Oracle VirtualBox • Linux KVM • FreeBSD bhyve • OpenBSD vmm Operating Systems 617/746 Virtualisation & Containers History • started with mainframe computers • IBM CP/CMS: 1968 • IBM VM/370: 1972 • IBM z/VM: 2000 Operating Systems 618/746 Virtualisation & Containers Desktop Virtualisation • x86 hardware lacks virtual supervisor mode • software-only solutions viable since late 90s - Bochs: 1994 - VMWare Workstation: 1999 - QEMU:2003 Operating Systems 619/746 Virtualisation & Containers Paravirtualisation • introduced as VMI in 2005 by VMWare • alternative approach in Xen in 2006 • relies on modification of the guest OS • near-native speed without HW support Operating Systems 620/746 Virtualisation & Containers The Virtual x86 Revolution • 2005: virtualisation extensions on x86 • 2008: MMU virtualisation • unmodified guest at near-native speed • most software-only solutions became obsolete Operating Systems 621/746 Virtualisation & Containers Paravirtual Devices • special drivers for virtualised devices — block storage, network, console — random number generator • faster than software emulation — orthogonal to CPU/MMU virtualisation Operating Systems 622/746 Virtualisation & Containers Virtual Computers • usually known as Virtual Machines • everything in the computer is virtual — either via hardware (VT-x, EPT) — or software (QEMU, virtio,...) • much easier to manage than actual hardware Operating Systems 623/746 Virtualisation & Containers Essential Resources • the CPU and RAM • persistent [block] storage • network connection • a console device Operating Systems 624/746 Virtualisation & Containers CPU Sharing • same principle as normal processes • there is a scheduler in the hypervisor — simpler, with different trade-offs • privileged instructions are trapped Operating Systems 625/746 Virtualisation & Containers RAM Sharing • very similar to standard paging • software (shadow paging) • or hardware (second-level translation) • fixed amount of RAM for each VM Operating Systems 626/746 Virtualisation & Containers Shadow Page Tables • the guest system cannot access the MMU • set up shadow table, invisible to the guest • guest page tables are sync'd to the sPT by VMM • the gPT can be made read-only to cause traps Operating Systems 627/746 Virtualisation & Containers Second-Level Translation • hardware-assisted MMU virtualisation • adds guest-physical to host-physical layer • greatly simplifies the VMM • also much faster than shadow page tables Operating Systems 628/746 Virtualisation & Containers Network Sharing • usually a paravirtualised NIC — transports frames between guest and host — usually connected to a SW bridge in the host — alternatives: routing, NAT • a single physical NIC is used by everyone Operating Systems 629/746 Virtualisation & Containers Virtual Block Devices • usually also paravirtualised • often backed by normal files — maybe in a special format — e.g. based on copy-on-write • but can be a real block device Operating Systems 630/746 Virtualisation & Containers Special Resources • mainly useful in desktop systems • GPU / graphics hardware • audio equipment • printers, scanners,... Operating Systems 631/746 Virtualisation & Containers PCI Passthrough • an anti-virtualisation technology • based on an IO-MMU (VT-D, AMD-Vi) • a virtual OS can touch real hardware — only one OS at a time, of course Operating Systems 632/746 Virtualisation & Containers GPUs and Virtualisation • can be assigned (via VT-d) to a single OS • or time-shared using native drivers (GVT-g) • paravirtualised • shared by other means (XI1, SPICE, RDP) Operating Systems 633/746 Virtualisation & Containers Peripherals • useful either via passthrough — audio, webcams, ... • or standard sharing technology — network printers & scanners — networked audio servers Operating Systems 634/746 Virtualisation & Containers Peripheral Passthrough • virtual PCI, USB or SATA bus • forwarding to a real device — e.g. a single USB stick — or a single SATA drive Operating Systems 635/746 Virtualisation & Containers Suspend & Resume • the VM can be quite easily stopped • the RAM of a stopped VM can be copied — e.g. to a file in the host filesystem — along with registers and other state • and also later loaded and resumed Operating Systems 636/746 Virtualisation & Containers Migration Basics • the stored state can be sent over network • and resumed on a different host • as long as the virtual environment is same • this is known as paused migration Operating Systems 637/746 Virtualisation & Containers Live Migration • uses asynchronous memory snapshots • host copies pages and marks them read-only • the snapshot is sent as it is constructed • changed pages are sent at the end Operating Systems 638/746 Virtualisation & Containers Live Migration Handoff • the VM is then paused • registers and last few pages are sent • the VM is resumed at the remote end • usually within a few milliseconds Operating Systems 639/746 Virtualisation & Containers Memory Ballooning • how to deallocate "physical" memory? — i. e. return it to the hypervisor • this is often desirable in virtualisation • needs a special host/guest interface Operating Systems 640/746 Virtualisation & Containers Part 12.2: Containers What are Containers? • OS-level virtualisation — e.g. virtualised network stack — or restricted file system access • not a complete virtual computer • turbocharged processes Operating Systems 642/746 Virtualisation & Containers Why Containers • virtual machines take a while to boot • each VM needs its own kernel — this adds up if you need many VMs • easier to share memory efficiently • easier to cut down the OS image Operating Systems 643/746 Virtualisation & Containers Kernel Sharing • multiple containers share a single kernel • but not user tables, process tables,... • the kernel must explicitly support this • another level of isolation (process, user, container) Operating Systems 644/746 Virtualisation & Containers Boot Time • a light virtual machine takes a second or two • a container can take under 50ms • but VMs can be suspended and resumed • but dormant VMs take up a lot more space Operating Systems 645/746 Virtualisation & Containers chroot • the mother of all container systems • not very sophisticated or secure • but allows multiple OS images under 1 kernel • everything else is shared Operating Systems 646/746 Virtualisation & Containers ch root-based Containers • process tables, network, etc. are shared • the superuser must also be shared • containers have their own view of the filesystem — including system libraries and utilities Operating Systems 647/746 Virtualisation & Containers BSD Jails • an evolution of the ch root container • adds user and process table separation • and a virtualised network stack — each jail can get its own IP address • root in the jail has limited power Operating Systems 648/746 Virtualisation & Containers Linux VServer • like BSD jails but on Linux - FreeBSD jail 2000, VServer 2001 • not part of the mainline kernel • jailed root user is partially isolated Operating Systems 649/746 Virtualisation & Containers Namespaces • visibility compartments in the Linux kernel • virtualizes common resources — the filesystem hierarchy (including mounts) — process tables — networking (IP address) Operating Systems 650/746 Virtualisation & Containers cgroups • controls resource allocation in Linux • a CPU group is a fair scheduling unit • a memory group sets limits on memory use • mostly orthogonal to namespaces Operating Systems 651/746 Virtualisation & Containers LXC • mainline Linux way to do containers • based on namespaces and eg roups • relative newcomer (2008, 7 years after vserver) • feature set similar to VServer, OpenVZ &c. Operating Systems 652/746 Virtualisation & Containers User-Mode Linux • halfway between a container and a virtual machine • an early fully paravirtualised system • a Linux kernel runs as a process on another Linux • integrated in Linux 2.6 in 2003 Operating Systems 653/746 Virtualisation & Containers DragonFlyBSD Virtual Kernels • very similar to User-Mode Linux • part of DFlyBSD since 2007 • uses standard Libc, unlike UML • paravirtual ethernet, storage and console Operating Systems 654/746 Virtualisation & Containers User Mode Kernels • easier to retrofit securely — uses existing security mechanisms — for the host, mostly a standard process • the kernel needs to be ported though — analogous to a new hardware platform Operating Systems 655/746 Virtualisation & Containers Migration • not widely supported, unlike in hypervisors • process state is much harder to serialise — file descriptors, network connections &c. • somewhat mitigated by fast shutdown/boot time Operating Systems 656/746 Virtualisation & Containers Part 12.3: Management Disk Images • disk image is the embodiment of the VM • the virtual OS needs to be installed • the image can be a simple file • or a dedicated block device on the host Operating Systems 658/746 Virtualisation & Containers Snapshots • making a copy of the image = snapshot • can be done more efficiently: copy on write • alternative to OS installation — make copies of the freshly installed image — and run updates after cloning the image Operating Systems 659/746 Virtualisation & Containers Duplication • each image will have a copy of the system • copy-on-write snapshots can help — most of the base system will not change — regression as images are updated separately • block-level de-duplication is expensive Operating Systems 660/746 Virtualisation & Containers File Systems • disk images contain entire file systems • the virtual disk is of (apparently) fixed size • sparse images: unwritten area is not stored • initially only filesystem metadata is allocated Operating Systems 661/746 Virtualisation & Containers Overcommit • the host can allocate more resources than it has • this works as long as not many VMs reach limits • enabled by sparse images and CoW snapshots • also applies to available RAM Operating Systems 662/746 Virtualisation & Containers Thin Provisioning • the act of obtaining resources on demand • the host system can be extended as needed — to keep pace with growing guest demands • alternatively, VMs can be migrated out • improves resource utilisation Operating Systems 663/746 Virtualisation & Containers Configuration • each OS has its own configuration files • same methods apply as for physical networks — software configuration management • bundled services are deployed to VMs Operating Systems 664/746 Virtualisation & Containers Bundling vs Sharing • bundling makes deployment easier • the bundled components have known behaviour • but updates are much trickier • this also prevents resource sharing Operating Systems 665/746 Virtualisation & Containers Security • hypervisors have a decent track record — security here means protection of host from guest — breaking out is still possible sometimes • containers are more of a mixed bag — many hooks are needed into the kernel Operating Systems 666/746 Virtualisation & Containers Updates • each system needs to be updated separately — this also applies to containers • blocks coming from a common ancestor are shared — but updating images means loss of sharing Operating Systems 667/746 Virtualisation & Containers Container vs VM Updates • de-duplication may be easier in containers — shared file system - e.g. link farming • kernel updates: containers and type 2 hypervisors — can be mitigated by live migration • type 1 hypervisors need less downtime Operating Systems 668/746 Virtualisation & Containers Docker • automated container image management • mainly a service deployment tool • containers share a single Linux kernel — the kernel itself can run in a VM • rides on a wave of bundling resurgence Operating Systems 669/746 Virtualisation & Containers The Cloud • public virtualisation infrastructure • "someone else's computer" • the guests are not secure against the host — entire memory is exposed, including secret keys — host compromise is fatal • the host is mostly secure from the guests Operating Systems 670/746 Virtualisation & Containers Review Questions 41. What is a hypervisor? 42. What is paravirtualisation? 43. How are VMs suspended and migrated? 44. What is a container? Operating Systems 671/746 Virtualisation & Containers Part 13: Review What is an OS made of? • the kernel • system libraries • system daemons / services • user interface • system utilities Basically every OS has those. Operating Systems 673/746 Review The Kernel • lowest level of an operating system • executes in privileged mode • manages all the other software — including other OS components • enforces isolation and security • provides low-level services to programs Operating Systems 674/746 Review System Libraries • form a layer above the OS kernel • provide higher-level services — use kernel services behind the scenes — easier to use than the kernel interface • typical example: libe — provides C functions like printf — also known as msvc rt on Windows Operating Systems 675/746 Review Programming Interfaces • kernel system call interface • system libraries / APIs • inter-process protocols • command-line utilities (scripting) Operating Systems 676/746 Review (System) Libraries • mainly C functions and data types • interfaces defined in header files • definitions provided in libraries — static libraries (archives): libc. a — shared (dynamic) libraries: libc. so • on Windows: msvc rt. lib and msvc rt. dll • there are (many) more besides libc / msvcrt Operating Systems 677/746 Review Shared (Dynamic) Libraries • required for running programs • linking is done at execution time • less code duplication • can be upgraded separately • but: dependency problems Operating Systems 678/746 Review Why is Everything a File • re-use the comprehensive file system API • re-use existing file-based command-line tools • bugs are bad simplicity is good • want to print? cat file.txt > /dev/ulptO — (reality is a little more complex) Operating Systems 679/746 Review What is a Filesystem? • a set of files and directories • usually lives on a single block device — but may also be virtual • directories and files form a tree — directories are internal nodes — files are leaf nodes Operating Systems 680/746 Review File Descriptors • the kernel keeps a table of open files • the file descriptor is an index into this table • you do everything using file descriptors • non-Unix systems have similar concepts Operating Systems 681/746 Review Regular files • these contain sequential data (bytes) • may have inner structure but the OS does not care • there is metadata attached to files — like when were they last modified — who can and who cannot access the file • you read () and write () files Operating Systems 682/746 Review Privileged CPU Mode • many operations are restricted in user mode — this is how user programs are executed — also most of the operating system • software running in privileged mode can do ^anything — most importantly it can program the MMU — the kernel runs in this mode Operating Systems 683/746 Review Memory Management Unit • is a subsystem of the processor • takes care of address translation — user software uses virtual addresses — the MMU translates them to physical addresses • the mappings can be managed by the OS kernel Operating Systems 684/746 Review What does a Kernel Do? • memory & process management • task (thread] scheduling • device drivers — SSDs, GPUs, USB, bluetooth, HID, audio,... • file systems • networking Operating Systems 685/746 Review Kernel Architecture Types • monolithic kernels (Linux, *BSD) • microkernels (Mach, L4, QNX, NT, ...) • hybrid kernels (macOS) • type 1 hypervisors (Xen) • exokernels, rump kernels Operating Systems 686/746 Review System Call Sequence • first, libc prepares the system call arguments • and puts the system call number in the correct register • then the CPU is switched into privileged mode • this also transfers control to the syscall handler Operating Systems 687/746 Review What is an i-node? • an anonymous, file-like object • could be a regular file — or a directory — or a special file — or a symlink Operating Systems 688/746 Review Disk-Like Devices • disk drives provide block-level access • read and write data in 512-byte chunks — or also 4K on big modern drives • a big numbered array of blocks Operating Systems 689/746 Review I/O Scheduler (Elevator) • reads and writes are requested by users • access ordering is crucial on a mechanical drive — not as important on an SSD — but sequential access is still much preferred • requests are queued (recall, disks are slow] — but they are not processed in FIFO order Operating Systems 690/746 Review Filesystem as Resource Sharing • usually only 1 or few disks per computer • many programs want to store persistent data • file system allocates space for the data — which blocks belong to which file • different programs can write to different files — no risk of trying to use the same block Operating Systems 691/746 Review Filesystem as Abstraction • allows the data to be organised into files • enables the user to manage and review data • files have arbitrary & dynamic size — blocks are transparently allocated & recycled • structured data instead of a flat block array Operating Systems 692/746 Review Memory-mapped 10 • uses virtual memory (cf. last lecture] • treat a file as if it was swap space • the file is mapped into process memory — page faults indicate that data needs to be read — dirty pages cause writes • available as the mmap system call Operating Systems 693/746 Review Fragmentation • internal - not all blocks are fully used — files are of variable size, blocks are fixed — a 4100 byte file needs 2 4 KiB blocks • external - free space is non-contiguous — happens when many files try to grow at once — this means new files are also fragmented Operating Systems 694/746 Review Hard Links • multiple names can refer to the same i-node — names are given by directory entries — we call such multiple-named files hard links — it's usually forbidden to hard-link directories • hard links cannot cross device boundaries — i-node numbers are only unique within a filesystem Operating Systems 695/746 Review Process Resources • memory (address space) • processor time • open files (descriptors) — also working directory — also network connections Operating Systems 696/746 Review Process Memory • each process has its own address space • this means processes are isolated from each other • requires that the CPU has an MMU • implemented via paging (page tables] Operating Systems 697/746 Review Process Switching • switching processes means switching page tables • physical addresses do not change • but the mapping of virtual addresses does • large part of physical memory is not mapped — could be completely unallocated (unused] — or belong to other processes Operating Systems 698/746 Review What is a Thread? • thread is a sequence of instructions • different threads run different instructions — as opposed to SIMD or many-core units [GPUs] • each thread has its own stack • multiple threads can share an address space Operating Systems 699/746 Review Fork • how do we create new processes? • by f o rk-ing existing processes • fork creates an identical copy of a process • execution continues in both processes — each of them gets a different return value Operating Systems 700/746 Review Process vs Executable • process is a dynamic entity • executable is a static file • an executable contains an initial memory image — this sets up memory layout — and content of the text and data segments Operating Systems 701/746 Review Exec • on UNIX, processes are created via f o rk • how do we run programs though? • exec: load a new executable into a process — this completely overwrites process memory — execution starts from the entry point • running programs: fork + exec Operating Systems 702/746 Review What is a Scheduler? • scheduler has two related tasks — plan when to run which thread — actually switch threads and processes • usually part of the kernel — even in micro-kernel operating systems Operating Systems 703/746 Review Interrupt • a way for hardware to request attention • CPU mechanism to divert execution • partial (CPU state only) context switch • switch to privileged (kernel) CPU mode Operating Systems 704/746 Review Timer Interrupt • generated by the PIT or the local APIC • the OS can set the frequency • a hardware interrupt happens on each tick • this creates an opportunity for bookkeeping • and for preemptive scheduling Operating Systems 705/746 Review What is Concurrency? • events that can happen at the same time • it is not important if it does, only that it can • events can be given a happens-before partial order • they are concurrent if unordered by happens-before Operating Systems 706/746 Review Why Concurrency? • problem decomposition — different tasks can be largely independent • reflecting external concurrency — serving multiple clients at once • performance and hardware limitations — higher throughput on multicore computers Operating Systems 707/746 Review Critical Section • any section of code that must not be interrupted • the statement x = x + 1 could be a critical section • what is a critical section is domain-dependent — another example could be a bank transaction — or an insertion of an element into a linked list Operating Systems 708/746 Review Race Condition: Definition • (anomalous) behaviour that depends on timing • typically among multiple threads or processes • an unexpected sequence of events happens • recall that ordering is not guaranteed Operating Systems 709/746 Review Mutual Exclusion • only one thread can access a resource at once • ensured by a mutual exclusion device (a.k.a mutex] • a mutex has 2 operations: lock and unlock • lock may need to wait until another thread unlocks Operating Systems 710/746 Review Deadlock Conditions 1. mutual exclusion 2. hold and wait condition 3. non-preemtability 4. circular wait Deadlock is only possible if all 4 are present. Operating Systems 711/746 Review Starvation • starvation happens when a process can't make progress • generalisation of both deadlock and livelock • for instance, unfair scheduling on a busy system • also recall the readers and writers problem Operating Systems 712/746 Review What is a Driver? • piece of software that talks to a device • usually quite specific / unportable — tied to the particular device — and also to the operating system • often part of the kernel Operating Systems 713/746 Review Drivers and Microkernels • drivers are excluded from microkernels • but the driver still needs hardware access — this could be a special memory region — it may need to react to interrupts • in principle, everything can be done indirectly — but this may be quite expensive, too Operating Systems 714/746 Review Interrupt-driven 10 • peripherals are much slower than the CPU — polling the device is expensive • the peripheral can signal data availability — and also readiness to accept more data • this frees up CPU to do other work in the meantime Operating Systems 715/746 Review Memory-mapped 10 • devices share address space with memory • more common in contemporary systems • 10 uses the same instructions as memory access — load and store on RISC, mov on x86 • allows selective user-level access (via the MMU) Operating Systems 716/746 Review Direct Memory Access • allows the device to directly read/write memory • this is a huge improvement over programmed 10 • interrupts only indicate buffer full/empty • the device can read and write arbitrary physical memory — opens up security / reliability problems Operating Systems 717/746 Review GPU Drivers • split into a number of components • graphics output / frame buffer access • memory management is often done in kernel • geometry, textures &c. are prepared in-process • front end API: OpenGL, Direct3D, Vulkan,... Operating Systems 718/746 Review Storage Drivers • split into adapter, bus and device drivers • often a single driver per device type — at least for disk drives and CD-ROMs • bus enumeration and configuration • data addressing and data transfers Operating Systems 719/746 Review Networking Layers 2. Link (Ethernet, WiFi) 3. Network (IP) 4. Transport (TCP, UDP,...) 7. Application (HTTP, SMTP, ...) Operating Systems 720/746 Review Networking and Operating Systems • a network stack is a standard part of an OS • large part of the stack lives in the kernel — although this only applies to monolithic kernels — microkernels use user-space networking • another chunk is in system libraries & utilities Operating Systems 721/746 Review Kernel-Side Networking • device drivers for networking hardware • network and transport protocol layers • routing and packet filtering (firewalls) • networking-related system calls (sockets) • network file systems (SMB, NFS) Operating Systems 722/746 Review IP (Internet Protocol) • uses 4 byte (v4) or 16 byte (v6) addresses — split into network and host parts • it is a packet-based protocol • is a best-effort protocol — packets may get lost, reordered or corrupted Operating Systems 723/746 Review TCP: Transmission Control Protocol • a stream-oriented protocol on top of IP • works like a pipe (transfers a byte sequence] — must respect delivery order — and also re-transmit lost packets • must establish connections Operating Systems 724/746 Review UDP: User (Unreliable) Datagram Protocol • TCP comes with non-trivial overhead — and its guarantees are not always required • UDP is a much simpler protocol — a very thin wrapper around IP — with minimal overhead on top of IP Operating Systems 725/746 Review DNS: Domain Name Service • hierarchical protocol for name resolution — runs on top of TCP or UDP • domain names are split into parts using dots — each domain knows whom to ask for the next bit — the name database is effectively distributed Operating Systems 726/746 Review NFS (Network File System) • the traditional UNIX networked filesystem • hooked quite deep into the kernel — assumes generally reliable network (LAN) • filesystems are exported for use over NFS • the client side mounts the NFS-exported volume Operating Systems 727/746 Review Shell • programming language centered on OS interaction • rudimentary control flow • untyped, text-centered variables • dubious error handling Operating Systems 728/746 Review Interactive Shells • almost all shells have an interactive mode • the user inputs a single statement on keyboard • when confirmed, it is immediately executed • this forms the basis of command-line interfaces Operating Systems 729/746 Review Shell Scripts • a shell script is an (executable] file • in simplest form, it is a sequence of commands — each command goes on a separate line — executing a script is about the same as typing it • but can use structured programming constructs Operating Systems 730/746 Review Terminal • can print text and read text from a keyboard • normally everything is printed on the last line • the text could contain escape (control) sequences — for printing colourful text or clearing the screen — also for printing text at a specific coordinate Operating Systems 731/746 Review A GUI Stack • graphics card driver, mode setting • drawing/painting (usually hardware-accelerated) • multiplexing (e.g. using windows) • widgets: buttons, labels, lists, ... • layout: what goes where on the screen Operating Systems 732/746 Review XI1 (X Window System) • a traditional UNIX windowing system • provides a C API (xlib) • built-in network transparency (socket-based) • core protocol version 11 from 1987 Operating Systems 733/746 Review Users • originally a proxy for people • currently a more general abstraction • user is the unit of ownership • many permissions are user-centered Operating Systems 734/746 Review User Management • the system needs a database of users • in a network, user identities often need to be shared • could be as simple as a text file — /etc/passwd and /etc/g roup on UNIX systems • or as complex as a distributed database Operating Systems 735/746 Review User Authentication • the user needs to authenticate themselves • passwords are the most commonly used method — the system needs to know the right password — user should be able to change their password • biometric methods are also quite popular Operating Systems 736/746 Review Ownership • various objects in an OS can be owned — primarily files and processes • the owner is typically whoever created the object — ownership can be transferred — usually at the impetus of the original owner Operating Systems 737/746 Review Access Control Policy • there are 3 pieces of information — the subject [user] — the verb (what is to be done] — the object (the file or other resource) • there are many ways to encode this information Operating Systems 738/746 Review Sandboxing • tries to limit damage from code execution exploits • the program drops all privileges it can — this is done before it touches any of the input — the attacker is stuck with the reduced privileges — this can often prevent a successful attack Operating Systems 739/746 Review What is a Hypervisor • also known as a Virtual Machine Monitor • allows execution of multiple operating systems • like a kernel that runs kernels • isolation and resource sharing Operating Systems 740/746 Review Hypervisor Types • type 1: bare metal — standalone, microkernel-like • type 2: hosted — runs on top of normal OS — usually need kernel support Operating Systems 741/746 Review Paravirtual Devices • special drivers for virtualised devices — block storage, network, console — random number generator • faster than software emulation — orthogonal to CPU/MMU virtualisation Operating Systems 742/746 Review VM Suspend & Resume • the VM can be quite easily stopped • the RAM of a stopped VM can be copied — e.g. to a file in the host filesystem — along with registers and other state • and also later loaded and resumed Operating Systems 743/746 Review What are Containers? • OS-level virtualisation — e.g. virtualised network stack — or restricted file system access • not a complete virtual computer • turbocharged processes Operating Systems 744/746 Review Bundling vs Sharing • bundling makes deployment easier • the bundled components have known behaviour • but updates are much trickier • this also prevents resource sharing Operating Systems 745/746 Review The End Actually... • a 2-part, written final exam • test: 9/10 required — pool of 44 questions (in the slides) • free-form text — one of the 11 lecture topics — 1 page A4: be concise but comprehensive Operating Systems 746/746 Review