Real-Time Programming & RTOS Concurrent and real-time programming tools 1 Concurrent Programming Concurrency in real-time systems typical architecture of embedded real-time system: several input units computation output units data logging/storing i.e., handling several concurrent activities concurrency occurs naturally in real-time systems Support for concurrency in programming languages (Java, Ada, ...) advantages: readability, OS independence, checking of interactions by compiler, embedded computer may not have an OS Support by libraries and the operating system (C/C++ with POSIX) advantages: multi-language composition, language’s model of concurrency may be difficult to implement on top of OS, OS API stadards imply portability 2 Processes and Threads Process running instance of a program, executes its own virtual machine to avoid interference from other processes, contains information about program resources and execution state, e.g.: environment, working directory, ... program instructions, registers, heap, stack, file descriptors, signal actions, inter-process communication tools (pipes, message boxes, etc.) Thread exists within a process, uses process resources , can be scheduled by OS and run as an independent entity, keeps its own: execution stack, local data, etc. share global data and resources with other threads of the same process 3 Processes and threads in UNIX 4 Process (Thread) States 5 Process (Thread) Initialization and Termination Initialization explicit process (thread) declaration fork (and join) cobegin, coend Termination completion of execution “suicide” by execution of a self-terminte statement abortion, through the explicit action of another process (thread) ocurrence of an error condition never (process is a non-terminating loop) 6 Concurrent Programming is Complicated Multi-threaded applications with shared data may have numerous flaws Race condition Two or more threads try to access the same shared data, the result depends on the exact order in which their instructions are executed Deadlock occurs when two or more threads wait on each other, forming a cycle and preventing all of them from making any forward progress Starvation an idefinite delay or permanent blocking of one or more runnable threads in a multithreaded application Livelock occurs when threads are scheduled but are not making forward progress because they are continuously reacting to each other’s state changes Usually difficult to find bugs and verify correctness 7 Communication and Synchronization Communication passing of information from one process (thread) to another typical methods: shared variables, message passing Synchronization satisfaction of constraints on the interleaving of actions of processes e.g. action of one process has to occur after an action of another one typical methods: semaphores, monitors Communication and synchronization are linked: communication requires synchronization synchronization corresponds to communication without content 8 Communication: Shared Variables Consistency problems: unrestricted use of shared variables is unreliable multiple update problem example: shared variable X, assignment X := X + 1 load the current value of X into a register increment the value of the register store the value of the register back to X two processes executing these instruction ⇒ certain interleavings can produce inconsistent results Solution: parts of the process that access shared variables must be executed indivisibly with respect to each other these parts are called critical section required protection is called mutual exclusion ... one may use a special mutual ex. protocol (e.g. Peterson) or a synchronization mechanism – semaphores, monitors 9 Synchronization: Semaphores A sempahore contains an integer variable that, apart from initialization, is accessed only through two standard operations: wait() and signal(). semaphore is initialized to a non-negative value (typically 1) wait() operation: decrements the semaphore value if the value is positive; otherwise, if the value is zero, the caller becomes blocked signal() operation: increments the semaphore value; if the value is not positive, then one process blocked by the semaphore is unblocked (usually in FIFO order) both wait and signal are atomic Semaphores are elegant low-level primitive but error prone and hard to debug (deadlock, missing signal, etc.) 10 Synchronization: Monitors encapsulation and efficient condition synchronization critical regions are written as procedures; all encapsulated in a single object or module procedure calls into the module are guaranteed to be mutually exclusive shared resources accessible only by these procedures In some cases processes may need to wait until some condition holds true. The condition may be made true by another process using the monitor. Solution: condition variables only two operations can be invoked on a condition variable x : x.wait() = calling process is suspended until another process invokes x.notify() x.notify() = resumes exactly one waiting process 11 Synchronization: Monitors 12 Communication: Message Passing asynchronous (no-wait): send operation is not blocking, requires buffer space (mailbox) synchronous (rendezvous): send operation is blocking, no buffer required remote invocation (extended rendezvous): sender is blocked until reply is received 13 Synchronous Message Passing 14 Asynchronous Message Passing 15 Asynch. Message Passing with Bounded Buffer 16 Real-Time Aspects time-aware systems make explicit references to the time frame of the enclosing environment e.g. a bank safe’s door are to be locked from midnight to nine o’clock the "real-time" of the environment must be available reactive systems are typically concerned with relative times an output has to be produced within 50 ms of an associated input must be able to measure intervals usually must synchronize with environment: input sampling and output signalling must be done very regularly with controlled variability 17 The Concept of Time Real-time systems must have a concept of time – but what is time? Measure of a time interval Units? seconds, milliseconds, cpu cycles, system "ticks" Granularity, accuracy, stability of the clock source Is "one second" a well defined measure? "A second is the duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium-133 atom." ... temperature dependencies and relativistic effects (the above definition refers to a caesium atom at rest, at mean sea level and at a temperature of 0 K) Skew and divergence among multiple clocks Distributed systems and clock synchronization Measuring time external source (GPS, NTP, etc.) internal – hardware clocks that count the number of oscillations that occur in a quartz crystal 18 Requirements for Interaction with "time" For RT programming, it is desirable to have: access to clocks and representation of time delays timeouts deadline specification and real-time scheduling 19 Access to Clock and Representation of Time requires a hardware clock that can be read like a regular external device mostly offered by an OS service, if direct interfacing to the hardware is not allowed Example of time representation (POSIX high resolution clock, counting seconds and nanoseconds since 1970 with known resolution) 20 Delays In addition to having access to a clock, need ability to Delay execution until an arbitrary calendar time What about daylight saving time changes? Problems with leap seconds. Delay execution for a relative period of time Delay for t seconds Delay for t seconds after event e begins 21 A Repeated Task (An Attempt) The goal is to do work repeatedly every 100 time units while(1) { delay(100); do_work(); } Does it work as intended? No, accumulates drift ... Each turn in the loop will take at least 100 + x milliseconds, where x is the time taken to perform do_work() 22 A Repeated Task (An Attempt) The goal is to do work repeatedly every 100 time units while(1) { delay(100); do_work(); } Does it work as intended? No, accumulates drift ... Delay is just lower bound, a delaying process is not guaranteed access to the processor (the delay does not compensate for this) 23 Eliminating (Part of) The Drift: Timers Set an alarm clock, do some work, and then wait for whatever time is left before the alarm rings This is done with timers Two types of timers one-shot periodic Thread is told to wait until the next ring – accumulating drift is eliminated Even with timers, drift may still occur, but it does not accumulate (local drift) 24 Timeouts Synchronous blocking operations can include timeouts Synchronization primitives Semaphores, condition variables, locks, etc. ... timeout usually generates an error/exception Networking and other I/O calls E.g. select() in POSIX May also provide an asynchronous timeout signal Detect time overruns during execution of periodic and sporadic tasks 25 Deadline specification and real-time scheduling Clock driven scheduling trivial to implement via cyclic executive Other scheduling algorithms need OS and/or language support: System calls create, destroy, suspend and resume tasks Implement tasks as either threads or processes Threads usually more beneficial than processes (with separate address space and memory protection): Processes not always supported by the hardware Processes have longer context switch time Threads can communicate using shared data (fast and more predictable) Scheduling support: Preemptive scheduler with multiple priority levels Support for aperiodic tasks (at least background scheduling) Support for sporadic tasks with acceptance tests, etc. 26 Jobs, Tasks and Threads In theory, a system comprises a set of (abstract) tasks, each task is a series of jobs tasks are typed, have various parameters, react to events, etc. Acceptance test performed before admitting new tasks A thread (or a process) is the basic unit of work handled by the scheduler Threads are the instantiation of tasks that have been admitted to the system How to map tasks to threads? 27 Periodic Tasks Real-time tasks defined to execute periodically T = (φ, p, e, D) It is clearly inefficient if the thread is created and destroyed repeatedly every period Some op. systems (funkOS) and programming languages (Real-time Java & Ada) support periodic threads the kernel (or VM) reinitializes such a thread and puts it to sleep when the thread completes The kernel releases the thread at the beginning of the next period This provides clean abstraction but needs support from OS Thread instantiated once, performs job, sleeps until next period, repeats Lower overhead, but relies on programmer to handle timing Hard to avoid timing drift due to sleep overuns (see the discussion of delays earlier in this lecture) Most common approach 28 Sporadic and Aperiodic Tasks Events trigger sporadic and aperiodic tasks Might be extenal (hardware) interrupts Might be signalled by another task Usual implementation: OS executes periodic server thread (background server, deferrable server, etc.) OS maintains a “server queue” = a list of pointers which give starting addresses of functions to be executed by the server Upon the occurrence of an event that releases an aperiodic or sporadic job, the event handler (usually an interrupt routine) inserts a pointer to the corresponding function to the list 29 Real-Time Programming & RTOS Real-Time Operating systems 30 Operating Systems – What You Should Know ... An operating system is a collection of software that manages computer hardware resources and provides common services for computer programs. Basic components multi-purpose OS: Program execution & process management processes (threads), IPC, scheduling, ... Memory management segmentation, paging, protection ... Storage & other I/O management files systems, device drivers, ... Network management network drivers, protocols, ... Security user IDs, privileges, ... User interface shell, GUI, ... 31 Operating Systems – What You Should Know ... 32 Implementing Real-Time Systems Key fact from scheduler theory: need predictable behavior Raw performance less critical than consistent and predictable performance; hence focus on scheduling algorithms, schedulability tests Don’t want to fairly share resources – be unfair to ensure deadlines met Need to run on a wide range of – often custom – hardware Often resource constrained: limited memory, CPU, power consumption, size, weight, budget Closed set of applications (Do we need a wristwatches to play DVDs?) Strong reliability requirements – may be safety critical How to upgrade software in a car engine? A DVD player? 33 Implications on Operating Systems General purpose operating systems not well suited for real-time Assume plentiful resources, fairly shared amongst untrusted users Serve multiple purposes Exactly opposite of an RTOS! Instead want an operating system that is: Small and light on resources Predictable Customisable, modular and extensible Reliable ... and that can be demonstrated or proven to be so 34 Implications on Operating Systems Real-time operating systems typically either cyclic executive or microkernel designs, rather than a traditional monolithic kernel Limited and well defined functionality Easier to demonstrate correctness Easier to customise Provide rich support for concurrency & real-time control Expose low-level system details to the applications control of scheduling, interaction with hardware devices, ... 35 Cyclic Executive without Interrupts The simplest real-time systems use a “nanokernel” design Provides a minimal time service: scheduled clock pulse with a fixed period No tasking, virtual memory/memory protection etc. Allows implementation of a static cyclic schedule, provided: Tasks can be scheduled in a frame-based manner All interactions with hardware to be done on a polled basis Operating system becomes a single task cyclic executive 36 Microkernel Architecture Cyclic executive widely used in low-end embedded devices 8 bit processors with kilobytes of memory Often programmed in (something like) C via cross-compiler, or assembler Simple hardware interactions Fixed, simple, and static task set to execute Clock driven scheduler But many real-time embedded systems are more complex, need a sophisticated operating system with priority scheduling Common approach: a microkernel with priority scheduler Configurable and robust, since architected around interactions between cooperating system servers, rather than a monolithic kernel with ad-hoc interactions 37 Microkernel Architecture A microkernel RTOS typically provides: Timing services, interrupt handling, support for hardware interaction Task management, scheduling Messaging, signals Synchronization and locking Memory management (and sometimes also protection) 38 Latency (Some) sources of hard to predict latency caused by the system: Interrupts see next slide System calls RTOS should characterise WCET; kernel should be preemptable Memory management: paging avoid, either use segmentation with a fixed memory management scheme, or memory locking Caches may introduce non-determinism; there are techniques for computing WCET with processor caches DMA competes with processor for the memory bus, hard to predict who wins 39 Interrupts The amount of time required to handle interrupt varies Thus in most OS, interrupt handling is divided into two steps Immediate interrupt service very short; invokes a scheduled interrupt handling routine Scheduled interrupt service preemptable, scheduled as an ordinary job at a suitable priority 40 Immediate Interrupt Service Interrupt latency is the time between interrupt request and execution of the first instruction of the interrupt service routine The total delay caused by interrupt is the sum of the following factors: the time the processor takes to complete the current instruction, do the necessary chores (flush pipeline and read the interrupt vector), and jump to the trap handler and interrupt dispatcher the time the kernel takes to disable interrupts the time required to complete the immediate interrupt service routines with higher-priority interrupts (if any) that occurred simultaneously with this one the time the kernel takes to save the context of the interrupted thread, identify the interrupting device, and get the starting address of the interrupt service routine the time the kernel takes to start the interrupt service routine 41 Event Latency 42 Example RTOS: FreeRTOS RTOS for embedded devices (currently ported to 34 microcontrollers) Distributed under GPL Written in C, kernel consists of 3+1 C source files (approx. 9000 lines of code including comments) Largely configurable 43 Example RTOS: FreeRTOS The OS is (more or less) a library of object modules; the application and OS modules are linked together in the resulting executable image Prioritized scheduling of tasks tasks correspond to threads (share the same address space; have their own execution stacks) highest priority executes; same priority ⇒ round robin implicit idle task executing when no other task executes ⇒ may be assigned functionality of a background server Synchronization using semaphores Communication using message queues Memory management no memory protection in basic version (can be extended) various implementations of memory management memory can/cannot be freed after allocation, best fit vs combination of adjacent memory block into a single one That’s (almost) all .... 44 Example RTOS: FreeRTOS Tiny memory requirements: e.g. IAR STR71x ARM7 port, full optimisation, minimum configuration, four priorities ⇒ size of the scheduler = 236 bytes each queue adds 76 bytes + storage area each task 64 bytes + the stack size 45 Real-Time Programming & RTOS Real-Time Programming Languages Brief Overview 46 C and POSIX IEEE 1003 POSIX "Portable Operating System Interface" Defines a subset of Unix functionality, various (optional) extensions added to support real-time scheduling, signals, message queues, etc. Widely implemented: Unix variants and Linux Dedicated real-time operating systems Limited support in Windows Several POSIX standards for real-time scheduling POSIX 1003.1b ("real-time extensions") POSIX 1003.1c ("pthreads") POSIX 1003.1d ("additional real-time extensions") Support a sub-set of scheduler features we have discussed 47 POSIX Scheduling API 48 POSIX Scheduling API (Threads) Thread scheduling API mirrors process scheduling API same scheduling policies, priorities, etc. 49 Threads: Example I #include pthread_t id; void *fun(void *arg) { // Some code sequence } main() { pthread_create(&id, NULL, fun, NULL); // Some other code sequence } 50 Threads: Example II #include #include #define NUM_THREADS 5 void *PrintHello(void *threadid) { printf("\n%d: Hello World!\n", threadid); pthread_exit(NULL); } int main (int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc, t; for(t=0; t