Scheduling is the action of assigning resources to perform tasks.
We will mainly focus on scheduling where our resource is a processor or multiple processors, and the task will be a thread or a process that needs to be executed.
The act of scheduling is carried out by a process called scheduler.
The scheduler goals are to
- Maximize throughput (amount of tasks done per time unit)
- Minimize wait time (amount of time passed since the process was ready until it started to execute)
- Minimize response time (amount of time passed since the process was ready until it finished executing)
- Maximize fairness (distributing resources fairly for each task)
In case you are not familiar with these metrics, I’d suggest looking at a few examples from another post I made about scheduling algorithms
Process Types in Linux
Linux has two types of processes
- Real-time Processes
- Conventional Processes
Real-time processes are required to ‘obey’ response time constraints without any regard to the system’s load.
In different words, real-time processes are urgent and cannot be delayed no matter the circumstances.
An example of a real-time process in Linux is the migration process which is responsible for distributing processes across CPU cores (a.k.a load balancing).
Conventional processes don’t have strict response time constraints and they can suffer from delays in case the system is ‘busy’.
An example of a conventional process can be the browser process you’re using to read this post.
Each process type has a different scheduling algorithm, and as long as there are ready-to-run real-time processes they will run and make the conventional processes wait.
There are two scheduling policies when it comes to real-time scheduling, SCHED_RR and SCHED_FIFO.
The policy affects how much runtime a process will get and how is the runqueue is operating.
Since I didn’t mention it explicitly before, let’s get something in order.
The ready-to-run processes I have mentioned are stored in a queue called runqueue. The scheduler is picking processes to run from this runqueue based on the policy.
As you might have guessed, in this policy the scheduler will choose a process based on the arrival time (FIFO = First In First Out).
A process with a scheduling policy of SCHED_FIFO can ‘give up’ the CPU under a few circumstances:
- Process is waiting, for example for an IO operation.
When the process is back to ‘ready’ state it will go back to the end of the runqueue.
- Process yielded the CPU, with the system call sched_yield.
The process will immediately go back to the end of the runqueue.
RR = Round Robin
In this scheduling policy, every process in the runqueue gets a time slice (quantum) and executes in his turn (based on priority) in a cyclic fashion.
In order for us to have a better intuition about round robin, let’s consider an example where we have 3 processes in our runqueue, A B C, all of them have the policy of SCHED_RR.
As shown in the drawing below, each process gets a time slice and executes in his turn. when all processes ran 1 time, they repeat the same execution order.
Real-Time Scheduling Summary
A real-time process can be scheduled in two different policies, SCHED_FIFO and SCHED_RR.
The policy affects how the runqueue is working and how much time each process is getting for execution.
CFS — Completely Fair Scheduler is the scheduling algorithm of conventional processes since version 2.6.23 of Linux.
Remmeber the metrics of schedulers we discussed in the top of this article? so CFS is focusing mainly on one metric — it wants to be fair as much as possible, meaning that he gives every process gets an even time slice of the CPU.
Note that, processes with higher priority might still get bigger time slices.
In order for us to understand how CFS works, we will have to get familiar with a new term — virtual runtime (vruntime).
Virtual runtime of a process is the amount of time spent by actually executing, not including any form of waiting.
As we mentioned, CFS tries to be as fair as possible.
To accomplish that, CFS will schedule the process with the minimum virtual time that is ready to run.
CFS maintains variables holding the maximum and minimum virtual runtime for reasons we will understand soon.
CFS — Completely Fair Scheduler
Before talking about how the algorithm works, let’s understand what data structure this algorithm is using.
CFS uses a red-black tree which is a balanced binary search tree — meaning that insertion, deletion and look-up are performed in O(logN) where N is the number of processes.
The key in this tree is the virtual runtime of a process.
New processes or process that got back to ready state from waiting are inserted into the tree with a key vruntime=min_vruntime.
This is extremely important in order to prevent starvation of older processes in the tree.
Moving on to the algorithm, at first, the algorithm sets himself a time limit — sched_latency.
In this time limit, it will try to execute all ready processes — N.
Which means that each process will get a time slice of the time limit divided by the number of processes — Qᵢ = sched_latency/N.
When a process finishes his time-slice (Qᵢ), the algorithm picks the process with the least virtual runtime in the tree to execute next.
Let’s address a situation that might be problematic with the way I described the algorithm so far.
Assuming that the algorithm picked a time limit of 48ms(milliseconds) and we have 6 processes — in this case every process gets 8ms to execute in his turn.
But what happens when the system is overloaded with processes?
Let’s say the time limit remains 48ms but now we have 32 processes, now each process has 1.5ms to execute — and this will cause a major slowdown in our system.
Why? What’s the difference?
Context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point.
Every time that a process finishes his execution time and a new process is scheduled, a context switch occurs which also takes time.
Let’s say that a context switch costs us 1ms, in the first example where we have 6ms per process, we can allow that, we waste 1ms on the context switch and 5ms on actually executing the process. but on the second example we only have 0.5ms to execute the process — we waste most of our time slice for context switching and that’s why it simply cannot work.
In order to overcome this situation, we introduce a new variable that will determine how small a time slice is allowed to be — min_granularity.
Let’s say that min_granularity=6ms and get back to our example.
Our time limit is 48 and we have 32 processes.
By the calculation we made before, every process will get 1.5ms but now it is simply not allowed because the min_granularity specifies the minimum time slice each process should get.
In this case, where Qᵢ < min_granularity we take min_granularity as our Qᵢ and change the time limit according to it.
In our example, Qᵢ would be equal to 6ms since 1.5ms < 6ms and that would mean that the new time limit would be Qᵢ ⋅ N = 6ms ⋅ 32 = 192ms.
At this point it might not be clear what are the differences between CFS and RR since both of them define some time slices and make processes execute in some order.
In order to summarize and to better understand the differences between these algorithms, here’s a short table
Hopefully by now, you have a better understanding of how process scheduling is done in the Linux kernel.
In case you want to receive my new stories to your inbox, subscribe with your email