Process Scheduling on Linux and macOS
Background
Sometimes a task will be heavily disk, network or CPU dependent, and will starve other tasks of those resources. It’s important that such tasks won’t steal resources from other tasks.
Ideally, some resource intensive tasks would run only when the resources they depend on are idle. That, or resource intensive tasks would have their resource consumption throttled to favor other tasks.
In this article, we will review CPU and I/O scheduling at the process-level on both Linux and Darwin.
Process Scheduling
Process scheduling priority can be easily tuned with nice
.
Linux processes given a lower niceness value on the interval [-20, 19]
via the nice
shell interface have priority over those with a larger niceness value. Internally, niceness has an interval of [0, 139]
, of which only the last 39 priorities are available to users.
Here’s how we’d run a process with a lower scheduling precedence:
nice -n 19 our_expensive_cmd
On macOS, priorities given to nice
on the command-line, or via the C process management interface, exist on the interval [-20, 20]
.
Darwin, the open source basis for macOS, has additional process priority concepts.
For example, a process with priority PRIO_DARWIN_BG
has these properties:
When a thread or process is in a background state the scheduling priority is set to the lowest value, disk IO is throttled (with behavior similar to using setiopolicy_np(3) to set a throttleable policy), and network IO is throttled for any sockets opened after going into background state. Any previously opened sockets are not affected.
We’ll use Darwin’s process priority concepts in the next section.
Processor Affinity
On Linux, we can set a task’s CPU affinity with taskset
.
If you run a heterogenous multiprocessing (HMP) setup, like those found in ARM big.LITTLE designs, this means you can pin background tasks to the little processor cores, or a CPU intensive task you’d like to finish quickly to the big processor cores.
If you want to take advantage of a processor’s cache or a multicore processor’s shared cache, this can eliminate overhead associated with switching processes between many processors or cache misses/coherency.
XNU, the kernel from which Darwin is based, exposes a thread affinity API, but there isn’t a convenient taskset
-like wrapper around it.
Network and Disk I/O Scheduling
The Linux kernel has a variety of input/ouput (I/O) queues. I/O is prioritized by each process’s given I/O scheduling class. The classes are idle, best effort and real time.
A process with the I/O scheduling class of idle will be given disk access when the resource is idle.
Best effort classes have priorities on the interval [0, 7]
, with lower numbers given a higher priority. Best effort scheduling is done via round-robin.
Real time classes are given disk priority, have the same number of priority levels as best effort and can only be used by root. Real time scheduled processes can starve other processes of resources. If you don’t know if you should use real time I/O scheduling, that most likely means that you don’t need it.
We can set I/O priority on Linux via ionice
.
# run a command with `idle` disk priority
ionice -c idle our_expensive_cmd
# run a command with the lowest `best effort` disk priority
ionice -c best-effort -n 7 our_expensive_cmd
Linux does not have a convenient interface to set network I/O priority. There are many options one can implement to that end, perhaps using kernel network QoS policies or iptables
. I’m sure something involving network namespaces could be hacked together and would be pretty cool. However, someone built a userspace solution with a nice command-line interface, so why not use that?
Since Linux is well-documented, I will not go into its different tiers of I/O queues, but I will expound on the macOS equivalent when we get to them, as I was unable to find much documentation in macOS-land.
Unlike Linux, XNU-derived systems have the concept of network I/O priority classes for processes.
XNU-derived macOS does not have the same scheduling structure as Linux, but does have the ability to prioritize disk I/O.
If we look at the setiopolicy_np()
manpage, we’ll see that macOS has a handful of I/O priority classes:
IOPOL_IMPORTANT
processes are given utmost priority.IOPOL_STANDARD
processes may be delayed in favor ofIOPOL_IMPORTANT
.IOPOL_UTILITY
processes are intended to be short-running and are throttled to favor the two priority classes above.IOPOL_THROTTLE
processs are intended to be long-running and are throttled to favor the above classes.IOPOL_PASSIVE
processes are intended for server-type processes for which I/O is the result of requests from client applications. The intent is that client application I/O will not be slowed down by server I/O initiated by the client. Other priority classes will ignore this class, giving the above classes priority overIOPOL_PASSIVE
.
Darwin gives us a nice command-line interface around setiopolicy_np()
and setpriority()
, called taskpolicy
.
You can browse its source here, given that it’s part of the Darwin open-source project. Here’s a copy of the relevant taskpolicy
manpage, since Apple took their copy down.
Let’s take a look at how we’d run a process with the lowest CPU and I/O priority in Linux:
ionice -c idle nice -n 19 our_expensive_cmd
Here’s how we’d do something very similar to the above on macOS:
taskpolicy -b our_expensive_cmd
The -b
flag runs our_expensive_cmd
with the process priority PRIO_DARWIN_BG
, which was touched upon in the previous section.
taskpolicy
also allows us to set one of the five disk policies that were listed above. For example, if we wanted to run a process with the policy IOPOL_THROTTLE
and scope IOPOL_SCOPE_PROCESS
, we could run:
taskpolicy -d throttle our_expensive_cmd
If we ran taskpolicy
with the -g
flag, it would do the same as above, but with the scope IOPOL_SCOPE_DARWIN_BG
.
Digging around the XNU kernel source, we will find that IOPOL_SCOPE_DARWIN_BG
is given the TASK_POLICY_DARWIN_BG_IOPOL
I/O policy flavor, which is used to decide the background process I/O tier.
We’ll also see that the default I/O policy for IOPOL_SCOPE_DARWIN_BG
is IOPOL_UTILITY
. IOPOL_UTILITY
is given tier THROTTLE_LEVEL_TIER2
, which is one tier above THROTTLE_LEVEL_TIER3
.
Interestingly enough, mobile XNU-derived systems like iOS have a slightly different I/O scheduling scheme than non-mobile systems, based on what I can glean from preprocessor directives.
Here’s a breakdown of tiers:
THROTTLE_LEVEL_TIER0
, the default scheduling tier forIOPOL_NORMAL
,IOPOL_DEFAULT
andIOPOL_PASSIVE
.THROTTLE_LEVEL_TIER1
, default forIOPOL_STANDARD
THROTTLE_LEVEL_TIER2
, default forIOPOL_UTILITY
THROTTLE_LEVEL_TIER3
, default forIOPOL_THROTTLE
If we try to give a process the scope of IOPOL_SCOPE_DARWIN_BG
, but give it any policy but IOPOL_UTILITY
or IOPOL_THROTTLE
, we’ll get an error:
alex@mbp:~$ taskpolicy -g passive bash
taskpolicy: setiopolicy_np(...IOPOL_SCOPE_DARWIN_BG...): Invalid argument
At some point I’ll do a write up on XNU QoS latency and throughput tiers, but that’s for another day.
Conclusion
We took a look at what CPU and I/O prioritization options are available on Linux. We’ve contrasted them with that which are available on macOS.
While Linux has utilities like nice
, ionice
and taskset
, it lacks a convenient process-level network throttling option. macOS, on the otherhand, has network and disk throttling built-in and available behind the taskpolicy
command-line utility.
Setting processor affinity is as easy as issuing a shell command on Linux, but on macOS such functionality must implemented at the application level, per application.