X-Git-Url: https://git.kernelconcepts.de/?a=blobdiff_plain;f=Documentation%2Fcpusets.txt;h=d17b7d2dd771e6c0eeeda4b93c372840209f014d;hb=663b97f7efd001b0c56bd5fce059c5272725b86f;hp=2f8f24eaefd9ac746f8a1e98b6e2c99abcc23b58;hpb=af6f5e3247a68074e384ef93c0b4bce1b73c9d80;p=karo-tx-linux.git diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 2f8f24eaefd9..d17b7d2dd771 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt @@ -51,6 +51,26 @@ mems_allowed vector. If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct ancestor or descendent, may share any of the same CPUs or Memory Nodes. +A cpuset that is cpu exclusive has a sched domain associated with it. +The sched domain consists of all cpus in the current cpuset that are not +part of any exclusive child cpusets. +This ensures that the scheduler load balacing code only balances +against the cpus that are in the sched domain as defined above and not +all of the cpus in the system. This removes any overhead due to +load balancing code trying to pull tasks outside of the cpu exclusive +cpuset only to be prevented by the tasks' cpus_allowed mask. + +A cpuset that is mem_exclusive restricts kernel allocations for +page, buffer and other data commonly shared by the kernel across +multiple users. All cpusets, whether mem_exclusive or not, restrict +allocations of memory for user space. This enables configuring a +system so that several independent jobs can share common kernel +data, such as file system pages, while isolating each jobs user +allocation in its own cpuset. To do this, construct a large +mem_exclusive cpuset to hold all the jobs, and construct child, +non-mem_exclusive cpusets for each individual job. Only a small +amount of typical kernel memory, such as requests from interrupt +handlers, is allowed to be taken outside even a mem_exclusive cpuset. User level code may create and destroy cpusets by name in the cpuset virtual file system, manage the attributes and permissions of these @@ -84,6 +104,9 @@ This can be especially valuable on: and a database), or * NUMA systems running large HPC applications with demanding performance characteristics. + * Also cpu_exclusive cpusets are useful for servers running orthogonal + workloads such as RT applications requiring low latency and HPC + applications that are throughput sensitive These subsets, or "soft partitions" must be able to be dynamically adjusted, as the job mix changes, without impacting other concurrently @@ -125,6 +148,8 @@ Cpusets extends these two mechanisms as follows: - A cpuset may be marked exclusive, which ensures that no other cpuset (except direct ancestors and descendents) may contain any overlapping CPUs or Memory Nodes. + Also a cpu_exclusive cpuset would be associated with a sched + domain. - You can list all the tasks (by pid) attached to any cpuset. The implementation of cpusets requires a few, simple hooks @@ -136,6 +161,9 @@ into the rest of the kernel, none in performance critical paths: allowed in that tasks cpuset. - in sched.c migrate_all_tasks(), to keep migrating tasks within the CPUs allowed by their cpuset, if possible. + - in sched.c, a new API partition_sched_domains for handling + sched domain changes associated with cpu_exclusive cpusets + and related changes in both sched.c and arch/ia64/kernel/domain.c - in the mbind and set_mempolicy system calls, to mask the requested Memory Nodes by what's allowed in that tasks cpuset. - in page_alloc, to restrict memory to allowed nodes. @@ -249,7 +277,7 @@ rewritten to the 'tasks' file of its cpuset. This is done to avoid impacting the scheduler code in the kernel with a check for changes in a tasks processor placement. -There is an exception to the above. If hotplug funtionality is used +There is an exception to the above. If hotplug functionality is used to remove all the CPUs that are currently assigned to a cpuset, then the kernel will automatically update the cpus_allowed of all tasks attached to CPUs in that cpuset to allow all CPUs. When memory