安装和使用 Ftrace

来源：互联网发布：手机淘宝标题在哪设置编辑：程序博客网时间：2024/05/03 21:35

Installing and Using Ftrace
ref:http://www.omappedia.org/wiki/Installing_and_Using_Ftrace
===================================================
Kernel configuration & Re-build

Kernel Hacking -> Tracers -> FUNCTION_TRACER
Kernel Hacking -> Tracers -> FUNCTION_GRAPH_TRACER (if possible)
Kernel Hacking -> Tracers -> STACK_TRACER // Trace max stack
Kernel Hacking -> Tracers -> DYNAMIC_FTRACE // enable/disable ftrace tracepoints dynamically

Using Ftrace

Ftrace has its control ﬁles in the debugfs system.This is usually mounted in /sys/kernel/debug.
# mount -t debugfs nodev /sys/kernel/debug

# mkdir /mnt/debug
# mount -t debugfs nodev /mnt/debugfs
That creates a /debug/tracing subdirectory which is used to control ftrace and for getting output from the tool.

-----------------------------
#cd /sys/kernel/debug/tracing
# cat available_tracers
blk function sched_switch nop

# echo function > current_tracer
# cat current_tracer
function
Note: Ftrace can handle only one tracer at the same time
-------------------Using a tracer
# echo 1 > tracing_on
# echo 0 > tracing_on
The trace is contained in the trace file.
Snail:/sys/kernel/debug/tracing# cat trace | head -10
# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP FUNCTION
#              | |       |          |         |
     firefox-bin-4072 [000]   946.722298: _raw_spin_unlock_irqrestore <-skb_dequeue
     firefox-bin-4072 [000]   946.722298: mutex_unlock <-unix_stream_recvmsg
     firefox-bin-4072 [000]   946.722298: scm_recv <-unix_stream_recvmsg
     firefox-bin-4072 [000]   946.722298: scm_destroy <-scm_recv
     firefox-bin-4072 [000]   946.722299: scm_destroy_cred <-scm_destroy
     firefox-bin-4072 [000]   946.722299: put_pid <-scm_destroy_cred
-------------------------Tracing a specific process
# echo $$ > set_ftrace_pid // current thread

you can create a shell script wrapper program.
[tracing]# cat ~/bin/ftrace-me
#!/bin/sh
DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'`
echo $$ > $DEBUGFS/tracing/set_ftrace_pid
echo function > $DEBUGFS/tracing/current_tracer
exec $*
[tracing]# ~/bin/ftrace-me ls -ltr

Note, you must clear the set_ftrace_pid file if you want to go back to generic function tracing after performing the above.

---------------------------
#!/bin/sh
DEBUGFS=`grep debugfs /proc/mounts | awk '{ print $2; }'` // print ' /mnt/debugfs'
echo $$ > $DEBUGFS/tracing/set_ftrace_pid // print crrunt process ID
echo function > $DEBUGFS/tracing/current_tracer
exec $*
====================== different tracers
- nop tracer
This is the tracer by default which trace no function.
- Function tracer
To work, it needs the kernel variable ftrace_enabled to be turned on, otherwise this tracer is a nop.
   # sysctl kernel.ftrace_enabled=1
    # echo function > current_tracer
    # cat current_tracer
    function
The header explains the format of the output pretty well. The first two items are the traced task name and PID.
The CPU that the trace was executed on is within the brackets. The timestamp is the time since boot in <secs>.<usecs>format,
followed by the function name with its parent following the "<-" symbol.

- How the Function Tracer work ?
The function tracer (enabled by CONFIG_FUNCTION_TRACER) is a way to trace almost all functions in the kernel.
When function tracing is enabled, the kernel is compiled with the gcc option -pg. This is a proﬁler that will make all
functions call a special function named mcount. Unfortunately, this could cause a very large overhead. To improve it,
enable the dynamic Ftrace (#Dynamic Ftrace), then the mcount calls, when not in use, are converted at run time to nops.
This allows the function tracer to have zero overhead when not in use.
+++ Finding Origins of Latencies Using Ftrace
- Sched_switch Tracer
This tracer, also activated by enabling CONFIG_FUNCTION_TRACER, traces the context switches and wakeups between tasks.
Wake ups are represented by a "+" and the context switches are shown as "==>". The format is:

    * Context switches:
      Previous task              Next Task
<pid>:<prio>:<state> ==> <pid>:<prio>:<state>

    * Wake ups:
      Current task               Task waking up
<pid>:<prio>:<state>    + <pid>:<prio>:<state>

The task states are:
    R - running : wants to run, may not actually be running
    S - sleep : process is waiting to be woken up (handles signals)
    D - disk sleep (uninterruptible sleep) : process must be woken up (ignores signals)
    T - stopped : process suspended
    t - traced : process is being traced (with something like gdb)
    Z - zombie : process waiting to be cleaned up
    X - unknown

==== Sched_switch viewer
To visualize the temporal relationship of Linux tasks, a graphical tools makes it possible.
This tool convert the trace data to VCD (value change dump) data. After this, it is possible to
view the context switches in a vcd viewer, such as GTKWave(# aptitude install gtkwave).
// http://www.omappedia.org/images/6/6d/Sched_switch-0.1.1.tar.gz
To use this program first make a ftrace file, and export it into a text file
# cat /debug/tracing/trace > /tmp/trace.txt
Now convert the trace data and start the viewer:
#sched_switch /tmp/trace.txt /tmp/trace.vcd
# gtkwave /tmp/trace.vcd

====== Function Graph Tracer
This tracer depends of the CONFIG_FUNCTION_GRAPH kernel option.
This tracer is similar to the function tracer except that it probes a function on its entry and its exit, and also the duration of each function.
NOTE: You must make sure that CC_OPTIMIZE_FOR_SIZE (Optimize for size) is not set.
              Kernel function profiler
When the duration is greater than 10 microseconds, a "+" is shown. If the duration is greater than 100 microseconds a "!" will be displayed.
===> means interrupt.
# tracer: function_graph
#
#     TIME        CPU DURATION                  FUNCTION CALLS
#      |          |     |   |                     |   |   |   |
0)               |      tty_poll() {
0)   0.459 us    |        tty_paranoia_check();
0)               |        tty_ldisc_ref_wait() {
0)               |          tty_ldisc_try() {
0)   0.460 us    |            _raw_spin_lock_irqsave();
0)   0.469 us    |            _raw_spin_unlock_irqrestore();
0)   2.401 us    |          }
0)   3.372 us    |        }

==== Function Profiling
With the function profiler, it is possible to take a good look at the actual function execution and not just samples.
If CONFIG_FUNCTION_GRAPH_TRACER is configured in the kernel, the function profiler will use the function
graph infrastructure to record how long a function has been executing. If just CONFIG_FUNCTION_TRACER is configured,
the function profiler will just count the functions being called.
NOTE: CONFIG_FUNCTION_PROFILER( Kernel function profiler)also need to set.
# echo nop > current_tracer
# echo 1 > function_profile_enabled
# cat trace_stat/function0 |head
Function                               Hit    Time            Avg             s^2
--------                               ---    ----            ---             ---
schedule                             12513    185224676 us     14802.57 us     1398161247 us
schedule_hrtimeout_range              5350    106054923 us     19823.35 us     1927031390 us
schedule_hrtimeout_range_clock        5350    106051822 us     19822.77 us     1927030959 us
poll_schedule_timeout                 5349    104054602 us     19453.09 us     1566715221 us
sys_poll                              9025    90736606 us     10053.91 us     116051200 us
do_sys_poll                           9025    90725452 us     10052.68 us     116039896 us
schedule_timeout                        52    25796322 us     496083.1 us     2808748337 us
sys_futex                             5370    20737388 us     3861.710 us     1880027812 us

The above also includes the times that a function has been preempted or schedule() was called and the task was swapped out.
This may seem useless, but it does give an idea of what functions get preempted often. Ftrace also includes options that allow you
to have the function graph tracer ignore the time the task was scheduled out.

# echo 0 > options/sleep-time
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# cat trace_stat/function0 | head
Function                               Hit    Time            Avg             s^2
--------                               ---    ----            ---             ---
sys_poll                              6115    30315217 us     4957.517 us     2412802549 us
do_sys_poll                           6115    30305818 us     4955.980 us     2412789865 us
schedule                              8372    30107862 us     3596.256 us     1765330011 us
poll_schedule_timeout                 3627    30084182 us     8294.508 us     4036656248 us
schedule_hrtimeout_range              3625    30081717 us     8298.404 us     4038846061 us
schedule_hrtimeout_range_clock        3625    30079395 us     8297.764 us     4038845543 us
c1e_idle                              1950    4998400 us     2563.282 us     9348104 us
default_idle                          1950    4997270 us     2562.702 us     9348491 us
Another option that affects profiling is graph-time (again with a "-"). By default it is enabled. When enabled, the times for a function include the times of all the functions that were called within the function. As you can see from the output in the above examples, several system calls are listed with the highest average. When disabled, the times only include the execution of the function itself, and do not contain the times of functions that are called from the function:

# echo 0 > options/graph-time
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# cat trace_stat/function0 | head
Function                               Hit    Time            Avg             s^2
--------                               ---    ----            ---             ---
default_idle                          2036    5350234 us     2627.816 us     9636489 us
do_sys_poll                           5435    45520.59 us     8.375 us        49.491 us
_raw_spin_lock_irqsave              145096    38007.94 us     0.261 us        0.143 us
sock_poll                            70609    37084.12 us     0.525 us        0.370 us
_raw_spin_unlock_irqrestore         150133    33984.39 us     0.226 us        0.001 us
schedule                              7766    33119.26 us     4.264 us        3.132 us
unix_poll                            70603    31386.57 us     0.444 us        0.521 us
fget_light                          108149    29644.08 us     0.274 us        0.091 us

===== FTD (Function Trace Dump) ???? Sth Wrong here.
===== Boot Tracer   &&&&&&&&&&&&&&&&

This tracer is activated with the CONFIG_BOOT_TRACER kernel configuration option.// I didn't find this option in 2.6.38.6
This tracer helps developers to optimize boot time: it records the timings of initcalls and traces key events and the identity of tasks that can cause boot delays.
The easiest way is to pass initcall_debug and printk.time=1 to the kernel command line, and boot the system. Then copy/paste the console output to a file (say boot.log or type dmesg > boot.log.
Then, generate a boot graph by using a utility provided in the Linux kernel sources:
# cat boot.log | perl $(KERNEL_SRC_DIR)/scripts/bootgraph.pl > boot.svg
Boot trace.jpeg
This graph can be viewed with Inkscape or Firefox. It gives a visual representation of the delays in initcalls.

Another way to get the same results is to pass in initcall_debug and ftrace=initcall to the kernel command line. You can then access timing information in the /sys/kernel/debug/tracing/trace file:
# cat current_tracer
initcall
# cat trace |head -20
============Latency Tracers
When the latency-format option is enabled (Tracer option), the trace file gives somewhat more information to see why a latency happened.
# sysctl kernel.ftrace_enabled= 0 //For more precise results, you can disabled the function tracing, which can bring a large overhead,
                                                            // by turning off the kernel variable ftrace_enabled.
or
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo irqsoff > current_tracer
#echo 0 > tracing_max_latency //Before launching each latency tracer, don't forget to reset the tracing_max latency
# echo latency-format > trace_options
# echo 1 > tracing_on
# ls -ltr
# echo 0 > tracing_on
# cat trace
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 2.6.38.6
# --------------------------------------------------------------------
# latency: 213 us, #4/4, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:1)
#    -----------------
#    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
# => started at: nv_napi_poll
# => ended at:   nv_napi_poll
#
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /_--=> lock-depth
#                |||||/     delay
# cmd     pid   |||||| time |   caller
#     \   /      ||||||   \   |   /
<idle>-0       0dNs..    0us!: _raw_spin_lock_irqsave <-nv_napi_poll
<idle>-0       0dNs.. 213us : _raw_spin_unlock_irqrestore <-nv_napi_poll
<idle>-0       0dNs.. 214us : trace_hardirqs_on <-nv_napi_poll
<idle>-0       0dNs.. 214us : <stack trace>
=> _raw_spin_unlock_irqrestore
=> nv_napi_poll
=> net_rx_action
=> __do_softirq
This shows that the current tracer is "irqsoff" tracing the time for which interrupts were disabled.
It gives the trace version and the version of the kernel upon which this was executed on ( 2.6.38.6).
Then it displays the max latency in microsecs (213 us). The number of trace entries displayed and the
total number recorded (both are seven: #4/4).VP, KP, SP, and HP are always zero and are reserved for later use.
The task is the process that was running when the latency occurred. (pid:0).
The start and stop (the functions in which the interrupts were disabled and enabled respectively) that caused the latencies:
        nv_napi_poll is where the interrupts were disabled.         nv_napi_poll where they were enabled again.
The next lines after the header are the trace itself. The header explains which is which.
    * cmd: The name of the process in the trace.
    * pid: The PID of that process.
    * CPU#: The CPU which the process was running on.
    * irqs-off: 'd' interrupts are disabled. '.' otherwise.
Note: If the architecture does not support a way to read the irq flags variable, an 'X' will always be printed here.
    * need-resched: 'N' task need_resched is set, '.' otherwise.
    * hardirq/softirq:
    'H' - hard irq occurred inside a softirq.
    'h' - hard irq is running
    's' - soft irq is running
    '.' - normal context.
    * preempt-depth: The level of preempt_disabled
The above is mostly meaningful for kernel developers.
    * time: When the latency-format option is enabled, the trace file output includes a timestamp relative to the start of the trace. This differs from the output when latency-format is disabled, which includes an absolute timestamp.
    * delay: This is just to help catch your eye a bit better. And needs to be fixed to be only relative to the same CPU. The marks are determined by the difference between this current trace and the next trace.
    '!' - greater than preempt_mark_thresh (default 100)
    '+' - greater than 1 microsecond
    ' ' - less than or equal to 1 microsecond.
The rest is the same as the 'trace' file.

==== Wakeup Tracer
This tracer is activate by enabling the CONFIG_SCHED_TRACER
This tracer tracks the latency of the highest priority task to be scheduled in, starting from the point it has woken up.
#cat trace
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 2.6.38.6
# --------------------------------------------------------------------
# latency: 32 us, #55/55, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:1)
#    -----------------
#    | task: sleep-3345 (uid:0 nice:0 policy:1 rt_prio:5)
#    -----------------
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /_--=> lock-depth
#                |||||/     delay
# cmd     pid   |||||| time |   caller
#     \   /      ||||||   \   |   /
<idle>-0       0d.h..    1us+:      0:120:R   + [000] 3345: 94:R sleep
<idle>-0       0d.h..    5us : wake_up_process <-hrtimer_wakeup
<idle>-0       0d.h..    6us : check_preempt_curr <-try_to_wake_up
<idle>-0       0d.h..    7us : resched_task <-check_preempt_curr
<idle>-0       0dNh..    8us : task_woken_rt <-try_to_wake_up
<idle>-0       0dNh..    8us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0       0dNh..    9us : _raw_spin_lock <-__run_hrtimer
<idle>-0       0dNh..    9us : tick_program_event <-hrtimer_interrupt
<idle>-0       0dNh..   10us : tick_dev_program_event <-tick_program_event
<idle>-0       0dNh..   10us : ktime_get <-tick_dev_program_event
<idle>-0       0dNh..   10us : clockevents_program_event <-tick_dev_program_event
<idle>-0       0dNh..   11us : lapic_next_event <-clockevents_program_event
<idle>-0       0dNh..   11us : native_apic_mem_write <-lapic_next_event
<idle>-0       0dNh..   12us : irq_exit <-smp_apic_timer_interrupt
<idle>-0       0dN...   12us : rcu_irq_exit <-irq_exit
<idle>-0       0dN...   13us : idle_cpu <-irq_exit
<idle>-0       0.N...   13us : __exit_idle <-cpu_idle
<idle>-0       0.N...   14us : tick_nohz_restart_sched_tick <-cpu_idle
<idle>-0       0dN...   14us : ktime_get <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   15us : rcu_exit_nohz <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   15us : select_nohz_load_balancer <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   16us : tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   16us : account_idle_ticks <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   17us : hrtimer_cancel <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   17us : hrtimer_try_to_cancel <-hrtimer_cancel
<idle>-0       0dN...   18us : lock_hrtimer_base <-hrtimer_try_to_cancel
<idle>-0       0dN...   18us : _raw_spin_lock_irqsave <-lock_hrtimer_base
<idle>-0       0dN...   19us : __remove_hrtimer <-hrtimer_try_to_cancel
<idle>-0       0dN...   19us : hrtimer_force_reprogram <-__remove_hrtimer
<idle>-0       0dN...   20us : _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
<idle>-0       0dN...   20us : hrtimer_forward <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   21us : ktime_divns <-hrtimer_forward
<idle>-0       0dN...   21us : ktime_add_safe <-hrtimer_forward
<idle>-0       0dN...   22us : ktime_add_safe <-hrtimer_forward
<idle>-0       0dN...   22us : hrtimer_start_range_ns <-tick_nohz_restart_sched_tick
<idle>-0       0dN...   22us : __hrtimer_start_range_ns <-hrtimer_start_range_ns
<idle>-0       0dN...   23us : lock_hrtimer_base <-__hrtimer_start_range_ns
<idle>-0       0dN...   23us : _raw_spin_lock_irqsave <-lock_hrtimer_base
<idle>-0       0dN...   24us : ktime_add_safe <-__hrtimer_start_range_ns
<idle>-0       0dN...   24us : enqueue_hrtimer <-__hrtimer_start_range_ns
<idle>-0       0dN...   25us : tick_program_event <-__hrtimer_start_range_ns
<idle>-0       0dN...   25us : tick_dev_program_event <-tick_program_event
<idle>-0       0dN...   26us : ktime_get <-tick_dev_program_event
<idle>-0       0dN...   26us : clockevents_program_event <-tick_dev_program_event
<idle>-0       0dN...   26us : lapic_next_event <-clockevents_program_event
<idle>-0       0dN...   27us : native_apic_mem_write <-lapic_next_event
<idle>-0       0dN...   27us : _raw_spin_unlock_irqrestore <-__hrtimer_start_range_ns
<idle>-0       0.N...   28us : schedule <-cpu_idle
<idle>-0       0.N...   28us : rcu_note_context_switch <-schedule
<idle>-0       0.N...   29us : _raw_spin_lock_irq <-schedule
<idle>-0       0dN...   29us : put_prev_task_idle <-schedule
<idle>-0       0dN...   30us : pick_next_task_stop <-schedule
<idle>-0       0dN...   30us : pick_next_task_rt <-schedule
<idle>-0       0d....   31us : schedule <-cpu_idle
<idle>-0       0d....   32us :      0:120:R ==> [000] 3345: 94:R sleep

we see that it took 32us microseconds to perform the task switch.

==========Selecting filter function

A list of functions that can be added to the filter files is shown in the available_filter_functions file. This file contain almost all functions in the kernel.

# cat available_filter_functions |head
do_one_initcall
run_init_process
init_post
name_to_dev_t
elf_check_arch
arm_elf_read_implies_exec
elf_set_personality
set_irq_flags
show_interrupts
machine_halt
....

To add a filter, just echo his name in set_function_filter

echo do_one_initcall > set_ftrace_filter

But adding one by one all the wanted event can be boring Fortunately, these files also take wildcards; the following glob expressions are valid:

    * value* - Select all functions that begin with value.
    * *value* - Select all functions that contain the text value.
    * *value - Select all functions that end with value.

To remove a filter or more, use the !symbol echo '!do_one_initcall' > set_function_filter

All these commands work also with set_ftrace_notrace
[edit] Stop the trace on a specific function

Enabling and disabling recording to the trace file can be done using tracing_on file from debugfs. It can also be controlled accurately through tracing_on() and tracing_off() functions from the kernel, which require unfortunately to recompile the kernel. In the special case where you just want to switch tracing on/off in a specific kernel module you don't need to rebuild the whole kernel. Just add tracing_on() and tracing_off() to your module and recompile it. A last method allows stopping the trace at a particular function. This method uses the set_ftrace_filter file.

The format of the command to have the function trace enable or disable the ring buffer is as follows:

function:command[:count]

This will execute the command at the start of the function. The command is either traceon or traceoff, and an optional count can be added to have the command only execute a given number of times. If the count is left off (including the leading colon) then the command will be executed every time the function is called.

# echo '__do_softirq:traceoff' > set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
__do_softirq:traceoff:unlimited

Notice that functions with commands do not affect the general filters. Even though a command has been added to __do_softirq, the filter still allowed all functions to be traced. Commands and filter functions are separate and do not affect each other.

=====What calls a specific function?
Sometimes it is useful to know what is calling a particular function. The immediate predecessor is helpful, but an entire backtrace is even better. The function tracer contains an option that will create a backtrace in the ring buffer for every function that is called by the tracer.

To use the function tracer backtrace feature, it is imperative that the functions being called are limited by the function filters. The option to enable the function backtracing is unique to the function tracer and activating it can only be done when the function tracer is enabled. This means you must first enable the function tracer before you have access to the option:
# echo sys_execve > set_ftrace_filter // echo '!sys_execve' > set_ftrace_filter
# echo function > current_tracer
# echo 1 > options/func_stack_trace   // echo 0 > tracing_on
# cat trace

# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP FUNCTION
#              | |       |          |         |
             cat-3287 [000]   273.728245: sys_execve <-ptregs_execve
             cat-3287 [000]   273.728249: <stack trace>
=> ptregs_execve
======= trace a specific function   // similar to the above
If I am only interested in sys_nanosleep and hrtimer_interrupt:

# echo sys_nanosleep hrtimer_interrupt \
        > set_ftrace_filter
# echo function > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
# tracer: ftrace
#
#           TASK-PID   CPU#    TIMESTAMP FUNCTION
#              | |      |          |         |
          usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
          usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
          <idle>-0     [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt

====== Trace Event
Using the different previous plugin tracers is not always very practical. Some can bring a large overhead and others do not give enough information to see what happened. Luckily, there is the event tracer. The event tracing is not a plugin. When events are enabled, they will be recorded in any plugin, including the special nop plugin.
---- Using Event Tracing
---- Via the 'set_event' interface
The events which are available for tracing can be found in the file "available_events"
# cat available_events |head
skb:kfree_skb
skb:consume_skb
skb:skb_copy_datagram_iovec
net:net_dev_xmit
net:net_dev_queue
net:netif_receive_skb
net:netif_rx
napi:napi_poll
syscalls:sys_enter_socket
syscalls:sys_exit_socket

To enable a particular event, such as 'kfree_skb', simply echo it to set_event. For example:
# echo sched_switch > current_tracer
# echo sched_switch >> /sys/kernel/debug/tracing/set_event
# head trace
# tracer: sched_switch
#
#           TASK-PID    CPU#    TIMESTAMP FUNCTION
#              | |       |          |         |
            bash-3256 [000] 3231.374613:   3256:120:S   + [000] 3256:120:S bash
            bash-3256 [000] 3231.374634:   3256:120:S ==> [000] 3294:120:R firefox-bin
            bash-3256 [000] 3231.374636: sched_switch: prev_comm=bash prev_pid=3256 prev_prio=120 prev_state=S ==> next_comm=firefox-bin next_pid=3294 next_prio=120
     firefox-bin-3294 [000] 3231.375123:   3294:120:R   + [000] 3526:120:R kworker/0:0
     firefox-bin-3294 [000] 3231.375132:   3294:120:R ==> [000] 3526:120:R kworker/0:0
     firefox-bin-3294 [000] 3231.375133: sched_switch: prev_comm=firefox-bin prev_pid=3294 prev_prio=120 prev_state=R ==> next_comm=kworker/0:0 next_pid=3526 next_prio=120