ftrace framework

来源：互联网发布：厦大网络课程中心编辑：程序博客网时间：2024/06/14 02:33

//--------------------------------------------------------------------------
// ftrace.txt

Introduction
------------

Ftrace is an internal tracer designed to help out developers and
designers of systems to find what is going on inside the kernel.
It can be used for debugging or analyzing latencies and
performance issues that take place outside of user-space.

Although ftrace is typically considered the function tracer, it
is really a frame work of several assorted tracing utilities.
There's latency tracing to examine what occurs between interrupts
disabled and enabled, as well as for preemption and from a time
a task is woken to the task is actually scheduled in.

One of the most common uses of ftrace is the event tracing.
Through out the kernel is hundreds of static event points that
can be enabled via the debugfs file system to see what is
going on in certain parts of the kernel.

Implementation Details
----------------------

See ftrace-design.txt for details for arch porters and such.

The File System
---------------

Ftrace uses the debugfs file system to hold the control files as
well as the files to display output.

The Tracers
-----------

Examples of using the tracer
----------------------------

//--------------------------------------------------------------------------

//ftrace-design.txt
在编译的时候,就注入了接口, 另外:HAVE_FUNCTION_TRACER和FUNCTION_TRACER是两个不同的宏

ifdef CONFIG_FUNCTION_TRACER
ORIG_CFLAGS := $(KBUILD_CFLAGS)
KBUILD_CFLAGS = $(subst -pg,,$(ORIG_CFLAGS))
endif

HAVE_FUNCTION_TRACER
--------------------
You will need to implement the mcount and the ftrace_stub functions.

The exact mcount symbol name will depend on your toolchain. Some call it
"mcount", "_mcount", or even "__mcount". You can probably figure it out by
running something like:
$ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount
call mcount
We'll make the assumption below that the symbol is "mcount" just to keep things
nice and simple in the examples.

Keep in mind that the ABI that is in effect inside of the mcount function is
*highly* architecture/toolchain specific. We cannot help you in this regard,
sorry. Dig up some old documentation and/or find someone more familiar than
you to bang ideas off of. Typically, register usage (argument/scratch/etc...)
is a major issue at this point, especially in relation to the location of the
mcount call (before/after function prologue). You might also want to look at
how glibc has implemented the mcount function for your architecture. It might
be (semi-)relevant.

The mcount function should check the function pointer ftrace_trace_function
to see if it is set to ftrace_stub. If it is, there is nothing for you to do,
so return immediately. If it isn't, then call that function in the same way
the mcount function normally calls __mcount_internal -- the first argument is
the "frompc" while the second argument is the "selfpc" (adjusted to remove the
size of the mcount call that is embedded in the function).

For example, if the function foo() calls bar(), when the bar() function calls
mcount(), the arguments mcount() will pass to the tracer are:
"frompc" - the address bar() will use to return to foo()
"selfpc" - the address bar() (with mcount() size adjustment)

Also keep in mind that this mcount function will be called *a lot*, so
optimizing for the default case of no tracer will help the smooth running of
your system when tracing is disabled. So the start of the mcount function is
typically the bare minimum with checking things before returning. That also
means the code flow should usually be kept linear (i.e. no branching in the nop
case). This is of course an optimization and not a hard requirement.

Here is some pseudo code that should help (these functions should actually be
implemented in assembly):

void ftrace_stub(void)
{
return;
}

void mcount(void)
{
/* save any bare state needed in order to do initial checking */

extern void (*ftrace_trace_function)(unsigned long, unsigned long);
if (ftrace_trace_function != ftrace_stub)
goto do_trace;

/* restore any bare state */

return;

do_trace:

/* save all state needed by the ABI (see paragraph above) */

unsigned long frompc = ...;
unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
ftrace_trace_function(frompc, selfpc);

/* restore all state needed by the ABI */
}

//--------------------------------------------------------------------------
tracepoints.txt

This document introduces Linux Kernel Tracepoints and their use. It
provides examples of how to insert tracepoints in the kernel and
connect probe functions to them and provides some examples of probe
functions.

* Purpose of tracepoints

A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty
(checking a condition for a branch) and space penalty (adding a few
bytes for the function call at the end of the instrumented function
and adds a data structure in a separate section). When a tracepoint
is "on", the function you provide is called each time the tracepoint
is executed, in the execution context of the caller. When the function
provided ends its execution, it returns to the caller (continuing from
the tracepoint site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters,
which prototypes are described in a tracepoint declaration placed in a
header file.

They can be used for tracing and performance accounting.

* Usage

Two elements are required for tracepoints :

- A tracepoint definition, placed in a header file.
- The tracepoint statement, in C code.

In order to use tracepoints, you should include linux/tracepoint.h.

In include/trace/events/subsys.h :

#undef TRACE_SYSTEM
#define TRACE_SYSTEM subsys

#if !defined(_TRACE_SUBSYS_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_SUBSYS_H

#include <linux/tracepoint.h>

DECLARE_TRACE(subsys_eventname,
TP_PROTO(int firstarg, struct task_struct *p),
TP_ARGS(firstarg, p));

#endif /* _TRACE_SUBSYS_H */

/* This part must be outside protection */
#include <trace/define_trace.h>

In subsys/file.c (where the tracing statement must be added) :

#include <trace/events/subsys.h>

#define CREATE_TRACE_POINTS
DEFINE_TRACE(subsys_eventname);

void somefct(void)
{
...
trace_subsys_eventname(arg, task);
...
}

Where :
- subsys_eventname is an identifier unique to your event
- subsys is the name of your subsystem.
- eventname is the name of the event to trace.

- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
function called by this tracepoint.

- TP_ARGS(firstarg, p) are the parameters names, same as found in the
prototype.

- if you use the header in multiple source files, #define CREATE_TRACE_POINTS
should appear only in one source file.

Connecting a function (probe) to a tracepoint is done by providing a
probe (function to call) for the specific tracepoint through
register_trace_subsys_eventname(). Removing a probe is done through
unregister_trace_subsys_eventname(); it will remove the probe.

tracepoint_synchronize_unregister() must be called before the end of
the module exit function to make sure there is no caller left using
the probe. This, and the fact that preemption is disabled around the
probe call, make sure that probe removal and module unload are safe.

The tracepoint mechanism supports inserting multiple instances of the
same tracepoint, but a single definition must be made of a given
tracepoint name over all the kernel to make sure no type conflict will
occur. Name mangling of the tracepoints is done using the prototypes
to make sure typing is correct. Verification of probe type correctness
is done at the registration site by the compiler. Tracepoints can be
put in inline functions, inlined static functions, and unrolled loops
as well as regular functions.

The naming scheme "subsys_event" is suggested here as a convention
intended to limit collisions. Tracepoint names are global to the
kernel: they are considered as being the same whether they are in the
core kernel image or in modules.

If the tracepoint has to be used in kernel modules, an
EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
used to export the defined tracepoints.

If you need to do a bit of work for a tracepoint parameter, and
that work is only used for the tracepoint, that work can be encapsulated
within an if statement with the following:

if (trace_foo_bar_enabled()) {
int i;
int tot = 0;

for (i = 0; i < count; i++)
tot += calculate_nuggets();

trace_foo_bar(tot);
}

All trace_<tracepoint>() calls have a matching trace_<tracepoint>_enabled()
function defined that returns true if the tracepoint is enabled and
false otherwise. The trace_<tracepoint>() should always be within the
block of the if (trace_<tracepoint>_enabled()) to prevent races between
the tracepoint being enabled and the check being seen.

The advantage of using the trace_<tracepoint>_enabled() is that it uses
the static_key of the tracepoint to allow the if statement to be implemented
with jump labels and avoid conditional branches.

//------------------------------------------------------------------------
event:
1. Introduction
===============

Tracepoints (see Documentation/trace/tracepoints.txt) can be used
without creating custom kernel modules to register probe functions
using the event tracing infrastructure.

Not all tracepoints can be traced using the event tracing system;
the kernel developer must provide code snippets which define how the
tracing information is saved into the tracing buffer, and how the
tracing information should be printed.

0 0