Solaris Performance and Tools 笔记（未完待续）

来源：互联网发布：知乎账号购买5w粉丝编辑：程序博客网时间：2024/06/06 00:04

分析方法

1. Monitoring. Using a system to record statistics over time. This data may reveallong term patterns that may be missed when using the regular stat tools.Monitoring may involve using SunMC, SNMP or sar.

2. Identification. For narrowing the investigation to particular resources, andidentifying possible bottlenecks. This may include kstat and procfs tools.

3. Analysis. For further examination of particular system areas. This may makeuse ofTRuss, DTrace, and MDB.

Typically, a systemwide view is the first place to start (the "stat" commands),along with a full process view (prstat(1)).

指标

Utilization（利用率）measures how busy a resource is and is usually represented as a percentageaverage over a time interval.Saturation（饱和度）is often a measure of work that has queued waiting for the resource and can bemeasured as both an average over time and at a particular point in time. Forsome resources that do not queue, saturation may be synthesized by errorcounts. Other terms that we use includethroughput（吞吐量） andhit ratio（命中率）,depending on the resource type.

CPU

The time spent on thesequeues, the length of thesequeues and the utilization of the systemprocessor are important metrics for quantifying CPU-related performancebottlenecks.

vmstat

Memory, run queue, and summarized processor utilization.

vmstat 5

 kthr      memory            page            disk          faults      cpu

 r b w   swap  free  re  mf pi po fr de sr dd f0 s1 --   in   sy  cs  us sy id

 0 0 0 1324808 319448 1   2  2  0  0  0  0  0  0  0  0  403   21  54   0  1 99

 2 0 0 1318528 302696 480 6 371 0  0  0  0 73  0  0  0  550 5971 190  84 16  0

 3 0 0 1318504 299824 597 0 371 0  0  0  0 48  0  0  0  498 8529 163  81 19  0

 2 0 0 1316624 297904 3   0 597 0  0  0  0 91  0  0  0  584 2009 242  84 16  0

 2 0 0 1311008 292288 2   0 485 0  0  0  0 83  0  0  0  569 2357 252  77 23  0

 2 0 0 1308240 289520 2   0 749 0  0  0  0 107 0  0  0  615 2246 290  82 18  0

 2 0 0 1307496 288768 5   0 201 0  0  0  0 58  0  0  0  518 2231 210  79 21  0

The first line is the summary since boot, followed by samples every five seconds.

CPU Saturation：

kthr

Total number of runnable threads on the dispatcher queues; used as a measure of CPU saturation

Any sustained non-zero value is likely to degrade performance.

· When a Solaris system hits 100%CPU utilization, there is no sudden dip in performance; the performancedegradation is gradual. Because of this,CPU saturationis often a better indicator of performance issues than is CPUutilization.

· When an application on a serverwith 10% CPU utilization wants the CPUs, they will almost always be availableimmediately. On a server with 100% CPU utilization, the same application willfind that the CPUs are already busyand will need to preempt the currentlyrunning thread or wait to be scheduled.

sar

The system activity reporter (sar) can provide live statistics or can be activated to record historical CPU statistics. You can identify long-term patterns.

sar 1 5

SunOS titan 5.11 snv_16 sun4u    02/27/2006

03:20:42    %usr    %sys    %wio   %idle

03:20:43      82      17       0       1

03:20:44      92       8       0       0

03:20:45      91       9       0       0

03:20:46      94       6       0       0

03:20:47      93       7       0       0

Average       91       9       0       0

An interval of 1 second and a count of 5 were specified.

· %usr, %sys (user, system). A commonly expected ratio is 70% usr and 30% sys, butthis depends on the application. Applications that useI/Oheavily, for example a busy Web server, can cause a muchhigher %sys due to a large number of system calls.Applications thatspend time processing userland code,for example, compression tools, can cause ahigher %usr.Kernel mode services, such as the NFS server, are %sys based.

· %wio (wait I/O). This statistichas now been deliberately set to zero in Solaris 10.

· %idle (idle).There are different mentalities for percent idle. One is that percent idleequalswasted CPU cycles and should be put to use.Another is that some level of%idle is healthy (anywhere from 20% to 80%) because it leaves "headroom" for short increases in activity to be dispatched quickly（留出足够的空间可以快速处理突发的活动）.

mpstat

For each CPU

mpstat 1

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl

  0    0   0  279   267  112  106    7    7   85    0   219   85   2   0  13

  1    1   0  102    99    0  177    9   15  119    2   381   72   3   0  26

  2    2   0   75   130    0  238   19   11   98    5   226   70   3   0  28

  3    1   0   94    32    0   39    8    6   95    2   380   81   2   0  17

  4    1   0   70    75    0  128   11    9   96    1   303   66   2   0  32

  5    1   0   88    62    0   99    7   11   89    1   370   74   2   0  24

  6    4   0   78    47    0   85   24    6   67    8   260   86   2   0  12

  7    2   0   73    29    0   45   21    5   57    7   241   85   1   0  14

…

· syscl(system calls).System calls per second.

· csw (context switches).

· icsw (number of involuntary context switches).不情愿的上下文切换

· migr(migrations of threads betweenprocessors). If possible,the OS tries to keep the threads on the last processor on which it ran. If thatprocessor is busy, the thread migrates.

prstat

To identify who is using the CPU.

Default output:

prstat

   PID USERNAME  SIZE   RSS STATE  PRI NICE       TIME  CPU PROCESS/NLWP

 25639 rmc      1613M   42M cpu22    0   10    0:33:10 3.1% filebench/2

 25655 rmc      1613M   42M cpu23    0   10    0:33:10 3.1% filebench/2

 25659 rmc      1613M   42M cpu30    0   10    0:33:11 3.1% filebench/2

…

 25638 rmc      1613M   42M cpu18    0   10    0:33:10 3.1% filebench/2

Total: 91 processes, 521 lwps, load averages: 29.06, 28.84, 26.68

· SIZE. The totalvirtual memory size of mappings within the process, including all mapped filesand devices.

· RSS. Resident setsize. Some or all of a process's virtual memory is backed by physical memory;we refer to that amount as a process's resident set size (RSS).

An average load average that exceeds the number of CPUs in the system is a typicalsign of an overloaded system.

prstat -mL

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID

 25644 rmc       98 1.5 0.0 0.0 0.0 0.0 0.0 0.1   0  36 693   0 filebench/2

 25660 rmc       98 1.7 0.1 0.0 0.0 0.0 0.0 0.1   2  44 693   0 filebench/2

…

 25637 rmc       97 1.7 0.1 0.0 0.0 0.0 0.3 0.6   4  95 693   0 filebench/2

Total: 91 processes, 510 lwps, load averages: 28.94, 28.66, 24.39

-m (show microstates) and -L(show per-thread). The columns USR through LAT sum to 100% ofthe time spent for each thread during the prstat sample.The important microstates for CPU utilization areUSR, SYS, and LAT. The USR and SYS columns are the user and system time that this threadspent running on the CPU. TheLAT (latency) columnis the amount of time spent waiting for CPU.A non-zeronumber means there was some queuing for CPU resources.（书上的例子实际是，百分之零点几是正常的，百分之几十就不正常了）

prstat -Lm -p 25691

-p (process id)

prstat -s rss

Sorting by a Key rss

prstat -t

A summary by user ID

Process

pmap

pmap -x 102908

 102908:   sh

 Address   Kbytes Resident   Anon  Locked Mode    Mapped File

 00010000      88      88       -       - r-x--   sh

 00036000       8       8       8       - rwx--   sh

 00038000      16      16      16       - rwx--     [ heap ]

 FF260000      16      16       -       - r-x--   en_.so.2

 FF272000      16      16       -       - rwx--   en_US.so.2

 FF280000     664     624       -       - r-x--   libc.so.1

 FF336000      32      32       8       - rwx--   libc.so.1

 FF360000      16      16       -       - r-x--   libc_psr.so.1

 FF380000      24      24       -       - r-x--   libgen.so.1

 FF396000       8       8       -       - rwx--   libgen.so.1

 FF3A0000       8       8       -       - r-x--   libdl.so.1

 FF3B0000       8       8       8       - rwx--     [ anon ]

 FF3C0000     152     152       -       - r-x--   ld.so.1

 FF3F6000       8       8       8       - rwx--   ld.so.1

 FFBFE000       8       8       8       - rw---     [ stack ]

 --------   -----   -----   -----   ------

 total Kb    1072    1032      56

-x (additional information  per  mapping)

plockstat

To observe hot lock behavior in user applications that use user-level locks.

待研究

truss

trace系统调用或用户函数。

用truss，同时开内部trace的话，即使只truss fm一个库，也会造成超时打不通电话。truss的性能比DTrace差很多：

Unlike TRuss, DTrace does not stop and start the process for each traced function; instead, DTrace collects data in per-CPU buffers which the dtrace command asynchronously reads. The overhead when using DTrace on a process does depend on the frequency of traced events but is usually less than that of truss.

要trace的实例程序如下：

w2liu@caxbs>more 1.h

#pragma once

int foo(int);

void bar();

w2liu@caxbs>cat 1.h

#pragma once

int foo(int);

void bar();

w2liu@caxbs>cat 1.c

#include <unistd.h>

void bar()

int foo(int i)

        sleep(1);

        bar();

        return i;

w2liu@caxbs>cat main.c

#include <unistd.h>

#include "1.h"

int main()

        int i = 0;

        for (; i < 1000; ++i)

                foo(i);

        return 0;

编译：

w2liu@caxbs>cc -G 1.c -o liba.so

w2liu@caxbs>cc liba.so main.c

注：运行a.out需要改LD_LIBRARY_PATH

truss -u liba ./a.out

trace用户函数和系统调用

w2liu@caxbs>truss -u liba ./a.out

execve("a.out", 0xFFBFF654, 0xFFBFF65C)  argc = 1

resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12

getcwd("/home/w2liu/test/truss", 1017)          = 0

resolvepath("/home/w2liu/test/truss/a.out", "/home/w2liu/test/truss/a.out", 1023) = 28

stat("/home/w2liu/test/truss/a.out", 0xFFBFF430) = 0

open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT

stat("/home/w2liu/lib/liba.so", 0xFFBFEEE8)     Err#2 ENOENT

stat("./liba.so", 0xFFBFEEE8)                   = 0

open("./liba.so", O_RDONLY)                     = 3

mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFF3A0000

mmap(0x00010000, 73728, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF380000

mmap(0xFF380000, 868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFF380000

mmap(0xFF390000, 1012, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 0) = 0xFF390000

munmap(0xFF382000, 57344)                       = 0

memcntl(0xFF380000, 724, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

close(3)                                        = 0

getcwd("/home/w2liu/test/truss", 1013)          = 0

resolvepath("/home/w2liu/test/truss/liba.soso", "/home/w2liu/test/truss/liba.so", 1023) = 30

stat("/home/w2liu/lib/libc.so.1", 0xFFBFEEE8)   Err#2 ENOENT

stat("./libc.so.1", 0xFFBFEEE8)                 Err#2 ENOENT

stat("/cax5800/pltform/testlib/libc.so.1", 0xFFBFEEE8) Err#2 ENOENT

stat("/cax5800/lib/libc.so.1", 0xFFBFEEE8)      Err#2 ENOENT

stat("/surpass/cax5800/lib/libc.so.1", 0xFFBFEEE8) Err#2 ENOENT

stat("/opt/SMAW/SMAWrtp/lib/libc.so.1", 0xFFBFEEE8) Err#2 ENOENT

stat("/export/home/oracle/products/10.2.0/lib32/libc.so.1", 0xFFBFEEE8) Err#2 ENOENT

stat("/usr/lib/libc.so.1", 0xFFBFEEE8)          = 0

resolvepath("/usr/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14

open("/usr/lib/libc.so.1", O_RDONLY)            = 3

mmap(0xFF3A0000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000

mmap(0x00010000, 1024000, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF280000

mmap(0xFF280000, 908061, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFF280000

mmap(0xFF36E000, 35537, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 909312) = 0xFF36E000

mmap(0xFF378000, 1312, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFF378000

munmap(0xFF35E000, 65536)                       = 0

mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF360000

memcntl(0xFF280000, 144252, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

close(3)                                        = 0

mprotect(0xFF380000, 868, PROT_READ|PROT_WRITE|PROT_EXEC) = 0

mprotect(0xFF380000, 868, PROT_READ|PROT_EXEC)  = 0

/1:     mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF270000

/1:     munmap(0xFF3A0000, 32768)                       = 0

/1:     getcontext(0xFFBFF120)

/1:     getrlimit(RLIMIT_STACK, 0xFFBFF100)             = 0

/1:     getpid()                                        = 15896 [15895]

/1:     setustack(0xFF272A88)

/1@1:   -> liba:_init(0xff3f40fc, 0xff3f5a70, 0x2b3f4, 0x0)

/1@1:   <- liba:_init() = 0xff3f40fc

/1@1:   -> liba:foo(0x0, 0x1c00, 0xff373700, 0x4)

/1:     nanosleep(0xFFBFF518, 0xFFBFF510)               = 0

/1@1:   <- liba:foo() = 0

/1@1:   -> liba:foo(0x1, 0x1c00, 0xff373700, 0x4)

/1:     nanosleep(0xFFBFF518, 0xFFBFF510)               = 0

/1@1:   <- liba:foo() = 1

…

-u [!]lib,...:[:][!]func,... (trace用户函数)

· lib的格式，例如想trace liba.so，用liba，不要加路径和后缀（扩展名）

· !代表排除，不trace这个lib或函数。用了!，在bash下必须用引号将其引起来，否则会报event not found，如： truss -u liba:'!foo' ./a.out

· lib和func支持通配符

· 单个:表示只trace从外部动态库到要trace的动态库的函数调用，忽略要trace的库内部的函数调用。两个:表示不管调用的来源，trace所有的函数调用。此规则只对动态库有效，对可执行文件无效

w2liu@caxbs>truss -u liba:* -t '!all' ./a.out

/1@1:   -> liba:_init(0xff3f40fc, 0xff3f5a70, 0x2b3f4, 0x0)

/1@1:   <- liba:_init() = 0xff3f40fc

/1@1:   -> liba:foo(0x0, 0x1c00, 0xff373700, 0x4)

/1@1:   <- liba:foo() = 0

/1@1:   -> liba:foo(0x1, 0x1c00, 0xff373700, 0x4)

/1@1:   <- liba:foo() = 1

…

单个:，只能trace到foo，trace不到bar

w2liu@caxbs>truss -u liba::* -t '!all' ./a.out

/1@1:   -> liba:_init(0xff3f40fc, 0xff3f5a70, 0x2b3f4, 0x0)

/1@1:   <- liba:_init() = 0xff3f40fc

/1@1:   -> liba:foo(0x0, 0x1c00, 0xff373700, 0x4)

/1@1:     -> liba:bar(0x0, 0x0, 0x0, 0x0)

/1@1:     <- liba:bar() = 0

/1@1:   <- liba:foo() = 0

/1@1:   -> liba:foo(0x1, 0x1c00, 0xff373700, 0x4)

/1@1:     -> liba:bar(0x0, 0x0, 0x0, 0x0)

/1@1:     <- liba:bar() = 0

/1@1:   <- liba:foo() = 1

…

两个:，能trace到foo和bar

truss -u liba::* -t '!all'./a.out

只trace用户函数

w2liu@caxbs>truss -u liba::* -t '!all' ./a.out

/1@1:   -> liba:_init(0xff3f40fc, 0xff3f5a70, 0x2b3f4, 0x0)

/1@1:   <- liba:_init() = 0xff3f40fc

/1@1:   -> liba:foo(0x0, 0x1c00, 0xff373700, 0x4)

/1@1:     -> liba:bar(0x0, 0x0, 0x0, 0x0)

/1@1:     <- liba:bar() = 0

/1@1:   <- liba:foo() = 0

/1@1:   -> liba:foo(0x1, 0x1c00, 0xff373700, 0x4)

/1@1:     -> liba:bar(0x0, 0x0, 0x0, 0x0)

/1@1:     <- liba:bar() = 0

/1@1:   <- liba:foo() = 1

…

-t [!]syscall,... (trace系统调用)

truss -u liba::* -t '!all' -d ./a.out

带时间戳的

w2liu@caxbs>truss -u liba::* -t '!all' -d ./a.out

Base time stamp:  1316155446.3763  [ Fri Sep 16 14:44:06 CST 2011 ]

/1@1:    0.0518 -> liba:_init(0xff3f40fc, 0xff3f5a70, 0x2b3f4, 0x0)

/1@1:    0.0536 <- liba:_init() = 0xff3f40fc

/1@1:    0.0543 -> liba:foo(0x0, 0x1c00, 0xff373700, 0x4)

/1@1:    1.0639   -> liba:bar(0x0, 0x0, 0x0, 0x0)

/1@1:    1.0650   <- liba:bar() = 0

/1@1:    1.0653 <- liba:foo() = 0

…

truss -u liba::* -t '!all' -p 26274

-p (process id)

apptrace

居然没找到这个工具，网上也没找到下载

Disk

· Environment: 随机还是顺序访问？单个还是阵列？

· Utilization. Thepercent busy value from iostat -x serves as a utilization value for diskdevices. The calculation behind it is based onthe timea device spends active. It is a useful starting point for understandingdisk usage.

· Saturation. The average wait queue length from iostat -x is a measureof disk saturation.

· Throughput. The kilobytes/sec values from iostat -x can also indicatedisk activity, and for storage arrays they may be the only meaningful metricthat Solaris provides.

· I/O rate. The numberof disk transactions per second can be seen bymeans of iostat or DTrace. The number is interesting because each operationincurs a certain overhead. This termis also known asIOPS (I/O operations per second).

· I/O sizes. You cancalculate the size of disk transactions from iostat -x by using the(kr/s + kw/s) / (r/s + w/s) ratio,which gives average event size; or you can measure the size directly withDTrace. Throughput is usually improved when larger events are used.

· Service times. The average wait queue and active service times can beprinted from iostat -x. Longer service times are likely to degrade performance.

· History. sar can beactivated to archive historical disk activity statistics. Long-term patternscan be identified from this data, which also provides a reference for whatstatistics are "normal" for your disks.

· Seek sizes. DTrace canmeasure the size of each disk head seek and present this data in a meaningfulreport.

· I/O time. Measuring the time a disk spends servicing an I/O event isvaluable because it takes into account various costs of performing an I/Ooperation: seek time, rotation time, and the time to transfer data. DTrace canfetch event time data.

Random vs. Sequential I/O

顺序访问比随机访问性能好，因为cache能起作用。

Storage Arrays

RAID的方式，cache的大小，是否是write through模式。

To actually understand storage array utilization, you must fetchstatistics from the storage array controller itself. Of interest arecache hit ratios and arraycontroller CPU utilization.

Sector Zoning

数据越放在磁盘的外部扇区，访问的速度越快。

A common procedure that takes advantage of this behavior is to slice disksso that the most commonly accessed data is positioned near the outside edge.

Max I/O Size

By default this is 128 Kbytes on SPARC systems and 56 Kbytes on x86systems.

Disk Utilization

It is based on the time a device spends active.

数据库应用的磁盘使用率往往很高。

随机访问和顺序访问的磁盘利用率没有可比性。

Storage arrays may report 100% utilization when in fact they are able toaccept more transactions. Solaris doesn't see what really happens on storagearray disks.

The utilization value is useful as a starting point, but it's notabsolute.

Disk Saturation

The average wait queue length.

A sustained level of disk saturation usually means a performance problem.

Disk Throughput

kr/s kw/s

Often with storage arrays, throughput is the only statistic available fromiostat that is accurate.

iostat Utility

iostat -xnz 5

                    extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b  device

    0.2    0.2    1.1    1.4  0.0  0.0    6.6    6.9   0   0  c0t0d0

    0.0    0.0    0.0    0.0  0.0  0.0    0.0    7.7   0   0  c0t2d0

    0.0    0.0    0.0    0.0  0.0  0.0    0.0    3.0   0   0  mars:vold(pid512)

                    extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b  device

  277.1    0.0 2216.4    0.0  0.0  0.6    0.0    2.1   0  58  c0t0d0

                    extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b  device

   79.8    0.0  910.0    0.0  0.4  1.9    5.1   23.6  41  98  c0t0d0

                    extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b  device

   87.0    0.0  738.5    0.0  0.8  2.0    9.4   22.4  65  99  c0t0d0

                    extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b  device

   92.2    0.6  780.4    2.2  2.1  1.9   22.8   21.0  87  98  c0t0d0

                    extended device statistics

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b  device

  101.4    0.0  826.6    0.0  0.8  1.9    8.0   19.0  46  99  c0t0d0

-n (uses logical name for the device)

-x (extended device statistics, printing a line perdevice)

-z (lines with zero activity are not printed)

The first output is the summary since boot, followed by samples every fiveseconds.

每列的含义：

· r/s. 每秒读的次数

· w/s. 每秒写的次数

· kr/s. 每秒读的KB数

· kw/s. 每秒写的KB数

· wait. Average number of transactions queued andwaiting

· actv. Average number of transactionsactivelybeing serviced

· wsvc_t. Average time a transaction spends on thewait queue

· asvc_t. Average time a transaction isactiveor running

· %w. Percent wait, based onthe time that transactions were queued

· %b. Percent busy, based on the time that the device wasactive

· Device. 设备

iostat -p, -P

-p (按分区显示，并显示汇总)

-P (按分区显示，不并显示汇总)

iostat -e

-e (error statistics)