不看死不瞑目的文档 :Solaris container - workload Management Sample

来源:互联网 发布:网络家教 编辑:程序博客网 时间:2024/05/21 13:32

Managing Workloads An Example

Introduction

To demonstrate the concepts explained in the previous chapter, this chapter uses the Solaris OS resource management facilities to manage workloads on an example system. The system is shared by several business units and is running two workloads: two database instances, one for a marketing application and one for a sales application.

A project is defined for each workload, enabling the Fair Share Scheduler to be used to manage CPU allocation between the workloads. A resource control is added to limit the amount of shared memory for each workload. To account for all activity of the oracle user that is not related to either of these workloads, a third project is created. This project is the default project for the oracle user.

Requirements

The following minimum requirements are needed to run this example:

Oracle 9i media (version 9.2.0.1.0)

6 GB disk space for the Oracle binaries and databases

Defining the Projects

To keep things simple, a local /etc/project database is used. The project entry in the /etc/nsswitch.conf file should be defined as follows:

# cat /etc/nsswitch

...

project: files

...

By convention, Oracle instances are run as the user oracle in group dba. As a result, the group dba and user oracle are created:

# groupadd dba

# mkdir -p /export/home

# useradd -g dba -d /export/home/oracle -m -s /bin/bash oracle





A project named group.dba is created to serve as the default project for the user oracle. The system uses the rules described in the getprojent(3C) man page to determine the default project when a user logs in. Since the default group of user oracle is the dba group, the group.<groupname> rule matches and the group.dba project is set as the default project for user oracle. A comment describing the project is added using the -c option:

# projadd -c “Oracle default project” group.dba



The id(1M) command can be used to verify the default project for the oracle user:

# su - oracle

$ id -p

uid=100(oracle) gid=100(dba) projid=100(group.dba)

$ exit







To manage each Oracle instance as a separate workload, a project is created for each Oracle instance to run in: project ora_mkt for the marketing Oracle instance, and project ora_sales for the sales Oracle instance.

# projadd -c “Oracle Marketing” -U oracle ora_mkt

# projadd -c “Oracle Sales” -U oracle ora_sales

 



The -U oracle option specifies that the oracle user is allowed to run processes in these projects. Once these steps are complete, the /etc/project file contains the following information:

# cat /etc/project

system:0::::

user.root:1::::

noproject:2::::

default:3::::

group.staff:10::::

group.dba:100:Oracle default project:::

ora_mkt:101:Oracle Marketing:oracle::

ora_sales:102:oracle Sales:oracle::












The first five projects are projects that are created during system installation. Note that the system assigned project IDs for the last three projects since they were not explicitly specified on the projadd command.

System V IPC Resource Controls

The System V IPC resource limits in the Solaris 10 OS, such as the maximum shared memory size, are no longer set in the /etc/system file, but instead are project resource controls. As a result, a system reboot is not longer required to put changes to these parameters in effect. This also allows system administrators to set different values for different projects. A number of System V IPC parameters are obsolete with the Solaris 10 OS, simply because they are no longer necessary. The remaining parameters have more reasonable defaults to enable more applications to work out-of-the-box, without requiring these parameters to be set. The following table identifies the values recommended by the Oracle Installation Guide and the corresponding Solaris OS resource controls.

Since the default values are higher than Oracle recommended values, the only resource control that must be set is project.max-shm-memory. To set the maximum shared memory size to 2 GB, add the project.max-shm-memory=(privileged,2147483648,deny) resource control to the last field of the project entries for the three Oracle projects.

# projmod -sK “project.max-shm-memory=(privileged,2G,deny)” group.dba

# projmod -sK “project.max-shm-memory=(privileged,2G,deny)” ora_mkt

# projmod -sK “project.max-shm-memory=(privileged,2G,deny)” ora_sales





Once these steps are complete, the /etc/project file should contain the following. Note that changes are shown in italics.

 

# cat /etc/project

system:0::::

user.root:1::::

noproject:2::::

default:3::::

group.staff:10::::

group.dba:100:Oracle default project:::project.max-shmmemory=(

privileged,2147483648,deny)

ora_mkt:101:Oracle Marketing:oracle::project.max-shm-memory=(privileged,2147483648,deny)

ora_sales:102:oracle Sales:oracle::project.max-shm-memory=(privileged,2147483648,deny)














To verify that the resource control is active, the id(1M) and prctl(1) commands can be used.

# su - oracle

$ id -p

uid=100(oracle) gid=100(dba) projid=100(group.dba)

$ prctl -n project.max-shm-memory -i process $$

process: 5754: -bash

NAME PRIVILEGE VALUE FLAG ACTION

RECIPIENT

project.max-shm-memory

privileged 2.00GB - deny












Logging in as the oracle user creates a new task in the group.dba project, causing the entry in the project database to be read and the resource control to be set. As can be seen in the fifth line of output from the prtcl command, a resource control limiting the maximum shared memory size for the project to 2 GB is present.

Installing Oracle and Creating the Databases

Oracle installation consists of a series of steps, including software installation and the creation of smf(5) services for the Oracle instances. The procedure for installing Oracle is described in Appendix B on page 91.

In this example, a directory /u01 with at least 6 GB of free space is required for the Oracle software and databases. A simple database is created for each workload. These databases are created using the procedure described in Appendix C on page 93. Use the database identifiers listed in the table below.

Database

Database Identifier (ORACLE_SID)

Marketing

MKT

Sales

SALES

Running Oracle Instances in Different

Projects The Oracle instances must run in separate projects in order to control them as separate entities using the Solaris Resource Manager. The processes of the marketing database instance should run in project ora_mkt, and the processes of the sales database instance should run in the ora_sales project. Since the Oracle provided start scripts are not projectaware, the processes of both instances run in the default project of the Oracle user group.dba. To run the instances in different projects, the Oracle start scripts must be made project-aware by issuing /usr/bin/newtask -p ora_sales as part of the startup of the sales database instance. This moves the current process and its childeren to the ora_sales project.

The Service Management Facility (SMF) in the Solaris 10 OS replaces the traditional way of managing application startup and shutdown through run control scripts. SMF uses a concept called services to accomplish this task. An SMF service consists of a set of methods and properties that describe service behavior. Examples of methods include the start and stop methods that smf(5) calls to start or stop the service. Properties are used to describe the service, such as dependencies on other required services, the user to run the service as, and the project in which to run the service. Through a set of smf(5) commands, services can be managed in a consistent manner. See the System Administration Guide: Basic Administration for more information on the Service Management Facility.

To run the example Oracle database instances in separate projects, two imple SMF services must be created: a salesdb service and mktdb service. The service for the sales database is created by importing the manifest for the service into the SMF repository. By convention, manifests for site-specific services are placed in the directory /var/svc/manifest/site. A manifest is an XML file that defines service properties and methods. One of the properties of an SMF service is the user under which the service should run. In this example, the user is oracle. The project in which the service should run is also a service property. In this example, the project is ora_sales. The relevant part of the manifest is shown below. The full manifest for the sales database and marketing database services can be found in Appendix D on page 95.

# cd /var/svc/manifest/site

# cat salesdb.xml

[...]

<exec_method

type='method'

name='start'

exec='/u01/app/method/ora start SALES'

timeout_seconds='0'>

<method_context

project='ora_sales'>

<method_credential user='oracle' />

</method_context>

</exec_method>

[...]

The project attribute of the method_context element determines the project in which the service runs. The user attibute of the method_credential element determines the user under which the service runs. The manifest for the marketing database service is equivalent except that its project attibute is set to ora_mkt.

The start and stop methods for both services are implemented in a single shell script (/u01/app/method/ora). The start method calls the script with start as the first argument, while the stop method calls the script with stop as the first argument. The Oracle database identifier is passed as the second argument.

# cat /u01/app/method/ora

#!/bin/sh

#

# Usage: ora ‘start’ | ‘stop’ db_id

#

ORACLE_SID=$2

ORACLE_HOME=/u01/app/oracle/product/9.2.0.1.0

export ORACLE_SID ORACLE_HOME

case “$1” in

'start')

$ORACLE_HOME/bin/sqlplus "/ as sysdba" <<START_EOF

startup

START_EOF

;;

'stop')

$ORACLE_HOME/bin/sqlplus "/ as sysdba" <<STOP_EOF

shutdown immediate

STOP_EOF

;;

esac

exit 0

The services are created by importing the manifest and subsequently enabling the services. Note that enabling a service implies a start of the service. The ps(1) command can be used to verify the intances are running in different projects.

 

# svccfg import salesdb.xml

# svccfg import mktdb.xml

# svcadm enable salesdb

# svcadm enable mktdb

# ps -u oracle -o user,project,comm

USER PROJECT COMMAND

oracle ora_sales ora_lgwr_SALES

oracle ora_sales ora_smon_SALES

oracle ora_mkt ora_smon_MKT

oracle ora_sales ora_pmon_SALES

oracle ora_sales ora_dbw0_SALES

oracle ora_mkt ora_ckpt_MKT

oracle ora_sales ora_ckpt_SALES

oracle ora_mkt ora_lgwr_MKT

oracle ora_mkt ora_pmon_MKT

oracle ora_mkt ora_dbw0_MKT

oracle ora_sales ora_reco_SALES

oracle ora_mkt ora_reco_MKT

 

The processes for the marketing database instance run in the ora_mkt project, the processes for the Sales database instance run in the ora_sales project.

Controlling CPU Consumption

Now that the Oracle instances are running in different projects, the Fair Share Scheduler can be used to control CPU consumption by the instances. Because the Fair Share Scheduler is not the default scheduler, it must be enabled using the dispadmin(1M) command:

# dispadmin -d FSS

The dispadmin command configures the Fair Share Scheduler (FSS) as the default scheduler to be enabled on the next reboot. It is possible to change to the Fair Share Scheduler without a reboot by moving all processes in the TS scheduler class and the init(1M) process to the FSS scheduler class using the prioctl(1M) command. This change persists only until the next reboot, and the dispadmin -d FSS command is required to make the change permanent.

# priocntl -s -c FSS -i class TS

# priocntl -s -c FSS -i pid 1

The change of the scheduler class can be verified using the ps(1) command with the –café options. In the output below, the fourth column (marked CLS) shows that the Fair Share Scheduler (FSS) is now the scheduler for the processes:

# ps -cafe

UID PID PPID CLS PRI STIME TTY TIME CMD

root 0 0 SYS 96 Dec 01 ? 0:01 sched

root 1 0 FSS 29 Dec 01 ? 0:00 /etc/init -

root 2 0 SYS 98 Dec 01 ? 0:00 pageout

root 3 0 SYS 60 Dec 01 ? 9:45 fsflush

root 556 1 FSS 29 Dec 01 ? 0:00 /usr/lib/saf/sac -t 300

...

oracle 1967 1 FSS 29 11:03:35 ? 0:00 ora_dbw0_MKT

oracle 1971 1 FSS 29 11:03:36 ? 0:00 ora_ckpt_MKT

oracle 2002 1 FSS 29 11:03:47 ? 0:01 ora_smon_SALES

oracle 1973 1 FSS 29 11:03:36 ? 0:01 ora_smon_MKT

oracle 1965 1 FSS 29 11:03:35 ? 0:00 ora_pmon_MKT

oracle 1996 1 FSS 29 11:03:46 ? 0:00 ora_dbw0_SALES

oracle 1975 1 FSS 29 11:03:36 ? 0:00 ora_reco_MKT

oracle 1998 1 FSS 29 11:03:47 ? 0:00 ora_lgwr_SALES

oracle 1969 1 FSS 29 11:03:36 ? 0:00 ora_lgwr_MKT

oracle 2000 1 FSS 29 11:03:47 ? 0:00 ora_ckpt_SALES

oracle 1994 1 FSS 29 11:03:46 ? 0:00 ora_pmon_SALES

oracle 2004 1 FSS 29 11:03:47 ? 0:00 ora_reco_SALES

....

The final step involves assigning CPU shares to the projects to control CPU consumption. Assuming that the sales database is twice as important as the marketing database, and should therefore be entitled to twice the amount of CPU resources, the number of CPU shares for the ora_sales project is set to twice the number of shares for the ora_mkt project. The other projects are assumed to be less important, and their shares remain at system assigned default values. To give the ora_sales and ora_mkt projects a higher proportion of CPU resources with respect to these projects, the CPU shares are chosen to be much larger than those for the other projects. These values entitle the ora_sales project to twenty times more CPU resources than the group.dba project, and twice as many as the ora_mkt project.

Project

CPU Shares

ora_sales

20

ora_mkt

10

group.dba

1 (default)

system

Unlimited

user.root

1 (default)

default 1

(default)

group.staff

1 (default)

The CPU shares are set using the prctl(1M) command:

# prctl -n project.cpu-shares -r -v 10 -i project ora_mkt

# prctl -n project.cpu-shares -r -v 20 -i project ora_sales

The current value of the project.cpu-shares resource control for a project can be checked as follows:

# prctl -n project.cpu-shares -i project ora_mkt

project: 101: ora_mkt

NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT

project.cpu-shares

privileged 10 - none -

system 65.5K max none -

# prctl -n project.cpu-shares -i project ora_sales

project: 102: ora_sales

NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT

project.cpu-shares

privileged 20 - none -

system 65.5K max none -

To make these values persistent, the project.cpu-shares resource controls must be added to the project database.

# projmod -sK “project.cpu-shares=(privileged,10,none)” ora_mkt

# projmod -sK “project.cpu-shares=(privileged,20,none)” ora_sales

# cat /etc/project

system:0::::

user.root:1::::

noproject:2::::

default:3::::

group.staff:10::::

group.dba:100:Oracle DBA:::project.max-shm-memory=(privileged,2147483648,deny)

ora_mkt:101:Oracle Marketing:oracle::project.cpushares=(

privileged,10,none);project.max-shm-memory=(privileged,2147483648,deny)

ora_sales:102:Oracle Sales:oracle::project.cpushares=(

privileged,20,none);project.max-shm-memory=(privileged,2147483648,deny)

Note A project entry must be on one line. The above lines have been wrapped for readability. They should be on one line.

For demonstration purposes, the nspin utility is used to create enough CPU demand to show the Fair Share Scheduler in action1. The nspin utility is part of the Solaris Resource Manager 1.x software, and is available for download at

http://www.sun.com/bigadmin/software/nspin/nspin.tar.gz. To create more demand for CPU resources than are available on the 4 CPU machine used here, four copies of nspin are run in both the ora_mkt and ora_sales projects.

$ id -p

uid=100(oracle) gid=100(dba) projid=100(group.dba)

$ newtask -p ora_mkt

$ nspin -n 4 &

[1] 2059

$ newtask -p ora_sales

$ id -p

uid=100(oracle) gid=100(dba) projid=102(ora_sales)

$ nspin -n 4 &

[1] 2066

The newtask(1) command is used to switch from the default group.dba project to the ora_mkt and ora_sales projects to run nspin. The prstat(1M) command can be used to show CPU utilization per project and verify that the Fair Share Scheduler is distributing CPU resources to the projects according to their CPU shares.

$ prstat -J

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP

2069 oracle 1064K 592K run 1 0 0:01:57 25% nspin/1

2066 oracle 1072K 664K run 18 0 0:01:31 17% nspin/1

2067 oracle 1072K 600K cpu1 30 0 0:01:05 12% nspin/1

2068 oracle 1072K 600K run 28 0 0:01:06 12% nspin/1

2061 oracle 1072K 600K run 17 0 0:01:31 8.7% nspin/1

2059 oracle 1072K 664K run 17 0 0:01:07 8.3% nspin/1

2060 oracle 1072K 600K cpu0 24 0 0:01:06 8.2% nspin/1

2062 oracle 1064K 592K cpu3 18 0 0:01:13 7.9% nspin/1

2058 root 6056K 5040K cpu2 59 0 0:00:00 0.0% prstat/1

PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT

102 11 1011M 712M 36% 0:05:40 66% ora_sales

101 11 1011M 703M 36% 0:04:58 33% ora_mkt

1 5 14M 9064K 0.4% 0:00:01 0.0% user.root

100 1 2760K 1952K 0.1% 0:00:00 0.0% group.dba

0 28 84M 23M 1.1% 0:00:23 0.0% system

Total: 56 processes, 196 lwps, load averages: 7.30, 3.09, 1.21

 

The top portion of the prstat display shows active processes sorted by CPU utilization. The bottom portion shows the statistics aggregated by project. The ora_sales project is receiving 66% of CPU resources, and the ora_mkt project is receiving 33%, even though both projects requested the same amount of CPU (four runnable nspin processes in each project). The Fair Share Scheduler allocates CPU resources according to the proportion of CPU shares of the active projects (using CPU time). The only active projects at the time are ora_mkt and ora_sales. As a result, the CPU entitlement for the ora_sales project equals (20/(20 + 10)) * 100 = 67%, while ora_mkt is entitled to (10/(20 + 10)) * 100 = 33%. This matches the actual CPU usage observed using prstat(1M).

Using Extended Accounting

Resource usage per project can be obtained using the Extended Accounting facility of the Solaris OS. Accounting records can be written per process, per task or both. To obtain resource usage per project, task accounting is sufficient. Rather than summarizing all process termination records from the process accounting file, task accounting files can be used instead. This involves substantially fewer records since the task accounting files consolidate multiple process records into one task record. Because tasks usually have a long life span and task accounting records are only written at the end of a task, interval records can be used to obtain accurate daily accounting. An interval record writes the current task usage to the accounting file and resets the task usage to zero. The total task usage is the sum of all interval records plus the termination record. Examples of common long running tasks include HPC jobs and database processes.

Extended Accounting is turned off by default, and must be turned on using the acctadm(1M) command. In this example, the accounting files are named taskyyymmdd. A cron(1M) job is used to switch files every night at midnight. To start the extended accounting facility at system boot time, a link to /etc/init.d/acctadm must be created in /etc/rc2.d:

# acctadm -e extended task

# acctadm -f /var/adm/exacct/task`date '+%y%m%d'` task

# ln -s /etc/init.d/acctadm /etc/rc2.d/S01acctadm

The following script writes interval records for all tasks and then switches to a new accounting file:

# cat /opt/local/bin/switchexacct

#!/bin/sh

#

# Write interval record for all active tasks and switch accounting file

#

PATH=/usr/bin:/usr/sbin

wracct -i "`ps -efo taskid= | sort -u`" -t interval task

acctadm -f /var/adm/exacct/task`date '+%y%m%d'` task

Add the following line to the crontab of the root user to execute the switchexacct script at 00:00:

0 0 * * * /opt/local/bin/switchexacct > /dev/null 2>&1

The following script uses the Perl interface to libexacct to extract resource usage information from the extended accounting files. More information on the Perl interface to libexacct can be found in the Solaris 10 Resource Manager Developers Guide.

The script processes the file(s) given on the command line and summarizes the CPU usage per project by selecting all task and task interval records in the file(s). Assuming that the extended accounting files conform to the /var/adm/exacct/task<yymmdd> naming convention, a monthly report for February 2005 can be generated by running the following script.

# cpureport.pl /var/adm/exacct/task0502*

PROJECT USR+SYS

default 0

group.dba 0

ora_mkt 76945

ora_sales 116620

system 342

user.root 59

# cat cpureport.pl

#!/usr/perl5/5.6.1/bin/perl

# cpureport.pl - extract CPU usage per project from extended

# accounting files (CPU time in seconds)

use strict;

use warnings;

use Sun::Solaris::Exacct qw(:EXACCT_ALL);

use Sun::Solaris::Project qw(:ALL);

my %proj = ();

die("Usage: $0 file [file ...]/n") unless ($#ARGV >= 0);

# Process all files given on the commandline

foreach my $arg (0 .. $#ARGV) {

my $ef = ea_new_file($ARGV[$arg], &O_RDONLY) || die(ea_error_str());

while (my $obj = $ef->get()) {

if ( $obj->catalog()->id() == &EXD_GROUP_TASK ||

$obj->catalog()->id() == &EXD_GROUP_TASK_INTERVAL ) {

my $h = $obj->as_hash(); # returns all items in this group

my $projid = $h->{EXD_TASK_PROJID};

$proj{$projid}{CPU_SEC} += $h->{EXD_TASK_CPU_SYS_SEC};

$proj{$projid}{CPU_NSEC} += $h->{EXD_TASK_CPU_SYS_NSEC};

$proj{$projid}{CPU_SEC} += $h->{EXD_TASK_CPU_USER_SEC};

$proj{$projid}{CPU_NSEC} += $h->{EXD_TASK_CPU_USER_NSEC};

}

}

if (ea_error() != EXR_OK && ea_error() != EXR_EOF) {

printf("/nERROR: %s/n", ea_error_str());

exit(1);

}

}

# Calculate total CPU time (usr + sys) and round to whole seconds

# and lookup project names (invent name if lookup fails).

for my $key ( keys %proj ) {

my $one_second = 10 ** 9; # ns per second

if ( $proj{$key}{CPU_NSEC} >= $one_second ) {

my $seconds = $proj{$key}{CPU_NSEC} / $one_second;

$proj{$key}{CPU_SEC} += $seconds;

if ( $proj{$key}{CPU_NSEC} % $one_second >= ($one_second / 2) ) {

$proj{$key}{CPU_SEC}++;

}

}

my $name = getprojbyid($key);

if ( defined($name) ) {

$proj{$key}{PROJECT} = $name;

}

else {

$proj{$key}{PROJECT} = "<" . $key . ">";

}

}

# Print the CPU usage for the projects sorted by project name

printf("PROJECT USR+SYS/n");

for my $key ( sort { $proj{$a}{PROJECT} cmp $proj{$b}{PROJECT} } keys

%proj ) {

printf("%-16s %8d/n", $proj{$key}{PROJECT}, $proj{$key}{CPU_SEC});

}

exit(0);

 

 
原创粉丝点击