When you need to measure how many system resources your application consumes, you need to pay particular attention to the following:

  • Disk I/O. Amount of read and write disk activity. I/O bottlenecks occur if read and write operations begin to queue.
  • Memory. Amount of available memory, virtual memory, and cache utilization.
  • Network. Percent of the available bandwidth being utilized, network bottlenecks.
  • Processor. Processor utilization, context switches, interrupts and so on.

The next sections describe the performance counters that help you measure the preceding metrics. System Overview (General operating system performance analysis. Use this for a general analysis of the operating system performance counters) Formatting:

  • Counter (Explanation)
    • Thresholds
  • Disk

    • \LogicalDisk(*)\Avg. Disk sec/Read (Avg. Disk sec/Read is the average time, in seconds, of a read of data to the disk. This analysis determines if any of the physical disks are responding slowly)
      • Average disk responsiveness is slow – more than 15ms
      • Average disk responsiveness is very slow – more than 25ms
      • Disk responsiveness is very slow (spike of more than 25ms)
    • \LogicalDisk(*)\Avg. Disk sec/Write
      (Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk. This analysis determines if any of the physical disks are responding slowly)
      • Average disk responsiveness is slow – more than 15ms
      • Average disk responsiveness is very slow – more than 25ms
      • Disk responsiveness is very slow (spike of more than 25ms)
    • \LogicalDisk(*)\Disk Transfers/sec (Disk Transfers/sec is the rate of read and write operations on the disk)
      • Less than 80 I/O’s per second on average when disk latency is longer than 25ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.
      • Less than 80 I/O’s per second on average when disk latency is longer than 25ms. This may indicate too many virtual LUNs using the same physical disks on a SAN. This was a spike – not an average.
    • \PhysicalDisk(*)\Avg. Disk sec/Read (Avg. Disk sec/Read is the average time, in seconds, of a read of data to the disk. This analysis determines if any of the physical disks are responding slowly)
      • Average disk responsiveness is slow – more than 15ms
      • Average disk responsiveness is very slow – more than 25ms
      • Disk responsiveness is very slow (spike of more than 25ms)
    • \PhysicalDisk(*)\Avg. Disk sec/Write (Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk. This analysis determines if any of the physical disks are responding slowly)
      • Average disk responsiveness is slow – more than 15ms
      • Average disk responsiveness is very slow – more than 25ms
      • Disk responsiveness is very slow (spike of more than 25ms
    • \Process(*)\IO Data Operations/sec
      (The rate at which the process is issuing read and write I/O operations. This counter counts all I/O activity generated by the process to include file, network and device I/Os)
      • This process is using more than 1000 data I/O’s per second
    • \Process(*)\IO Other Operations/sec
      (The rate at which the process is issuing I/O operations that are neither read nor write operations (for example, a control function). This counter counts all I/O activity generated by the process to include file, network and device I/Os)
      • This process is using more than 1000 data I/O’s per second
  • Memory

Kernel Mode Memory

  • \Memory\Available MBytes (Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required)
    • Low on available memory – less than 10% available
    • Very low on available memory – less than 5% available
    • Decreasing trend of 10MB’s per hour. This could indicate a memory leak.
  • \Memory\Free System Page Table Entries (Free System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hangs)
    • Running low on PTE’s – less than 10,000 (If the free PTEs are under 10,000 the system is close to a system wide hang)
    • Critically low on PTE’s – less than 5000 (If the free PTEs are under 5000 the system is close to a system wide hang)
  • \Memory\Pages Input/sec (Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\\Pages Input/sec to the value of Memory\\Page Reads/sec to determine the average number of pages read into memory during each read operation)
    • More then 10 page file reads per second
  • \Memory\Pages/sec (If it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files)
    • High pages/sec – greater than 1000 (If it’s higher than 1000, the system is could be beginning to run out of memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory)
    • Very high average pages/sec – greater than 2500 (If this is greater than 2500, the system could be experiencing system-wide delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory)
    • Critically high average pages/sec – greater than 5000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory)
    • Spike in pages/sec – greater than 1000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory)
  • \Memory\Pool Nonpaged Bytes
    • Low on Pool NonPaged memory
      - less than 40% available (If the systems exceeds more that 60% of the Pool Non-paged bytes memory pool, then consider removing the /3GB switch or consider migrating to a 64-bit system.
    • Critically low on Pool NonPaged memory – less than 20% available (If the system exceeds 80% of the Pool Non-paged bytes memory pool. If so, consider removing the /3GB switch or consider migrating to a 64-bit system.
  • \Memory\Pool Paged Bytes (if the system is becoming close to the maximum Pool paged memory size. Pool Paged Bytes is the size, in bytes, of the paged pool, an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used)
    • Low on Pool Paged memory – less than 40% available
    • Critically low on Pool Paged memory – less than 20% available

User Mode Memory

  • \Process(*)\Private Bytes (Private Bytes is the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes)
    • For Windows 32 Bit: 250MB delta between Minimum Size and Maximum Size
      (Maximum – Minimum = !>(not greater than) 250MB)
    • For Windows 64 Bit: 500MB delta between Minimum Size and Maximum Size
      (Maximum – Minimum = !> (not greater than) 500MB)
  • \Process(*)\Working Set (Working Set is the current size, in bytes, of the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before leaving main memory)
    • For Windows 32 Bit: 250MB delta between Minimum Size and Maximum Size
      (Maximum – Minimum = !>(not greater than) 250MB)
    • For Windows 64 Bit: 500MB delta between Minimum Size and Maximum Size
      (Maximum – Minimum = !> (not greater than) 500MB)
        • \Process(*)Thread Count (The number of threads currently active in this process. An instruction is the basic unit of execution in a processor, and a thread is the object that executes instructions. Every running process has at least one thread.)
          • For Windows 32 Bit: For 2GB maximum 2048 threads
          • For Windows 64 Bit: For 2GB memory maximum 6600 threads

          \Process(*)\Handle Count (How many handles each process has open and determines if a handle leaks is suspected. A process with a large number of handles and/or an aggresive upward trend could indicate a handle leak which typically results in a memory leak. The total number of handles currently open by this process. This number is equal to the sum of the handles currently open by each thread in this process)

            • For Windows 32 Bit: For most processes, if higher than 2,500 handles open, investigate.
              Exceptions are:
              System 10,000
              lsass.exe 30,000
              store.exe 30,000
              sqlsrvr.exe 30,000
              a
            • For Windows 64 Bit: For most processes, if higher than 3,000 handles open, investigate.
              Exceptions are:
              System 20,000
              lsass.exe 50,000
              store.exe 50,000
              sqlsrvr.exe 50,000

          • Network

            • \Network Interface(*)\Output Queue Length
              • High Network I/O – more than 1 thread waiting on network I/O (If the output queue length is greater than 1. If so, this system’s network is nearing capacity. Consider analyzing network traffic to determine why network I/O is nearing capacity such as *chatty* network services and/or large data transfers)
              • Very high network I/O – more than 2 threads waiting on network I/O (if the output queue length is greater than 2. If so, this system’s network is over capacity. Consider analyzing network traffic to determine why network I/O is nearing capacity such as *chatty* network services and/or large data transfers)
            • Network Utilization Analysis (Bytes Total/sec is the rate at which bytes are sent and received over each network adapter, including framing characters. Network Interface\Bytes Received/sec is a sum of Network Interface\Bytes Received/sec and Network Interface\Bytes Sent/sec. This counter indicates the rate at which bytes are sent and received over each network adapter. This counter helps you know whether the traffic at your network adapter is saturated and if you need to add another network adapter. How quickly you can identify a problem depends on the type of network you have as well as whether you share bandwidth with other applications.
              • \Network Interface(*)\Bytes Total/sec
              • \Network Interface(*)\Current Bandwidth
                • Thresholds:
                  • High average network utilization – more than 50%
                  • Very high average network utilization – more than 80%
            • Server\Bytes Total/sec (This counter indicates the number of bytes sent and received over the network. Higher values indicate network bandwidth as the bottleneck. If the sum of Bytes Total/sec for all servers is roughly equal to the maximum transfer rates of your network, you may need to segment the network)
              • Not be more than 50 percent of network capacity.
          • Processor:

            • Processor\% Processor Time (This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating)
              • Less than 60% consumed = Healthy
              • 51% – 90% consumed = Monitor or Caution
              • 91% – 100% consumed = Critical or Out of Spec
            • \Processor\% Privileged Time
              • Consistently over 75 percent indicates a bottleneck.
            • \System\Context Switches/sec (Context switching happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system. If you do not see much processor utilization and you see very low levels of context switching, it could indicate that threads are blocked)
              • High context switches/sec – more than 5000 context switches per second
              • Very high context switches/sec – more than 15,000 context switches per second
            • \Processor(*)\% Interrupt Time
              (This counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems)
              • High CPU Interrupt Time – more than 30% interrupt time (A high amount of % Interrupt Time in the processor could indicate a hardware or driver problem)
              • Very high CPU Interrupt Time – more than 50% interrupt time (A very high amount of % Interrupt Time in the processor could indicate a hardware or driver problem)
            • System\Processor Queue Length (If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor\% Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved)
            • Each processor has 10 or more threads waiting.(Determines if the average processor queue length exceeds the number of processors by 10. If this threshold is broken, then the processor(s) may be at capacity)

          Each processor has 20 or more threads waiting(Determines if the average processor queue length exceeds twenty times the number of processors. If this threshold is broken, then the processor(s) are beyond capacity)