UE4 Performance and Profiling
来源:互联网 发布:软件英语流利说 编辑:程序博客网 时间:2024/06/05 18:30
Performance is an omnipresent topic in making real-time games. In order to create the illusion of moving images, we need a frame rate of at least 15 frames per second. Depending on the platform and game, 30, 60, or even more frames per second may be the target.
Unreal Engine provides many features and they have different performance characteristics. In order to optimize the content or code to achieve the required performance, you need to see where the performance is spent. For that, you can use the engine profiling tools. Every case is different and some knowledge about the internals of hardware and software is needed. Here we collected some details that should help you in that task.
This guide primarily covers rendering topics because this is the area is where most performance is needed. Adding more objects, increasing the resolution, adding lights, and improving materials all have a substantial impact on performance. Luckily in rendering, it is often easy to get back some performance. Many rendering features can be adjusted by console variables.
You can use the editor output log or the in-game console to:
Set console variables (cvarname value)
Get the current state (cvarname)
Get help for a variable cvarname ?.
If needed, you can put the settings in ConsoleVariables.ini
with this syntax: cvarname=value. To find the right console variable, you can useDumpConsoleVars or the auto completion system. Most rendering variables start with r..
For more about console variables and toggling other engine options to improve performance, see the Adjusting Engine Feature Levelsreference page.
General Tips
Ideally, you should profile as close to the target you care about as possible. For example, a good profiling case is testing your game in standalone form on the target hardware with built lightmaps.
For good profiling, it is best to setup a reproducible case isolated from things that can add noise to the results. Even the editor adds some noise (e.g. an open Content Browser can add rendering cost) so it is better to profile in a game directly. In some cases, it might be even useful to change code (e.g. to disable random number generators). The Pause command or Slomo 0.001 can be very useful to make things more stable.
Measure a few times to know how accurate your profiling is. Stat commands such as stat unit and stat fps give you some numbers to start with. Accurate profiling should be done in milliseconds (ms), not frames per second (fps). You easily can convert between the numbers but relative improvements have little meaning when measured in fps. When talking about individual features, we only speak about milliseconds as it is not measuring the frame.
If you see a limit on 30 fps (~33.3ms) or 60 fps (~16.6ms), you likely have VSync enabled. For more accurate timing, it is better if you profile without it.
Do not expect an extremely high frame rate with a very simple scene. Many design choices pay off for complex scenes (e.g. deferred rendering) but have a higher base cost. You also might bump into a frame rate limiter. If needed, such things can be adjusted (e.g. t.MaxFPS,r.VSync).
For tips and guidelines on setting up content and levels for performance, see the Performance Guidelines page.
For information on the stat commands, see the Stat Commands page.
Identify What You Are Bound By
Modern hardware has many units that run in parallel (e.g. GPU: memory, triangle/vertex/pixel processing, etc. and CPU: multiple CPUs, memory, etc.). Often those units need to wait on each other. You first need to identify what you are bound by. Optimizing that is likely to improve performance. Optimizing the wrong thing is a waste of time and is likely to introduce new bugs or even performance regressions in other cases. Once you improved that area, it is often good to profile again as this might reveal a new performance bottleneck that was formerly hidden.
First you should check if you frame rate is limited by CPU or the GPU cost. As always you either can vary the workload (e.g. change the resolution) and see what has effect. Here it is easier to look at the engine build in feature stat unit.
The actual frame time is limited by one of the three: Game (CPU game thread), Draw (CPU render thread) or GPU (GPU). Here we can see the GPU is the limiting factor (the largest of the three). In order to get a smaller Frame time, we have to optimize the GPU workload there.
CPU Profiling
GPU Profiling
Show Flags
The engine show flags can be used to toggle a lot of rendering features. The editor lists all show flags in a convenient 2D UI. You can click on the checkbox to toggle multiple checkboxes without closing the menu.
In-game, you can use the show command. Use show to get a list of all show flags and their state. Use show showflagname
to toggle a feature. Note that this only works in game viewports, so in editor viewports, the editor UI needs to be used. You can use the console variables (e.g. showflag.Bloom) to override show flag values in-game or in the editor, but that also disables the UI.
A good profiling starting point is with high-level features, such as show StaticMeshes or show tessellation.
All show flags are also exposed as console variables, e.g. console show Bloom, showflag.Bloom 0 or in configuration files: showflag.Bloom = 0 Console variables require more typing but they also override the editor UI showflags and they can be put into configuration files like other console variables.
The most useful ones for profiling:
View Modes
View modes are just a combination of show flags. The editor UI is separate from the show flags and you can use the ViewMode console command to switch between them. The most useful for performance are: Wireframe, LightComplexity, ShaderComplexity, and Lit.
How to Deal With a Wide Range of Hardware
Unreal Engine has scalability built into many graphics features. Different applications have different needs so customizing that system is advised.
You can use the tool by typing SynthBenchmark into the console.
FSynthBenchmark (V0.92):===============Main Processor: ... 0.025383 s/Run 'RayIntersect' ... 0.027685 s/Run 'Fractal'CompiledTarget_x_Bits: 64UE_BUILD_SHIPPING: 0UE_BUILD_TEST: 0UE_BUILD_DEBUG: 0TotalPhysicalGBRam: 32NumberOfCores (physical): 16NumberOfCores (logical): 32CPU Perf Index 0: 100.9CPU Perf Index 1: 103.3Graphics:Adapter Name: 'NVIDIA GeForce GTX 670'(On Optimus the name might be wrong, memory should be ok)Vendor Id: 0x10deGPU Memory: 1991/0/2049 MB ... 4.450 GigaPix/s, Confidence=100% 'ALUHeavyNoise' ... 7.549 GigaPix/s, Confidence=100% 'TexHeavy' ... 3.702 GigaPix/s, Confidence=100% 'DepTexHeavy' ... 23.595 GigaPix/s, Confidence=89% 'FillOnly' ... 1.070 GigaPix/s, Confidence=100% 'Bandwidth'GPU Perf Index 0: 96.7GPU Perf Index 1: 101.4GPU Perf Index 2: 96.2GPU Perf Index 3: 92.7GPU Perf Index 4: 99.8CPUIndex: 100.9GPUIndex: 96.7
Generate a Chart Over a Period of Time
It can be very useful to get the stat unit times over a longer period of time (e.g. in-game cutscene or a camera path set up to test many cases).
The following chart was generated on an Android device (Streaming) from a cutscene. Before and after the cutscene, the console commands StartFPSChart and StopFPSChart were entered. The resulting .csv file (in [ProjectFolder]\Saved\Cooked\Android_ES31\SubwayPatrol\Saved\Profiling\FPSChartStats) was opened in Microsoft Excel. For this example, we removed the first 4 header lines, selected all, and inserted a Scatter With Straight Lines plot.
More about Performance and Profiling
System Settings
Profiler
Scalability
VFX Optimization: Core Optimization Concepts
Performance is an omnipresent topic in making real-time games. In order to create the illusion of moving images, we need a frame rate of at least 15 frames per second. Depending on the platform and game, 30, 60, or even more frames per second may be the target.
Unreal Engine provides many features and they have different performance characteristics. In order to optimize the content or code to achieve the required performance, you need to see where the performance is spent. For that, you can use the engine profiling tools. Every case is different and some knowledge about the internals of hardware and software is needed. Here we collected some details that should help you in that task.
This guide primarily covers rendering topics because this is the area is where most performance is needed. Adding more objects, increasing the resolution, adding lights, and improving materials all have a substantial impact on performance. Luckily in rendering, it is often easy to get back some performance. Many rendering features can be adjusted by console variables.
You can use the editor output log or the in-game console to:
Set console variables (cvarname value)
Get the current state (cvarname)
Get help for a variable cvarname ?.
If needed, you can put the settings in ConsoleVariables.ini
with this syntax: cvarname=value. To find the right console variable, you can useDumpConsoleVars or the auto completion system. Most rendering variables start with r..
For more about console variables and toggling other engine options to improve performance, see the Adjusting Engine Feature Levelsreference page.
General Tips
Ideally, you should profile as close to the target you care about as possible. For example, a good profiling case is testing your game in standalone form on the target hardware with built lightmaps.
For good profiling, it is best to setup a reproducible case isolated from things that can add noise to the results. Even the editor adds some noise (e.g. an open Content Browser can add rendering cost) so it is better to profile in a game directly. In some cases, it might be even useful to change code (e.g. to disable random number generators). The Pause command or Slomo 0.001 can be very useful to make things more stable.
Measure a few times to know how accurate your profiling is. Stat commands such as stat unit and stat fps give you some numbers to start with. Accurate profiling should be done in milliseconds (ms), not frames per second (fps). You easily can convert between the numbers but relative improvements have little meaning when measured in fps. When talking about individual features, we only speak about milliseconds as it is not measuring the frame.
If you see a limit on 30 fps (~33.3ms) or 60 fps (~16.6ms), you likely have VSync enabled. For more accurate timing, it is better if you profile without it.
Do not expect an extremely high frame rate with a very simple scene. Many design choices pay off for complex scenes (e.g. deferred rendering) but have a higher base cost. You also might bump into a frame rate limiter. If needed, such things can be adjusted (e.g. t.MaxFPS,r.VSync).
For tips and guidelines on setting up content and levels for performance, see the Performance Guidelines page.
For information on the stat commands, see the Stat Commands page.
Identify What You Are Bound By
Modern hardware has many units that run in parallel (e.g. GPU: memory, triangle/vertex/pixel processing, etc. and CPU: multiple CPUs, memory, etc.). Often those units need to wait on each other. You first need to identify what you are bound by. Optimizing that is likely to improve performance. Optimizing the wrong thing is a waste of time and is likely to introduce new bugs or even performance regressions in other cases. Once you improved that area, it is often good to profile again as this might reveal a new performance bottleneck that was formerly hidden.
First you should check if you frame rate is limited by CPU or the GPU cost. As always you either can vary the workload (e.g. change the resolution) and see what has effect. Here it is easier to look at the engine build in feature stat unit.
The actual frame time is limited by one of the three: Game (CPU game thread), Draw (CPU render thread) or GPU (GPU). Here we can see the GPU is the limiting factor (the largest of the three). In order to get a smaller Frame time, we have to optimize the GPU workload there.
CPU Profiling
GPU Profiling
Show Flags
The engine show flags can be used to toggle a lot of rendering features. The editor lists all show flags in a convenient 2D UI. You can click on the checkbox to toggle multiple checkboxes without closing the menu.
In-game, you can use the show command. Use show to get a list of all show flags and their state. Use show showflagname
to toggle a feature. Note that this only works in game viewports, so in editor viewports, the editor UI needs to be used. You can use the console variables (e.g. showflag.Bloom) to override show flag values in-game or in the editor, but that also disables the UI.
A good profiling starting point is with high-level features, such as show StaticMeshes or show tessellation.
All show flags are also exposed as console variables, e.g. console show Bloom, showflag.Bloom 0 or in configuration files: showflag.Bloom = 0 Console variables require more typing but they also override the editor UI showflags and they can be put into configuration files like other console variables.
The most useful ones for profiling:
View Modes
View modes are just a combination of show flags. The editor UI is separate from the show flags and you can use the ViewMode console command to switch between them. The most useful for performance are: Wireframe, LightComplexity, ShaderComplexity, and Lit.
How to Deal With a Wide Range of Hardware
Unreal Engine has scalability built into many graphics features. Different applications have different needs so customizing that system is advised.
You can use the tool by typing SynthBenchmark into the console.
FSynthBenchmark (V0.92):===============Main Processor: ... 0.025383 s/Run 'RayIntersect' ... 0.027685 s/Run 'Fractal'CompiledTarget_x_Bits: 64UE_BUILD_SHIPPING: 0UE_BUILD_TEST: 0UE_BUILD_DEBUG: 0TotalPhysicalGBRam: 32NumberOfCores (physical): 16NumberOfCores (logical): 32CPU Perf Index 0: 100.9CPU Perf Index 1: 103.3Graphics:Adapter Name: 'NVIDIA GeForce GTX 670'(On Optimus the name might be wrong, memory should be ok)Vendor Id: 0x10deGPU Memory: 1991/0/2049 MB ... 4.450 GigaPix/s, Confidence=100% 'ALUHeavyNoise' ... 7.549 GigaPix/s, Confidence=100% 'TexHeavy' ... 3.702 GigaPix/s, Confidence=100% 'DepTexHeavy' ... 23.595 GigaPix/s, Confidence=89% 'FillOnly' ... 1.070 GigaPix/s, Confidence=100% 'Bandwidth'GPU Perf Index 0: 96.7GPU Perf Index 1: 101.4GPU Perf Index 2: 96.2GPU Perf Index 3: 92.7GPU Perf Index 4: 99.8CPUIndex: 100.9GPUIndex: 96.7
Generate a Chart Over a Period of Time
It can be very useful to get the stat unit times over a longer period of time (e.g. in-game cutscene or a camera path set up to test many cases).
The following chart was generated on an Android device (Streaming) from a cutscene. Before and after the cutscene, the console commands StartFPSChart and StopFPSChart were entered. The resulting .csv file (in [ProjectFolder]\Saved\Cooked\Android_ES31\SubwayPatrol\Saved\Profiling\FPSChartStats) was opened in Microsoft Excel. For this example, we removed the first 4 header lines, selected all, and inserted a Scatter With Straight Lines plot.
More about Performance and Profiling
System Settings
Profiler
Scalability
VFX Optimization: Core Optimization Concepts
- UE4 Performance and Profiling
- High Performance MySQL, Chapter2: benchmarking and profiling
- Performance & Profiling
- UE4 CPU Profiling
- UE4 GPU Profiling
- Erlang performance profiling
- Profiling JavaScript Performance
- JVM performance profiling (有待整理)
- Launch-Time Performance----Profiling Launch Performance
- UE4 Performance Guidelines
- Benchmarking and Profiling in MySQL
- Profiling with Traceview and dmtracedump
- Profiling with Traceview and dmtracedump
- Profiling with RStudio and profvis
- UE4 Performance Guidelines for Mobile Devices
- Java memory profiling with jmap and jhat
- Android Application Memory Profiling and Optimation
- [置顶] Debugging and Profiling PHP with Xdebug
- Linux Source命令及脚本的执行方式解析
- Spring MVC之@RequestMapping 详解
- 【拾遗】C++ STL容器begin(),end()
- 中国企业一扎堆 这个产业就完蛋!
- Activity实现透明效果
- UE4 Performance and Profiling
- Spring MVC之@RequestParam @RequestBody @RequestHeader 等详解
- size_t 类型
- 编程题一
- Spring MVC之@RequestBody, @ResponseBody 详解
- cocos2D-x 学习之路(一)
- Unity CJ 干货分享:全新的Unity移动游戏优化解决方案
- 用户登录加载照片
- 玩转Bootstrap(基础) (1.基础知识)