unity 性能优化

来源：互联网发布：vb的作用与功效副作用编辑：程序博客网时间：2024/05/21 18:46

Optimizing Graphics Performance

优化图形性能

Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game’s graphical rendering.

良好的性能是至关重要的许多游戏的成功。下面是一些简单的指导方针,最大化你的游戏的图形渲染的速度。

Where are the graphics costs

图形成本在哪里

The graphical parts of your game can primarily cost on two systems of the computer: the GPU or the CPU. The first rule of any optimization is to find where the performance problem is; because strategies for optimizing for GPU vs. CPU are quite different (and can even be opposite - it’s quite common to make GPU do more work while optimizing for CPU, and vice versa).

你的游戏的图形部分可以在计算机的两个系统:主要是成本GPU和CPU。任何优化的第一条规则是发现性能问题在哪里,因为策略优化GPU与CPU有很大的不同(甚至可以是相反的——它是很正常的,使CPU、GPU做更多的工作,而优化,反之亦然)。

Typical bottlenecks and ways to check for them:

典型的瓶颈和检查它们的方法:

GPU is often limited by fillrate or memory bandwidth.

GPU往往不仅或内存带宽的限制。

Does running the game at lower display resolution make it faster? If so, you’re most likely limited by fillrate on the GPU.

并以较低的显示分辨率运行游戏更快吗?如果是这样,你最可能受限于不仅在GPU上。

CPU is often limited by the number of things that need to be rendered, also known as “draw calls”.

CPU通常是有限的数量需要呈现的事情,也被称为“画”。

Check “draw calls” in Rendering Statistics window; if it’s more than several thousand (for PCs) or several hundred (for mobile), then you might want to optimize the object count.

检查“画”在呈现统计窗口,如果是超过几千(电脑)或几百(移动),那么您可能希望优化对象计数。

Of course, these are only the rules of thumb; the bottleneck could as well be somewhere else. Less typical bottlenecks:

当然,这些只是经验法则;瓶颈也可能是别的地方。不典型的瓶颈:

GPU has too many vertices to process. How many vertices are “ok” depends on the GPU and the complexity of vertex shaders. Typical figures are “not more than 100 thousand” on mobile, and “not more than several million” on PC.

GPU的顶点处理太多了。有多少顶点“ok”取决于GPU和顶点着色器的复杂性。典型的数字是“不超过10万”在移动,和“不超过几百万”电脑。

CPU has too many vertices to process, for things that do vertex processing on the CPU. This could be skinned meshes, cloth simulation, particles etc.

CPU处理太多的顶点,顶点处理的事情在CPU上。这可能是皮肤网格、布料模拟,颗粒等。

CPU optimization - draw call count

CPU优化——画叫计数

In order to render any object on the screen, the CPU has some work to do - things like figuring out which lights affect that object, setting up the shader & shader parameters, sending drawing commands to the graphics driver, which then prepares the commands to be sent off to the graphics card. All this “per object” CPU cost is not very cheap, so if you have lots of visible objects, it can add up.

为了呈现在屏幕上的任何对象,CPU有一些工作要做,比如找出灯光影响,对象,设置材质及材质参数,将绘图命令发送给显卡驱动,然后准备命令被送到显卡。所有这些“每个对象”CPU成本不是很便宜,所以如果你有大量的可见对象,它可以增加。

So for example, if you have a thousand triangles, it will be much, much cheaper if they are all in one mesh, instead of having a thousand individual meshes one triangle each. The cost of both scenarios on the GPU will be very similar, but the work done by the CPU to render a thousand objects (instead of one) will be significant.

举个例子,如果你有一千个三角形,这将是多,便宜得多,如果他们都是在一个网格,而不是一千个网格一个三角形。场景在GPU的成本将非常相似,但CPU的工作呈现一千年对象(而不是一个)将是巨大的。

In order to make CPU do less work, it’s good to reduce the visible object count:

为了使CPU做更少的工作,很好减少可见对象数:

Combine close objects together, either manually or using Unity’s draw call batching.

结合密切的对象在一起,手动或使用统一的调用批处理。

Use less materials in your objects, by putting separate textures into a larger texture atlas and so on.

使用更少的材料在你的对象,通过将不同的纹理成一个更大的纹理地图集等等。

Use less things that cause objects to be rendered multiple times (reflections, shadows, per-pixel lights etc., see below).

使用更少的事情,导致多次呈现对象(反射、阴影逐像素灯等,见下文)。

Combine objects together so that each mesh has at least several hundred triangles and uses only one Material for the entire mesh. It is important to understand that combining two objects which don’t share a material does not give you any performance increase at all. The most common reason for having multiple materials is that two meshes don’t share the same textures, so to optimize CPU performance, you should ensure that any objects you combine share the same textures.

结合对象在一起,以便每个网至少有几百个三角形和只使用一个整个网状材料。重要的是要理解,结合两个对象,不分享材料不给你任何性能提升。最常见的原因有多个材料是两个网格不共享相同的纹理,所以CPU性能优化,你应该确保你的任何对象将共享相同的纹理。

However, when using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense, as explained below.

然而,当使用许多像素灯转发渲染路径,有对象的结合也许没有意义的情况下,如下解释。

GPU: Optimizing Model Geometry

GPU:几何优化模型

When optimizing the geometry of a model, there are two basic rules:

当优化的几何模型时,有两个基本规则:

Don’t use any more triangles than necessary

不要使用任何更多不必要的三角形

Try to keep the number of UV mapping seams and hard edges (doubled-up vertices) as low as possible

尽量保持UV映射接缝和硬边的数量(上顶点)尽可能低

Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the geometric vertex count, i.e. the number of distinct corner points that make up a model. For a graphics card, however, some geometric vertices will need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is invariably higher than the count given by the 3D application.

注意,实际数量的顶点,图形硬件过程通常是不一样的数量报告的3 d应用程序。建模应用程序通常显示几何顶点数,即不同的角点的数量模型。显卡,然而,一些几何顶点需要分成两个或两个以上逻辑顶点渲染的目的。必须分离,如果有多个顶点法线,UV坐标或顶点颜色。因此,统一的顶点数总是高于数由3 d应用程序。

While the amount of geometry in the models is mostly relevant for the GPU, some features in Unity also process models on the CPU, for example mesh skinning.

在几何量的模型大多是相关的GPU,一些功能也在统一过程模型在CPU上,例如网格剥皮。

Lighting Performance

照明性能

Lighting which is not computed at all is always the fastest! Use Lightmapping to “bake” static lighting just once, instead of computing it each frame. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:

照明是不计算总是最快的!使用Lightmapping“烤”静态照明只有一次,而不是计算每一帧。生成lightmapped环境的过程只需要一段时间不仅仅是统一放置在现场,但是:

It is going to run a lot faster (2–3 times for 2 per-pixel lights)

它将运行更快(2单像素灯的2 - 3倍)

And it will look a lot better since you can bake global illumination and the lightmapper can smooth the results

上,那么它会看起来好多了因为你可以烤全球照明和lightmapper可以平滑的结果

In a lot of cases there can be simple tricks possible in shaders and content, instead of adding more lights all over the place. For example, instead of adding a light that shines straight into the camera to get “rim lighting” effect, consider adding a dedicated “rim lighting” computation into your shaders directly.

在很多情况下可以有简单的技巧可能在着色器和内容,而不是增加更多的灯的到处都是。例如,而不是添加一个光直接照射到相机“rim照明”效应,可以考虑添加一个专门的“边缘照明”直接计算到你的阴影。

Lights in forward renderingLights in forward rendering

灯光在向前呈现

Per-pixel dynamic lighting will add significant rendering overhead to every affected pixel and can lead to objects being rendered in multiple passes. On less powerful devices, like mobile or low-end PC GPUs, avoid having more than one Pixel Light illuminating any single object, and use lightmaps to light static objects instead of having their lighting calculated every frame. Per-vertex dynamic lighting can add significant cost to vertex transformations. Try to avoid situations where multiple lights illuminate any given object.

开销逐像素动态照明将增加显著呈现影响的每一个像素,可以导致多个对象被呈现。威力较小的设备,比如手机或者低端PC gpu,避免多个像素的光照亮任何一个对象,并使用lightmaps静态对象而不是光照明计算每一帧。种每个顶点都具备动态照明可以添加重要的顶点转换成本。尽量避免多个灯照亮任何给定对象的情况。

If you use pixel lighting then each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it will increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object will be taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, and so nothing is gained by combining. For this reason, you should not combine meshes that are far enough apart to be affected by different sets of pixel lights.

如果你使用像素照明那么每个网格必须呈现为多少倍像素灯照明。如果你把两个相距很远的网格,它将增加有效结合对象的大小。所有像素灯,照亮这个组合对象的任何部分将被考虑在渲染,渲染过的数量,需要可以增加。一般来说,必须做出传球的数量呈现组合对象程数之和为每个单独的对象,所以没有获得通过。出于这个原因,您不应该结合网格相距足够远,受不同的像素灯。

During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is. Furthermore, some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important will typically have a lower rendering overhead.

在呈现期间,团结找到所有灯周围的网格和计算哪些灯影响最。的质量设置是用来修改多少灯光最终像顶点像素灯,有多少灯。每个光计算其重要性基础上从网有多远,有多强烈的照明。此外,一些灯都比其他人更重要的是纯粹的游戏背景。出于这个原因,每一个光渲染模式设置,可以设置为重要或不重要;灯光标记为不重要通常会呈现较低的开销。
As an example, consider a driving game where the player’s car is driving in the dark with headlights switched on. The headlights are likely to be the most visually significant light sources in the game, so their Render Mode would probably be set to Important. On the other hand, there may be other lights in the game that are less important (other cars’ rear lights, say) and which don’t improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important so as to avoid wasting rendering capacity in places where it will give little benefit.

作为一个例子,考虑一个驾驶游戏,玩家在游戏的车前灯打开驾驶在黑暗中。车头灯可能是最直观地在游戏中重要的光源,所以他们的渲染模式可能被设置为重要。另一方面,可能还有其他灯在游戏中是不太重要的(其他车辆的尾灯,说),改善视觉效果不多的像素灯。这种灯光的渲染模式可以安全地将不重要,以免浪费呈现能力的地方它会给小的好处。

Optimizing per-pixel lighting saves both CPU and the GPU: the CPU has less draw calls to do, and the GPU has less vertices to process and pixels to rasterize for all these additional object renders.

优化单像素照明可以节省CPU和GPU:CPU更吸引电话,少和GPU顶点和像素点阵化处理所有这些额外的对象呈现。

GPU: Texture Compression and Mipmaps

GPU:纹理压缩和过滤

Using Compressed Textures will decrease the size of your textures (resulting in faster load times and smaller memory footprint) and can also dramatically increase rendering performance. Compressed textures use only a fraction of the memory bandwidth needed for uncompressed 32bit RGBA textures.

使用压缩纹理将减少你的纹理大小(导致更快的加载时间和更小的内存占用),也可以大大提高渲染性能。压缩纹理只使用所需的内存带宽的一小部分未压缩的32位RGBA纹理。

Use Texture Mip Maps

使用纹理Mip地图

As a rule of thumb, always have Generate Mip Maps enabled for textures used in a 3D scene. In the same way Texture Compression can help limit the amount of texture data transfered when the GPU is rendering, a mip mapped texture will enable the GPU to use a lower-resolution texture for smaller triangles.

作为一个经验法则,总是为纹理生成Mip启用地图中使用3 d场景。同样的纹理压缩可以帮助限制转移的纹理数据量在GPU渲染,mip纹理映射将使使用低分辨率纹理的GPU小三角形。

The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a 2D game.

唯一的例外是当一个特塞尔绵羊(纹理像素)是已知的1:1映射到显示屏幕像素,与UI元素或2 d游戏。

LOD and Per-Layer Cull Distances

LOD和Per-Layer剔除距离

In some games, it may be appropriate to cull small objects more aggressively than large ones, in order to reduce both the CPU and GPU load. For example, small rocks and debris could be made invisible at long distances while large buildings would still be visible.

在一些游戏中,挑选小物体可能是适当的比大的更积极,为了减少CPU和GPU负载。例如,小岩石和碎片可以看不见在长距离大型建筑仍然是可见的。

This can be either achieved by Level Of Detail system, or by setting manual per-layer culling distances on the camera. You could put small objects into a separate layer and setup per-layer cull distances using the Camera.layerCullDistances script function.

这可以通过系统的详细级别,或通过设置手动per-layer扑杀在摄像机之间的距离。你可以把小对象到一个单独的层和设置per-layer使用摄像机之间的距离。layerCullDistances脚本函数。

Realtime Shadows

实时阴影

Realtime shadows are nice, but they can cost quite a lot of performance, both in terms of extra draw calls for the CPU, and extra processing on the GPU. For further details, see the Shadows page.

实时阴影也不错,但他们可以成本相当多的性能,无论是额外的绘制要求CPU、GPU和额外的处理。更多细节,请参见页面的影子。

GPU: Tips for writing high-performance shaders

GPU:用于编写高性能着色器的技巧

A high-end PC GPU and a low-end mobile GPU can be literally hundreds of times performance difference apart. Same is true even on a single platform. On a PC, a fast GPU is dozens of times faster than a slow integrated GPU; and on mobile platforms you can see just as large difference in GPUs.

So keep in mind that GPU performance on mobile platforms and low-end PCs will be much lower than on your development machines. Typically, shaders will need to be hand optimized to reduce calculations and texture reads in order to get good performance. For example, some built-in Unity shaders have their “mobile” equivalents that are much faster (but have some limitations or approximations - that’s what makes them faster).
Below are some guidelines that are most important for mobile and low-end PC graphics cards:

高端PC GPU和低端移动GPU可以上百次性能差异。甚至在一个平台上也是如此。电脑,快GPU数十倍的速度比慢集成GPU;和在移动平台上你可以看到大的不同,正如GPU。

所以请记住,GPU性能在移动平台上和低端电脑将远低于您的开发机器上。通常,着色器需要手优化以减少计算和纹理读取为了获得较好的性能。例如,一些内置的统一着色器的“移动”等价物,快得多(但有一些限制或近似,这就是使他们更快)。

以下是一些指导方针,为移动和低端电脑显卡是最重要的:

Below are some guidelines that are most important for mobile and low-end PC graphics cards:

以下是一些指导方针,为移动和低端电脑显卡是最重要的:

Complex mathematical operations

复杂的数学运算

Transcendental mathematical functions (such as pow, exp, log, cos, sin, tan, etc) are quite expensive, so a good rule of thumb is to have no more than one such operation per pixel. Consider using lookup textures as an alternative where applicable.
It is not advisable to attempt to write your own normalize, dot, inversesqrt operations, however. If you use the built-in ones then the driver will generate much better code for you.
Keep in mind that alpha test (discard) operation will make your fragments slower.

超越数学函数(如战俘、exp、日志,因为罪,棕褐色,等等)是相当昂贵的,所以一个好的经验法则是每像素不超过一个这样操作。考虑使用查找纹理作为替代,适用。

它不是明智的尝试编写自己的正常化,点,然而,inversesqrt操作。如果你使用内置的司机就会为您生成更好的代码。

记住,α测试(丢弃)操作会让你的碎片慢。

Floating point operations

浮点操作

You should always specify the precision of floating point variables when writing custom shaders. It is critical to pick the smallest possible floating point format in order to get the best performance. Precision of operations is completely ignored on many desktop GPUs, but is critical for performance on many mobile GPUs.

你应该指定的精度浮点变量在编写自定义着色器。至关重要选择尽可能最小的浮点格式以获得最佳性能。精密的操作是完全无视在许多桌面gpu,但在许多移动gpu性能是至关重要的。

If the shader is written in Cg/HLSL then precision is specified as follows:

如果材质是用Cg / HLSL那么精确指定如下:

float - full 32-bit floating point format, suitable for vertex transformations but has the slowest performance.
half - reduced 16-bit floating point format, suitable for texture UV coordinates and roughly twice as fast as float.
fixed - 10-bit fixed point format, suitable for colors, lighting calculation and other high-performance operations and roughly four times faster than float.
If the shader is written in GLSL ES then the floating point precision is specified specified as highp, mediump, lowp respectively.
For further details about shader performance, please read the Shader Performance page.

浮动——全32位浮点格式,适合顶点转换,但最慢的性能。

一半——减少16位浮点格式,适用于纹理UV坐标和漂浮的两倍左右。

固定- 10位定点格式,适用于色彩、照明计算和其他高性能操作,大约四倍浮动。

如果材质是用GLSL ES那么浮点精度指定指定为highp,mediump lowp分别。

对材质性能详情,请阅读页面的材质性能。

Simple Checklist to make Your Game Faster

简单的清单,让你的游戏速度

Keep vertex count below 200K..3M per frame when targetting PCs, depending on the target GPU
If you’re using built-in shaders, pick ones from Mobile or Unlit category. They work on non-mobile platforms as well; but are simplified and approximated versions of the more complex shaders.
Keep the number of different materials per scene low - share as many materials between different objects as possible.
Set Static property on a non-moving objects to allow internal optimizations like static batching.
Do not use Pixel Lights when it is not necessary - choose to have only a single (preferably directional) pixel light affecting your geometry.
Do not use dynamic lights when it is not necessary - choose to bake lighting instead.
Use compressed texture formats when possible, otherwise prefer 16bit textures over 32bit.
Do not use fog when it is not necessary.
Learn benefits of Occlusion Culling and use it to reduce amount of visible geometry and draw-calls in case of complex static scenes with lots of occlusion. Plan your levels to benefit from ccclusion culling.
Use skyboxes to “fake” distant geometry.
Use pixel shaders or texture combiners to mix several textures instead of a multi-pass approach.
If writing custom shaders, always use smallest possible floating point format:
fixed / lowp - for colors, lighting information and normals,
half / mediump - for texture UV coordinates,
float / highp - avoid in pixel shaders, fine to use in vertex shader for position calculations.
Minimize use of complex mathematical operations such as pow, sin, cos etc. in pixel shaders.
Choose to use less textures per fragment.

保证顶点数低于200 k . .3米每帧定位电脑时,根据目标GPU

如果你使用内置的着色器,选择的移动或未点燃的类别。他们工作在非移动平台上,但更复杂的着色器的简化和近似版本。

保持每个场景的不同材料数量低收入分享尽可能多的不同对象之间的材料。

设置静态属性等多类型对象允许内部优化静态配料。

不要使用像素灯当没有必要——选择只有一个(最好是定向)像素光影响你的几何。

不要使用动态灯光当没有必要——选择烤照明。

尽可能使用压缩纹理格式,否则宁愿在32位16位的纹理。

没有必要时不要使用雾。

学习的好处阻塞扑杀和用它来减少可见几何和draw-calls复杂的静态场景的阻塞。计划你的水平从ccclusion扑杀中获益。

使用包厢“假”遥远的几何。

使用像素着色器或纹理组合器混合几种纹理代替多道的方法。

如果编写自定义着色器,总是使用最小可能的浮点格式:

固定/ lowp -颜色、光照信息和法线,

一半/ mediump -纹理UV坐标,

浮动/ highp——避免在像素着色器,可以使用顶点着色器的位置计算。

减少使用复杂的数学运算,如战俘,罪恶,因为等在像素着色器。

每个片段的选择使用更少的纹理。

http://docs.unity3d.com/Manual/OptimizingGraphicsPerformance.html（原链接）

0 0