Showing posts with label alpha cuts. Show all posts
Showing posts with label alpha cuts. Show all posts

Monday, 21 March 2022

How to use the new Firestorm Performance Floater and Autotune feature

New Firestorm, new features 

The latest Firestorm release has a new feature (albeit one that I still consider experimental), the "performance floater". In recent blog posts, I've explained  why I created this, in earlier blogs and most recently, in "Upgraders of the lost ARC" I explained a bit about what it does. This post is all about "How to use Performance Floater". 

Bundled with the Performance Floater is the Autotune FPS feature, I'll also explain how this works, and how to best avoid getting yourself into a muddle with it.



For a more concise, and probably more readable summary of these two features please refer to Inara Pey's excellent write up, and you can also click on the '?' icon at the top of the floater to go immediately to the Firestorm wiki page, dedicated to this function, which will be maintained and updated as the feature develops.

What does Performance Floater do for me?

The Performance Floater shows you in real-time what parts of the scene are taking the most time, and slowing down your graphics. In particular, this first release focuses on avatars and allows you to see which avatars are truly lagging you; it also allows you to examine your own attachments to see how they perform.

Ok so what about Autotune?

Autotune is a first look at our attempt to allow the viewer to automatically manage some of your settings to attempt to give you the performance level that you request and keep you there.
Keep in mind, that 

Performance Floater best practices.

Initially, we will ignore Autotune and look instead at manual tuning and how the performance floater can help us.

Given that the motivation behind this feature was to highlight the damage that segmented bodies have done to SL performance, it will come as no surprise that it is most useful when applied in crowd scenes. I wanted to allow a finer-grained tuning experience that would allow you to enjoy a crowded club, or similar scene without having to de-render everyone and completely ruin the atmosphere. So how do we go about this?

Step by step - A quick guide.

Step 1 - Open the performance floater.

The first thing we need is information, it is hard to know whether you've improved anything without measuring before and after. To open the performance floater, we have a few options

Look for the "Improve graphics speed..." menu entry on either the World menu or on the Advanced->Performance tools menu.




It can also be added to your toolbars for quick access. Look for the Gauge icon on the tool pallette

Step 2 - Review the summary stats

We start with the overview panel, which tells us what our current Frame rate (Frames Per Second, aka FPS) is. You may also see a warning such as "User limited FPS" or "In Background", these are intended as a reminder that you are not getting the full potential because you have either deliberately limited the FPS or are "tabbed out" on another screen/application such as Discord/Chrome.

Below this is the summary data and the first clue as to what is happening to our FPS.



A more complete explanation of these numbers can be found here. What we can find here though is the first hint about what we need to do. 

Best practice - Start with the largest number as this is where the biggest gains can be made

If scenery is the largest number then we might start to think about whether our draw distance is too high etc. However, if we are sightseeing or taking photos, then we might want the scenery to be in full glorious detail. If we are shopping we want to see the goods on display, but don't really need to see the displays far away. Think about what it is you are aiming for. 

Most of the time you will find that either scenery, avatars or both are high. But occasionally we can find that the other numbers are worth a quick look. If the UI is more than a few percent then consider closing down unwanted chat windows and inventory, etc. If the "HUDs" costs are high, then you should remove as many as you can or simply hide them all using the "show HUD attachments" option on the Avatar menu (alt-shift-H on Windows). 

Best practice for general FPS health - Close unnecessary UI and remove (not minimise) HUDs 

Keep your UI windows closed and remove HUDs when not using them (the "favourite wearbles" feature is amazingly useful for this.)

"The scenery is killing my FPS"

In this first release, the amount of fine-grained tuning for scenery is very limited (I was focused more on the Lagatar problem). However, we provide quick access to a couple of the controls that make the biggest difference. 

Clicking on the graphics settings panel in the floater will take you to a subset of the preferences found on the main preferences panel. Here we can make wholesale changes to the quality of our graphics using the "Quality vs speed" slider, or tweak individual parameters. The main features exposed here are Draw distance, shadows and water reflections. Draw distance and water can be changed dynamically, and you can watch the impact. Changing shadows is slightly more disruptive as the viewer has to change how things are drawn and this causes a "pause", especially on slower machines.

Best practices for scenes - dumb down the water, shrink the draw distance, remove the shadows. 

In general, we want to keep the visual quality as high as we can, whilst still being able to move about. With this in mind, you need to pick and choose between the options.  Since EEP, water reflections and refraction have been a terrible burden (a fix is coming from LL, but it is not here yet), You can still have decent looking water, without reflections. Of course, the most obvious (but frequently overlooked is Draw Distance. It's simple really, drawing more "stuff" takes longer. reducing the draw distance shrinks the number of things that need to be considered by the viewer. In a club or shopping mall, shrink the DD to 64m and you'll be more nimble.

Water - If you are not near water, or do not care about water reflections then you should almost certainly switch water reflections to "None; opaque" this gives a big FPS boost whilst still leaving the water looking reasonably nice. For the biggest win, you can always fully disable water on the advanced menu, under rendering types. But don't forget to turn it back on.



Shadows -  The most visually disruptive change. I love to have shadows in a scene, but shadows will typically more than halve your FPS. So if you really need that extra boost then foregoing shadows is a good choice. Use the shadows setting on the floater or in the preferences. this is very useful if you are at a shopping event or club where the shadows are probably not that important.

Things not to do (probably) - Killing Advanced lighting - ALM. 



A lot of people automatically reach for the advanced lighting kill switch in preferences and proclaim the amazing boost they get. For many people, that boost is dominated by the fact that the shadows get disabled too. try turning off shadows only first. Disabling ALM can have detrimental effects on some machines as it prevents some GPU use, loading more on the CPU. However, if you are on a poor network, then ALM will reduce the bandwidth as materials will not be fetched.

OK, so that's the global scene dealt with.

"OMG the avatars are killing me"

When the statistics suggest that avatars are a significant amount of the frame time we can look at the avatars nearby and decide what to do.  It does not take many segmented avatar abominations to totally destroy your performance. Modern BOM bodies without the "alpha cuts" or segments are far more performant. So how can separate the good from the bad?



The bar graph on the left of this screen gives a quick visual indication of the costs. This was my favourite feature of the original Linden Lab design, though it was used to show ARC, which as you may have gathered from my other writing, is practically useless.

Along the top of this panel we have a slider, this controls the maximum time we will allow an avatar to take. On the screen above I have 23 avatars in the scene and they are taking a total of 50 milliseconds. Without going into a maths lesson this is a problem, a very large amount of time is being consumed. We can also see that the top 3 "offenders" (those avatars taking the longest) are a large chunk of that total. 

If we slide the slider from the right to the left the limit will decrease. In this example, I can set it somewhere around 3800, and any avatars above the limit will be "optimised".

The optimisation works at a fine-grained level that was not possible in the past. The first thing that we do is remove the shadows of the laggiest avatars, this will halve their render time. When this happens an 'S' will appear in the column between ARC and Name. The further you decrease the slider the more avatars will be affected. When an avatar has had their shadows removed and is still laggier than the limit you have set then we take the decision to force it into an Imposter. An 'I' will appear between the ARC and the name.

Imposters are not everybody's cup of tea, but in a crowded club, a few imposters can lift your FPS whilst still allowing a decent visual experience. Try it out and decide for yourself. Derendering is another option of course (not supported directly on this release but accessible on the people floater as usual), "Render friends only" is of course another choice but keep in mind that you cannot easily bring them back. If you don't want to see anyone at all (yourself included) then the check box at the bottom of the panel allows you to disable all avatars.



Best practices - go little by little, and don't forget to reset later!

The most common complaint during early testing was from people finding that they were seeing everyone as imposters. The slow animating, flat, low-resolution cutouts are great at a distance but not so nice up close. If you find you are seeing them everywhere then you probably forgot to reset your "Maximum render time" slider.

Things to remember: 

  1. Both this and Autotune change your settings. Use graphics presets to save and restore sane settings just in case.
  2. What you see is the cost of drawing this scene on your machine. Something not in view will show as very low cost. When looking at your attachments, make sure that your avatar is in view and not partially hidden.

Is it me? How can I be sure?

Your own avatar will appear in the list highlighted in yellow. Due to how the viewer works, it costs a little more to draw your own avatar than it does to draw others, so even if you are identically dressed to another avatar you will show more expensive on your screen (they will see themselves as more expensive too). 

However, you can check the cost of your own attachments by looking at the "Your avatar complexity" panel. This panel lists all of your non-HUD attachments and their costs. You can now see what is the most costly item you are wearing and decide if you can do better. You can also use this to compare the impact of different items, try on different hairstyles and see which ones are laggier. In the market for a new body? Grab the demos and wear them all, compare the performance as well as the looks before making your choice.

Can't Firestorm do all this for me? Auto tuning, the pros and cons.

Succesful Auto tuning requires a little restraint and some managing of your own expectations. 
There is nothing I can do to make your decade-old potato of a laptop, run at 50FPS in a sim packed with Mesh avatars. Not happening. However, Auto-tune can and will try to do the best it can for you. 

Basic Autotuning

When using Autotune we set a target FPS level and whether we want to adjust the avatars only or the avatars and the scenery. We can also decide if we want it to run continuously while we continue to enjoy ourselves or to run once and then stop.

Troubleshooting tip: why is my friend flat, pixelated and their animations slow?

If you unexpectedly see imposters everywhere then double-check that Autotune is not forcing your Render time limit too low. If so, turn off autotune and manually adjust the slider.




Autotune FPS - Best practice #1 start low. 

Consider this, you are at a club, surrounded by gyrating mesh bodies. Your FPS has dived to single digits, and not even high single digits, you just want to move around a bit but it is like wading through molasses. You can't really turn off all the avatars, because then you'll barge them all out of the way and spend the next half an hour apologising. 

Set the Autotune to something higher, but not too high, Try 12 FPS maybe? Once you have selected the target FPS, you can hit start. The target will be shown at the top of the floater, Starting as Yellow or Magenta, and hopefully turning green when we reach or exceed the target.

The Autotune will consider the factors and try to tune subtle things such as avatar shadows first. Then resorting to other measures. You'll see the Mac Render Time slider zipping to and fro. If you are too ambitious then the Autotune will try its hardest and perhaps overshoot, then undershoot and you'll be back and forth and not settling. Pick something comfortable and within reach and you'll find the experience more rewarding. 

Autotune FPS - Best practice #2 Avatars only or Avatars first?

By default, the tuning strategy will be set to "Avatars and Scene" this allows the engine to consider avatars first and then if it cannot get enough boost from the avatar tweaks then it will resort to scene wide changes. Which of these you want is very dependent upon where you are and what you are doing.

If you are wandering in a scenic region and there are a number of people around then you might select "avatars only" to ensure that you keep the scene at the settings you like but allow the engine to degrade the avatar quality of others as you walk around. 

Autotune FPS - Best practice #3 Experiment with autotune settings (but don't forget that you did)

The "gear" icon on the autotune panel takes you to advanced options. some of these are rather obscure and I won't explain them in detail here, but feel free to experiment. What is the worst that can happen? It will change your settings and everything will look weird. Use the Firestorm graphics preference save/load options to store a setting to return to should that happen.

Have fun, I hope that this feature helps.

Most of all I hope that through this new way of presenting the determining performance you not only be better able to manage your experience in SL but will learn more about the impact we all have on one another's Second Life. 

I hope to extend this feature with future releases and integrate it with the Linden Viewer so that a similar feature is available to everyone no matter what viewer they are using. What is more, the next few months should see some dramatic changes in the rendering performance going live in Second Life as Linden Lab has been working very hard on performance tuning. I hope to be able to adapt these tools to the "new normal", to provide more options and add more "intelligence" to the tuning.




Monday, 13 September 2021

Find me some body to love...benchmarking your lagatar.

This is essentially part 2 of the "why we need to get rid of the segmented bodies." blog.

Hypothesis - Mesh segmentation leads to significant rendering performance issues.

Before we start, just a heads up, this part is the data dump. It's all about the process of gathering data. As such it is somewhat less accessible than the last one. 

Still here? Enjoy.

A few months ago, I decided to quantify the real cost of sliced up bodies. Initially, I did some simple side-by-side tests in-world.

The first attempts were compelling but unsatisfactory. Using an alt, I ran some initial tests in a region in New Babbage that happened to be empty. I de-rendered everything, then had Beq TP in wearing my SLink Redux body. I recorded for a few minutes, sent her away, let things return to normal, then had Liz TP in wearing her Maitreya body.

The results were quite stark. Running with just my Alt alone (the baseline), I saw 105 FPS (remember, this is basically an empty region). With SLink Redux, it dipped a little then recovered to 104FPS. With Maitreya, it dropped to 93FPS.

So this was a good start, but I wanted something a bit more robust and repeatable. Moreover, I wanted to test the principle. This is not about pointing out "X body is good, and Y body is bad"; it is about demonstrating why design choices affect things.

I needed to test things rigorously and in isolation. This meant using a closed OpenSim grid where I had full control of any external events. It also meant I needed to get test meshes that behaved the same way as proprietary bodies. 

Testing proprietary bodies against one another is problematic. 

  1. They won't rez (typically). You need to get lots of friends with specific setups.
  2. If they did rez, they are mostly too complex for SL Animesh constraints (100K tris)
  3. Bodies vary in construction, # meshes, # triangles, with and without feet etc. Making it less clear what is driving the results.
  4. Being proprietary, you can't test outside of SL either, of course, which means you are then exposed to SL randomness (people coming and going - I don't have the luxury of my own region) 

So I asked my partner Liz (polysail) to make me a custom mesh that we could adapt to our needs, and thus SpongeBlobSquareBoobs was born.



"SpongeBlob" is a headless rigged mesh body that consists of 110,000 triangles. Why 110K? It is the upper end of what can be uploaded into SL/OpenSim, given the face/triangle/vertex limits. Body triangle counts are harder to average because some have feet/hands attached; others do not. Another reason why we wanted to have a completely independent model.

The coloured panels shown in this photo are vertex colours (i.e. not textures) randomly assigned to each submesh. This picture is most likely the 192 mesh x 8 face "worst case" test model. We used no textures so that the texture bind cost was not part of this test (that's a different experiment for another day, perhaps)

The single most important fact to keep in mind when you read through this data is:

    Every single SpongeBlob is the same 110K triangles. They vary only by how they are sliced.

Apparatus and setup

So if SpongeBlob gives us the "Body Under Test" (BUT), what are we testing with?

Data Recording

The data is recorded using Tracy, a profiling tool available to anyone who can self-compile a viewer. It works by recording how long certain code sections take (much like the "fast timers" you see in the normal viewer's developer menu). This data gets streamed to a "data capture program" that runs locally (same machine or same LAN). The capture program or another visualiser tool can then be used to explore the data. I recorded things like the DrawCall time, though once we understand how the pipeline works, all we really need is the FPS, as I'll explain later, so you could use any FPS tool if you want to try this yourself in a simpler form.

Environment and noise control

The accuracy of the tests relies on removing as much noise as we can. We all know that SL framerates are jittery, so we do our best to stabilise things by removing as much untested noise as possible. 

To this end, I used an OpenSim system (I used the DreamGrid windows setup as it is extremely quick and easy to set up). With my own OpenSim grid, running on an older PC, I created a 256x256 region with no neighbours. This means I have an SL-like region size, and I have removed any potential viewer overhead of managing connections to multiple regions.

The region was left empty, no static scenery was used, meaning that the region rendering overhead was constrained pretty much to just the land, sea and sky.

Settings

The plan was to record using several different machines of varying capabilities, so I made sure to keep the settings as similar as possible across those. 

We are interested in the rendering costs of different body "configurations", and these are only comparable in the same context (i.e. on the same hardware). Still, we'd like to look for trends, similarities, and differences across different hardware setups, so I tried to ensure that I used the same core settings. The key ones are as follows:-

FPS limiting off - clearly...

Shadows (sun/moon & local) - This deliberately increases the render load and helps lift the results above the measurement jitter.

Midday - Are there implications if the shadows are longer? Let's avoid that by fixing the sun location.

Max-Nonimposters - unlimited. This ensures we don't impostor any of the tests.

ALM on - we want materials to be accounted for even though we are not using them. It ought not affect our results, really.

Camera view - I needed to ensure that I was rendering the same scene. To achieve this, I used a simple rezzing system that Liz and I have tweaked over time. It uses a simple HUD attachment on the observer that controls the camera. A controller "cube" sends a command to the HUD telling it where to position the camera and what direction to point in. 

Test Setup

Each test involves rezzing a fixed set of BUTs (16) in a small grid. These cycle through random animations. The controller cube that is used to position the camera is also responsible for rezzing the BUTs. Every time the cube is long-clicked, it will delete the existing BUTs and rez the next set.

Each avatar model is an Animesh. This full test cannot be run in SL due to the Second Life limit of 100K triangles. Using Animesh removes any other potential implications to the rendering caused by being an actual avatar agent.

This is a typical view being recorded.


Consistency and repeatability

It was important to remove as many errors as possible, so scripting things like the rezzing and camera made a lot of sense. We also made sure that the viewer was restarted between each test of a given BUT.

Tests were run for at least 5 minutes, and I would exclude the first 2 minutes to ensure that all the body parts had been downloaded, rezzed and cached as appropriate. There are implications to the slicing of bodies that alter the initial load and rendering time (you see this with the floating clouds of triangles when you TP to a busy store/region), but this is not what we are testing.

Hardware

Running the tests on a single machine tells us that the findings apply to that machine, and within reason, we can extend the conclusion across all machines in the same or similar class. But, of course, in Second Life, we have a wide range of machines and environments. So it was important to us to get as much data as we could. 

We thus ran the tests across various machines that we have access to. 
As a developer, most of my machines, even older ones, tend to have been "high end" in their day. So we should note that potential bias when drawing conclusions.

Here is the list of hardware tested along with the "Code names."


Methodology and Test Runs

Using the above setup, we would run through a specific set of configurations. Those were as follows.


The baseline test is simply an empty scene. Thus we establish the cost of rendering the world and any extraneous things; this includes any cost to having the observing avatar present.

You can see that every mesh has the same number of triangles but is split into more and more objects. Once we reach 192 objects, we continue scaling using multiple texture faces (thus creating submeshes). 

I will include in an appendix a test that shows the broad equivalence of submeshes versus actual meshes. There is no appreciable benefit to one as opposed to the other in these tests (there may be other implications we do not investigate)

By changing the number of meshes and faces, we are scaling up the number of submeshes that the pipeline has to deal with and thus the number of drawcalls. If you remember the analogy I gave in the first part of this blog, you'll recall that the hypothesis is that the process of parcelling up all the contextual information for drawing a set of triangles far outweighs the time spent processing the triangles alone.

If this hypothesis is correct, we will see a decline in FPS as the number of submeshes increases. As we reduce the number of triangles in each call, we also demonstrate that the number of triangles is not dominant. 

Results

So what did we find?

The first graph I will share is the outright FPS plotted against the total submeshes in the scene.



This graph tells us a few things. 
1) The raw compute power of a high-end machine such as the "beast" is very quickly cut down to size by the drawcall overhead.
2) That the desktop machines with their dedicated GPUs have a similar profile
3) The laptops, lacking a discrete, dedicated GPU, are harder to see.

If we normalise the data by making the FPS a percentage of the baseline FPS for that machine, we will rescale vertically and hopefully have a clearer view of the lower end data.


This is very interesting (well, to a data nerd like me). 
We can see that the profiles of all the machines tested are similar, suggesting that the impact is across the board.
We can also see that the laptops continue to be segregated from the desktops. The impact of the drawcalls, while pronounced and clearly disruptive, is not as extreme as for the dedicated GPUs. This would seem to support the hypothesis that those machines with onboard graphics are additionally penalised by the triangles giving the graph that vertical offset from the rest. As we have not explicitly measured this, we cannot draw too much from this, but there is clearly pressure on those less powerful machines. 

What may be surprising to some and is certainly interesting is that all the desktops are impacted similarly. The shiny new RTX3070TI suffers just as much as the rather ancient GTX670. What we get is more headroom on the modern card. 

The next graph is really another interpretation of the same FPS data. Now though, we are looking at the frame time as opposed to frames per second. To illustrate this, to achieve 25FPS, we have a time budget of 1/25th of a second per frame. We tend to measure that in milliseconds (where a millisecond is 1/1000th of a second); thus, 25fps requires us to render one entire frame every 40 milliseconds (ms).



Here we can see the anticipated trend quite clearly. 

What did we expect?

If the cost of a drawcall dwarfs the cost of triangles, then every extra drawcall will add a more or less fixed cost to the overall frame time. If the triangle count were to have a stronger influence, we'd see more of a curve to the graphs as the influence of the triangles per draw call decreases along with their number.

The drawcall is the dominant factor though interestingly, we see some curvature on the laptop plot.

The curve we see in "Liz's laptop" is initially convex; is this what we expected? Probably so. If the total drawcall cost is the time spent packing the triangles (T) plus the time spent on the rest of the drawcall overhead (D), then initially T+D is steep, but as T decreases and D remains more or less static, we go back to the linear pattern. We can also see a slight kink, suggesting that we may have a sweet spot for this machine where the number of triangles and the drawcall work together optimally.

We see other slight kinks in other graphs. We need to be careful of over-analysing, given the limited sample points along the horizontal axis and those error bars that show quite a high degree of variance in the laptop frames.

Conclusions

Let's use our table from the last blog to examine the typical mesh count for current bodies in use.
BodyTotal facesaverage visible faces# times slower than best in class (higher is worse)
Maitreya Lara30423012.78
Legacy147134018.89
Belleza Freya111619010.56
SLink HG redux149301.67
Inthium Kupra83181.00
Signature Geralt9033708.22
Signature Gianni11594319.58
Legacy Male10461743.87
Belleza Jake9074018.91
Aesthetic2052054.56
SLink Physique BOM97451.00


The implication is clear. A body that has ten times the number of submeshes will take more or less ten times as long to render. However, we do not walk around as headless naked bodies (well, most of us don't - never say never in SL), but we need to be far more aware of the number of submeshes in the items we wear. After your body, the next biggest offender is very likely to be your hair. There are many, often very well known, makes of hair that have every lock of hair as a separate mesh. 

We need proper, trusted guidance and tools.

Ultimately, there are choices to be made, and the biggest problem here is not the content; it is the lack of good advice on making that content. Where is the wiki page that explains to creators that every submesh that they make adds overhead? 

This is ultimately not the creators' fault; it comes back to the platform's responsibility, inadequate guidance and enforcement, and incorrect metrics (yes, ARC . I'm looking at you!). 

Definitions:

BUT: Body Under Test, The specific configuration of our model that is subject to this test.

FPS: Frame Per Second, how many times per second a screen image is rendered. Too slow, and things are laggy and jittery. People get very wrapped up in how many FPS they should/could get. In reality, it depends on what you are doing. However, you'd like to be able to move about with relative smoothness. 

Jitter/noise: These are different terms for essentially the same thing, inaccuracies in the measurements that cannot be corrected. Noise and Jitter are variances introduced by things outside of the actual measurement. FPS is a noisy metric, it varies quite wildly from frame to frame, but when we average it over a few, it becomes more stable. 

Appendix A: Is invisible mesh hurting my FPS?

I mentioned in the last blog that the concerns over invisible mesh were largely over-hyped, in large part due to an optimisation introduced by TPVs courtesy of Niran.

To test this, I set half of the faces of a 192x8 body to be transparent and ran a benchmark. I then ran the same benchmark with a 192x4 body. In theory, they should be practically the same.


Results: 

No, as we had hypothesised, there is no perceivable difference at this level between the two. As noted in the earlier blog, we are just measuring the direct rendering impact. There are other indirect impacts, but for now, they are a lesser concern.

Appendix B: Which is better, Separate meshes or multiple faces?

To test whether there was any clear benefit between breaking a mesh up into multiple faces or multiple objects, I ran benchmarks against three models that equated to the same number of submeshes passing through the pipeline. 
96x2 48x4 and 24x8.



Results:

As can be seen, there is no clear benefit. The raw numbers would suggest that the 96x2 was slightly slower. That would be plausible as there is an expectation of an object having a higher overhead in terms of metadata and other references, but two factors weaken this. 
1) The error bars - the variance in the measurements places the numbers close enough for there to be reasonable doubt over any outright difference. 
2) The 24x8 is slower than the 48x4. Once again, well within the observed variance, but it casts further doubt on any argument that there is a significant measurable difference. 

This may be something that I look at again to see if there is a more conclusive way of conducting this experiment. For the purposes of this blog, which is for determining whether the construction choices affect the overall performance, it is quite clear that it is the number of submeshes and not their organisation that is the driver.

Saturday, 11 September 2021

Why crowds cause lag, why "you" are to blame and how "we" can help fix it.

Everything is slow, and we're to blame...



OK, buckle up; this one is going to be a long one.....

We all know the deal, you go to a shopping event, and you wade through a tar pit of lag until you can click the "render friends only" button and remove all the avatars. 

More Avatars = More Lag

Why do avatars cause so much load?

If you look through the blogs and forums, you'll find that there is much conjecture over the causes, and you can easily find an "expert" who'll explain to you the problem of onion skin meshes, triangle counts, poor LODs, and invisible duplicate meshes to name but the most common. 

As with many myths and pseudo-scientific speculation, there is an air of plausibility, and often enough, a grain of truth. However, while many of these may contribute to lag, the hard experimental evidence points to something so large that it eclipses them all.

We'll examine each of these "usual suspects" as an appendix at the end of this post. For now, let's cut to the chase.

The number one cause of lag is...Alpha cuts

Alpha cuts? You know that nice little convenience feature, the one that lets you hide parts of your body? The ones where over the years, people have nagged and pushed for more and more detail in the alpha slices. Every one of those little areas is a "submesh", and (as I'll explain) these are the number one cause of avatar-induced lag without any shadow of a doubt. Until BOM, of course, it was not a convenience feature; it was the only way to alpha a mesh body. A requirement because of a shortcoming that dates back to the first use of mesh bodies. But these days, we have BOM, and for the majority of uses, this same alpha effect can be accomplished with more precision and far more efficiently using an alpha layer.


Why are these specifically such a problem?

Every mesh object in SL is made up of one or more "submeshes". Each submesh represents a texture face on that model (allowing it to be coloured, textured, or made invisible independently from the rest.)

A mesh object can have at most 8 texture faces (submeshes); after that, if we need more independent faces, we have to add another object and start to add faces to that, repeating until we are done. 

In the viewer rendering pipeline, every submesh results in a separate package of data (known as a drawcall) being sent to the GPU (this parcel of goodies gets unwrapped by the GPU and used to draw part of the final image that will appear on your screen).

This is important because "drawcalls" have a substantial overhead. You can picture it like this.

We have a production line in our living room, making little sets of triangles and sending them off to our client (the GPU). We cheerfully pack triangles into a box, along with all the necessary paint and decorations needed to make them look pretty, wrap them up securely, put a bow on top, walk to the post box, and pop the box in the mail to the GPU. 

How long does this take us?

If we break this process down, we find that packing the triangles themselves is remarkably quick; we can pack 10,000 triangles rapidly (for illustration, we'll say 5 seconds). But packing it into the box with all the other paraphernalia and walking to the post box with it takes an awful lot longer (let's say 5 minutes just for this illustration). In fact, it takes so much longer that it doesn't really matter how many triangles we are cramming into the boxes; the time spent dispatching them will dwarf it.

If we had a mesh body of 110,000 triangles to send to the GPU. Placing it in one large box will take us, 
11 x 5 seconds to pack triangles in a box = 55 seconds (let's call it 1 minute)
  1 x 5 minutes to send that box. = 5 minutes 
The total time to send our body was 6 minutes. 

If instead, we chop up the body and send it out in 220 separate parts:
11 x 5 seconds to pack the triangles = 55 seconds (it is the exact same number of triangles)
220 x 5 minutes = 1100 minutes
Total time to send our body is now 1101 minutes, or 18 hours and 21 minutes.

That Mesh body your avatar is wearing is very likely to be in the Render time equivalent of hours rather than minutes. We'll be looking at some "real" numbers next time. 

To put this in context, here are some "typical" numbers collected in an in-world survey, jumping around places in SL. 

BodyTotal facesaverage visible faces# times slower than best in class (higher is worse)
Maitreya Lara30423012.78
Legacy147134018.89
Belleza Freya111619010.56
SLink HG redux149301.67
Inthium Kupra83181.00
Signature Geralt9033708.22
Signature Gianni11594319.58
Legacy Male10461743.87
Belleza Jake9074018.91
Aesthetic2052054.56
SLink Ph. Male BOM97451.00

Notes:
I quote the average number of visible faces instead of the total faces because the drawcall cost of fully transparent "submeshes" is avoided in almost all viewers. This number varies depending on the outfits worn, so it is only fair to judge by the "typical" number visible during our sampling. 

As you can see, we are all paying a remarkably high price for the convenience of alpha slicing.

Bodies are, of course, the number one offender, closely followed by certain brands/styles of hair and heads.

Don't blame the body makers.

Let's be clear why we have ended up this way. It is the lack of a clear understanding of how bad this was. In this ignorance, many designers, body makers, and most importantly, you and I, the customers, have not only let this continue, but we have also encouraged it. Demanding more cuts, more flexibility. I've seen blog posts and group messages written out of complete ignorance, asserting that "[body creators] who have forced their users to adopt BOM and removed alpha cuts" had "got it wrong". 

No, they got it right; this was entirely what was hoped for; it's just that people were not ready for it, and arguably the tooling and support were not either.

So what now? How do I get a lower lag body? I like my wardrobe 

Write to the CSR of your favourite body and ask for a cut-free edition. 

Let's face facts. We all like our wardrobes, and we don't like to give that up, and we don't necessarily need to. The same body, the same weights, can work without the cuts. None of your clothing goes to waste, though you will need alpha masks to take the place of those HUD-based alphas and the "auto-alpha" scripts. 

If enough of us ask for an efficient version, then one of two things will happen:
EITHER:
The body makers you contact will provide uncut versions alongside the cut versions and share some of the knowledge as to why the uncut ones are better. 

I sincerely hope that they will; in the end, this ought to be less painful for them - they have to spend a lot of time slicing those models up and tweaking things.

OR:
New bodies will fill the gap for performance, and slowly people will move over. We already have two female cut-free bodies for those willing to move. 

There is, of course, a second part to this. Those lower lag bodies need alpha layers, and clothing creators need to start offering alpha layers; with the recent success of the Kupra body and this finally happening anyway, and over time this will help make clothes more flexible as they will no longer have to "cut the cloth" to where the alpha cuts are. For older clothes,  it is worth noting that many outfits do not need an alpha; many more can work with standard off-the-shelf alphas. So all those old outfits are probably fine. What is more, if you place the alpha layers into the outfits folder with the clothing, then they will be automatically worn and replaced.

Finally, remember, this is not do-or-die. We can still choose to wear an older laggy body when we want a specific older outfit that we haven't managed to get an alpha for and for which standard alphas do not work.

This is all very well, Beq, but what about X?

There are undoubtedly a bunch of you going, yep, but my XYZ outfit needs blah and what about this corner case over here?

None of those are going away. You will still have the choices. If there is an apparent reason why you need to have a more complex body, then nobody is stopping you. All I am doing here is waving a red flag to highlight just how much damage this "convenience" function is doing. 

For example, time and again, people raise the "oh but I must have onions cos BOM has no materials", The appendix talks about this. If you need onion skins, you can have them, but if you have them on a 12 mesh body, you'll now have 24 meshes. If you have them on a 240 mesh body, you'll now have 480... kill the cuts. After that, I'll moan about onion skins, but it'll have far fewer teeth :-)

If we can move to a saner "normal" where bodies aren't made up of hundreds of tiny fragments, then we can all afford to have extra onion skins for our materials, etc., without breaking the bank.

Can't the viewer make this better?

Not entirely, no. In the course of investigating this, I have identified a few things that can be optimised. But, even if I make the rendering cost of tiny meshes more efficient, all we do is move the scale a little; things would still be 10 or more times slower, and more importantly, we are still wasting time doing a largely pointless activity.

People who know me will have heard me mutter, "The fastest way to do an unnecessary task is to avoid doing it at all." when looking at optimisation. In fact, an amused friend recently sent me a link to this exact engineering philosophy coming up in a SpaceX interview

In the long term, the picture does change a bit. One day, we are promised that the viewer will migrate to a more modern pipeline; when we reach that point, the overhead of these so-called "draw calls" will be diminished, and the argument may flip back towards total triangles, etc. 

However, that is not yet. In fact, that is likely to be a good couple of years away, and keep in mind that some people do not even have a machine that can run a modern rendering pipeline. 

To put it bluntly, we can act, all of us, me, you and your friends, we can act now, to fix this problem and move to a far more sane world where we can go to large events without having to disable all our settings. Or we can wait and hope things get better naturally before everyone abandons SL for being a laggy swamp and goes someplace else.

Finally, there are a few "nice to haves" that we could really do with a mix of viewer features and server-side features. 

1) An easier way to make alphas; this is something I have proposed to the lab in a Jira feature request but is not currently on the "book of work".

2) We also need a few UI/LSL tweaks to allow HUDs to wear/remove alphas without things like RLV. 

3) Perhaps this is really #1. We need solid, reliable tools that tell us, both as creators and most importantly as users, whether an item is efficient or not.

But none of these is required for things to start moving forward. But at the same time, all of these "nice to haves" are entirely doable. Nothing here is out of reach. 

Come on then, Beq, prove your point.

OK, so there's a lot of blame being assigned here, mostly to ourselves as users, so I'd better have a good argument to back this up... I think I do, so before you flame me in the comments, let me show you why. My next post will share some empirical data gathered through weeks of performance testing that supports my arguments; I'll show you the extensive testing done and explain how you might try to do something similar to see it for yourself.

UPDATE: Part 2 is now online - warning statistics ahead 9/10 of you will be bored the other half will love it.

Appendix

But what about ...?

Here's a quick appraisal of the typical perspective on lag. The high-level summary is that most of these are valid observations and identify some form of inefficiency. Whether they contribute to FPS loss is another question, and until we get rid of the massive issue around draw calls, their effect is moot.

The poor LODs and poor LOD swapping:

There are several issues here that get conflated, and while each is a valid problem, they do not, on the whole, impact your rendering performance (hear me out on this).

For a long time now, I have been ranting about how poor LODs affect the quality of our lives in SL. This has not changed; moreover, the bugs that affect Rigged mesh rendering mean that they rarely get shown even when a creator has provided good LODs. 

So that's two issues, 
1) Creators do not provide proper LODs 
2) The LODs are not shown. 

Number one is moot if number 2 is not fixed and number 2 is not being fixed by the lab because of number 1.  That is to say that if we make rigged mesh behave as it "should" (thus fixing point 2), then your clothes would vanish very quickly because of point 1, which leads to grumpy people. 

We are, as they say, between a rock and a hard place...

But does it affect the FPS? Yes, to some extent. If you have a high detail LOD that is being drawn at a distance where most of the triangles resolve to a small number of pixels, then the GPU has to shade each vertex, and this can mean that a pixel in the final render is being shaded more than once, this is known as "overdraw". It means that your GPU is doing a lot more work than it needs to. Simply put, if every pixel on your screen was shaded once and then had to be done against, then clearly, it takes twice as long as shading just once would. This is a great example of a GPU bottleneck, which we rarely see in SL due to the draw call problems that dwarf everything else. These are real problems; they are just hidden from view right now. 

So it's those feet, those invisible feet!!!?

Or increasingly "OMG those lag inducing multi-style hairs". 

You'll see this on many blogs that discuss complex bodies and the impact of rendering, and to be fair, this is entirely plausible, and I have believed it myself. However, tests prove otherwise. 

Due to limitations in the Bento body, we do not have usable toe bones nor the morph targets that would allow us to distort the foot meshes to fit our shoes. As a result, we have multi-pose feet on our bodies. 

Once upon a time, I used to wear SLink single pose feet; I would wear the flat feet in a sandals outfit and the high feet in a high-heels outfit, etc. At some point, market pressure for convenience won out, and body makers started to package all the feet together. Now, when I am wearing my feet, the mesh will be many feet, all bundled up. If I have 6 poses for my feet, then 5 of these will be fully transparent. Leaving just the one set visible. 



We also see a similar trend in hair; rigged hairs with "style HUDs" that allow us to alter the appearance.

I am as guilty as anyone for thinking that this causes significant (wasted) effort and thus lag in the viewer. Undoubtedly, there is overhead; let's not ignore that all of this data has to be downloaded, unpacked, and held in RAM...This is all wasteful, BUT it is not a significant contributor to the rendering lag, which is what we are focussed upon today; this is primarily because most if not all viewers now have an optimization in place that prevents the rendering of fully transparent meshes (thanks I believe to NiranV  Dean's nifty optimisation). 

A problem for the future? Probably. A serious issue right now... No, there are far bigger fish to fry.

Is it Triangle counts?

OK, this is the high-poly elephant in the room, I guess. 
"OMG, XYZ body is 500,000 triangles. What a nightmare, no wonder it lags me out."

This, as it turns out, is highly subjective, and while triangles ultimately do impose a certain load, it does not at the present time matter anywhere near as much as it should. 

There are so many uninformed, technically inept, or just plain lazy creators out there who throw absurdly dense meshes into SL* and as with the poor LODs above, they are undoubtedly a source of lag and of load, causing overdraw and unnecessary, or at least inappropriate, data storage and transfer. This is predominantly going to manifest as GPU load, with a side-helping of RAM pressure and cache misses.

Thus, the full answer is more subtle and depends a little on your computer. In general, if you have a dedicated graphics card (GPU), then the chances are it doesn't matter that much in terms of FPS right now. Remember our box packing antics? We need to pack an awful lot of triangles before it starts to approach the cost of dispatching the box. If you have a machine that relies on the onboard graphics, then the story is slightly different, but even then, the number of triangles (within reason) is not the number one cause of lag. 

I'll illustrate this more clearly when we get to the benchmarks and numbers (which may be tomorrow's post)


* In their defence, creators (clothing creators in particular) are under pressure to deliver new content at a high rate and for surprisingly low returns (for many). If you consider the amount of real-world time that has to go into producing an item and then consider the Second Life price and shelf life, you can start to appreciate that corners are cut to meet demands. So, once again, we are in part to blame; we, the consumers, do, after all, create the marketplace and the demand. Far too many of us accept inefficient content, and perhaps more importantly, we limited tools to identify well-made, efficient content


Is it Onion skin meshes?

"So.. it's those damnable onion skins, I knew it." 
Well, yes and no, The onion skins are indeed one of the contributors, but they are to some extent guilty by association with the actual FPS murderer. Every onion skin is another draw call, and each onion skin layer is a complete skin; it doubles the number of drawcalls. But doubling 6 meshes into 12 is not anything like as much of a problem doubling 60 into 120 or 120 into 240. Every onion layer you add to your body made of N meshes, you add another N drawcalls. 

It is for this reason that people who moan at me, "Oh, but we must support materials, and we need layers", get an unconcerned shrug. If we lived in a world where all our bodies were 10 meshes. If some people need to have 20, then that's fine. You are not going to notice, especially when most people will have them empty. Moreover, you could have onion layers as an optional wearable. Thus avoiding the cost for the majority of us and giving optionality to the others. Ironically, Maitreya detached the onion layers in just this manner and got nothing but grief for it!

End of Appendix.