Thursday 17 November 2022

Alpha blend issues? Get them sorted.



OK, so I'll accept that the title is a little clickbaity, but bear with me, what I am going to explain here will make managing your rigged mesh alpha a lot more predictable.

Making it easier to avoid alpha-clashes with Second Life and OpenSim outfits

No doubt we've all stumbled across the issue of alpha blend items clashing with one another. Most of us just accept the shortcoming with grumbles, but creators have tried with varying levels of success to navigate their way through this to minimise the alpha issues.

Recent updates have enabled us to use a more deliberate and robust way to achieve this. 

The aim is to move away from the left-hand "Before" image and achieve the right-hand "After" image more reliably.


What's the big deal, Beq?

In the past, there was an implied ordering that was somewhat inconsistent or, at the very least, undocumented and thus fragile, meaning it could well break in the future.

When Linden Lab was developing the recent performance improvement updates to the viewer that were recently delivered, there came a point where a choice had to be made in order to enable alpha-blend rigged meshes to be batched under more circumstances. This choice basically boiled down to ensuring that rigged mesh items were sorted, like everything else. The problem was that sorting them caused numerous things to break. Ultimately, what was deemed to be the "least damaging" sort order was settled upon. This sort-order was effectively setting in stone the implicit ordering that we had before, but it does have some gotchas that we'll come to. 

What does the ordering mean?

In the "before image" above and at the top of the blog, we can see that the alpha of the hair is not correctly interacting with the dresses underneath it. This is because the ordering is incorrect. when the hair is being drawn the dress has not yet been drawn, and as a result, only my bare skin shows as being "underneath". 

By ordering the attachments explicitly, we can ensure that the hair always appears to be on top and thus that any items worn closer to the skin will "render as if under" the hair.

OK, so how does this new thing work?

An excellent summary of the new sort order posted "anonymously" can be found on Tumblr. This summary was initially put together by a well-known hair creator who then discussed this with another creator and me; they have decided to remain anonymous as they don't want to become the focus of all the questions that will ultimately come of this and, of course, the anger if it should ever stop being true.

Before publicising the new rules too widely, I suggested that we wait until I could confirm that Linden Lab (specifically Runitai Linden, the rendering team lead) were happy that what we were publishing was indeed something they (and thus we, i.e. Firestorm) would look to maintain going forward. 

I specifically want us to be able to move on from creating items that rely on folklore towards robust rules that are guaranteed at least to the point where any future breaking changes would be obvious to LL and allow them to give us fair warning. 

Here is the link to the Tumblr post itself. It is a great concise read and includes a great example of fragile folklore that has been depended upon in the past.

You used to be able to fix this by using bump and specular maps to assign priorities

While the above quote may be true, it was not something that was deliberately defined by anyone and thus is not something anyone should be relying upon. when such things break, we often have to duck and dive through all kinds of weird paths in the code to help fix someone's favourite little hack. The result is bloated, slower code that is extremely prone to breakage (fragile) and may well impose severe limitations on future development. This is not good for anyone.

So let's have a quick look at the post and the list.

"There are two things that decide the priority: the attachment point, and the root prim transparency.

Let me explain. LL recently added priority to each attachment point, here is a breakdown of what priority each bone has. Please note that the lower the number, the higher the priority:"

This ordering is actually defined in the avatar_lad.xml file that is distributed with the viewer. Each attachment point entry has a numeric ID specified, and it is this that defines the sort order. 

The list above includes all the attachment points for completeness. Keep in mind though that the HUD points don't really count as nobody else will see the attachments on those.

If you would like this as a notecard then I have placed it in a box in my "factory/shop" 

SLURL to Alpha Rigged Attachment Order List

As the blog notes, this means that a rigged mesh hair attached to the skull will render as if it is on top of a mesh head that is attached to the chin attachment point. Creators can now use this ordering to give the most logical layering of alpha items, whilst end-users can tweak their outfits to avoid clashes by changing the attachment point. 

Caveats

There are a couple of issues that need to be highlighted here. 

1) Make sure your root prim is at least partially transparent.

This first point is important at the moment and one of the primary sources of "content breaks" on the latest viewers. It alludes to the fact that the above ordering is enforced only for items that share the same root prim state of alpha-blend. Rigged items with a semi- or fully-transparent root prim will prioritise with one another as above; meanwhile, items with a solid root prim will similarly sort amongst themselves. However, those with a solid root prim will always draw as a lower priority (render as if behind/underneath) those items with transparency on the root prim. 

The reason behind this anomaly is deep down in the technicalities of the rendering pipeline, and while it would be preferable for all of us to have a consistent behaviour that did not get impacted by this weird and arbitrary anomaly, changing this behaviour is not easily achieved without risking many other breakages and thus for the time being creators are advised to use transparency on their root prim (which is frequently a block). Because, at the present time, the majority of items seem to have transparency, and moreover, there are cases where the item cannot work without it (because the root prim cannot be buried inside the body/head, for example), the strong recommendation is that creators converge on the standard of ensuring that the root prim is at least partially transparent. 

Important: While the attachment point priority changes have been confirmed as an intentional change and should stick around going forward, the root prim transparency part is not confirmed. This means that it might not remain true in the future. 

There is a hidden performance trick in here too.

Further to making your products behave nicely with others, I would advise anyone using a hidden/buried root prim to make that prim fully transparent; doing so will remove that prim from the rendering overhead and reduce the overall cost of rendering the attachment. Another performance win. It will also ensure that should a user want to attach to a non-standard attachment point, they won't have your logo box poking out of their face.

2) Make a copy if you need an item to be worn in different places with different outfits.

The final caveat is perhaps more subtle as it only applies if you have an item that is worn with many different outfits and might have a different priority within those. You may have noticed in the past that when you attach an item to a specific attachment point, it will remember that change, so every time you wear it, it will return to the same point. 

This means that if you have saved 5 outfits with "My little black dress" at priority 17,  but for some reason, you need to change it to priority 20 for a new outfit. The next time you wear one of the original outfits, the dress will have moved to the most recent priority. The simplest way to avoid this is to make a copy and consider renaming it with the priority number.



 

Tuesday 20 September 2022

Announcing Local Mesh

Update: January 2023 - Local Mesh is now part of the Firestorm release.

 I've been variously dreaming of and promising local mesh for a few years now; it has always been near the top of the TODO list without making it there. Now, finally, and due primarily to the efforts of Vaalith Jinn, who wrote the underlying mechanics for local mesh we are ready to unveil local mesh to the creator community. If Vaalith's name has a familiar ring to it then that may be because Vaalith was the originator of the current local textures feature we've all enjoyed for years.

Local Mesh - What on earth are you talking 'bout Beq?

Local Mesh is to mesh creators what local textures are to all you designers and decorators. Put simply, it allows you to preview a mesh from the Collada file, inworld, where you can texture it and even wear it before uploading it for other users to enjoy. 

The next version of Firestorm (coming very soon under our new experimental accelerated release model) will have a "local mesh" option on the build menu. I should note here that this is very much the "first pass" while it has been tested to be functionally stable, there are many rough edges in terms of usability that we hope to address in further work, but we were excited to get this into the hands of our creators sooner rather than later so that we can also incorporate feedback into that future direction.

Oh! I see, cool, how do I use it?

Selecting the "local mesh" option will pop up a "floater" through which all your local mesh interactions take place. Similarly to the local texture workflow, you create a list of "local" resources that the viewer will monitor. Use the [Add] button to import a DAE as you would normally; if you use properly named files the LODs will be loaded automatically too. 


Selecting a loaded mesh from the list of local meshes, while simultaneously editing a full perm mesh "surrogate" will allow you to apply the parameters to the surrogate, making the mesh appear in your viewer. Onlookers will see only the original. You can also use the "rez selected" option to create a "surrogate" in-world and automatically "fill it" with your local mesh goodness. 

What about rigged mesh? Bet that doesn't work...

Sorry, you just lost your beer money. Rigged mesh is fully supported, with the only limitation being the need to have your own surrogate mesh (it can be any full perm mesh created by you, in fact, I often use a simple mesh cube). Attach the object and make sure it is open for editing (click on it inworld or through "edit" on the worn tab in your inventory), then just apply your local meshes. 


And yes, it even works with Animesh.

How do I update my local meshes?

The real value of local mesh is that we can now edit these and update the DAEs and quickly see the results in-world. Unlike with local textures, we do not at present auto-reload meshes, this is because the complexity of loading a collada file is far higher than that of a simple texture and for this first release it is not something we have supported. Instead, we have a refresh button that will scan all of the locally loaded mesh assets and refresh any that have changed since last refresh.

So are these just the same as real uploads?

In most regards yes. You can apply textures, both from inventory or using local textures; this includes normal maps and specular maps too.


You can resize (except for rigged meshes), move them.

But...keep in mind that what you are really scaling/texturing is the surrogate object and once all this is over that object will lose the mesh override but all the other changes will remain. 

One thing you cannot do at present is link them.

OK, and this is just visible to me right?

Yes, only you will be able to see the meshes, other users will see only the normal appearance of the surrogate object (including texture changes etc). This highlights another of the rough edges of this first release. The current "Rez" tool, crates a surrogate prim that will appear to other users (and yourself after a relog) as a small flat panel. In a future update, I plan to have a more concrete and visible placeholder object and ideally a way to rez directly as an attachment too. 

Here is what Liz could see while I was taking these photos.

I want it now... 

KK, I hear you. It's coming real soon. The new faster release cycle Firestorm should see this available to you in the next week or so. 

As of January 2023, Local Mesh is available in the official Firestorm Release. If you are on a version prior to 6.6.8, then you will need to update. 

And a final note with some background

So Beq, if you wanted this so long, why didn't it happen before?

The primary reason for this is that it requires a lot of sustained focus to get from the initial proof of concept to something usable, and as a solitary volunteer developer working in my spare time, there are always distractions, commitments and other more urgent things to address.

The concept b behind local mesh is simple enough once you know how mesh assets work in the viewer; you just have to create a placeholder "Surrogate" and locally "Implant" the mesh into that surrogate prim. However, as soon as you step beyond the bare basics, all the corner cases jump out and demand attention. How do we rez them? Where do we rez them? Where in the workflow should they appear, what happens if there are errors in the mesh? How do linksets work? The whole idea space gets crowded, and then a bug report comes in, or something needs updating on one of my other responsibilities, and the whole thing gets put to one side until "later". 

What it needed was someone willing to do much of that initial graft, to not get distracted by the complexities of what might be and just focus on a minimum viable implementation (MVP, yeah, I  know MVI but whatever) with no scope creep and importantly as few other distractions as possible. Early this year, that person appeared in the form of Vaalith Jinn. Vaalith (as noted above) is not entirely new to viewer development, having previously contributed the much loved "local texture" feature. When Vaalith said that they were working on local mesh, I was excited and we discussed our shared vision and where those visions differed we came to compromises and agreed on what the MVP should look like. With occasional back-and-forth discussions, the code grew to a point where, a month or so back, Vaalith presented the initial version for me to integrate. I have since tinkered with it a little to make the user interface a little more standard to FS and to incorporate a basic but functional "rez" function to place a local mesh in-world. We are now ready and excited to unleash this. 


Thursday 21 April 2022

How to manage your FPS for events

 Fantasy Faire is back, so how can we optimise our experience?



Fantasy Faire is one of the most popular annual events on the grid. Featuring stunningly beautiful regions packed with retail delights, alongside art and entertainment events it also draws massive crowds. This heady concoction does not bode well for frame rates. In this blog, I'll go over a few tricks that we can use to improve our performance and enjoyment.

The challenge posed by the Faire is manyfold, we want the best of everything, we want to see the scenery at its best and unlike more mundane shopping events, you often want to see other people. "Watching the Faire folk" is a spectator sport frequently enjoyed by Fairegoers. Observing the visitors in their fun avatars and outfits is as much a part of the Faire as shopping and supporting the charity is. So while, for a typical shopping event, I might suggest disabling avatars entirely or using the "Show Friends Only" function, for the Faire we want to keep as many people visible as possible.

At the same time, we have all these gorgeous worlds that have been built (I can say that without reservation this year as I have taken a year off ;-) ) This means that we want to keep as many features enabled as possible. A perfect storm for lag, and a conundrum for us.

Step 1: Getting yourself ready...

During this blog we'll come back to the new performance floater a number of times so take a moment to locate it. You can find it on the "World" menu, it is also on the "Advanced" menu, as a sub-menu of the "Performance tools", in both cases it is listed under "Improve Graphics Speed". You can also add it to any of your toolbars by opening the Toolbar window (right-click on any toolbar and select "Toolbar buttons...", then pick the "Graphics Speed" button and drag it to the toolbar you want it on. 


While we are looking at the floater, take a moment to note the top few lines. The FPS needs no explanation really, but the summary line below that is going to help us.  The frametime is how long each frame is taking. It should be approximately 1000 divided my the FPS and is measured in milliseconds (a millisecond is 1/1000th of a second). 

Look at the UI number. Notice that if you have chat windows and inventory windows open, then you can increase your FPS by closing them, even minimising them will help.

HUDs on the other hand are misleadingly low impact on the FPS. For the most part, they have a comparatively low direct, measurable impact, but if you have your gorgeous, texture heavy HUDs that come with your head and your body, or any other glossy looking HUDs, then remember that those textures are taking up space in your computer's memory and that of your graphics card. This means less space for everything else and it increases the amount of "shuffling around" that has to happen. 

As such best practice is to remove all the HUDs that you can; this also has the benefit of reducing the script load on the regions that are already heavily taxed managing countless vendors and other scripts.

Use Firestorm's "favourite wearables" capability to get quick and easy access to those HUDs you need on a more frequent basis. e,g, I keep my SLink HUDs, My Lelutka head HUDs and my AO and others all there for quick access. 



You can also hide all HUDs using "Alt-shift-H", though keep in mind that this does not stop their scripts which will be adding to the server load.

Step 2: Set a good example and de-lag yourself.

Before we depart for the Faire, we should do as we would like others to do and reduce our impact on their  Faire experience. This means reviewing our outfit(s) 

Tip #1 - Try a new look and ditch that laggy body for your visit.

Leave your Maitreya/Legacy/eBody/Signature bodies off until you need them and try a fun fantasy look for a change. Thsi will make you less of a burden on other shoppers in the crowds.

Alpha segmented bodies are a disaster for frame times. Pick a low impact body and why not use the occasion to get in theme. Many fantasy avatars have a very small performance impact by comparison.

By way of example, here is my normal Avatar, which already uses a low lag SLink Redux body, switching to the fun paperfriends steampunk avatar I bought at the Faire last year.


https://gyazo.com/b1f55c19a6f279a285e4d7548ecdd99d

Even though my SLink body is about 1/8th the render time of a typical "popular" body, this fun foldable me takes just 15 microseconds to render, which means I could have 20+ of these avatars around me for less than the cost of a single eBody reborn. 

But, it's not just these extreme low lag avatars; here is a friend demonstrating that size does not matter. This enormous dragon avatar is a fraction of the cost of her regular body choice.


Tip #2 - Use the outfits function in the viewer to set up one or more appearances. 

"But I want to shop for my normal body" I hear you cry. 

If your typical avatar appearance includes one of the popular but very laggy mesh bodies then consider creating an outfit for this, and one for the low lag fantasy avatars, this will allow you to switch whenever you are trying on your purchases. 

Be realistic but be considerate too. Nobody is expecting everyone to just suddenly stop using these poor performing bodies overnight; we're all too attached (no pun intended) to our wardrobes and bodies and we all want to show off our new purchases, but take a moment to review your own render times and if you get the chance, switch to the low lag options and have some unencumbered fun.

Tip #3  - Ignore ARC - it is misleading, wrong and counter-productive.

When assembling your outfits for the event, ignore the ARC and push aside your preconceptions.

The dragon image above is a great example. The Mesh body is 5 times slower to render than the entire dragon, yet the ARC would have you thinking something else. The ARC is just wrong, it is based upon outdated and incorrect assumptions.


So if ARC is bad then what do we use? 

With the latest Firestorm, we have provided a better tool to assess your true impact. The render time, we'll use this to determine which of our attachments are causing the most lag and we can decided to keep them or swap them.

Focus your camera on yourself so that the viewer is showing your avatar, something similar to the photos of me above; then use the "attachments view" of the performance floater to see which of your attachments is the slowest to draw. Try variations, especially with hair, and see how things perform. You may be surprised. As a rule, the worst offenders are typically your body or your hair, or both.

Tip #4 - Make the best of a bad thing

If you are using a segmented mesh body like Maitreya/Legacy/eBody etc. use the HUD to turn off as much of the body as possible; anything that is covered over should be disabled, auto alpha often does a good job but see if you can get rid of more. Every segment that you can set to invisible is a boost in performance for everyone looking at you.

If you are tempted to wear particles, don't, just don't; remove them immediately. They are a considerable burden to rendering. They won't, however, show up in the stats on the current version of the tool as I had no isolated worn particles from those inworld. while you will not see the item render time change, you will see the FPS drop.

You might want to turn down the max particles setting in the graphics preferences too, but don;t for get that particles are often used for effects in the regions.

OK! We've optimised ourselves. We are now ready to go. 

Step 3: Quality vs Speed, the managing the balancing act. - Global settings and Scenery

There are some decisions we need to make, we may want things to look their best, and we may want things to run their fastest. Typically we need to find a middle ground that we're happy with. 

Your choices in the next few steps will be dictated by your hardware/network and your personal preferences.

For the best looking experience, you want to ensure ALM is on, Shadows are enabled and Water reflections are set to their highest quality. These come at a high cost though. The next few paragraphs will explain more.

ALM (Advanced Lighting Model) aka deferred rendering.

For many people, turning on ALM should have a minimal impact on their FPS, for some ALM may even boost performance. This is because more processing can be moved away from the CPU and onto the graphics card. When testing this, ensure you are not mixing up ALM with shadows. Turn off shadows if they are on, by setting them to "None" in the preferences/graphics panel. Let your FPS stabilise and make a note of it, then turn off Advanced Lighting, wait and let things settle again. 

If toggling ALM off and on is having minimal impact on your FPS then I would strongly advise you to leave it on. It is very hard to see the regions as the designers envisaged with ALM turned off.

People who may wish to keep ALM off are those with limited RAM (I would suggest anything less that 8GB is considered limited) and those with slow or metered internet connections.

Shadows

 A lot of people think disabling ALM gives them a massive performance boost, often though this is simply because doing so also kills shadows. Shadows are a major source of render time, they are also, of course, a major aspect of good lighting and atmosphere. Rendering shadows requires that every object gets drawn from the perspective of the sun and other lights, this overhead means that shadows at least double or even triple the amount of time it takes to draw a scene. 

Keeping shadows on or off can depend on what your objective is. If you want nice photos then shadows are imperative, if you just want to experience the region as the designer intended then, again, you need the shadows. However, some lighting doesn't cast distinctive shadows and yet the effort to draw them is the same so consider creating a graphics preset with shadows and without so that you can quickly switch back and forth.

Water reflections

Water in SL looks lovely when all the full reflections and refractions are being drawn, but for technical reasons, even when you cannot see the water all those little ripples and highlights are still being drawn.

The ripples and reflections mean that every object that might possibly be reflected is redrawn, the more detailed and cluttered the region (meaning the more "stuff" there is) the more work is being done for the water, even if you cannot see the results of that work. 

If you are in a region where the water is not visible, or if you do not care too much about the quality of the water, then set Water reflections to None:Opaque. 

In general, it is worth taking a moment to use the performance floater and experiment with different settings to see the impact. In my experience, there is little to no performance difference between the top-level water reflections and the so-called "minimal", however, it does depend on how much "stuff" is around. Slipping all the way down to "None:Opaque" will grant you a significant FPS boost. 

Create a preset with good reflections and no reflections so you can switch easily.

Step 4: Quality and Quantity

If the above three options are all about quality of rendering then the defining factor is the number of objects that the viewer is having to draw. 

Draw Distance

The next tool we have is one we are all probably aware of, the Draw Distance. Draw Distance limits how far ahead we can see. Inside a city scene we can often afford to relax and shorten that distance while in the countryside, or out on the open plains we need to see further. By selecting a sensible DD we limit the amount of "stuff" that the viewer has to deal with this can have a dramatic effect in a busy scene.

Max non-imposters.

Whilst this is technically an avatar setting and we'll talk more about those shortly, it behaves similarly to DrawDistance. When we are in a crowd we can limit the number of avatars that are being drawn at full detail using this setting, but what people often do not realise is that the viewer draws the closest Avatars first. Thus if we set the max NonImpostors to 10, the viewer will draw the ten closest avatars in all their glory but use the imposters (flat cutouts) for those further away. With Imposters being carefully managed you can often improve scene rendering times considerably when things get crowded.

We have now covered the major settings that have a direct impact on the quality/speed balance.

Play with these settings and create yourself a set of graphics presets that you can switch between as best suits the region you are in, and the activity you are enjoying.

Step 5: Friends, family, strangers and lag

At the top of this blog, we walked through choices that we might make to optimise our appearance. Some of you will have followed them; others will have resolutely refused to reconsider your choices :-)

Irrespective of your decisions, we will find many others that have chosen to stay as they were or *shockingly* have not even read this blog ... I know, it's hard to believe, right?

We've made the best choices we can about our scenery; what can we do to protect ourselves against laggy crowds?

We have been trained (erroneously) to use complexity to limit those we render. Complexity still has a role in blocking the extremely slow, but as we observed before, using ARC for any purpose has far too many false alarms and makes people take the wrong decisions.

With the newest Firestorm, we now have render time. I've discussed the background to Render time in earlier blogs, but what we can do now is use render time to automatically force laggy avatars to behave better.

On the Avatars nearby tab, we see the Maximum render time slider. You can use this to manually set a cutoff. What this value should be will depend on your hardware. You can get an idea by going to a club, an event or a busy store and seeing how long typical avatars take to render. 

The render time limit allows the viewer to take some avatar specific actions. If your machine is capable enough and you decided to render shadows in the early sections, you'll find that avatars have a massive impact on your FPS because every avatar takes up to 3 times longer to draw than it would without a shadow. Using the render time limit, we can set the threshold so those complex and slow avatars (yes, those with the segmented bodies, did I mention them yet?) have their shadows unilaterally disabled, allowing you to still render them fully, but without the shadows, a much better visual option than the horrible grey mess of a jelly doll. 

If an avatar exceeds your threshold once its shadows have been disabled, the viewer will create an impostor image. Unlike the JellyDolls, this impostor is typically fully detailed and, from a distance, looks relatively normal. The most noticeable effect is that they will animate far slower, and if you get up close, they will look pixelated.

Use the limit carefully, and remember to set it back to high/unlimited when you want because (in this initial release of the feature, I have not included a simple reset button (it's on the TODO list)

While there is no magic wand that I can wave to make things run faster, and in the end, your machine can only do so much, following these tips will help you understand the impact things have and how you can limit and control them. 

Final words - Autotune and defaults.

In closing, I should mention the AutoTune feature. 

The new Autotune capability of Firestorm is my first attempt to bring some degree of automation to the settings discussed above. Based upon the scene and the target FPS that you ask it to achieve it will tune things such as the water reflections, the draw distance, and the avatar render time. It was specifically designed with crowded events in mind, and it will aim to keep the visual quality of the scene as high as possible using just the controls we have discussed here.

It has been working very well for many people, but for a few, it can be frustrating and a little disconcerting to have their settings changing as they walk around. You'll need to make your own choices. 

If all else fails and you end up with graphics settings that seem confused or broken, simply ensure that autotune is disabled, and click the "reset to defaults" button on the performance floater. 

Have fun; I hope this helps you get a little more out of your machine and a less laggy Fairelands experience.


Monday 21 March 2022

How to use the new Firestorm Performance Floater and Autotune feature

New Firestorm, new features 

The latest Firestorm release has a new feature (albeit one that I still consider experimental), the "performance floater". In recent blog posts, I've explained  why I created this, in earlier blogs and most recently, in "Upgraders of the lost ARC" I explained a bit about what it does. This post is all about "How to use Performance Floater". 

Bundled with the Performance Floater is the Autotune FPS feature, I'll also explain how this works, and how to best avoid getting yourself into a muddle with it.



For a more concise, and probably more readable summary of these two features please refer to Inara Pey's excellent write up, and you can also click on the '?' icon at the top of the floater to go immediately to the Firestorm wiki page, dedicated to this function, which will be maintained and updated as the feature develops.

What does Performance Floater do for me?

The Performance Floater shows you in real-time what parts of the scene are taking the most time, and slowing down your graphics. In particular, this first release focuses on avatars and allows you to see which avatars are truly lagging you; it also allows you to examine your own attachments to see how they perform.

Ok so what about Autotune?

Autotune is a first look at our attempt to allow the viewer to automatically manage some of your settings to attempt to give you the performance level that you request and keep you there.
Keep in mind, that 

Performance Floater best practices.

Initially, we will ignore Autotune and look instead at manual tuning and how the performance floater can help us.

Given that the motivation behind this feature was to highlight the damage that segmented bodies have done to SL performance, it will come as no surprise that it is most useful when applied in crowd scenes. I wanted to allow a finer-grained tuning experience that would allow you to enjoy a crowded club, or similar scene without having to de-render everyone and completely ruin the atmosphere. So how do we go about this?

Step by step - A quick guide.

Step 1 - Open the performance floater.

The first thing we need is information, it is hard to know whether you've improved anything without measuring before and after. To open the performance floater, we have a few options

Look for the "Improve graphics speed..." menu entry on either the World menu or on the Advanced->Performance tools menu.




It can also be added to your toolbars for quick access. Look for the Gauge icon on the tool pallette

Step 2 - Review the summary stats

We start with the overview panel, which tells us what our current Frame rate (Frames Per Second, aka FPS) is. You may also see a warning such as "User limited FPS" or "In Background", these are intended as a reminder that you are not getting the full potential because you have either deliberately limited the FPS or are "tabbed out" on another screen/application such as Discord/Chrome.

Below this is the summary data and the first clue as to what is happening to our FPS.



A more complete explanation of these numbers can be found here. What we can find here though is the first hint about what we need to do. 

Best practice - Start with the largest number as this is where the biggest gains can be made

If scenery is the largest number then we might start to think about whether our draw distance is too high etc. However, if we are sightseeing or taking photos, then we might want the scenery to be in full glorious detail. If we are shopping we want to see the goods on display, but don't really need to see the displays far away. Think about what it is you are aiming for. 

Most of the time you will find that either scenery, avatars or both are high. But occasionally we can find that the other numbers are worth a quick look. If the UI is more than a few percent then consider closing down unwanted chat windows and inventory, etc. If the "HUDs" costs are high, then you should remove as many as you can or simply hide them all using the "show HUD attachments" option on the Avatar menu (alt-shift-H on Windows). 

Best practice for general FPS health - Close unnecessary UI and remove (not minimise) HUDs 

Keep your UI windows closed and remove HUDs when not using them (the "favourite wearbles" feature is amazingly useful for this.)

"The scenery is killing my FPS"

In this first release, the amount of fine-grained tuning for scenery is very limited (I was focused more on the Lagatar problem). However, we provide quick access to a couple of the controls that make the biggest difference. 

Clicking on the graphics settings panel in the floater will take you to a subset of the preferences found on the main preferences panel. Here we can make wholesale changes to the quality of our graphics using the "Quality vs speed" slider, or tweak individual parameters. The main features exposed here are Draw distance, shadows and water reflections. Draw distance and water can be changed dynamically, and you can watch the impact. Changing shadows is slightly more disruptive as the viewer has to change how things are drawn and this causes a "pause", especially on slower machines.

Best practices for scenes - dumb down the water, shrink the draw distance, remove the shadows. 

In general, we want to keep the visual quality as high as we can, whilst still being able to move about. With this in mind, you need to pick and choose between the options.  Since EEP, water reflections and refraction have been a terrible burden (a fix is coming from LL, but it is not here yet), You can still have decent looking water, without reflections. Of course, the most obvious (but frequently overlooked is Draw Distance. It's simple really, drawing more "stuff" takes longer. reducing the draw distance shrinks the number of things that need to be considered by the viewer. In a club or shopping mall, shrink the DD to 64m and you'll be more nimble.

Water - If you are not near water, or do not care about water reflections then you should almost certainly switch water reflections to "None; opaque" this gives a big FPS boost whilst still leaving the water looking reasonably nice. For the biggest win, you can always fully disable water on the advanced menu, under rendering types. But don't forget to turn it back on.



Shadows -  The most visually disruptive change. I love to have shadows in a scene, but shadows will typically more than halve your FPS. So if you really need that extra boost then foregoing shadows is a good choice. Use the shadows setting on the floater or in the preferences. this is very useful if you are at a shopping event or club where the shadows are probably not that important.

Things not to do (probably) - Killing Advanced lighting - ALM. 



A lot of people automatically reach for the advanced lighting kill switch in preferences and proclaim the amazing boost they get. For many people, that boost is dominated by the fact that the shadows get disabled too. try turning off shadows only first. Disabling ALM can have detrimental effects on some machines as it prevents some GPU use, loading more on the CPU. However, if you are on a poor network, then ALM will reduce the bandwidth as materials will not be fetched.

OK, so that's the global scene dealt with.

"OMG the avatars are killing me"

When the statistics suggest that avatars are a significant amount of the frame time we can look at the avatars nearby and decide what to do.  It does not take many segmented avatar abominations to totally destroy your performance. Modern BOM bodies without the "alpha cuts" or segments are far more performant. So how can separate the good from the bad?



The bar graph on the left of this screen gives a quick visual indication of the costs. This was my favourite feature of the original Linden Lab design, though it was used to show ARC, which as you may have gathered from my other writing, is practically useless.

Along the top of this panel we have a slider, this controls the maximum time we will allow an avatar to take. On the screen above I have 23 avatars in the scene and they are taking a total of 50 milliseconds. Without going into a maths lesson this is a problem, a very large amount of time is being consumed. We can also see that the top 3 "offenders" (those avatars taking the longest) are a large chunk of that total. 

If we slide the slider from the right to the left the limit will decrease. In this example, I can set it somewhere around 3800, and any avatars above the limit will be "optimised".

The optimisation works at a fine-grained level that was not possible in the past. The first thing that we do is remove the shadows of the laggiest avatars, this will halve their render time. When this happens an 'S' will appear in the column between ARC and Name. The further you decrease the slider the more avatars will be affected. When an avatar has had their shadows removed and is still laggier than the limit you have set then we take the decision to force it into an Imposter. An 'I' will appear between the ARC and the name.

Imposters are not everybody's cup of tea, but in a crowded club, a few imposters can lift your FPS whilst still allowing a decent visual experience. Try it out and decide for yourself. Derendering is another option of course (not supported directly on this release but accessible on the people floater as usual), "Render friends only" is of course another choice but keep in mind that you cannot easily bring them back. If you don't want to see anyone at all (yourself included) then the check box at the bottom of the panel allows you to disable all avatars.



Best practices - go little by little, and don't forget to reset later!

The most common complaint during early testing was from people finding that they were seeing everyone as imposters. The slow animating, flat, low-resolution cutouts are great at a distance but not so nice up close. If you find you are seeing them everywhere then you probably forgot to reset your "Maximum render time" slider.

Things to remember: 

  1. Both this and Autotune change your settings. Use graphics presets to save and restore sane settings just in case.
  2. What you see is the cost of drawing this scene on your machine. Something not in view will show as very low cost. When looking at your attachments, make sure that your avatar is in view and not partially hidden.

Is it me? How can I be sure?

Your own avatar will appear in the list highlighted in yellow. Due to how the viewer works, it costs a little more to draw your own avatar than it does to draw others, so even if you are identically dressed to another avatar you will show more expensive on your screen (they will see themselves as more expensive too). 

However, you can check the cost of your own attachments by looking at the "Your avatar complexity" panel. This panel lists all of your non-HUD attachments and their costs. You can now see what is the most costly item you are wearing and decide if you can do better. You can also use this to compare the impact of different items, try on different hairstyles and see which ones are laggier. In the market for a new body? Grab the demos and wear them all, compare the performance as well as the looks before making your choice.

Can't Firestorm do all this for me? Auto tuning, the pros and cons.

Succesful Auto tuning requires a little restraint and some managing of your own expectations. 
There is nothing I can do to make your decade-old potato of a laptop, run at 50FPS in a sim packed with Mesh avatars. Not happening. However, Auto-tune can and will try to do the best it can for you. 

Basic Autotuning

When using Autotune we set a target FPS level and whether we want to adjust the avatars only or the avatars and the scenery. We can also decide if we want it to run continuously while we continue to enjoy ourselves or to run once and then stop.

Troubleshooting tip: why is my friend flat, pixelated and their animations slow?

If you unexpectedly see imposters everywhere then double-check that Autotune is not forcing your Render time limit too low. If so, turn off autotune and manually adjust the slider.




Autotune FPS - Best practice #1 start low. 

Consider this, you are at a club, surrounded by gyrating mesh bodies. Your FPS has dived to single digits, and not even high single digits, you just want to move around a bit but it is like wading through molasses. You can't really turn off all the avatars, because then you'll barge them all out of the way and spend the next half an hour apologising. 

Set the Autotune to something higher, but not too high, Try 12 FPS maybe? Once you have selected the target FPS, you can hit start. The target will be shown at the top of the floater, Starting as Yellow or Magenta, and hopefully turning green when we reach or exceed the target.

The Autotune will consider the factors and try to tune subtle things such as avatar shadows first. Then resorting to other measures. You'll see the Mac Render Time slider zipping to and fro. If you are too ambitious then the Autotune will try its hardest and perhaps overshoot, then undershoot and you'll be back and forth and not settling. Pick something comfortable and within reach and you'll find the experience more rewarding. 

Autotune FPS - Best practice #2 Avatars only or Avatars first?

By default, the tuning strategy will be set to "Avatars and Scene" this allows the engine to consider avatars first and then if it cannot get enough boost from the avatar tweaks then it will resort to scene wide changes. Which of these you want is very dependent upon where you are and what you are doing.

If you are wandering in a scenic region and there are a number of people around then you might select "avatars only" to ensure that you keep the scene at the settings you like but allow the engine to degrade the avatar quality of others as you walk around. 

Autotune FPS - Best practice #3 Experiment with autotune settings (but don't forget that you did)

The "gear" icon on the autotune panel takes you to advanced options. some of these are rather obscure and I won't explain them in detail here, but feel free to experiment. What is the worst that can happen? It will change your settings and everything will look weird. Use the Firestorm graphics preference save/load options to store a setting to return to should that happen.

Have fun, I hope that this feature helps.

Most of all I hope that through this new way of presenting the determining performance you not only be better able to manage your experience in SL but will learn more about the impact we all have on one another's Second Life. 

I hope to extend this feature with future releases and integrate it with the Linden Viewer so that a similar feature is available to everyone no matter what viewer they are using. What is more, the next few months should see some dramatic changes in the rendering performance going live in Second Life as Linden Lab has been working very hard on performance tuning. I hope to be able to adapt these tools to the "new normal", to provide more options and add more "intelligence" to the tuning.




Thursday 16 December 2021

Upgraders of the lost ARC

One of the things that my recent discussion around the issues with segmented, rigged mesh bodies and the issues they cause for viewers threw into stark perspective was that the current complexity metrics are not in the least bit helpful. In particular the mainstay of the current toolset "ARC".

So what is ARC?

Avatar Rendering Complexity (ARC) is a calculated value based upon an algorithm determined some years ago by Linden Lab. It scores an item based on a preset "cost" per feature, that in theory aligns to the overall rendering impact of that item. It has not been adjusted for many years and even assuming it was ever broadly accurate, it has been outrun by changes in hardware and in content/content creation. The calculation assigns a based cost and/or multiplier based on the type of features used, such as legacy bump mapping, flexiprims, alpha. 

Why is ARC wrong?

In part, it is down to bit-rot, it was seemingly based on tests in the past but times and features have changed, technology moved on but the code has not kept pace. However, it is more than just being out of date, even if the algorithm was broadly right for one machine how well would that transfer to another? We can all appreciate that the performance profile of a laptop with "onboard" graphics will differ massively from a desktop with a dedicated GPU. Every one of us has different aspects of our setup. There are too many things that depend upon or are affected by not just your machine,  but your settings and circumstances too.

Fine, ARC is not correct, but it's better than nothing, right?

This has been said time and again but frankly, I am far from convinced of this. Back in the summer, the Lab released their first look at "performance floater", this features a new presentation of ARC, ranking the "worst offenders" and allowing you to derender them. This is a great feature, in theory, except it is frequently pointing the finger at the wrong people. Because ARC is flawed, those at the top of the "complexity" list may not be the ones affecting your FPS at all; worse still because the numbers are misleading they can encourage entirely the wrong changes. I have seen people remove all their flexiprim hair and replace it with rigged mesh, their complexity plummets, but the reality is that they are more than likely taking far longer to draw now. A segmented mesh body will frequently have a lower ARC score than an unsegmented one, yet we have seen in my recent posts that the reality is quite different. The result of this misleading information is that people swap out perfectly good assets for dreadful ones making the overall performance worse all the while being lied to by the ARC.

I would argue, therefore, that false information is worse than no information. So what can we do?

Say hello to ART

ART is my cutesy name for the Avatar Render Time. It is a personalised score that reflects the impact of an asset on you and your machine right now, with these settings. It is not a like-for-like replacement for ARC, it is not a tweaking of the algorithm, it is instead a consistent measure of the specific cost of rendering avatars on your machine, with your settings.

Here is an example of the new screen that will be available in the next Firestorm release, taken at a busy Music venue.




This display is derived from the initial work of the Lab as can be seen in their "performance floater" project viewer released in the summer. The presentation has been "more or less" retained, but importantly, I have moved away from ARC. There are a few aspects of this screen that are worthy of note.

The frame summary



Alongside the display of the overall FPS number, we have a breakdown of the time taken to render each frame. 

The frame time is expressed in milliseconds (ms), a millisecond is 1000th of a second, in this case, the entire scene was drawn in 70ms, 70/1000ths of a second. 15 FPS is roughly (the FPS is an average of recent frames) 1000/70 and therefore if we want to increase the number of frames per second we must reduce the time each frame takes. 

Alongside the Frametime we have the proportion of time spent on various (quite broad) categories.  

UI - All the viewer user interface, the floaters and menus that you have open. It is worth looking at how much certain views cost if you are looking to maximise your FPS.

HUDS - The HUDs that you are wearing. I am a little wary of this number, HUDs can be very high texture overhead and while this increases the HUD rendering cost, it also increases pressure on other rendering as those textures displace ones in the scene form the graphics card. This interdependency is not something that we can accurately reflect at present. 

Tasks - The time spent keeping your viewer connected to SecondLife, processing all the messages and so forth. This should be comparatively low and stable.

This leaves, Scenery, Avatars and the mysterious "Swap". 

Scenery - The scenery number is a gross over-simplification, it effectively means all the rendering that I could not assign directly to a specific avatar (or HUD or UI). It will thus mostly be the environment around you but it can include certain overheads that cannot be easily assigned elsewhere.

Avatars - No prizes for this one; the time spent rendering each avatar, and we'll examine that more closely in a bit. 

Swap - Swap is the most obscure stat here, it is uncompromisingly technical. I'll put a footnote to explain it for those who care.

What we note is that in our scene, a relatively simple skybox venue, almost 20% (around 14ms) of our time was spent drawing the static mesh and prims that we see. Meanwhile. the avatars took a whopping 72% (50ms).

The nearby avatar list

Below this, we see the list of "nearby avatars". This list is built based on the draw distance and as such includes avatars that may be out of your direct line of sight. 



Here we can see that there were 23 avatars being "handled" by my viewer, and we shift into another "scary unit" the microsecond (µs). A microsecond is a millionth of a second a tiny amount of time, used here to allow for the very wide range of rendering times different avatars can consume.

In this snapshot we see that our most complex avatar took more than 4,800 microseconds, this is almost 5 milliseconds. That single avatar (out of 23) was almost 10% of all the avatars combined. 

Furthermore, we see the fallacy of ARC writ large (and this was not contrived, but a pure chance) the lowest ARC, the avatar we'd traditionally have considered "the best behaved" is in fact the slowest to be drawn. Meanwhile, we have "Gavrie", who would be second from the top based purely on ARC, showing as one of the more efficient avatars.

We can now choose to manually derender the worst offenders, by right-clicking, or use the "Maximum render time" slider at the top to set a cap of our choosing.

So how is Avatar Render Time determined? 

To determine the time we take measurements during the rendering process combining these to find the total time dedicated to drawing each avatar. This number varies a little from frame to frame, it will change when you look away from or zoom in to, a given person due to the way that the viewer reduces the textures and geometry being drawn, as such we cannot say that "Gavrie" is always the better behaved, just that in recent frames their render time has been significantly lower.  What we learn from this is that, removing/reducing the cost of a given avatar will save us up to that amount of time next frame and in turn increase our frame rate.

There are about 17ms consumed by the top 5 avatars. If we were to remove them completely that would reduce our frame time accordingly from 70ms to 53ms. or roughly 19FPS. an almost 25% increase.

What drives the render avatar time?

A large proportion of the time is attachments. Don't forget that for the vast majority of us our bodies, heads, hair, hands and feet are all attachments. In my previous blogs, I have illustrated the outright cost of these. There is also an underlying cost of just being an avatar, ironically your invisible basic system body has a cost of rendering that is higher than I had expected (even though it is not there most of the time) but for now at least those are not something we can influence; therefore, we need to look at our attachments and see what we can do to improve ourselves.

The performance floater has an "attachments" view.


Here we can assess whether we are doing the best we can to keep lag down? If I were showing nearer the top of the avatar list I might want to review my attachments and see whether I could remove any of the higher cost ones.

It is worth noting that once again the familiar ARC number is misleading. Faced with the ARC alone I might change my hair and leave the harness straps on my legs.


In fact, those straps are slightly more expensive than the hair in spite of being 1/10th of the alleged complexity.

So does the render time tell us everything we need to know?

Not really, no. What the render time is telling us is long this avatar or this attachment took to draw on our screen. It is a personalised measurement, and for FPS improvement, that is what we need to know. 

Of course, if we are not actually looking at the avatar, or the avatar is far away, then the cost will drop. You can see this in the attachment view by simply camming away from yourself. 

Importantly, the render cost that I see will differ from the render cost you see. Even if we share camera viewpoint and graphics settings, the difference between my PC and your PC will dictate how similar (or not) our results are.

Render Time addresses:

  • What (or more typically who) is causing us to slow down. 
  • What items we are wearing that might slow us and others down.
  • Highlighting the effect of settings changes (shadows on/off, water reflection, draw distance)

It does not (directly) address:-

  • The total lack of accountability in SL for what we wear.
    It is not usable by region owners to restrict entry and manage the lag on their estate.
  • The non-rendering impact of complex assets.
    There can be an overhead, in preparing complex assets, transmission time on the network, time to unpack and validate the asset, time to fetch things such as textures. These all happen over the course of many frames.
  • It does not (easily) help creators know "up-front" what the performance of their creation might be.
    Creators have become adept at "optimising their creations to lower ARC. Sadly that is frequently a red-herring. ART meanwhile cannot tell them exactly how an item will render on their customer's machines. It can, however, more accurately guide their decisions. 

In summary

This new feature is intended to guide you in optimising your attachments and understanding good avatar content. You will be able to make informed decisions about your outfits and things that you are demoing. It will also allow you to identify those people who are causing your FPS to take a dive, allowing you to defender them. 

The feature will launch alongside another experimental feature "auto-tuning". A means of letting the viewer manage your settings in hope that it can maintain a higher FPS for you. I'll discuss the auto-tuning feature in another post. At the moment the two are closely linked but as they mature it might make sense to separate them. 

Footnote - Swap

Swap is the time spent by the open GL drivers in what is known as "SwapBuffer". When rendering the scene the viewer draws to an offscreen canvas and once finished will "swap" this with the current display canvas, allowing us to draw one while showing the other. The reality is more nuanced than this with a lot of the draw commands occurring during the swap process and a long swap time can indicate that you are GPU bound (also commonly known as fill bound) with the CPU having to wait for the previous frame to finish before it can proceed. At present this is not that common, certain settings will exacerbate matters. Heavy texture load, very high shadow quality, low GPU VRAM can all play a part. 

Friday 19 November 2021

I don't wanna Mesh with nobody

OK, I had not planned on this blog but a forum post raised that ugly spectre of LOD Factors and, in the light of a few things that have happened recently, I thought it would be worth a "quick" (yeah right, like Beq does quick) update post. 

A few months ago I posted a couple of blogs calling upon the mesh body makers to give us more options of low render overhead bodies. Demonstrating (in extreme terms) the true cost of those individual "alpha-cut" segments on your body, and your poorly optimised rigged hair. By way of illustration here is a couple of screenshots.

There are a couple of things to note before you look

1) Yes, I am running on a stupidly high spec machine. I am also standing in a deliberately uncluttered space. Ignore the FPS 

2) This is on a version of Firestorm that is at least one release away. It incorporates some of the impressive performance changes made by the Linden Lab team alongside some of our own tweaks. Right now, you can test the LL version by going to the project viewer download page on secondlife.com. It has bugs, many of them, but by heck, they are fast bugs and the more you can find now, the faster we get our hands on the goodies.

3) What you are seeing here is a new floater (also building on the initial work done at the lab) that will be part of the very next Firestorm release. A proper discussion of this is for another blog on another day, but what you see in the screenshot is the time in microseconds that each of my avatar's attachments is taking to render. On a less powerful machine, these numbers will be considerably higher, but the relative magnitudes will remain broadly the same.

The first image is my regular avatar. Beq is wearing a gorgeous Lelutka EVOX head, a zippy Slink Redux Hourglass body (with the petite augment), some arse kicker boots for good measure and a bunch of other bits and pieces. The Redux Slink body is notable for having a very limited number of "segments".


Meanwhile....

Beq slips on a demonstration copy of a well-known very popular and very segmented body...


What this is showing is that the segmented bodies are significantly slower to draw than the unsegmented ones. Note too that while I happen to be demonstrating the Maitreya body here. It is certainly not the worst out there (far from it) it is, however, the most prevalent and the one that I hope beyond all other hope that the creator will provide a cut free option. Many new bodies now require alpha layers and as such the push back from creators against supporting BOM properly due to the "extra work" alphas required is no longer credible.

Soon (though in experimental form, as I want to get it into your hands to find out where the rough edges are) the feature shown here, will be in Firestorm and you'll be able to assess the rendering cost of your outfit and by implication the impact it has on others.

My original blog was really a plea to body makers to give us options, alongside their segmented models, to provide a weight compatible uncut version just so that we have the choice and don;t have to give up a hard-earned wardrobe to be a good citizen. Siddean at Slink did exactly this. The redux models are shipped with the original models making the arguments of "oh but this tiny niche requirement means I must have alpha cuts" a moot point.

Since those blogs were written back in the summer-time, we've seen a few things happen. Inithium, the makers of the increasingly popular Kupra body range, all of which have no cuts and perform well, is launching a male body that is also without cuts (uncut meaning something quite different in the male space ;-) ). Siddean Munroe, creator of the Slink bodies has launched an entirely new product range Cinnamon and Chai, two bodies both of which are minimally cut and in my tests perform at least as well as the redux. 

It can cost you nothing too

In my first write up I inadvertently overlooked the entirely, 100% free option too. The open-source Ruth2 mesh body which has an uncut BOM only version. You can read all about Ruth2 and the her male counterpart  Roth2 on Austin Tate's blog and the bodies are available on the market-place. 

In summary

Your choices for higher-performing bodies are increasing. Your CPU and more importantly your friends' CPUs will thank you for it. And watch out, not all new bodies, (or reborn old bodies) are improving matters, with creators rolling out new versions of problem bodies.



A point of clarification is needed too. While the cost of rendering the body is high other attachments all add up and the basic cost of just being an avatar is also a factor so in the real world you can't have 5 or 6  Beq's for every Sliced body in your nightclub, but you can ultimately have more Beq's or just better FPS.

Looking ahead

Perhaps more significant for performance (though this is very tentative and it remains to be seen in practice) is news from Runitai Linden at yesterday's content creator user group meeting, that he is hoping to have some changes that directly address a large part of the problem that heavily sliced mesh bodies cause. My other blogs explain the concept of "batching" and drawcalls, Runitai's changes will (hopefully) bring improved batching to rigged mesh, cutting down the number of draw calls required. This is very early days and Runitai cautioned that the changes make many shortcuts and assumptions that may not survive contact with the real world. 

I have everything from fingers to toes crossed in the hope this can happen. I would still urge you all to consider a low slice body next time you have a choice, request one from your favourite body maker or at the very least defer that purchase if the creator is not offering an option and take another look at the other offerings. While Runitai's rendering gymnastics may pull us out of the fire, we should never have been in there in the first place and with the hope that these changes will help a lot and make the difference less extreme, it will likely remain the case that a segmented body is just harder to draw and an unwanted burden.


Love 

Beq

x

Monday 13 September 2021

Find me some body to love...benchmarking your lagatar.

This is essentially part 2 of the "why we need to get rid of the segmented bodies." blog.

Hypothesis - Mesh segmentation leads to significant rendering performance issues.

Before we start, just a heads up, this part is the data dump. It's all about the process of gathering data. As such it is somewhat less accessible than the last one. 

Still here? Enjoy.

A few months ago, I decided to quantify the real cost of sliced up bodies. Initially, I did some simple side-by-side tests in-world.

The first attempts were compelling but unsatisfactory. Using an alt, I ran some initial tests in a region in New Babbage that happened to be empty. I de-rendered everything, then had Beq TP in wearing my SLink Redux body. I recorded for a few minutes, sent her away, let things return to normal, then had Liz TP in wearing her Maitreya body.

The results were quite stark. Running with just my Alt alone (the baseline), I saw 105 FPS (remember, this is basically an empty region). With SLink Redux, it dipped a little then recovered to 104FPS. With Maitreya, it dropped to 93FPS.

So this was a good start, but I wanted something a bit more robust and repeatable. Moreover, I wanted to test the principle. This is not about pointing out "X body is good, and Y body is bad"; it is about demonstrating why design choices affect things.

I needed to test things rigorously and in isolation. This meant using a closed OpenSim grid where I had full control of any external events. It also meant I needed to get test meshes that behaved the same way as proprietary bodies. 

Testing proprietary bodies against one another is problematic. 

  1. They won't rez (typically). You need to get lots of friends with specific setups.
  2. If they did rez, they are mostly too complex for SL Animesh constraints (100K tris)
  3. Bodies vary in construction, # meshes, # triangles, with and without feet etc. Making it less clear what is driving the results.
  4. Being proprietary, you can't test outside of SL either, of course, which means you are then exposed to SL randomness (people coming and going - I don't have the luxury of my own region) 

So I asked my partner Liz (polysail) to make me a custom mesh that we could adapt to our needs, and thus SpongeBlobSquareBoobs was born.



"SpongeBlob" is a headless rigged mesh body that consists of 110,000 triangles. Why 110K? It is the upper end of what can be uploaded into SL/OpenSim, given the face/triangle/vertex limits. Body triangle counts are harder to average because some have feet/hands attached; others do not. Another reason why we wanted to have a completely independent model.

The coloured panels shown in this photo are vertex colours (i.e. not textures) randomly assigned to each submesh. This picture is most likely the 192 mesh x 8 face "worst case" test model. We used no textures so that the texture bind cost was not part of this test (that's a different experiment for another day, perhaps)

The single most important fact to keep in mind when you read through this data is:

    Every single SpongeBlob is the same 110K triangles. They vary only by how they are sliced.

Apparatus and setup

So if SpongeBlob gives us the "Body Under Test" (BUT), what are we testing with?

Data Recording

The data is recorded using Tracy, a profiling tool available to anyone who can self-compile a viewer. It works by recording how long certain code sections take (much like the "fast timers" you see in the normal viewer's developer menu). This data gets streamed to a "data capture program" that runs locally (same machine or same LAN). The capture program or another visualiser tool can then be used to explore the data. I recorded things like the DrawCall time, though once we understand how the pipeline works, all we really need is the FPS, as I'll explain later, so you could use any FPS tool if you want to try this yourself in a simpler form.

Environment and noise control

The accuracy of the tests relies on removing as much noise as we can. We all know that SL framerates are jittery, so we do our best to stabilise things by removing as much untested noise as possible. 

To this end, I used an OpenSim system (I used the DreamGrid windows setup as it is extremely quick and easy to set up). With my own OpenSim grid, running on an older PC, I created a 256x256 region with no neighbours. This means I have an SL-like region size, and I have removed any potential viewer overhead of managing connections to multiple regions.

The region was left empty, no static scenery was used, meaning that the region rendering overhead was constrained pretty much to just the land, sea and sky.

Settings

The plan was to record using several different machines of varying capabilities, so I made sure to keep the settings as similar as possible across those. 

We are interested in the rendering costs of different body "configurations", and these are only comparable in the same context (i.e. on the same hardware). Still, we'd like to look for trends, similarities, and differences across different hardware setups, so I tried to ensure that I used the same core settings. The key ones are as follows:-

FPS limiting off - clearly...

Shadows (sun/moon & local) - This deliberately increases the render load and helps lift the results above the measurement jitter.

Midday - Are there implications if the shadows are longer? Let's avoid that by fixing the sun location.

Max-Nonimposters - unlimited. This ensures we don't impostor any of the tests.

ALM on - we want materials to be accounted for even though we are not using them. It ought not affect our results, really.

Camera view - I needed to ensure that I was rendering the same scene. To achieve this, I used a simple rezzing system that Liz and I have tweaked over time. It uses a simple HUD attachment on the observer that controls the camera. A controller "cube" sends a command to the HUD telling it where to position the camera and what direction to point in. 

Test Setup

Each test involves rezzing a fixed set of BUTs (16) in a small grid. These cycle through random animations. The controller cube that is used to position the camera is also responsible for rezzing the BUTs. Every time the cube is long-clicked, it will delete the existing BUTs and rez the next set.

Each avatar model is an Animesh. This full test cannot be run in SL due to the Second Life limit of 100K triangles. Using Animesh removes any other potential implications to the rendering caused by being an actual avatar agent.

This is a typical view being recorded.


Consistency and repeatability

It was important to remove as many errors as possible, so scripting things like the rezzing and camera made a lot of sense. We also made sure that the viewer was restarted between each test of a given BUT.

Tests were run for at least 5 minutes, and I would exclude the first 2 minutes to ensure that all the body parts had been downloaded, rezzed and cached as appropriate. There are implications to the slicing of bodies that alter the initial load and rendering time (you see this with the floating clouds of triangles when you TP to a busy store/region), but this is not what we are testing.

Hardware

Running the tests on a single machine tells us that the findings apply to that machine, and within reason, we can extend the conclusion across all machines in the same or similar class. But, of course, in Second Life, we have a wide range of machines and environments. So it was important to us to get as much data as we could. 

We thus ran the tests across various machines that we have access to. 
As a developer, most of my machines, even older ones, tend to have been "high end" in their day. So we should note that potential bias when drawing conclusions.

Here is the list of hardware tested along with the "Code names."


Methodology and Test Runs

Using the above setup, we would run through a specific set of configurations. Those were as follows.


The baseline test is simply an empty scene. Thus we establish the cost of rendering the world and any extraneous things; this includes any cost to having the observing avatar present.

You can see that every mesh has the same number of triangles but is split into more and more objects. Once we reach 192 objects, we continue scaling using multiple texture faces (thus creating submeshes). 

I will include in an appendix a test that shows the broad equivalence of submeshes versus actual meshes. There is no appreciable benefit to one as opposed to the other in these tests (there may be other implications we do not investigate)

By changing the number of meshes and faces, we are scaling up the number of submeshes that the pipeline has to deal with and thus the number of drawcalls. If you remember the analogy I gave in the first part of this blog, you'll recall that the hypothesis is that the process of parcelling up all the contextual information for drawing a set of triangles far outweighs the time spent processing the triangles alone.

If this hypothesis is correct, we will see a decline in FPS as the number of submeshes increases. As we reduce the number of triangles in each call, we also demonstrate that the number of triangles is not dominant. 

Results

So what did we find?

The first graph I will share is the outright FPS plotted against the total submeshes in the scene.



This graph tells us a few things. 
1) The raw compute power of a high-end machine such as the "beast" is very quickly cut down to size by the drawcall overhead.
2) That the desktop machines with their dedicated GPUs have a similar profile
3) The laptops, lacking a discrete, dedicated GPU, are harder to see.

If we normalise the data by making the FPS a percentage of the baseline FPS for that machine, we will rescale vertically and hopefully have a clearer view of the lower end data.


This is very interesting (well, to a data nerd like me). 
We can see that the profiles of all the machines tested are similar, suggesting that the impact is across the board.
We can also see that the laptops continue to be segregated from the desktops. The impact of the drawcalls, while pronounced and clearly disruptive, is not as extreme as for the dedicated GPUs. This would seem to support the hypothesis that those machines with onboard graphics are additionally penalised by the triangles giving the graph that vertical offset from the rest. As we have not explicitly measured this, we cannot draw too much from this, but there is clearly pressure on those less powerful machines. 

What may be surprising to some and is certainly interesting is that all the desktops are impacted similarly. The shiny new RTX3070TI suffers just as much as the rather ancient GTX670. What we get is more headroom on the modern card. 

The next graph is really another interpretation of the same FPS data. Now though, we are looking at the frame time as opposed to frames per second. To illustrate this, to achieve 25FPS, we have a time budget of 1/25th of a second per frame. We tend to measure that in milliseconds (where a millisecond is 1/1000th of a second); thus, 25fps requires us to render one entire frame every 40 milliseconds (ms).



Here we can see the anticipated trend quite clearly. 

What did we expect?

If the cost of a drawcall dwarfs the cost of triangles, then every extra drawcall will add a more or less fixed cost to the overall frame time. If the triangle count were to have a stronger influence, we'd see more of a curve to the graphs as the influence of the triangles per draw call decreases along with their number.

The drawcall is the dominant factor though interestingly, we see some curvature on the laptop plot.

The curve we see in "Liz's laptop" is initially convex; is this what we expected? Probably so. If the total drawcall cost is the time spent packing the triangles (T) plus the time spent on the rest of the drawcall overhead (D), then initially T+D is steep, but as T decreases and D remains more or less static, we go back to the linear pattern. We can also see a slight kink, suggesting that we may have a sweet spot for this machine where the number of triangles and the drawcall work together optimally.

We see other slight kinks in other graphs. We need to be careful of over-analysing, given the limited sample points along the horizontal axis and those error bars that show quite a high degree of variance in the laptop frames.

Conclusions

Let's use our table from the last blog to examine the typical mesh count for current bodies in use.
BodyTotal facesaverage visible faces# times slower than best in class (higher is worse)
Maitreya Lara30423012.78
Legacy147134018.89
Belleza Freya111619010.56
SLink HG redux149301.67
Inthium Kupra83181.00
Signature Geralt9033708.22
Signature Gianni11594319.58
Legacy Male10461743.87
Belleza Jake9074018.91
Aesthetic2052054.56
SLink Physique BOM97451.00


The implication is clear. A body that has ten times the number of submeshes will take more or less ten times as long to render. However, we do not walk around as headless naked bodies (well, most of us don't - never say never in SL), but we need to be far more aware of the number of submeshes in the items we wear. After your body, the next biggest offender is very likely to be your hair. There are many, often very well known, makes of hair that have every lock of hair as a separate mesh. 

We need proper, trusted guidance and tools.

Ultimately, there are choices to be made, and the biggest problem here is not the content; it is the lack of good advice on making that content. Where is the wiki page that explains to creators that every submesh that they make adds overhead? 

This is ultimately not the creators' fault; it comes back to the platform's responsibility, inadequate guidance and enforcement, and incorrect metrics (yes, ARC . I'm looking at you!). 

Definitions:

BUT: Body Under Test, The specific configuration of our model that is subject to this test.

FPS: Frame Per Second, how many times per second a screen image is rendered. Too slow, and things are laggy and jittery. People get very wrapped up in how many FPS they should/could get. In reality, it depends on what you are doing. However, you'd like to be able to move about with relative smoothness. 

Jitter/noise: These are different terms for essentially the same thing, inaccuracies in the measurements that cannot be corrected. Noise and Jitter are variances introduced by things outside of the actual measurement. FPS is a noisy metric, it varies quite wildly from frame to frame, but when we average it over a few, it becomes more stable. 

Appendix A: Is invisible mesh hurting my FPS?

I mentioned in the last blog that the concerns over invisible mesh were largely over-hyped, in large part due to an optimisation introduced by TPVs courtesy of Niran.

To test this, I set half of the faces of a 192x8 body to be transparent and ran a benchmark. I then ran the same benchmark with a 192x4 body. In theory, they should be practically the same.


Results: 

No, as we had hypothesised, there is no perceivable difference at this level between the two. As noted in the earlier blog, we are just measuring the direct rendering impact. There are other indirect impacts, but for now, they are a lesser concern.

Appendix B: Which is better, Separate meshes or multiple faces?

To test whether there was any clear benefit between breaking a mesh up into multiple faces or multiple objects, I ran benchmarks against three models that equated to the same number of submeshes passing through the pipeline. 
96x2 48x4 and 24x8.



Results:

As can be seen, there is no clear benefit. The raw numbers would suggest that the 96x2 was slightly slower. That would be plausible as there is an expectation of an object having a higher overhead in terms of metadata and other references, but two factors weaken this. 
1) The error bars - the variance in the measurements places the numbers close enough for there to be reasonable doubt over any outright difference. 
2) The 24x8 is slower than the 48x4. Once again, well within the observed variance, but it casts further doubt on any argument that there is a significant measurable difference. 

This may be something that I look at again to see if there is a more conclusive way of conducting this experiment. For the purposes of this blog, which is for determining whether the construction choices affect the overall performance, it is quite clear that it is the number of submeshes and not their organisation that is the driver.