Feal87's Blog

Just another programming weblog.

Archive for June, 2009

Profiling your engine

Posted by feal87 on June 20, 2009

A little preface. In these days I’ve been quite occupied with exams (I have another exam the 24 of June) and I haven’t had enough time to keep working on the game engine and at the same time updating this blog with new material.
This post will talk about one of the areas I’ve worked in this little “break”, that is basically the profiling of your game engine. (I’ve also worked into recreating the Visual Novel Reader sub-engine, but that’s a story for another time.)

Before starting, let’s define what’s profiling. Profiling is the process of extrapolation of information about something. In our case we extrapolate the information about the speed of the execution of our routines. We have quite a lot of ways to profile an application:

1) Use a profiler application like YourKit .Net Profiler.
2) Inject your own code with a personal profiler.
3) Use a personal created performance test framework.

The first way, the most common one is by using a profiler application that will record times and number of invocations for each and every methods in your whole application. This is the first and most important way to profile the performance of your application (CPU Wise). Remember if you are using C# as language to profile with the executable launched from outside the development environment otherwise you’ll not get some optimizations from the JIT compiler (like property inlining) and you’ll get bad ideas about what’s slowing down your application.

The second way, (one I personally discourage) is done by injecting your own code with checks to record times and number of invocations. (It is not flexible and will make your code quite complex)

The third way is to use a personal created test framework to test little self-sustaining pieces of code and compare their own speed. This is what I’ve prepared in these days.
The reason behind the creation of this little framework was because I had to test a series of codes and tell which one is faster, but a profiler application to do this (these routines are VERY fast and called rarely) was not a very adeguate solution.

The idea was to have a series of classes, each one focused in testing one functionality. The classes this way created would be forwarded to a Batch management class that will handle the batch execution.

After some hours of work, here’s the result :

Test Framework

Test Framework

The framework produced was divided into two major area, the ITest/Test area and the TestBatcher area.
The first area is the area defining the base classes all tests must derive from.
The test class is composed of :

1 ) Filename – The filename where to save the results obtained by the calculation
2 ) NumeroEsecuzioni – The number of executions of each test code inside the Test class.
3 ) Results – Contains all the results obtained from an execution of the tests in milliseconds.
4 ) TestCodes – A list of delegates that contains the codes to be tested against each other.
5 ) Destroy() – Destroy every resource initialized and used by the test.
6 ) ExecuteTests() – Execute all the tests NumeroEsecuzioni times and save the best results obtained for each one of them.
7 ) Initialize() – Initialize all the resources needed by the test at runtime.
8 ) WriteResultToFile() – Save the results to a text file.

The second area is the area defining the TestBatcher class, a class that allows batching of multiple Test class execution.
It is composed of :

1 ) AddTest(Test tst) – Add a new test to the batch system
2 ) ExecuteBatch() – Execute all the tests queued in the system
3 ) Initialize() – Initialize the batch system.

With this simple framework you can test and know how long it takes to do some kind of operation and use the fastest method.
This is an example test class i’ve used to know what was faster between Map() and UpdateSubresource() in DirectX10 with SlimDX to update a resource buffer :


/*
* Copyright (c) 2009 Ferreri Alessio
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using SlimDX;
using System.Drawing;
#if DX9
using SlimDX.Direct3D9;
#else
using SlimDX.Direct3D10;
using GameLibrary;
#endif

namespace TestFramework
{
    public class TestUpdateSubresourceVsMap : Test
    {
#if DX10
        private SlimDX.Direct3D10.Buffer BufferToUpdateMap;
        private SlimDX.Direct3D10.Buffer BufferToUpdateUpdateSubresource;
        private Matrix WorldMatrix;
        private Single[] FinalMatrix;
        private byte[] FinalArray;

        private Device device;

 

        public override void Initialize()
        {
            base.Initialize();

            device = Program.gc.Manager.Device;

            BufferToUpdateMap = new Buffer(Program.gc.Manager.Device, 64,
                                           ResourceUsage.Dynamic, BindFlags.VertexBuffer,
                                           CpuAccessFlags.Write, ResourceOptionFlags.None);
            BufferToUpdateUpdateSubresource = new Buffer(Program.gc.Manager.Device, 64,
                                                         ResourceUsage.Default, BindFlags.VertexBuffer,
                                                         CpuAccessFlags.None, ResourceOptionFlags.None);

            WorldMatrix = Matrix.Identity;

            FinalMatrix = new Single[16];
            FinalMatrix[0] = WorldMatrix.M11;
            FinalMatrix[1] = WorldMatrix.M21;
            FinalMatrix[2] = WorldMatrix.M31;
            FinalMatrix[3] = WorldMatrix.M41;
            FinalMatrix[4] = WorldMatrix.M12;
            FinalMatrix[5] = WorldMatrix.M22;
            FinalMatrix[6] = WorldMatrix.M32;
            FinalMatrix[7] = WorldMatrix.M42;
            FinalMatrix[8] = WorldMatrix.M13;
            FinalMatrix[9] = WorldMatrix.M23;
            FinalMatrix[10] = WorldMatrix.M33;
            FinalMatrix[11] = WorldMatrix.M43;
            FinalMatrix[12] = WorldMatrix.M14;
            FinalMatrix[13] = WorldMatrix.M24;
            FinalMatrix[14] = WorldMatrix.M34;
            FinalMatrix[15] = WorldMatrix.M44;

            FinalArray = new byte[64];

            TestCodes.Add(new Action(delegate
            {
                DataStream ds = BufferToUpdateMap.Map(MapMode.WriteDiscard, MapFlags.None);
                ds.WriteRange(FinalMatrix, 0, 16);
                BufferToUpdateMap.Unmap();
                ds.Dispose();
            }));

            TestCodes.Add(new Action(delegate
            {
                unsafe
                {
                    ByteConverter.WriteSingleArrayToByte(ref FinalMatrix, ref FinalArray, 0);
                    fixed (byte* arr = FinalArray)
                    {
                        device.UpdateSubresource(arr, 64, 64, BufferToUpdateUpdateSubresource, 0);
                    }
                }
            }));

 

            NumeroEsecuzioni = 1000;
            Filename = AppDomain.CurrentDomain.BaseDirectory + "ResultsUpdateSubresourceMap.txt";
        }

        public override void Destroy()
        {
            base.Destroy();
            BufferToUpdateMap.Dispose();
            BufferToUpdateUpdateSubresource.Dispose();
        }
#endif
    }
}

By the way, it is faster to use the new Updatesubresource i’ve added to SlimDX rather than using Map/Unmap. Here’s the results on my notebook.

0,004050794 milliseconds with Map()/Unmap()
0,003422223 milliseconds with UpdateSubresource()

I hope this reading has been useful to you,
See ya 😉

Advertisements

Posted in General | Tagged: , , , , | 2 Comments »

Remake of ID3DX10Font / ID3DXFont

Posted by feal87 on June 1, 2009

Continuing from where I left at the previous post, I’ll talk about another necessary remake I’ve done for my game engine. As you probably know, the D3DX library contains other than the sprite drawing class, a text drawing class called ID3DX10Font / ID3DXFont. It is based over GDI and use the Sprite class for drawing the text.

As I have deleted the sprite class from my engine references thanks to the SpriteAlternative class, I needed an alternative for the text writing class. Another reason of the remake was that for unknown reason the results of the SAME identical text drawed with the DirectX9 and DirectX10 class was quite different in style.
I had quite a few ways to implement the class :

1) Using GDI/GDI+ create a texture with all the gliphs needed by the text I’m writing and write the characters one by one using different texture coordinates.
2) Using Precreated Bitmap Fonts
3) Using GDI+ to create the textures with the whole text needed and use a cache system to prevent useless recreation of the same texture.

The first method is the same as the basic class offered by D3DX, while it is quite performant, it use a LOT of CPU power (at least on my Core 2 Duo 2.00 ghz) and it is quite intricated (lots of directx calls).
The second method is the method most used by games all over the world, but it not very flexible. Having a different bitmap font for each type of size/style/font is nice if you have to create a particular game and stops there, but with an engine is better to give flexibility even at the cost of speed sometimes.
The third method, the one i choose, is VERY good for static text and still optimal for changing text. Other than that, this method have a very low CPU usage. (the graphics is generated once and just drawed the rest of the times just like any texture with the SpriteAlternative class).

After lots of brainstorming and work the FontAlternative class was born.

Font Alternative Class Diagram

Font Alternative Class Diagram

I’ve created 2 classes, the first, CachedText, is the class that contains the details of a text cached in the system, the second, FontAlternative, is the actual font class.
Let’s analyze the members of the two classes :

CachedText :

1) Colore – Color of the text.
2) Dimensioni – Size of the rectangle where i’m writing.
3) Font – Font using for the drawing. (System.Drawing.Font)
4) Text – Text to draw.
5) Texture – Texture ready to be drawed.

FontAlternative :

1) CacheTesti – A dynamic cache that contains a series of texts already drawed by the system and ready to be drawed again.
2) ClearCache – Clear the texts cache.
3) MeasureString – Return the size of the rect needed to draw a string with a certain font.
4) UnloadContent – Clear the cache and dispose of the various resources used. (it is executed on the various situation where the device is lost/reset etc…)
5) Draw – Search a text inside the cache. If not found it create a new cache entry and then draws it. If found it just draws it.
6) DrawCached – Draw a passed cached text directly.

Font Alternative Usage

Font Alternative Usage

Using the new class is pretty simple :

1) First we start the SpriteAlternative engine which this class is based on.
2) Check if we have the reference to the cached text.
3a) If no, we call the FontAlternative.Draw() that search in its cache for the text and if not found create a new cache entry and returns it.
4a) Save the cache reference returned from the FontAlternative.Draw() function.
3b) Draw using the FontAlternative.DrawCached() that draws directly the texture.
5) Close the SpriteAlternative engine.

The results from this change? Well, while the speed is not changed that much (0.070 ms gain over a 3,5 ms application), the CPU gain was massive (over 15% of less CPU power used by the same app) and now I have full control over my application. (the only thing i still use of the D3DX default class library is the effects framework (and i have plans for that :D))
I suggest anyone starting to create an engine to have a shot in these remakes because they will save you lots of headcache later on.

I still don’t know what we’ll talk next time, but I hope you’ll anyway look forward to it.
See ya 😉

Posted in General | Tagged: , , , , , | 12 Comments »