Feal87's Blog

Just another programming weblog.

Archive for the ‘General’ Category

Data Synchronization between Tasks

Posted by feal87 on November 4, 2009

Little preface, the exam has gone very well (30/30), but I had some little “problems” these months that kept me from posting on this blog. 😛

Now let’s go to the topic at hand. One of the most important things I’ve done when working on making my engine multithreaded was defining a way to let multiple threads synchronize their data efficiently and thread-safely. Be sure to have read the previous post on the multithread engine before starting here.

Before starting discussing the possible ways to work out this problem let’s ask ourself

“Why should tasks have data in common?”

Well, in the common scenario of a game we have an AI task, an update task and a physics task (excluding the obvious draw, audio, etc…). Each of these three task should give their results to each another. In this particular case the AI task gives data to the physics task that gives data to the update task.

There are several ways to achieve this. The most commons are these two :

1) Using synchronizations constructs to be sure that only one thread is accessing the resource that is updated.
2) Synchronizing the data at the end of the frame on the engine thread.

We skip althogether the first way cause its really inefficient due to continous synchronizations and have lots of other issues and go right at the second way by taking a look at an image explaining how it work :

Frame Synchronization

As you can see the idea is to make the threads work together, and update their data at the end of the frame. This way they do not need to wait for each other to finish and can use all the resources they want. This way with an accurate task division you can maximize the use of resources without creating over-complex structures.

Please remember that if the chain of waiting is very long you can have some slowdown between action and reaction in the game. In this example we have a chain of 3 threads, considering them synched at 60 FPS we have 3/60 (48 milliseconds) of second of wait before an action in the update cause a reaction in the draw.

How can we actually create this in an application?

Let’s see a scrap of how my old implementation was done. (i’m currently working at improving the implementation to be more extensible and fast, but the foundation concepts are the same so it doesn’t matter for us :))

Synchronization Manager

Synchronization Types

My solution works by having a Synchronization Manager for each engine.

A Synchronization Manager is an entity that manages what we can call a “Event List”, a list of events that NEEDS to be synchronized (like the reaction of a ball to the wind in a physics system) and actually do the synchronization at the end of the frame in each engine.

If some task need to share some resource it creates an event and get an ID that identify that event, the task can update the resource using the appropriate method of the Synchronization Manager.

The other tasks can subscribe to that event and receive at the end of each frame any modify that occurred on that resource (if any).

Simple isn’t it? 😉

I hope this reading has been useful to you,
That’s all from me for today,
See you soon with another article. 😉 

P.S. I think I’ll work out a detailed article on the Content Managing big rehaul I’ve done for my engine some weeks ago. I think it will be an interesting reading for all of you.

Posted in General | Tagged: , , , | Leave a Comment »

Profiling your engine

Posted by feal87 on June 20, 2009

A little preface. In these days I’ve been quite occupied with exams (I have another exam the 24 of June) and I haven’t had enough time to keep working on the game engine and at the same time updating this blog with new material.
This post will talk about one of the areas I’ve worked in this little “break”, that is basically the profiling of your game engine. (I’ve also worked into recreating the Visual Novel Reader sub-engine, but that’s a story for another time.)

Before starting, let’s define what’s profiling. Profiling is the process of extrapolation of information about something. In our case we extrapolate the information about the speed of the execution of our routines. We have quite a lot of ways to profile an application:

1) Use a profiler application like YourKit .Net Profiler.
2) Inject your own code with a personal profiler.
3) Use a personal created performance test framework.

The first way, the most common one is by using a profiler application that will record times and number of invocations for each and every methods in your whole application. This is the first and most important way to profile the performance of your application (CPU Wise). Remember if you are using C# as language to profile with the executable launched from outside the development environment otherwise you’ll not get some optimizations from the JIT compiler (like property inlining) and you’ll get bad ideas about what’s slowing down your application.

The second way, (one I personally discourage) is done by injecting your own code with checks to record times and number of invocations. (It is not flexible and will make your code quite complex)

The third way is to use a personal created test framework to test little self-sustaining pieces of code and compare their own speed. This is what I’ve prepared in these days.
The reason behind the creation of this little framework was because I had to test a series of codes and tell which one is faster, but a profiler application to do this (these routines are VERY fast and called rarely) was not a very adeguate solution.

The idea was to have a series of classes, each one focused in testing one functionality. The classes this way created would be forwarded to a Batch management class that will handle the batch execution.

After some hours of work, here’s the result :

Test Framework

Test Framework

The framework produced was divided into two major area, the ITest/Test area and the TestBatcher area.
The first area is the area defining the base classes all tests must derive from.
The test class is composed of :

1 ) Filename – The filename where to save the results obtained by the calculation
2 ) NumeroEsecuzioni – The number of executions of each test code inside the Test class.
3 ) Results – Contains all the results obtained from an execution of the tests in milliseconds.
4 ) TestCodes – A list of delegates that contains the codes to be tested against each other.
5 ) Destroy() – Destroy every resource initialized and used by the test.
6 ) ExecuteTests() – Execute all the tests NumeroEsecuzioni times and save the best results obtained for each one of them.
7 ) Initialize() – Initialize all the resources needed by the test at runtime.
8 ) WriteResultToFile() – Save the results to a text file.

The second area is the area defining the TestBatcher class, a class that allows batching of multiple Test class execution.
It is composed of :

1 ) AddTest(Test tst) – Add a new test to the batch system
2 ) ExecuteBatch() – Execute all the tests queued in the system
3 ) Initialize() – Initialize the batch system.

With this simple framework you can test and know how long it takes to do some kind of operation and use the fastest method.
This is an example test class i’ve used to know what was faster between Map() and UpdateSubresource() in DirectX10 with SlimDX to update a resource buffer :


/*
* Copyright (c) 2009 Ferreri Alessio
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using SlimDX;
using System.Drawing;
#if DX9
using SlimDX.Direct3D9;
#else
using SlimDX.Direct3D10;
using GameLibrary;
#endif

namespace TestFramework
{
    public class TestUpdateSubresourceVsMap : Test
    {
#if DX10
        private SlimDX.Direct3D10.Buffer BufferToUpdateMap;
        private SlimDX.Direct3D10.Buffer BufferToUpdateUpdateSubresource;
        private Matrix WorldMatrix;
        private Single[] FinalMatrix;
        private byte[] FinalArray;

        private Device device;

 

        public override void Initialize()
        {
            base.Initialize();

            device = Program.gc.Manager.Device;

            BufferToUpdateMap = new Buffer(Program.gc.Manager.Device, 64,
                                           ResourceUsage.Dynamic, BindFlags.VertexBuffer,
                                           CpuAccessFlags.Write, ResourceOptionFlags.None);
            BufferToUpdateUpdateSubresource = new Buffer(Program.gc.Manager.Device, 64,
                                                         ResourceUsage.Default, BindFlags.VertexBuffer,
                                                         CpuAccessFlags.None, ResourceOptionFlags.None);

            WorldMatrix = Matrix.Identity;

            FinalMatrix = new Single[16];
            FinalMatrix[0] = WorldMatrix.M11;
            FinalMatrix[1] = WorldMatrix.M21;
            FinalMatrix[2] = WorldMatrix.M31;
            FinalMatrix[3] = WorldMatrix.M41;
            FinalMatrix[4] = WorldMatrix.M12;
            FinalMatrix[5] = WorldMatrix.M22;
            FinalMatrix[6] = WorldMatrix.M32;
            FinalMatrix[7] = WorldMatrix.M42;
            FinalMatrix[8] = WorldMatrix.M13;
            FinalMatrix[9] = WorldMatrix.M23;
            FinalMatrix[10] = WorldMatrix.M33;
            FinalMatrix[11] = WorldMatrix.M43;
            FinalMatrix[12] = WorldMatrix.M14;
            FinalMatrix[13] = WorldMatrix.M24;
            FinalMatrix[14] = WorldMatrix.M34;
            FinalMatrix[15] = WorldMatrix.M44;

            FinalArray = new byte[64];

            TestCodes.Add(new Action(delegate
            {
                DataStream ds = BufferToUpdateMap.Map(MapMode.WriteDiscard, MapFlags.None);
                ds.WriteRange(FinalMatrix, 0, 16);
                BufferToUpdateMap.Unmap();
                ds.Dispose();
            }));

            TestCodes.Add(new Action(delegate
            {
                unsafe
                {
                    ByteConverter.WriteSingleArrayToByte(ref FinalMatrix, ref FinalArray, 0);
                    fixed (byte* arr = FinalArray)
                    {
                        device.UpdateSubresource(arr, 64, 64, BufferToUpdateUpdateSubresource, 0);
                    }
                }
            }));

 

            NumeroEsecuzioni = 1000;
            Filename = AppDomain.CurrentDomain.BaseDirectory + "ResultsUpdateSubresourceMap.txt";
        }

        public override void Destroy()
        {
            base.Destroy();
            BufferToUpdateMap.Dispose();
            BufferToUpdateUpdateSubresource.Dispose();
        }
#endif
    }
}

By the way, it is faster to use the new Updatesubresource i’ve added to SlimDX rather than using Map/Unmap. Here’s the results on my notebook.

0,004050794 milliseconds with Map()/Unmap()
0,003422223 milliseconds with UpdateSubresource()

I hope this reading has been useful to you,
See ya 😉

Posted in General | Tagged: , , , , | 2 Comments »

Remake of ID3DX10Font / ID3DXFont

Posted by feal87 on June 1, 2009

Continuing from where I left at the previous post, I’ll talk about another necessary remake I’ve done for my game engine. As you probably know, the D3DX library contains other than the sprite drawing class, a text drawing class called ID3DX10Font / ID3DXFont. It is based over GDI and use the Sprite class for drawing the text.

As I have deleted the sprite class from my engine references thanks to the SpriteAlternative class, I needed an alternative for the text writing class. Another reason of the remake was that for unknown reason the results of the SAME identical text drawed with the DirectX9 and DirectX10 class was quite different in style.
I had quite a few ways to implement the class :

1) Using GDI/GDI+ create a texture with all the gliphs needed by the text I’m writing and write the characters one by one using different texture coordinates.
2) Using Precreated Bitmap Fonts
3) Using GDI+ to create the textures with the whole text needed and use a cache system to prevent useless recreation of the same texture.

The first method is the same as the basic class offered by D3DX, while it is quite performant, it use a LOT of CPU power (at least on my Core 2 Duo 2.00 ghz) and it is quite intricated (lots of directx calls).
The second method is the method most used by games all over the world, but it not very flexible. Having a different bitmap font for each type of size/style/font is nice if you have to create a particular game and stops there, but with an engine is better to give flexibility even at the cost of speed sometimes.
The third method, the one i choose, is VERY good for static text and still optimal for changing text. Other than that, this method have a very low CPU usage. (the graphics is generated once and just drawed the rest of the times just like any texture with the SpriteAlternative class).

After lots of brainstorming and work the FontAlternative class was born.

Font Alternative Class Diagram

Font Alternative Class Diagram

I’ve created 2 classes, the first, CachedText, is the class that contains the details of a text cached in the system, the second, FontAlternative, is the actual font class.
Let’s analyze the members of the two classes :

CachedText :

1) Colore – Color of the text.
2) Dimensioni – Size of the rectangle where i’m writing.
3) Font – Font using for the drawing. (System.Drawing.Font)
4) Text – Text to draw.
5) Texture – Texture ready to be drawed.

FontAlternative :

1) CacheTesti – A dynamic cache that contains a series of texts already drawed by the system and ready to be drawed again.
2) ClearCache – Clear the texts cache.
3) MeasureString – Return the size of the rect needed to draw a string with a certain font.
4) UnloadContent – Clear the cache and dispose of the various resources used. (it is executed on the various situation where the device is lost/reset etc…)
5) Draw – Search a text inside the cache. If not found it create a new cache entry and then draws it. If found it just draws it.
6) DrawCached – Draw a passed cached text directly.

Font Alternative Usage

Font Alternative Usage

Using the new class is pretty simple :

1) First we start the SpriteAlternative engine which this class is based on.
2) Check if we have the reference to the cached text.
3a) If no, we call the FontAlternative.Draw() that search in its cache for the text and if not found create a new cache entry and returns it.
4a) Save the cache reference returned from the FontAlternative.Draw() function.
3b) Draw using the FontAlternative.DrawCached() that draws directly the texture.
5) Close the SpriteAlternative engine.

The results from this change? Well, while the speed is not changed that much (0.070 ms gain over a 3,5 ms application), the CPU gain was massive (over 15% of less CPU power used by the same app) and now I have full control over my application. (the only thing i still use of the D3DX default class library is the effects framework (and i have plans for that :D))
I suggest anyone starting to create an engine to have a shot in these remakes because they will save you lots of headcache later on.

I still don’t know what we’ll talk next time, but I hope you’ll anyway look forward to it.
See ya 😉

Posted in General | Tagged: , , , , , | 12 Comments »

Scrapping out the ID3DX10Sprite / ID3DXSprite interface

Posted by feal87 on May 31, 2009

One of the first things I’ve thought useful for my engine to have and ironically one of the last I’ve actually implemented is a remake of the default interfaces given by the extra DirectX library D3DX for drawing sprites. (2D images over quads in an orthogonal projection basically)

The reason driving me to remake these classes was about improving speed and increasing flexibility. While the default classes had the basic functionality to draw 2D images, it lacks support for custom shaders and other functionality needed in my domain. Another particular note that influenced my decision was the fact that the two classes for DirectX9 and DirectX10 had different behaviour and contract and I wanted an unified behaviour and contract for my applications.

The first thing I’ve done when I decide to start developing the classes was brainstorming. I suggest anyone that is starting a big change in their own code to think and write down what exactly do you want and how you want to make it. I figured out several ideas about how to organize the new classes :

1) Using a dynamic vertexbuffer and index buffer. Inside them adding new values each time a call is performed. Then executing several draw with different vertex offsets. (Basically the same thing that the default class does)
2) Using a dynamic vertexbuffer and index buffer and a special automanaging texture that contained all the textures to draw. When the end function is called it calls a single draw to draw everything.
3) Using a static vertexbuffer with a quad inside (trianglestrip to save memory) and draw multiple time with different world coordinates each time a call is maked.

I personally thought the first choice was quite inefficient, updating a lot of times the vertex/index buffers was not my style so I skipped thinking about this implementation.
The second choice was quite alluring, but automanaging a special texture was a quite difficult task and I didn’t know if there were any gains in speed. (I think I’ll give it a shot anyway in the near future ;))
The third choice, the one i choose, was the most simple and “effective” in my opinion. I only needed to update the world matrix and the texture each time I drawed a new quad.

Sprite Alternative Class

Sprite Alternative Class

 The contract given by the class is quite straightforward. Let’s analyze it (I’ll skip the Begin/Draw/End that I’ll analyze after) :

1) The Buffer property is the static vertex buffer containing the quad that has the protected modifier to allow the derived class to modify it.
2) The ProjectionMatrix and ViewMatrix properties allow the user to choose which transforms to use for a particular rendering.
3) The Dispose method that release all the resources used by the class. (VertexBuffer and Effect basically)
4) The LoadContent and UnloadContent that load and unload the resources used by the class in the various circumstances. (Lost Devices, Resets etc..)
5) The constructor.

Sprite Alternative Process

Sprite Alternative Process

Using the new class is quite straightforward.

1) The call to SpriteAlternative.Begin() initialize all the states/effects needed for the draws to occur.
2) The calls to SpriteAlternative.Draw() change the WorldMatrix and Texture and actually make the Draw call.
3) The call to SpriteAlternative.End() closed the effect. (Only needed in DX9 while in DX10 just does nothing)

While i did not gain a lot of frames in the process (almost 0,779 ms each frame on a 3,6 ms application and -10% on the CPU utilization on the same application), I did learn a lot from this experience and now I have a solid base on which to base various personalization (deriving from the SpriteAlternative class and giving out personalized shaders and other features…).
I suggest anyone reading this article to have a shot to recreate their own Sprite class, it is a nice experience.

I hope this has been a helpful reading for you all,
The next topic I’ll talk about is the remake of the ID3DX10Font/ID3DXFont classes,

See you later. 😉

Posted in General | Tagged: , , , , | Leave a Comment »

Game Loop – A little story

Posted by feal87 on May 29, 2009

When I look back at the almost three months spent on the game engine I’ve developed, I remind of the various problems and trials I had to resolve at each stage. One of the most interesting thing I think I’ve worked on was the designing of the game loop.
Well the design at first was “quick and dirty” and slowly evolved into something more advanced. Let’s go back and see the steps that have taken the game engine to what it is now and while at it, let’s look at the various type of game loops I’ve used. (I will NOT delve into the depth of programming, but leave it as a simple theorical reading)

When i first started tinkering with SlimDX, I started by using a slightly modified version of their Sample Framework. It is basically a simple framework that imitate the behaviour of XNA. Their game loop was a very simple single-threaded one.
The thing that i didn’t like of it was that it was based over the Application_Idle event to trigger the drawing of the frame, making the engine quite unreliable and generating useless trash work over the Game Clock. I quickly worked over it making it look like a classical game loop of old days…

Simple Game Loop

Simple Framework Gameloop

The game loop repeated until the game ended its execution, updating the game logic and drawing frames as fast as it can. (if not blocked by the VSync)
While this kind of game loop was ok for demos and simple app, it was quite unefficient for games that needed correct timing and performance.

However I remained stuck to this kind of game loop for quite a while, until one day i received a mail from the Intel Visual Adrenaline newsletter containing a paper over multithreaded game engines.
I thought it would be nice to implement a multithreaded game engine, especially in these days of multicores CPUs. (While I’ve not followed any of the guidelines of the paper :D)
I started working on the Multithreaded game engine and after a day or two…

The new game loop contained several blocks. Let’s examine them all :

Multithreaded Game Loop

Multithreaded Game Loop

1 ) Just like the old game loop the first things it does is to process all the windows messages still in the queue.
2 ) Check if the engine is still active and if not, just sleep for 20 ms. This part of the engine was quite necessary because i felt that using CPU cycles while the application was minimized or paused was quite useless.
3 ) The third step is “Reset the devices if needed“. This is a needed feature if the user try to change resolution or the device is lost. (As we know Reset must occur in the same thread of the message pump and of the creation of the device)
4 ) Update the engine clock to have a correct timing inside the game.
5 ) The “Process all internal messages” phase is a particular one. I thought it would be needed to execute some operations serially so i gave the opportunity to queue inside a personal message pump some delegates to be executed at the next frame.
6 ) Finally we are at the most important step of the game loop, “Schedule all the tasks available for the engine“. 
Before examining this step let’s see what is a task. A task in my engine, is a class that execute a particular objective (be it Draw the frame, update the game logic, calculate physics, etc…) at a certain rate calculated in ticks in a personal thread. For example we could have a Draw task that runs as fast as it can, and an Update task that runs maximum 60 times per second. The engine at this phase contained 3 basic always-active tasks that were the Draw Task, Update Task and the Audio Task.
As you can now imagine, this step schedule inside a particular Task manager class all the default tasks and all the tasks defined for the specific application. (for example an user can create a Physics task if needed for its app).
7 ) In the seven step we “wait for all the tasks to be completed“. (This is called lock-step mode in a game engine)
8 ) Synchronize data between tasks. In this step the Synchronization manager send notices of data changed to the various subscriber tasks.
9 ) Finally we go to the last step when we check if the game is ended or not.

 

 

 

 

 

 

 

 

 

 

As you can see, this kind of game loop was quite a step forward permitting control over timing and performance and on a side note allowed the game to scale over multiple processors easily.

One problem remained in my mind…what if the application needs more than one device? (like a level editor program with inside a character editor in another window)
There was two ways to resolve the issue :

1) Use multiple swapchains on the same device.
2) Reorganize the game loop to support multiple instances of the game engine.

I decided for the latter and started reorganizing the game loop to handle this new situation and I finally got to the actual form of the game loop :

Final form of the Game Loop

Final form of the Game Loop

What’s changed here? Well we basically abstracted the game engine by creating an engines manager that allow multiple instances of them to run in a lock-step mode. The rest of the game loop is basically the same.
Here’s an image of an example application running two engines at the same time with the final engine :
Multi Engine Example

Multi Engine Example

I hope, this reading has been useful, I’ve still not decided what the next post will talk about, but I think it will hopefully be about code practices so look forward to it.

See you later. 😉

Posted in General | Tagged: , , , , | 1 Comment »

Walkway to a MultiAPI Engine (Automatic Build System)

Posted by feal87 on May 28, 2009

When I decided that the game engine I was about to create would support multiple APIs, I had a few ideas about how to structure the game engine and its build management, all with pros and contros. This post will analyze the possible choices that I had without showing actual code, but examining the ideas behind it and their reasoning.

I identified while thinking three possible way to manage the game engine and its build management in simplicity :

1) Have single class with one codepath for each API.
2) Have multiple class, one for each API. (Device9, Device10, DeviceGL, Texture9, Texture10, TextureGL, etc…)
3) Have single class with preprocessor directives for conditional compiling. (#define DX9, #if DX9…#else…#endif, etc…)

The first one while permitting the use and switch between multiple engines from even ingame, it added useless checks every operation to know what engine are we using. Imagine an if…then…else each operation, it would have added a very big and useless overhead to the game engine.

Example of a Type 1 Engine Function

Example of a Type 1 Engine Function

The second one had no overhead by useless checks, but losed the real meaning of a transparent MultiAPI engine, because i would have to write multiple code for using each API in the various games while the objective is to write less code as possible.

The third one, the one I choose, was perfect for a transparent game engine. With conditional compiling I would have a slim DLL for each of the APIs, and all these DLL’s would have the same identical behaviour making them interchangeable.
However I encountered some problems while designing the solution.
The question was, how to actually use different DLL from inside C# in a simple and straightforward way? How to distribute all these different DLL’s and executables with an automatic system?
After a bit of tinkering I found out the solution I’m still using even today.

Engine example solution

Engine Type 3 Example solution

Basically the Visual Studio Solution is divided in multiple project and each project has multiple build configuration, one for each API. Each of this configuration have a preprocessor symbol to identify what needs to be compiled. When the game is build Visual Studio run the PostBuild event that merge the game engine library compiled (depending of the configuration) with the game executable and copy/rename the generated executable in the bin directory.
This has been possible thanks to a very nice tools created by Microsoft Research called ILMerge. ILMerge is an utility that can be used to merge multiple .NET assemblies into a single assembly.

Here it is an example PostBuild script that copies the SlimDX DLL and use ILMerge to create a single assembly with everything in it. (Only for Release DX9, but adding the others is just a work of copy&paste)

mkdir .\Build\Eseguibili\$(ProjectName)\
copy .\Library\SlimDX.dll .\Build\$(ProjectName)
IF "$(ConfigurationName)" == "Release DirectX9"
   (.\BuildTools\ILMerge.exe /out:.\Build\$(ProjectName)\$(ProjectName)DX9.exe
   "$(TargetDir)$(ProjectName).exe" "$(TargetDir)gamelibrary.dll")
IF "$(ConfigurationName)" == "Release DirectX9"
   (del .\Build\$(ProjectName)\$(ProjectName)DX9.pdb)

This way using the example solution I posted above we would have as result of the compilation two executables, PongDX9.Exe and PongDX10.Exe. Each one containing everything needed to be run, game engine dll included, inside the executable. (Except the game assets obviously)

I hope to have covered enough ground to let you make a general idea behind the managing of a MultiAPI engine.
The next article will be more technical. We’ll talk about the structure of the game engine i’m developing, its gameloop and lots more.

See you later. 😉

Posted in General | Tagged: , , , , | Leave a Comment »

Interoperability between DirectX9 and DirectX10

Posted by feal87 on May 27, 2009

One of the first things i’ve looked up when I was starting to write my game engine was to check what the actual APIs had to offer and if a MultiAPI engine was a reasonable or useful idea. (I’ll make a separate post to explain how I designed the engine)

After some searching, I decided that for compatibility sake I had to support not only the latest bleeding edge technology DirectX10, but also DirectX9. This because DirectX10 for several reason was not available on Windows XP systems.

Well, the dice was thrown and the developing began at steady rate, after a week or two I stumbled on a particular situation that I want to share with everyone hoping that it may help someone who is going through the same problem.

At that time I was developing a test game, the classical remake of Arkanoid in 2D that I think everyone do as first test project. While the main engine was working, I had to choose what and how to draw the objects on the screen. The classical arkanoid is divided basically in 3 distinct areas. The player, the ball and the bricks. 

Pong2D Screenshot
Pong2D Screenshot

While the first two posed no problem and could be drawed stand-alone, the bricks posed some problems.
DirectX is an API built to take advantage of large stacks of objects drawn together with a single Draw call. Doing 120 different draw call for the bricks (the level area is 15 bricks as width and 8 bricks as height) was quite overkill and would have damaged the performance of a, otherwise, so simple game. The idea I developed was to have a big texture to contains all the bricks and update the texture each time the status of the bricks changed. (updating obviously only the part changed)

Well, the surprise (and the reason of this post) come after developing everything and starting the test phase.
While in DirectX9 mode the game worked fine, in DirectX10 mode something strange occurred. After the first update of the texture, the color of the bricks changed completely while maintaining their structure… (The blue become red and some other changes)

After some debugging I found out that DirectX10 DOES NOT support ANY raw format supported by DirectX9 for texture. While DirectX9 used a format ARGB, DirectX10 (or well, DXGI) supported only RGBA. Also, the System.Drawing.Bitmap class (based over GDI+) i was using to update the texture was no more good as it support only ARGB too.

How to go around this problem? There was three way to go around this problem :

1) Convert (swirl) the data on the CPU before launching the update of the texture and retaining this way all the old code.
2) Convert (swirl) the data before drawing on the GPU via pixel shader.
2) Create a new class that import and manages the textures differently between DirectX9 and DirectX10 while maintaining a common face to the framework.

While the first idea was quite fast to develop it was tremendously SLOW (to update a 720×400 image it took 5 ms using unsafe pointers and over 12 ms with Marshalling) so it was discarded right away.
The second idea was discarded right away too because at that time I was still using the default Sprite interface that does not allow personal shaders instead of a textured-quad solution. (I’ll return on this on another post)

Leaved with no choice I programmed the third choice and the BitmapAlternative class born anew.

BitmapAlternative class diagram
BitmapAlternative class diagram

The BitmapAlternative class is basically a class that permits draw and management of images in either ARGB or RGBA format. The main point of making this class was to offer a single abstract way to get the texture data and update safely and I think it is the best solution to cover this kind of problem. 

Here is is a short peek at the actual code of the class (only the contructor and the members)

public class BitmapAlternative
{
    public Int32 Width { get; set; }
    public Int32 Height { get; set; }
    public Int32 Stride { get { return Width * 4; } }
    public byte[] Data;
   
    public BitmapAlternative(Bitmap source)
    {
        Width = source.Width;
        Height = source.Height;
        #if DX10
           Data = ImageFunctions.ARGB32ToABGR32(source);
        #else
           Data = ImageFunctions.GetBytes(source);
        #endif
    }
}

The moral of this story? Well, there is no moral. 😀 
I just wanted to tell a story and hopefully save some people time figuring out and trying various ways to interoperate  DirectX9 and DirectX10 while using textures.

Ferreri Alessio

P.S. I heard that in the next DXGI 1.1 there will be support in DirectX10 for the plain old DirectX9 texture format. Maybe they found out about their error? 😉

Posted in General | Tagged: , , , , | 1 Comment »