Error message

  • Notice: Trying to get property of non-object in filter_default_format() (line 532 of /home/ntroutman/webapps/nt_drupal/modules/filter/filter.module).
  • Notice: Undefined variable: options in filter_process_format() (line 911 of /home/ntroutman/webapps/nt_drupal/modules/filter/filter.module).

Ints Vs. BitVector32: Speed Showdown

So I've been playing around the past few days with writing a simple 3D game. I can't tell you much about . . . sorry. I started off in Java using OpenGL. Things went fairly well considering 3D engines are not my strong suit and its a lot of learning. (And yes I looked at JME, but decided against it). Anyways, I decide that I'd try my hand at using XNA, which meant porting what I had from Java to C#. (I have this delusion of releasing it as an indie game one day on the 360). While porting I ran across my code where I used packed ints.

By this I mean I store 5 fields in a single 32-bit int.

int d = ints[i];
int b1 = (d >> 29); // 3 bits
int b2 = (d >> 24) & 0x1f; // 5 bits
int b3 = (d >> 16) & 0xFF; // 8 bits
int b4 = (d >> 8) & 0xFF; // 8 bits
int b5 = d & 0xFF; // 8 bits

Yes, new machines have tons of memory, but shrinking the memory footprint by 500% is probably worth it in the long run. Especially since these fields resided in rather large 3-dimensional arrays. I know in C you can do bit-fields to really pack structs in tight, yet keep some abstraction to allow the code to be clear about what field you are accessing. However, C# doesn't seem to support these. (I suppose you could drop into unmanaged code, but thats just ugly.)

After a bit of googling I ran across the BitVector32 struct in C#. It seemed perfect! First off you have to declare how the data is stored in the bit vector:

BitVector32.Section bShape, bRotation, bFaces, bType, bTexture;
bShape = BitVector32.CreateSection(7);
bRotation = BitVector32.CreateSection(31, bShape);
bFaces = BitVector32.CreateSection(255, bRotation);
bType = BitVector32.CreateSection(255, bFaces);
bTexture = BitVector32.CreateSection(255, bType);

Then we could access it using clear names:

BitVector32 bv = new BitVector32(ints[i]);
int b1 = bv[bShape];
int b2 = bv[bRotation];
int b3 = bv[bFaces];
int b4 = bv[bType];
int b5 = bv[bTexture];

However, seeing as this is a game and performance is very important, I wanted to run a quick test to check the speed of things. I used a simple Timing class I had which called a delegate, ran it 1000 times (to let the JIT work a bit, not sure if its needed, but can't hurt), then starts timing and runs through another bunch of invocations.

const int NUM_TESTS = 10000;
const int NUM_ELEMENTS = 2048;
int[] ints = new int[NUM_ELEMENTS];
Random gen = new Random(112358);

// Fill the array with random data
for (int i = 0; i < NUM_ELEMENTS; i++)
{
	ints[i] = gen.Next(8) << 29 | gen.Next(16) << 24
			| gen.Next(256) << 16 | gen.Next(256) << 8 | gen.Next(256);
}

Timer.timeit("Read Ints", NUM_TESTS, delegate()
   {
	for (int i = 0; i < NUM_ELEMENTS; i++)
	{
	    // int code like above
	}
    });            

Timer.timeit("Read Ints", NUM_TESTS, delegate()
   {
	for (int i = 0; i < NUM_ELEMENTS; i++)
	{
	    // bit vector code like above
	}
    });

The results in milliseconds:

Read Ints:286
Read BitVector32:1353

Things don't look good for bit vectors, but wait. I'm doing an awful lot of instantiations, what if I make the bit vector arrays ahead of time and then only time the access. I don't really need the int arrays if bit vectors will work. (I just now thought off this while writing this post . . . so I'll be back in a bit, lol.)

Back with more results. I also decided to add in using 5 bytes, since that reduced the waste down to 1-byte as opposed to 20-bytes using 5 ints. (I also fixed a bug where I wasn't adding the fifth field pulled from the ints). These changes made the setup look like this:

byte[,] bytes = new byte[NUM_ELEMENTS, 5];
int[] ints = new int[NUM_ELEMENTS];
BitVector32[] bvs = new BitVector32[NUM_ELEMENTS];

Random gen = new Random(112358);
for (int i = 0; i < NUM_ELEMENTS; i++)
{
	int b1 = gen.Next(8), b2 = gen.Next(16), b3 = gen.Next(256), b4 = gen.Next(256), b5 = gen.Next(256);
	int packed = b1 << 29 | b2 << 24 | b3 << 16 | b4 << 8 | b5;
	ints[i] = packed;
	bytes[i, 0] = (byte)b1;
	bytes[i, 1] = (byte)b2;
	bytes[i, 2] = (byte)b3;
	bytes[i, 3] = (byte)b4;
	bytes[i, 4] = (byte)b5;
	bvs[i] = new BitVector32(packed);
}

Then I replaced:

BitVector32 bv = new BitVector32(ints[i]);

With:

BitVector32 bv = bvs[i];

And added in a new body for bytes:

for (int i = 0; i < NUM_ELEMENTS; i++)
{
	int b1 = bytes[i, 0];
	int b2 = bytes[i, 1];
	int b3 = bytes[i, 2];
	int b4 = bytes[i, 3];
	int b5 = bytes[i, 4];
	int t = b1 + b2 + b3 + b4 + b5;
}

And the results:

Read Ints:328
Read BitVector32:1266
Read Bytes:1011

*sigh* clearly bit vectors just aren't going to cut it. I can live with bit bashing for a 400% speed up. Early optimization? Sure, maybe, but its hard to not to take it now knowing its so large. Attached you will find my code if you'd like to run things your self.

AttachmentSize
Timer.cs896 bytes
IntVsBitVector32.cs3.01 KB

Comments

That really is terrible performance for BitVector32.  You might consider creating some static methods to encapsulate the bit twiddling.  That is:

static class BitTwiddlers
{
    static public int Shape(int d) { return d >> 29; }
    // etc...
}

I ran your test code, replacing the inline bit twiddling with calls to those functions, and the time was exactly the same as for ints (105 ms in Release mode, running outside the debugger).  It looks like the JIT compiler is inlining those static methods.

It sure would be nice to use BitVector32, but the static methods give the same effect without costing performance.

Thanks for the suggestion. That is in fact what I ended up doing. I just wrote static functions and defined const's for the shifts and hope the JIT will inline everything for me. Its one location where I would like pre-processor macros, but, if the JIT will inline then the performance is the same.

Add new comment