Few days ago I was thinking about how floating point numbers are represented in the memory so I made some research to find out the answer.
I wanted to send float in an AMF3 package as an integer. To understand my problem I explain how integers are encoded for sending in AMF3 packages.

For unsigned integers the encoding is similar to UTF-8, it’s called variable-width encoding.
If the number is smaller than 0x80 (represented on 7 bits), we send it on 1 byte.
If it’s smaller than 0x400 (represented on 14 bits) we send it on 2 bytes. The first 7 bits are sent in the first byte and the MSB is set to 1 indicating another byte is coming and the rest 7 bits are in the next byte.
If the number is smaller than 0x200000 (represented on 21 bits) we send it on 3 bytes familiar to the previous case. 7 bits are sent in the first byte and the MSB set to 1 indicating another byte is coming, the next 7 bits in the 2nd byte setting MSB to 1 indicating another byte is coming and the rest 7 bits in the next byte.
The max number we can send is 0x1fffffff (represented on 29 bits). The first 3 bytes are sent as described above and we can use all the 8 bits of the last byte.
Example sending value: 255 (0xff) is represented on bits: 11111111. It can be sent on 2 bytes, first byte will be 11111111 and the second 00000001.
So I wanted to use this mechanics on float types.

The floating point representation on 32 bit (float type):
31. bit – sign
30-23. bit – exponent
22-00. bit – fraction

Let’s see what it means.
Sign: 1 means a negative number, 0 means a positive number.
Exponent: The value is multiplied with the numberth power of 2.
Fraction: Is the base of the number. It represents a number between 1 and 2.

Here is a C++ sample code how to exactly calculate the number when we know these things.

```#include <math.h>

float Float( float floatNumber )
{
int intNumber = *(int*)&floatNumber;

int sign     =   intNumber >> 31;
int exponent = ( intNumber >> 23 )
& ( ( 1 <<  8 ) - 1 );
int fraction =   intNumber
& ( ( 1 << 23 ) - 1 );

bool  positive     = ( sign == 0 );
int   realExponent = exponent - 127;
float realFraction = 1.0f
+ ( (float)fraction / ( 1 << 23 ) );

float newNumber = ( positive ? 1.0f : -1.0f )
* realFraction * pow( 2.0f, realExponent );

return newNumber;
}
```

This code gives back our original number but we can see the parts of the mechanic.
First make our float number to int, this is tricky because we can’t cast the number directly then we would get the floared value of our original number. So we cast the address of our float to an integer pointer and get its value as if it were an integer.
Now let’s get the 31. bit, just right shift the value by 31 to get it.
The exponent is a little tricky. Right shift the number by 23 (that’s the count of the fraction bits) and get the lowest 8 bits to ignore the sign bit.
The fraction is simply the lowest 23 bits value.
Now to see what we did there. Positive is the number if sign value is 0. The real exponent has a 127 bias so it’s also simple. The real fraction is a number between 1 and 2 [1;2), so it has to be devided by the 23th power of 2 and added 1.
To check if we did everything well calculate the number again. See, it’s simple.
There are some special values, like 0, which can not get back exactly as you see the formula neither the fraction nor the power of 2 can not be 0. So it’s a dedicated number, the smallest one, which integer value is 0, so the exponent and the fraction are the smallest (-127 and 1).
There are some other conclusions that can be calculated easily now such as the precision and the min/max values of float type.
Finally my result was that float can not be sent as integer types in AMF3 packets because all bits are used for representing a float number and it can not be simplified to work as I wanted. Sure I can use an own fix-point type but it wasn’t my purpose.

### One Comment

• Bleki
• Posted April 9, 2012 at 12:41 am