FX::Maths::Vector

FX::Maths::Vector< type, A > Class Template Reference

#include <FXMaths.h>

Inheritance diagram for FX::Maths::Vector< type, A >:

Detailed Description

template<typename type, unsigned int A>
class FX::Maths::Vector< type, A >

A SIMD based N dimensional vector.

This is a generic vector of size A of type components. Specialisations have been provided for SIMD quantities on platforms which support those, so for example on Intel SSE platforms four floats will map to a __m128 SSE register. Where the size is a power of two up to 1Kb, the compiler is asked to memory align the vector and an assert is added to ensure instantiation will not succeed without correct alignment. If doing your own memory allocation, make sure to align to sixteen bytes: FX::malloc and FX::calloc both have an alignment parameter, and you can use FX::aligned_allocator<T, 16> for STL containers.

Warning:: GCC on x86 and x64 cannot currently align stack allocated variables any better than 16 bytes. Therefore the assertion checks are reduced to a check for 16 byte alignment on GCC.

Furthermore, processor SIMD is used for ALL two power sized vectors up to 1Kb (256 floats or 128 doubles) so a vector of sixteen floats will be implemented as four lots of __m128 SSE operations on current processor technology. This means that you can write now for upcoming vector processors eg; Intel's upcoming Advanced Vector Extensions (AVX) which use a GPU-like parallel processing engine (Larrabee) to process blocks of 256-1024 bit vectors (8 to 32 floats) at once. For this same reason, operands available for this class have been kept minimal - sin() probably is too big for a simple math processor.

Warning:: This use of combining multiple SIMD vectors to implement a bigger vector does not currently optimise well with most compilers. Both GCC v4.2 and MSVC9 refuse to realise that more than one SIMD op can be performed in parallel and force everything through xmm0. Intel's C++ compiler does do the right thing, but forces a load & store via memory between ops which is entirely unnecessary and I haven't found a way to prevent this (it seems to think memory may get clobbered). This SIMD problem is known to the GCC and MSVC authors and imminently upcoming versions fix this register allocation problem.

All of the standard arithmetic, logical and comparison operators are provided but they are only defined according to what FX::Generic::TraitsBasic<type> says. If the type is not an arithmetical one, no operators are defined; if floating point, then the standard arithmetic ones; if integer, then the standard plus logical operators. When arithmetical, the additional friend functions have been provided: isZero(), min(), max(), sum(), dot(); for floating-point only: sqrt(), rcp(), rsqrt(); for integer only: lshiftvec(), rshiftvec(). These can be invoked via Koenig lookup so you can use them as though they were in the C library.

FOX provides hardwired versions of this class in the forms of FX::FXVec2f, FX::FXVec2d, FX::FXVec3f, FX::FXVec3d, FX::FXVec4f, FX::FXVec4d. These are nothing like as fast, and also they are designed in a highly SIMD unfriendly way - FX::Maths::Vector was deliberately designed with an inconvenient API to force high performance programming.

Implementation:

The following combinations have been optimised:

When compiled with SSE support, Vector<float, 4> uses __m128
When compiled with SSE2 support, Vector<double, 2> uses __m128d. Vector<FXushort|FXshort,8>, Vector<FXuint|FXint,4> both use __m128i (SSE2 does not have a full set of instructions for 16 chars nor 2 long long's).
When compiled with SSE3 support, sum() uses horizontal adding.
When compiled with SSE4 support, dot() uses the direct SSE4.1 instruction.

For double precision on SSE2 only, rcp(), rsqrt() are no faster (nor slower) than doing it manually - only on SSE do they have special instructions.

For integers on SSE2 only, multiplication, division, modulus, min(), max() are emulated (slowly) as they don't have corresponding SSE instructions available. For SSE4 only, multiplication, min(), max() is SSE optimised.

Note that the SSE2 optimised bit shift ignores all but the lowest member - for future compatibility you should set all members of the shift quantity to be identical. lshiftvec() and rshiftvec() treat the entire vector as higher indexed members being higher bits. On little endian machines, this leads to shifts occurring within their member types going "the wrong way" and then leaping to the next member. Usually, you want this. Only bit shifts which are multiples of eight are accelerated on SSE2.

See FX::Maths::Array and FX::Maths::Matrix for a static array letting you easily implement a matrix. See also the FXVECTOROFVECTORS macro for how to declare to the compiler when a vector should be implemented as a sequence of other vectors (this is how the SSE specialisations overload specialisations for two power increments) - if you want a non-two power size, you'll need to declare the VectorOfVectors specialisation manually.

Definition at line 549 of file FXMaths.h.

Public Member Functions

Vector ()

Vector (const type *d)

Vector (const type &d)

operator equivtype & ()

operator const equivtype & () const

The documentation for this class was generated from the following file:

FXMaths.h

Generated on Fri Nov 20 18:38:02 2009 for TnFOX by

v1.4.7


Public Member Functions
	Vector ()
	Vector (const type *d)
	Vector (const type &d)
	operator equivtype & ()
	operator const equivtype & () const

FX::Maths::Vector< type, A > Class Template Reference

Detailed Description

template<typename type, unsigned int A> class FX::Maths::Vector< type, A >

Implementation:

Public Member Functions

template<typename type, unsigned int A>
class FX::Maths::Vector< type, A >