The instruction needs to have the 32-bit "lane" values chopped to 32-bits. The current lane implementation is not doing the chopping. Need to explicitly do the chop and add. Valgrind bug 405362