Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6390 Improve Solaris support in Hadoop Map/Reduce
  3. MAPREDUCE-6417

MapReduceClient's primitives.h is toxic and should be extirpated

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Blocker
    • Resolution: Unresolved
    • 3.0.0-alpha1
    • None
    • client
    • None

    Description

      MapReduceClient's primitives.h attempts to provide optimised versions of standard library memory copy and comparison functions. It has been the subject of several portability-related bugs:

      • HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is needed, doesn't work on non-x86
      • HADOOP-11665 Provide and unify cross platform byteorder support in native code
      • MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
      • HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM AARCH64 due to x86 asm statements

      At present it only works on x86 and ARM64 as it lacks definitions for bswap and bswap64 for any platforms other than those.

      However it has even more serious problems on non-x86 architectures, for example on SPARC simple_memcpy simply doesn't work at all:

      $ cat bang.cc
      #include <string.h>
      #define SIMPLE_MEMCPY
      #include "primitives.h"
      int main(int argc, char **argv)
      {
          char b1[9];
          char b2[9];
          simple_memcpy(b2, b1, sizeof(b1));
      }
      $ gcc -o bang bang.cc && ./bang
      Bus Error (core dumped)
      

      That's because simple_memcpy does pointer fiddling that results in misaligned accesses, which are illegal on SPARC.

      fmemcmp is also broken. Even if a definition of bswap is provided, on big-endian architectures the result is simply wrong because of its unconditional use of bswap:

      $ cat thud.cc
      #include <stdio.h>
      #include <strings.h>
      #include "primitives.h"
      int main(int argc, char **argv)
      {
          char a[] = { 0,1,2,0 };
          char b[] = { 0,2,1,0 };
          printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a))));
      }
      $ g++ -o thud thud.cc && ./thud
      65280 -1
      

      And in addition fmemcmp suffers from the same misalignment issues as simple_memcpy and coredumps on SPARC when asked to compare odd-sized buffers.

      primitives.h provides the following functions:

      • bswap - used in 12 files in MRC but as HADOOP-11505 points out, mostly incorrectly as it takes no account of platform endianness
      • bswap64 - used in 4 files in MRC, same comments as per bswap apply
      • simple_memcpy - used in 3 files in MRC, should be replaced with the standard memcpy
      • fmemcmp - used in 1 file, should be replaced with the standard memcmp
      • fmemeq - used in 1 file, should be replaced with the standard memcmp
      • frmemeq - not used at all, should just be removed

      Summary: primitives.h should simply be deleted and replaced with the standard memory copy & compare functions, or with thin wrappers around them where the APIs are different.

      Attachments

        1. MAPREDUCE-6417.001.patch
          19 kB
          Alan Burlison

        Issue Links

          Activity

            People

              alanburlison Alan Burlison
              alanburlison Alan Burlison
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: