Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-2628

Import MySQL table --direct UTF-8 data corrupted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.6
    • None
    • sqoop2-jdbc-connector
    • None
    • sqoop 1.4.6 hadoop 2.6.0-amzn-1

    Description

      Sqoop doesn't honor UTF-8 chars when import --direct on a MySQL table.

      Here is the key comma delimited output from attached example script w/o and w/ --direct:

      1,Τη γλώσσα,"/fox/\jumps
      1,���� ������������,"/fox/\jumps
      

      I looked over sqoop --verbose output and hadoop logs but can't find anything suspicious.

      As an aside run the example script w/ --mysql-delimiters to get this puzzling comma delimited output:

      1,Τη γλώσσα,"/fox/\\jumps
      1,'���� ������������','\"/fox/\\jumps'
      

      Note, the difference between the text fields containing the word "fox." The output should be identical but they are quoted differently.

      Attached are scripts to create the MySQL utest example table and bash script I used to demonstrate the --direct problem.

      Environment

      $ sqoop version
      Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
      Please set $HCAT_HOME to the root of your HCatalog installation.
      Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
      Please set $ACCUMULO_HOME to the root of your Accumulo installation.
      15/10/20 17:28:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
      Sqoop 1.4.6
      git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
      Compiled by root on Mon Apr 27 14:38:36 CST 2015
      
      $ hadoop version
      Hadoop 2.6.0-amzn-1
      Subversion git@aws157git.com:/pkg/Aws157BigTop -r edd5a97db145470a8723dde24f38c83724e0959c
      Compiled by ec2-user on 2015-09-25T14:59Z
      Compiled with protoc 2.5.0
      From source with checksum 7beeae31f3c4554b23d92f1e63dc85
      This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-amzn-1.jar
      

      Attachments

        1. sqoop_utest.log
          295 kB
          Joseph Crotty
        2. sqoop_import.sh
          0.6 kB
          Joseph Crotty
        3. create_utest_table.sql
          0.3 kB
          Joseph Crotty

        Activity

          People

            Unassigned Unassigned
            holybit Joseph Crotty
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: