Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-3436

sqoop imports data from oracle exadata has duplicates

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.4.7
    • None
    • sqoop1.4.7,hortonworks2.6.3

    • Important

    Description

      Hi I have used sqoop with oracle exadata which results in complete row duplicate ,at present we are removing using the distinct query and dumping into another target table,Please suggest on this

      Background for oracle table :

      Oracle used for sqoop import have no primary keys involved (i.e) tables are of scd type2 and have complex keys as primary keys which does not suit split by option and tables are very huge(100gig)

      Command used for sqoop import from oracle exadata

      sqoop import --connect %s@//%s:%s/%s --username %s -password %s --table %s.%s --fields-terminated-by '%s' --hive-drop-import-delims --hive-import --hive-overwrite --hive-table %s.%s --null-string '\\\N' --null-non-string '\\\N' --m %s --fetch-size=2500

      Attachments

        Activity

          People

            Unassigned Unassigned
            sunnn12 Naresh AR
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: