Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-2038

Enrichment Loader Fails When Run as MR Job

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Done
    • Major
    • Resolution: Done
    • None
    • 0.7.1
    • None

    Description

      The enrichment loader fails when run as an MR job on YARN. It runs successfully when run in local mode.

      The following exception occurs inside the YARN container.

      2019-03-13 16:14:28,391 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
      java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HBaseConfiguration.createClusterConf(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;Ljava/lang/String;)Lorg/apache/hadoop/conf/Configuration;
       at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:204)
       at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
       at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:517)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:501)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1640)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:501)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:287)
       at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1598)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1595)
       at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1526)
      2019-03-13 16:14:28,394 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
      

      Steps to Replicate

      1. Create a data set of enrichments to load.

      [root@node1 0.7.1]# cat alexa.csv
      1,google.com
      2,youtube.com
      3,facebook.com
      4,baidu.com
      5,wikipedia.org
      6,yahoo.com
      7,google.co.in
      8,reddit.com
      9,qq.com
      10,amazon.com
      11,taobao.com
      12,google.co.jp
      13,twitter.com
      14,tmall.com
      15,vk.com
      16,live.com
      17,instagram.com
      18,sohu.com
      19,sina.com.cn
      20,weibo.com
      21,jd.com
      22,360.cn
      23,google.de
      24,google.co.uk
      25,google.ru
      26,google.fr
      27,google.com.br
      28,list.tmall.com
      29,linkedin.com
      30,google.com.hk
      31,netflix.com
      32,yandex.ru
      33,google.it
      34,yahoo.co.jp
      35,google.es
      36,t.co
      37,pornhub.com
      38,ebay.com
      39,imgur.com
      40,google.com.mx
      41,google.ca
      42,alipay.com
      43,twitch.tv
      44,xvideos.com
      45,bing.com
      46,youth.cn
      47,msn.com
      48,aliexpress.com
      49,tumblr.com
      50,ok.ru
      

      2. Push the data to HDFS.

      hdfs dfs -put alexa.csv /tmp
      

      3. Create the enrichment definition.

      [root@node1 0.7.1]# cat enrichment.json
      {
       "zkQuorum":"node1:2181",
       "sensorToFieldList":{
       "squid":{
       "type":"ENRICHMENT",
       "fieldToEnrichmentTypes":{
       "domain_without_subdomains":[
       "whois",
       "alexa"
       ]
       }
       }
       }
      }
      

      4. Create the extractor definition.

      [root@node1 0.7.1]# cat extractor.json
      {
       "config" : {
       "columns" : {
       "domain" : 1,
       "rank" : 0
       }
       ,"indicator_column" : "domain"
       ,"type" : "alexa"
       ,"separator" : ","
       },
       "extractor" : "CSV"
      }
      

      5. Execute the loader.

      /usr/metron/0.7.1/bin/flatfile_loader.sh -n ./enrichment.json -t enrichment -c t -e ./extractor.json -i /tmp/alexa.csv -m MR
      
      19/03/13 16:12:26 WARN extractor.TransformFilterExtractorDecorator: Unable to setup zookeeper client - zk_quorum url not provided. **This will limit some Stellar functionality**
      19/03/13 16:12:26 INFO importer.MapReduceImporter: Configuring MapReduceImporter: /tmp/alexa.csv => enrichment:t
      19/03/13 16:12:27 INFO client.RMProxy: Connecting to ResourceManager at node1/127.0.0.1:8050
      19/03/13 16:12:27 INFO client.AHSProxy: Connecting to Application History server at node1/127.0.0.1:10200
      
       
      
      19/03/13 16:14:09 INFO input.FileInputFormat: Total input paths to process : 1
      19/03/13 16:14:10 INFO mapreduce.JobSubmitter: number of splits:1
      19/03/13 16:14:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552492524533_0003
      19/03/13 16:14:12 INFO impl.YarnClientImpl: Submitted application application_1552492524533_0003
      19/03/13 16:14:12 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552492524533_0003/
      19/03/13 16:14:12 INFO mapreduce.Job: Running job: job_1552492524533_0003
      19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 running in uber mode : false
      19/03/13 16:14:33 INFO mapreduce.Job: map 0% reduce 0%
      19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 failed with state FAILED due to: Application application_1552492524533_0003 failed 2 times due to AM Container for appattempt_1552492524533_0003_000002 exited with exitCode: 1
      For more detailed output, check the application tracking page: http://node1:8088/cluster/app/application_1552492524533_0003 Then click on links to logs of each attempt.
      Diagnostics: Exception from container-launch.
      Container id: container_e01_1552492524533_0003_02_000001
      Exit code: 1
      

      6. The root cause exception is visible in the YARN logs or the application tracker UI.

      Attachments

        Issue Links

          Activity

            People

              nickwallen Nick Allen
              nickwallen Nick Allen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m