Details
-
Bug
-
Status: Done
-
Major
-
Resolution: Done
-
None
-
None
Description
The enrichment loader fails when run as an MR job on YARN. It runs successfully when run in local mode.
The following exception occurs inside the YARN container.
2019-03-13 16:14:28,391 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HBaseConfiguration.createClusterConf(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;Ljava/lang/String;)Lorg/apache/hadoop/conf/Configuration; at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:204) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:517) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:501) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1640) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:501) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:287) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1598) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1595) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1526) 2019-03-13 16:14:28,394 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
Steps to Replicate
1. Create a data set of enrichments to load.
[root@node1 0.7.1]# cat alexa.csv 1,google.com 2,youtube.com 3,facebook.com 4,baidu.com 5,wikipedia.org 6,yahoo.com 7,google.co.in 8,reddit.com 9,qq.com 10,amazon.com 11,taobao.com 12,google.co.jp 13,twitter.com 14,tmall.com 15,vk.com 16,live.com 17,instagram.com 18,sohu.com 19,sina.com.cn 20,weibo.com 21,jd.com 22,360.cn 23,google.de 24,google.co.uk 25,google.ru 26,google.fr 27,google.com.br 28,list.tmall.com 29,linkedin.com 30,google.com.hk 31,netflix.com 32,yandex.ru 33,google.it 34,yahoo.co.jp 35,google.es 36,t.co 37,pornhub.com 38,ebay.com 39,imgur.com 40,google.com.mx 41,google.ca 42,alipay.com 43,twitch.tv 44,xvideos.com 45,bing.com 46,youth.cn 47,msn.com 48,aliexpress.com 49,tumblr.com 50,ok.ru
2. Push the data to HDFS.
hdfs dfs -put alexa.csv /tmp
3. Create the enrichment definition.
[root@node1 0.7.1]# cat enrichment.json { "zkQuorum":"node1:2181", "sensorToFieldList":{ "squid":{ "type":"ENRICHMENT", "fieldToEnrichmentTypes":{ "domain_without_subdomains":[ "whois", "alexa" ] } } } }
4. Create the extractor definition.
[root@node1 0.7.1]# cat extractor.json { "config" : { "columns" : { "domain" : 1, "rank" : 0 } ,"indicator_column" : "domain" ,"type" : "alexa" ,"separator" : "," }, "extractor" : "CSV" }
5. Execute the loader.
/usr/metron/0.7.1/bin/flatfile_loader.sh -n ./enrichment.json -t enrichment -c t -e ./extractor.json -i /tmp/alexa.csv -m MR 19/03/13 16:12:26 WARN extractor.TransformFilterExtractorDecorator: Unable to setup zookeeper client - zk_quorum url not provided. **This will limit some Stellar functionality** 19/03/13 16:12:26 INFO importer.MapReduceImporter: Configuring MapReduceImporter: /tmp/alexa.csv => enrichment:t 19/03/13 16:12:27 INFO client.RMProxy: Connecting to ResourceManager at node1/127.0.0.1:8050 19/03/13 16:12:27 INFO client.AHSProxy: Connecting to Application History server at node1/127.0.0.1:10200 19/03/13 16:14:09 INFO input.FileInputFormat: Total input paths to process : 1 19/03/13 16:14:10 INFO mapreduce.JobSubmitter: number of splits:1 19/03/13 16:14:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552492524533_0003 19/03/13 16:14:12 INFO impl.YarnClientImpl: Submitted application application_1552492524533_0003 19/03/13 16:14:12 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552492524533_0003/ 19/03/13 16:14:12 INFO mapreduce.Job: Running job: job_1552492524533_0003 19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 running in uber mode : false 19/03/13 16:14:33 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 failed with state FAILED due to: Application application_1552492524533_0003 failed 2 times due to AM Container for appattempt_1552492524533_0003_000002 exited with exitCode: 1 For more detailed output, check the application tracking page: http://node1:8088/cluster/app/application_1552492524533_0003 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e01_1552492524533_0003_02_000001 Exit code: 1
6. The root cause exception is visible in the YARN logs or the application tracker UI.
Attachments
Issue Links
- relates to
-
METRON-2043 Fix profiler-client dependencies
- In Progress
- links to