Details
Description
The nfcapd files that are in HDFS are converted into empty parquet files. Checked the nfcapd files and they are not empty.
This is the screen stdout:
2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected -------------------------------------- 2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.20170224222000 2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - File /home/spot/netflow-files/nfcapd.20170224222000 added to the queue 2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------ 2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected -------------------------------------- 2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.current.6266 2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - WARNING - File extension not supported: /home/spot/netflow-files/nfcapd.current.6266 2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - WARNING - File won't be ingested 2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------ 2017-02-24 22:25:11,272 - SPOT.INGEST.FLOW.31183 - INFO - SPOT.Utils: Creating hdfs folder: hadoop fs -mkdir -p /user/spot/pipelines/flow/binary/20170224/22 2017-02-24 22:25:13,715 - SPOT.INGEST.FLOW.31183 - INFO - SPOT.Utils: Loading file to hdfs: hadoop fs -moveFromLocal /home/spot/netflow-files/nfcapd.20170224222000 /user/spot/pipelines/flow/binary/20170224/22/nfcapd.20170224222000 2017-02-24 22:25:16,422 - SPOT.INGEST.FLOW.31183 - INFO - Sending file to worker number: 1 2017-02-24 22:25:16,552 - SPOT.INGEST.FLOW.31183 - INFO - File /home/spot/netflow-files/nfcapd.20170224222000 has been successfully sent to Kafka Topic to: SPOT-INGEST-flow_internals-17_28_51 2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected -------------------------------------- 2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.20170224222500 2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - File /home/spot/netflow-files/nfcapd.20170224222500 added to the queue 2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------ 2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected -------------------------------------- 2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.current.6266 2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - WARNING - File extension not supported: /home/spot/netflow-files/nfcapd.current.6266 2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - WARNING - File won't be ingested 2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------ 2017-02-24 22:30:11,565 - SPOT.INGEST.FLOW.31176 - INFO - SPOT.Utils: Creating hdfs folder: hadoop fs -mkdir -p /user/spot/pipelines/flow/binary/20170224/22 2017-02-24 22:30:14,076 - SPOT.INGEST.FLOW.31176 - INFO - SPOT.Utils: Loading file to hdfs: hadoop fs -moveFromLocal /home/spot/netflow-files/nfcapd.20170224222500 /user/spot/pipelines/flow/binary/20170224/22/nfcapd.20170224222500 2017-02-24 22:30:16,825 - SPOT.INGEST.FLOW.31176 - INFO - Sending file to worker number: 0 2017-02-24 22:30:16,944 - SPOT.INGEST.FLOW.31176 - INFO - File /home/spot/netflow-files/nfcapd.20170224222500 has been successfully sent to Kafka Topic to: SPOT-INGEST-flow_internals-17_28_51
The YARN job:
2017-02-24 19:35:47,812 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://tanuki.akainix.local:8020/user/spot/.staging/job_1480610160914_0358/job_1480610160914_0358_1_conf.xml to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml_tmp 2017-02-24 19:35:47,844 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml_tmp 2017-02-24 19:35:47,853 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358.summary_tmp to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358.summary 2017-02-24 19:35:47,854 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml_tmp to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml 2017-02-24 19:35:47,857 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358-1487975734022-spot-INSERT+INTO+TABLE+spotdb.f...spotdb.flow_tmp%28Stage-1487975747716-1-0-SUCCEEDED-root.users.spot-1487975739310.jhist_tmp to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358-1487975734022-spot-INSERT+INTO+TABLE+spotdb.f...spotdb.flow_tmp%28Stage-1487975747716-1-0-SUCCEEDED-root.users.spot-1487975739310.jhist 2017-02-24 19:35:47,857 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop() 2017-02-24 19:35:47,858 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1480610160914_0358_m_000000_0 2017-02-24 19:35:47,858 INFO [Thread-68] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : levante.akainix.local:8041 2017-02-24 19:35:47,877 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1480610160914_0358_m_000000_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED 2017-02-24 19:35:47,878 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to 2017-02-24 19:35:47,878 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://tanuki.akainix.local:19888/jobhistory/job/job_1480610160914_0358 2017-02-24 19:35:47,886 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered. 2017-02-24 19:35:48,888 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0 2017-02-24 19:35:48,889 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://tanuki.akainix.local:8020 /user/spot/.staging/job_1480610160914_0358 2017-02-24 19:35:48,900 INFO [Thread-68] org.apache.hadoop.ipc.Server: Stopping server on 39525 2017-02-24 19:35:48,901 INFO [IPC Server listener on 39525] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 39525 2017-02-24 19:35:48,901 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2017-02-24 19:35:48,901 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted 2017-02-24 19:35:48,902 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
Regards,
Joaquín Silva