[HDFS-7982] huge non dfs space used - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0
Component/s: datanode
Labels:
None

Description

Hi...

I'm trying to load an external textfile table into a internal orc table using hive. My process failed with the following error :
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive/blablabla.... could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation.

After investigation, I saw that the quantity of "non dfs space" grows more and more, until the job fails.
Just before failing, the "non dfs used space" reaches 54.GB on each datanode. I still have space in "remaining DFS".

Here the dfsadmin report just before the issue :

[hdfs@hadoop-01 data]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 475193597952 (442.56 GB)
Present Capacity: 290358095182 (270.42 GB)
DFS Remaining: 228619903369 (212.92 GB)
DFS Used: 61738191813 (57.50 GB)
DFS Used%: 21.26%
Under replicated blocks: 38
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.3.36:50010 (hadoop-04.XXXXX.local)
Hostname: hadoop-04.XXXXX.local
Decommission Status : Normal
Configured Capacity: 158397865984 (147.52 GB)
DFS Used: 20591481196 (19.18 GB)
Non DFS Used: 61522602976 (57.30 GB)
DFS Remaining: 76283781812 (71.04 GB)
DFS Used%: 13.00%
DFS Remaining%: 48.16%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 182
Last contact: Tue Mar 24 10:56:05 CET 2015

Name: 192.168.3.35:50010 (hadoop-03.XXXXX.local)
Hostname: hadoop-03.XXXXX.local
Decommission Status : Normal
Configured Capacity: 158397865984 (147.52 GB)
DFS Used: 20555853589 (19.14 GB)
Non DFS Used: 61790296136 (57.55 GB)
DFS Remaining: 76051716259 (70.83 GB)
DFS Used%: 12.98%
DFS Remaining%: 48.01%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 184
Last contact: Tue Mar 24 10:56:05 CET 2015

Name: 192.168.3.37:50010 (hadoop-05.XXXXX.local)
Hostname: hadoop-05.XXXXX.local
Decommission Status : Normal
Configured Capacity: 158397865984 (147.52 GB)
DFS Used: 20590857028 (19.18 GB)
Non DFS Used: 61522603658 (57.30 GB)
DFS Remaining: 76284405298 (71.05 GB)
DFS Used%: 13.00%
DFS Remaining%: 48.16%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 182
Last contact: Tue Mar 24 10:56:05 CET 2015

I was expected to find a temporary space used within my filesystem (ie /data).
I found the DFS usage under /data/hadoop/hdfs/data (19GB) but no trace of 57GB for non DFS...

[root@hadoop-05 hadoop]# df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 148G 20G 121G 14% /data

I also checked dfs.datanode.du.reserved that is set to zero.
[root@hadoop-05 hadoop]# hdfs getconf -confkey dfs.datanode.du.reserved
0

Did I miss something ? Where is non DFS space on linux ? Why did I get this message "could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation." knowing that datanodes were up and running with still remaining DFS space.

This error is blocking us.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: regis le bretonnic

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 24/Mar/15 13:13

Updated:: 06/Sep/15 10:43

Resolved:: 13/Apr/15 21:12