Details
Description
we occasionally have some compressed file larger than 160MB in .deflate format. And it was load to hive using an external table, say table T_A.
when select count from T_A we got more records,70% more! compared with that we use "hadoop fs -text /xxxxx |wc -l" to check the file.
any clue for this? how could it happened?
the large .deflate file was due to imperfect processing , when we fixed it and get files less than 64M. the above problem did not come up. But since it is not guaranteed that a larger file would not show up again. is there any way to avoid this subject ?
cheers!
eye