Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.3.0, 2.0.0
-
None
Description
The eased out denominator has to detect duplicate row-stats from different attributes.
select account_id from customers c, customer_activation ca where c.customer_id = ca.customer_id and year(ca.dt) = year(c.dt) and month(ca.dt) = month(c.dt) and year(ca.dt) between year('2013-12-26') and year('2013-12-26')
private Long getEasedOutDenominator(List<Long> distinctVals) { // Exponential back-off for NDVs. // 1) Descending order sort of NDVs // 2) denominator = NDV1 * (NDV2 ^ (1/2)) * (NDV3 ^ (1/4))) * .... Collections.sort(distinctVals, Collections.reverseOrder()); long denom = distinctVals.get(0); for (int i = 1; i < distinctVals.size(); i++) { denom = (long) (denom * Math.pow(distinctVals.get(i), 1.0 / (1 << i))); } return denom; }
This gets [8007986, 821974390, 821974390], which is actually 3 columns 2 of which are derived from the same column.
Reduce Output Operator (RS_12) key expressions: _col0 (type: bigint), year(_col2) (type: int), month(_col2) (type: int) sort order: +++ Map-reduce partition columns: _col0 (type: bigint), year(_col2) (type: int), month(_col2) (type: int) value expressions: _col1 (type: bigint) Join Operator (JOIN_13) condition map: Inner Join 0 to 1 keys: 0 _col0 (type: bigint), year(_col1) (type: int), month(_col1) (type: int) 1 _col0 (type: bigint), year(_col2) (type: int), month(_col2) (type: int) outputColumnNames: _col3
So the eased out denominator is off by a factor of 30,000 or so, causing OOMs in map-joins.
Attachments
Attachments
Issue Links
- links to