Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
How does this handle tables that are bucketed + sorted?
insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
insert into T values(3,4),(7,8) creates delta_3_3/bucket_1
the expectation for any reader would be to see some contiguous subset of (1,2),(3,4),(5,6),(7,8)
but this would require a special reader which I don't see
In particular it's not clear how SMB join can work
This looks like a general problem:
For plain Hive table, if you do 2 inserts, and the 1st one creates 00000_0, then 2nd one will create 00000_0_copy_1.
There is nothing merge these files at query time to produce a single sort order (like Acid reader in full acid tables)
It should at least throw in this case.
Current "CONCATENATE" doesn't support bucketed or sorted tables.
Attachments
Issue Links
- is blocked by
-
HIVE-17675 verify SMB join with multiple inserts
- Resolved