[MAPREDUCE-6197] Cache MapOutputLocations in ShuffleHandler - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.9.0, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.9.0
Hadoop Flags:

Reviewed

Description

ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / index information) when it receives a message.
This should be caching map info across requests, so that the a scan of all directories is not required for each reducer fetching from the same map.

Also, the scan for each map output / index file is performed twice per mapId within a request. In populateHeaders - once in the call to getMapOutputInfo, and then directly in the method.

For an invocation where we do end up with more than 1000 (default) mapIds in a single call, and don't cache them in the map - the path constructed for such entries will be invalid. This is highly unlikely to be the case though, until there's proper caching.

MapOutputInfo info = mapOutputInfoMap.get(mapId);
          if (info == null) {
            info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user);
          }

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6197.patch
13/Jun/16 22:35
15 kB
Junping Du

Issue Links

relates to

MAPREDUCE-7237 Supports config the shuffle's path cache related parameters

Resolved

Activity

People

Assignee:: Junping Du

Reporter:: Siddharth Seth

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 16/Dec/14 01:07

Updated:: 29/Aug/19 09:25

Resolved:: 21/Jun/16 21:26