[HADOOP-18038] "hdfs --daemon start" command may write invalid PID to file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.2.2
Fix Version/s: None
Component/s: scripts
Labels:
None

Description

Starting a daemon with hdfs --daemon start ... (and also yarn --daemon start ...) might result in writing invalid PID to PIDfile.

Scenario: run hdfs --daemon start namenode (or any other hadoop daemon).

Expected result: PID of running namenode java process gets written to PID file.

Actual result (non-deterministic): PID of exited bash process get written to PID file.

Root cause of the issue is a fact that both daemon launching bash functions - hadoop_start_daemon and hadoop_start_daemon_wrapper - are concurrently writing different PIDs to the same file, and only PID written by hadoop_start_daemon_wrapper is correct. Order of those writes is weakly synchronised (with hardcoded 5s timeout). Under specific circumstances (like heavy CPU load) this ordering might not be preserved resulting in invalid PID ending up in PIDfile.

Possible solution: It seems that it's unnecessary for hadoop_start_daemon to write to pidfile if it's being called from hadoop_start_daemon_wrapper - it should skip this step in this scenario.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mariusz Okulanis

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Dec/21 16:47

Updated:: 08/Dec/21 16:41