Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
kafka: 3.5.1
Java: openjdk version "20.0.1" 2023-04-18
OS: Ubuntu 22.04.3 LTS on WSL2/Windows 11
Description
kafka-server-stop script does not work if environment variable COLUMNS is set on Ubuntu.
Steps to reproduce:
kafka/zookeeper.properties
dataDir=/tmp/kafka-test-20230828-15217-1lop1tk/zookeeper clientPort=34461 maxClientCnxns=0 admin.enableServer=false
kafka/server.properties
broker.id=0 listeners=PLAINTEXT://:46161 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/tmp/kafka-test-20230828-15217-1lop1tk/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 log.retention.hours=168 log.retention.check.interval.ms=300000 zookeeper.connect=localhost:34461 zookeeper.connection.timeout.ms=18000 group.initial.rebalance.delay.ms=0
$ zookeeper-server-start kafka/zookeeper.properties >/dev/null 2>&1 & [1] 18593 $ kafka-server-start kafka/server.properties >/dev/null 2>&1 & [2] 18982 $ COLUMNS=10 kafka-server-stop # This is unexpected No kafka server to stop $ kafka-server-stop $ zookeeper-server-stop [2]+ Exit 143 kafka-server-start kafka/server.properties $ [1]+ Exit 143 zookeeper-server-start kafka/zookeeper.properties
In the third command, I specified COLUMNS environment variable. It caused kafka-server-stop script to fail finding kafka process.
Cause
kafka-server-stop script uses ps ax to find kafka process.
OSNAME=$(uname -s) if [[ "$OSNAME" == "OS/390" ]]; then (snip) elif [[ "$OSNAME" == "OS400" ]]; then (snip) else PIDS=$(ps ax | grep ' kafka\.Kafka ' | grep java | grep -v grep | awk '{print $1}') fi
On Ubuntu, ps ax truncates its output if environment variable COLUMNS exists.
(source code of ps command] shows that COLUMNS environment variable wins result of isatty)
$ ps ax | cat 19912 pts/0 Sl 0:03 /home/linuxbrew/.linuxbrew/opt/openjdk/libexec/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../logs/kafkaServer-gc.log:time,tags:filecount=10,filesize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../logs -Dlog4j.configuration=file:/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../config/log4j.properties -cp /home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../libs/activation-1.1.1.jar:(snip):/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../libs/zstd-jni-1.5.5-1.jar kafka.Kafka kafka/server.properties $ COLUMNS=10 ps ax | cat 19912 pts/0 Sl 0:05 /home/linux
I tested this on WSL2 on Windows and openjdk installed with Homebrew, but it should occur on any environment with procps-ng.
Problem
This caused CI failure in Homebrew project. (GitHub/Homebrew/homebrew-core#133887)
Homebrew's behavior that passes COLUMNS environment variable seems a bug. But, server-stop script is not expected to be affected by such an environment variable. So, this also seemed to be a bug for me.
Related issues
This problem, KAFKA-4931 and KAFKA-4110 can also be fixed by introducing ProcessID file. But the three problem have different cause and can be thought separately.