Tip - A Command to Estimate Kafka Topic Disk Usage

A small tip

If you’re using Kafka to manage your data, it’s important to keep track of the disk usage of your topics to ensure that you have enough storage space. One way to estimate the disk usage of Kafka topics is by using the following command:

du -scb /var/lib/kafka/data/* | sed 's/.\/var\/lib\/kafka\// /g' | awk -F'-' '{print $1}' | awk '{print $2, $1}' | awk '{arr[$1]+=$2} END {for (i in arr) {printf("%s\t%.2f\n"),i,arr[i]/1024/1024/1024.0}}' | sort -k 2 -g -r

This command uses several Unix utilities to estimate the disk usage of Kafka topics and sort them based on their total disk usage, from highest to lowest. Here’s what each part of the command does:

By using this command, you can get a better understanding of how much disk space your Kafka topics are using and take steps to manage your data storage more effectively.