sdm710-common: rootdir: Switch to lz4 for zram

5.4 kernel defaults to lzo-rle for zram. lzo-rle added run-length
encoding for lzo, which is beneficial to the use case of zram that
often has long runs of zero bytes. However, according to my testing
results, lzo-rle still has lower decompression throughput and higher
access latency compared to lz4. lzo-rle beats lz4 in compression
ratio, but most Xiaomi SM8350 devices have either 8GB or 12GB of RAM
where memory pressure hardly happens. Therefore, lz4's higher throughput
and lower latency is more preferable.

Now that I added LZ4_FAST_DEC_LOOP to the kernel lz4 module from
upstream lz4 1.9.x release, the decompression speed has increased
by another 10%, which leads further ahead lzo-rle.

Test setup:
- Disable swap on zram if it is enabled
  $ swapoff /dev/block/zram0
- If you want to change the comp_algorithm after zram has been
  initialized, reset zram first.
  $ echo 1 > /sys/block/zram0/reset
- Set the comp_algorithm. For example, for lz4
  $ echo lz4 > /sys/block/zram0/comp_algorithm
- Set up zram size
  $ echo 4294967296 > /sys/block/zram0/disksize
- Make the swap
  $ mkswap /dev/block/zram0
- Pre-fill zram with 1GB of real-word zram data dumped previously under
  memory pressure. For example,
  $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000

Random 4k read:
$ fio --readonly --name=randread --direct=1 --rw=randread --ioengine=psync \
  --randrepeat=0 --numjobs=4 --iodepth=1 --group_reporting=1 \
  --filename=/dev/block/zram0 --bs=4K --size=1000M

lzo-rle: read: IOPS=1547k, BW=6042MiB/s (6336MB/s)(4000MiB/662msec)
lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec)
lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec)

Sequencial 4k read:
$ fio --readonly --name=zram_seqread --direct=1 --rw=read --ioengine=psync \
  --numjobs=4 --iodepth=1 --group_reporting=1 --filename=/dev/block/zram0 \
  --bs=4K --size=1000M

lzo-rle: read: IOPS=1816k, BW=7092MiB/s (7437MB/s)(4000MiB/564msec)
lz4: read: IOPS=1907k, BW=7449MiB/s (7811MB/s)(4000MiB/537msec)
lz4 fast dec: read: IOPS=2004k, BW=7828MiB/s (8208MB/s)(4000MiB/511msec)

Compression ratio from /sys/block/zram0/mm_stat
lzo-rle: 1048576000 -> 81969152
lz4: 1048576000 -> 86200320

zstd was also considered. It offers really nice compression ratio,
1048576000 -> 59899904, but the throughput and latency are so much
worse than lz4 and lzo-rle. It is probably more suitable for traditional
desktop/server swap setup with less aggressive swappiness value. It can
also help alleviate high memory pressure situation by compressing the
RAM to smaller size.
This commit is contained in:
Chenyang Zhong 2023-06-11 16:42:59 +03:00 committed by CakesTwix
parent 14225579e3
commit 16ac982064
Signed by: CakesTwix
GPG key ID: 7B11051D5CE19825

View file

@ -341,9 +341,7 @@ function configure_zram_parameters() {
# For >4GB Non-Go device, size = 4GB
# And enable lz4 zram compression for Go targets.
if [ "$low_ram" == "true" ]; then
echo lz4 > /sys/block/zram0/comp_algorithm
fi
if [ -f /sys/block/zram0/disksize ]; then
if [ -f /sys/block/zram0/use_dedup ]; then