From 7f90cb07afb67a8785e907a1db10421019a91661 Mon Sep 17 00:00:00 2001
From: Rossco99 <ross.dold@gmail.com>
Date: Wed, 2 Oct 2024 11:01:02 +0800
Subject: [PATCH] Creation of wax-optimise-disk-utilisation-zfs-dedup.md

---
 ...wax-optimise-disk-utilisation-zfs-dedup.md | 157 ++++++++++++++++++
 1 file changed, 157 insertions(+)
 create mode 100644 docs/operate/wax-infrastructure/wax-optimise-disk-utilisation-zfs-dedup.md

diff --git a/docs/operate/wax-infrastructure/wax-optimise-disk-utilisation-zfs-dedup.md b/docs/operate/wax-infrastructure/wax-optimise-disk-utilisation-zfs-dedup.md
new file mode 100644
index 0000000..650e27f
--- /dev/null
+++ b/docs/operate/wax-infrastructure/wax-optimise-disk-utilisation-zfs-dedup.md
@@ -0,0 +1,157 @@
+Running WAX Mainnet production nodes can be very resource intensive. Considering that many operators endeavour to provide Full nodes with a complete history of blocks, storage requirements can be a challenge as they are constantly expanding.
+
+This guide will walk through how to optimise disk utilisation utilising ZFS Deduplication across multiple WAX nodes.
+
+# Optimise Disk Utilisation with ZFS Deduplication
+
+For the most part WAX software is weighted to only use a single thread and won’t necessarily use all resources available. If your operational hardware has sufficient CPU, RAM and Storage it may be quite economical to run multiple WAX nodes on a single physical server. This can be accomplished using Virtual Machines, Containers or pure Baremetal with different service TCP ports.
+
+With multiple nodes running on the same server the first challenge is typically the amount of disk space required as all data is required to be duplicated. The blocks data is currently 4TB and state-history data is currently 11TB (October 2024), doubling this on a single server would require a substantial amount of disk.
+
+Thankfully this can be mitigated using ZFS Deduplication.
+
+## Zettabyte File System (ZFS)
+
+ZFS ([Zettabyte File System](https://docs.oracle.com/cd/E19253-01/819-5461/index.html)) is a highly scalable and advanced file system that was originally developed by Sun Microsystems (now owned by Oracle Corporation) for the Solaris operating system. It is designed to address the limitations of traditional file systems and provide features like data integrity, data protection, and high storage capacity.
+
+Due to its advanced features and robustness, ZFS has gained popularity in various environments, including data centres, file servers, and storage appliances. It has also been ported to other operating systems such as Linux and FreeBSD, providing a versatile and reliable storage solution.
+
+## ZFS Deduplication
+
+The deduplication feature offered by ZFS allows for the elimination of redundant data within ZFS pools/filesystems. In our case we have a considerable amount of duplicate data stored in the blocks and state-history folders, only a single copy of these files will be retained. The remaining instances will function as references to that original data copy. This approach significantly conserves disk space within your configured physical server’s ZFS pool.
+
+From a technical perspective, when you copy, move, or create new data within your ZFS pool/filesystem, ZFS divides it into smaller chunks and compares these chunks with existing ones stored. By identifying matches between the chunks, even if only parts of the data correspond, the deduplication feature effectively reduces the disk space consumption.
+
+In our use case the disk space consumption of the blocks and state-history folders with be halved when running two nodes on a single server.
+
+## Configuration
+
+This example will cover two WAX node instances on single baremetal server and similar to the  [Set Up a Solid WAX Mainnet Node Guide](https://developer.wax.io/operate/wax-infrastructure/wax-mainnet-node.html)  uses  **2 Discrete Disk Systems** on this server  in order to balance disk IO.
+
+**Disk 1** is the high speed enterprise grade SSD or NVMe and will be the OS disk used for the WAX software, all config and the state files. Each node instance is required to have it’s own state files, typically it’s easier to place all files for each node instance in a separate directory and run with different service TCP ports.
+
+**Disk 2**  is a SAS disk array of 4 x 4TB drives (please adjust disk size to an appropriate capacity for the current chain conditions) that will host a separate  `\blocks`  directory for each node instance.
+
+Disk 2 will run the  **ZFS File System** which will give us three main benefits. ZFS will enable us to use  **LZ4 compression** for space savings, disk IO will be improved with  **Adaptive Replacement Cache**  (ARC) and disk utilisation will be optimised with  **Deduplication**.
+
+Implement ZFS on Disk 2 with the below configuration:
+
+```
+#Install ZFS  
+$ sudo apt-get install zfsutils-linux  
+  
+#Locate the Disk 2 device names  
+$ lsblk  
+  
+#Create ZFS Pool called "datavolume" on located devices  
+$ sudo zpool create -f -o ashift=12 datavolume /dev/sde /dev/sdf /dev/sdg /dev/sdh  
+  
+#Enable LZ4 compression  
+$ sudo zfs set compression=lz4 datavolume  
+  
+#Disable ZFS access time Updates  
+$ sudo zfs set atime=off datavolume  
+  
+#Set ZFS Extended Attributes to System Attibute for Performance  
+$ sudo zfs set xattr=sa datavolume  
+  
+#Set ARC to only cache metadata  
+$ sudo zfs set primarycache=all datavolume  
+  
+#Enable ZFS Deduplication  
+$ sudo zfs set dedup=on datavolume  
+  
+#Set the mountpoint location to your preferred location  
+$ sudo zfs set mountpoint=/home/eosphere/datavolume datavolume  
+```
+
+## Verification
+
+Now that a 16TB pool has been created, copy or sync your  `/blocks`  over onto the  `/datavolume`  mountpoint being sure to use two separate folders for each node instance such as  `/datavolume/node1blocks`  and  `datavolume/node2blocks`  obviously referenced correctly in the nodeos  `config.ini`. ZFS Dedup will recognise the data duplication across the two directories in the datavolume pool.
+
+**Check LZ4 Compression:**
+
+```
+$ zfs get ratio  
+  
+NAME                             PROPERTY       VALUE  SOURCE  
+datavolume                       compressratio  1.25x  -
+```
+
+ZFS LZ4 compression works as expected with a healthy 1.25x on a nodeos  `blocks.log`.
+
+**Check Deduplication Performance:**
+
+```
+$ zpool list  
+  
+NAME        SIZE   ALLOC  FREE   CKPOINT   EXPANDSZ    FRAG    CAP  DEDUP    HEALTH  ALTROOT  
+datavolume  14.5T  2.57T  11.9T        -         -     5%      17%  2.00x    ONLINE  -
+```
+
+Deduplication works as advertised essentially deduplicating both node instances of 2.57TB LZ4 compressed blocks data ->  **DEDUP 2.00x**
+
+**Check Deduplication Memory Utilisation:**
+
+```
+$ zpool status -D datavolume  
+  
+  pool: datavolume  
+ state: ONLINE  
+  scan: scrub repaired 0B in 00:02:40 with 0 errors on Sun May 14 00:26:43 2023  
+config:  
+  
+ NAME               STATE     READ WRITE CKSUM  
+          datavolume       ONLINE       0     0     0  
+          sde              ONLINE       0     0     0  
+          sdf              ONLINE       0     0     0  
+          sdg              ONLINE       0     0     0  
+          sdh              ONLINE       0     0     0  
+errors: No known data errors  
+  
+ dedup: DDT entries 25353378, size 1.06K on disk, 350B in core  
+  
+bucket              allocated                       referenced            
+______   ______________________________   ______________________________  
+refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE  
+------   ------   -----   -----   -----   ------   -----   -----   -----  
+     1    2.47K    317M   24.7M   24.7M    2.47K    317M   24.7M   24.7M  
+     2    24.2M   3.02T   2.45T   2.45T    48.3M   6.04T   4.91T   4.91T  
+     4    6.75K    864M    544M    544M    27.0K   3.38G   2.13G   2.13G  
+     8        1    128K      4K      4K       10   1.25M     40K     40K  
+    16        5    640K     20K     20K      136     17M    544K    544K  
+    32       24      3M    924K    924K    1.01K    129M   38.8M   38.8M  
+    2K        2    256K      8K      8K    5.82K    745M   23.3M   23.3M  
+ Total    24.2M   3.02T   2.45T   2.45T    48.4M   6.05T   4.91T   4.91T
+```
+
+Memory utilisation can be ascertained through the equation below:
+
+DDT entries x Core / 1024²
+
+25353378*350 / (1024²) =  **8462MB RAM Used**
+
+**Check the disk IO of the ZFS pool:**
+
+```
+$ zpool iostat  
+              capacity     operations     bandwidth   
+pool        alloc   free   read  write   read  write  
+----------  -----  -----  -----  -----  -----  -----  
+datavolume  2.57T  11.9T      9     27  1.39M  2.39M  
+----------  -----  -----  -----  -----  -----  -----
+```
+
+The output above is both nodes running and in-sync with the network.
+
+While researching and testing for this guide there appears to be quite a bit of misinformation in regards ZFS Deduplication out there, Dedup is often disregarded due to being CPU, RAM and Disk IO intensive.
+
+In our experience it works very well to alleviate unnecessary disk usage of the  `/blocks`  directory especially in Virtual Machine environments on larger servers. The overhead appears to be quite manageable with the largest being RAM which is around 1GB / Disk TB, CPU and Disk IO were unaffected.
+
+It is also possible to use  [ZFS cloning](https://docs.oracle.com/cd/E19253-01/819-5461/gbcxz/index.html)  to essentially clone data without duplicating it on the disk. ZFS cloning is however a manual process unless scripted and requires a re-clone to be run every so often to reclaim duplicate data.
+
+---
+
+These **WAX Developer Technical Guides** are created using source material from the [EOSphere WAX Technical How To Series](https://medium.com/eosphere/wax-technical-how-to/home)
+
+Be sure to ask any questions in the  [EOSphere Telegram](https://t.me/eosphere_io)