diff --git a/docs/finn/developers.rst b/docs/finn/developers.rst
index 3b182b8db8..2a5e26959b 100644
--- a/docs/finn/developers.rst
+++ b/docs/finn/developers.rst
@@ -2,8 +2,6 @@
 Developer documentation
 ***********************
 
-.. note:: **This page is under construction.**
-
 This page is intended to serve as a starting point for new FINN developers.
 Power users may also find this information useful.
 
diff --git a/docs/finn/getting_started.rst b/docs/finn/getting_started.rst
index eae61b1a55..217f982702 100644
--- a/docs/finn/getting_started.rst
+++ b/docs/finn/getting_started.rst
@@ -125,7 +125,7 @@ General FINN Docker tips
 
 Supported FPGA Hardware
 =======================
-**Vivado IPI support for any Xilinx FPGA:** FINN generates a Vivado IP Integrator (IPI) design from the neural network with AXI stream (FIFO) in-o>
+**Vivado IPI support for any Xilinx FPGA:** FINN generates a Vivado IP Integrator (IPI) design from the neural network with AXI stream (FIFO) in-out interfaces, which can be integrated onto any Xilinx-AMD FPGA as part of a larger system. It’s up to you to take the FINN-generated accelerator (what we call “stitched IP” in the tutorials), wire it up to your FPGA design and send/receive neural network data to/from the accelerator.
 
 **Shell-integrated accelerator + driver:** For quick deployment, we target boards supported by  `PYNQ <http://www.pynq.io/>`_ . For these platforms, we can build a full bitfile including DMAs to move data into and out of the FINN-generated accelerator, as well as a Python driver to launch the accelerator. We support the Pynq-Z1, Pynq-Z2, Kria SOM, Ultra96, ZCU102 and ZCU104 boards, as well as Alveo cards.
 
diff --git a/docs/finn/hw_build.rst b/docs/finn/hw_build.rst
index 9e34edc9d1..39c39eb7df 100644
--- a/docs/finn/hw_build.rst
+++ b/docs/finn/hw_build.rst
@@ -87,8 +87,4 @@ transformation for Zynq, and the `VitisLink` transformation for Alveo.
 Deployment
 ==========
 
-
-Deployment
------------
-
 The bitfile and the driver file(s) can be copied to the PYNQ board and be executed there. For more information see the description in the `end2end_example <https://github.com/Xilinx/finn/tree/main/notebooks/end2end_example>`_ Jupyter notebooks.
diff --git a/docs/finn/img/mem_mode.png b/docs/finn/img/mem_mode.png
index 27783c5f32..451561c54b 100755
Binary files a/docs/finn/img/mem_mode.png and b/docs/finn/img/mem_mode.png differ
diff --git a/docs/finn/internals.rst b/docs/finn/internals.rst
index 825fafb0b6..0fd6c42350 100644
--- a/docs/finn/internals.rst
+++ b/docs/finn/internals.rst
@@ -181,7 +181,7 @@ Disadvantages:
 
 Internal_decoupled mode
 ------------------------
-In *internal_decoupled* mode a different variant of the MVAU with three ports is used. Besides the input and output streams, which are fed into the circuit via Verilog FIFOs, there is another input, which is used to stream the weights. For this the `streaming MVAU <https://github.com/Xilinx/finn-hlslib/blob/master/mvau.hpp#L214>`_ from the finn-hls library is used. To make the streaming possible a Verilog weight streamer component accesses the weight memory and sends the values via another FIFO to the MVAU. This component can be found in the `finn-rtllib <https://github.com/Xilinx/finn/tree/dev/finn-rtllib>`_ under the name *memstream.v*. For the IP block generation this component, the IP block resulting from the synthesis of the HLS code of the streaming MVAU and a FIFO for the weight stream are combined in a verilog wrapper. The weight values are saved in .dat files and stored in the weight memory from which the weight streamer reads. The resulting verilog component, which is named after the name of the node and has the suffix "_memstream.v", exposes only two ports to the outside, the data input and output. It therefore behaves externally in the same way as the MVAU in *internal_embedded* mode.
+In *internal_decoupled* mode a different variant of the MVAU with three ports is used. Besides the input and output streams, which are fed into the circuit via Verilog FIFOs, there is another input, which is used to stream the weights. For this the `streaming MVAU <https://github.com/Xilinx/finn-hlslib/blob/master/mvau.hpp#L214>`_ from the finn-hls library is used. To make the streaming possible a Verilog weight streamer component accesses the weight memory and sends the values via another FIFO to the MVAU. This component can be found in the `finn-rtllib <https://github.com/Xilinx/finn/tree/dev/finn-rtllib>`_ under the name *memstream.v*. For the IP block generation this component, the IP block resulting from the synthesis of the HLS code of the streaming MVAU and a FIFO for the weight stream are combined. The weight values are saved in .dat files and stored in the weight memory from which the weight streamer reads. The resulting verilog component, which is named after the name of the node and has the suffix "_memstream.v", exposes only two ports to the outside, the data input and output. It therefore behaves externally in the same way as the MVAU in *internal_embedded* mode.
 
 Advantages:
 
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 85bc1d0dcd..3a3730d2b9 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1,6 +1,6 @@
 brevitas@git+https://github.com/Xilinx/brevitas@master#egg=brevitas_examples
 dataclasses-json==0.5.7
-docutils==0.17.1
+docutils==0.19
 gspread==3.6.0
 importlib_resources
 IPython
@@ -9,7 +9,7 @@ netron
 pytest
 pyverilator@git+https://github.com/maltanar/pyverilator@master#egg=pyverilator
 qonnx@git+https://github.com/fastmachinelearning/qonnx@main#egg=qonnx
-sphinx_rtd_theme==0.5.0
+sphinx_rtd_theme==2.0.0
 torch
 torchvision
 tqdm
diff --git a/notebooks/advanced/4_advanced_builder_settings.ipynb b/notebooks/advanced/4_advanced_builder_settings.ipynb
index dccac6195d..5139377342 100644
--- a/notebooks/advanced/4_advanced_builder_settings.ipynb
+++ b/notebooks/advanced/4_advanced_builder_settings.ipynb
@@ -46,7 +46,7 @@
    "id": "5dbed63f",
    "metadata": {},
    "source": [
-    "## Introduction to the CNV-w2a2 network <a id=\"intro_cnv\"></a>\n",
+    "## Introduction to the CNV-w2a2 network <a id='intro_cnv'></a>\n",
     "\n",
     "The particular quantized neural network (QNN) we will be targeting in this notebook is referred to as CNV-w2a2 and it classifies 32x32 RGB images into one of ten CIFAR-10 classes. All weights and activations in this network are quantized to two bit, with the exception of the input (which is RGB with 8 bits per channel) and the final output (which is 32-bit numbers). It is similar to the convolutional neural network used in the [cnv_end2end_example](../end2end_example/bnn-pynq/cnv_end2end_example.ipynb) Jupyter notebook.\n",
     "\n",
@@ -116,7 +116,7 @@
    "id": "c764ed76",
    "metadata": {},
    "source": [
-    "## Quick recap, how to setup up default builder flow for resource estimations <a id=\"recap_builder\"></a>"
+    "## Quick recap, how to setup up default builder flow for resource estimations <a id='recap_builder'></a>"
    ]
   },
   {
@@ -305,7 +305,7 @@
    "id": "7e561a91",
    "metadata": {},
    "source": [
-    "## Build steps <a id=\"build_step\"></a>"
+    "## Build steps <a id='build_step'></a>"
    ]
   },
   {
@@ -369,7 +369,7 @@
    "id": "e9c2c97f",
    "metadata": {},
    "source": [
-    "### How to create a custom build step <a id=\"custom_step\"></a>"
+    "### How to create a custom build step <a id='custom_step'></a>"
    ]
   },
   {
@@ -643,7 +643,7 @@
    "id": "a6edf5c4-9213-45cd-834f-615c12685d9e",
    "metadata": {},
    "source": [
-    "## Specialize layers configuration json <a id=\"specialize_layers\"></a>"
+    "## Specialize layers configuration json <a id='specialize_layers'></a>"
    ]
   },
   {
@@ -675,7 +675,7 @@
    "id": "bc90b589-7a92-4996-9704-02736ac4e60e",
    "metadata": {},
    "source": [
-    "The builder flow step before `step_specialize_layers` generates a template json file to set the preferred implementation style per layer. We can copy it from one of the previous runs to this folder and manipulate it to pass it to a new build."
+    "The builder flow step before `step_create_dataflow_partition` generates a template json file to set the preferred implementation style per layer. We can copy it from one of the previous runs to this folder and manipulate it to pass it to a new build."
    ]
   },
   {
@@ -934,7 +934,7 @@
    "id": "5ffbadd1",
    "metadata": {},
    "source": [
-    "## Folding configuration json <a id=\"folding_config\"></a>"
+    "## Folding configuration json <a id='folding_config'></a>"
    ]
   },
   {
@@ -1270,7 +1270,7 @@
    "id": "4a675834",
    "metadata": {},
    "source": [
-    "## Additional builder arguments <a id=\"builder_arg\"></a>"
+    "## Additional builder arguments <a id='builder_arg'></a>"
    ]
   },
   {
@@ -1294,7 +1294,7 @@
    "id": "e0c167f4",
    "metadata": {},
    "source": [
-    "### Verification steps <a id=\"verify\"></a>"
+    "### Verification steps <a id='verify'></a>"
    ]
   },
   {
@@ -1505,7 +1505,7 @@
    "id": "4609f94d",
    "metadata": {},
    "source": [
-    "### Other builder arguments <a id=\"other_args\"></a>"
+    "### Other builder arguments <a id='other_args'></a>"
    ]
   },
   {
@@ -1610,7 +1610,7 @@
    "id": "3b98eb65",
    "metadata": {},
    "source": [
-    "### Example for additional builder arguments & bitfile generation <a id=\"example_args\"></a>"
+    "### Example for additional builder arguments & bitfile generation <a id='example_args'></a>"
    ]
   },
   {
diff --git a/src/finn/qnn-data/build_dataflow/build.py b/src/finn/qnn-data/build_dataflow/build.py
index 13d58d2c91..58d566a6e6 100644
--- a/src/finn/qnn-data/build_dataflow/build.py
+++ b/src/finn/qnn-data/build_dataflow/build.py
@@ -1,4 +1,5 @@
-# Copyright (c) 2020 Xilinx, Inc.
+# Copyright (C) 2020-2022 Xilinx, Inc.
+# Copyright (C) 2022-2024, Advanced Micro Devices, Inc.
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
diff --git a/src/finn/qnn-data/build_dataflow/dataflow_build_config.json b/src/finn/qnn-data/build_dataflow/dataflow_build_config.json
index a053c1a22f..8165055fd5 100644
--- a/src/finn/qnn-data/build_dataflow/dataflow_build_config.json
+++ b/src/finn/qnn-data/build_dataflow/dataflow_build_config.json
@@ -4,7 +4,8 @@
   "mvau_wwidth_max": 10000,
   "synth_clk_period_ns": 10.0,
   "board": "Pynq-Z1",
-  "standalone_thresholds": true,
+  "standalone_thresholds": false,
+  "folding_config_file": "folding_config.json",
   "shell_flow_type": "vivado_zynq",
   "verify_save_rtlsim_waveforms": true,
   "force_python_rtlsim": true,
diff --git a/src/finn/qnn-data/build_dataflow/folding_config.json b/src/finn/qnn-data/build_dataflow/folding_config.json
index 46f1d6236d..124876c3db 100644
--- a/src/finn/qnn-data/build_dataflow/folding_config.json
+++ b/src/finn/qnn-data/build_dataflow/folding_config.json
@@ -1,8 +1,7 @@
 {
   "Defaults": {},
-  "Thresholding_hls_0": {
-    "PE": 49,
-    "ram_style": "distributed"
+  "Thresholding_rtl_0": {
+    "PE": 49
   },
   "MVAU_hls_0": {
     "PE": 16,
diff --git a/src/finn/qnn-data/build_dataflow/specialize_layers_config.json b/src/finn/qnn-data/build_dataflow/specialize_layers_config.json
index c2a8bd4553..9224a72907 100644
--- a/src/finn/qnn-data/build_dataflow/specialize_layers_config.json
+++ b/src/finn/qnn-data/build_dataflow/specialize_layers_config.json
@@ -1,26 +1,17 @@
 {
   "Defaults": {},
   "Thresholding_0": {
-    "preferred_impl_style": "hls"
+    "preferred_impl_style": "rtl"
   },
   "MVAU_0": {
     "preferred_impl_style": "hls"
   },
-  "Thresholding_1": {
-    "preferred_impl_style": ""
-  },
   "MVAU_1": {
     "preferred_impl_style": ""
   },
-  "Thresholding_2": {
-    "preferred_impl_style": ""
-  },
   "MVAU_2": {
     "preferred_impl_style": ""
   },
-  "Thresholding_3": {
-    "preferred_impl_style": "rtl"
-  },
   "MVAU_3": {
     "preferred_impl_style": ""
   },
diff --git a/src/finn/qnn-data/test_ext_weights/tfc-w1a1-extw.json b/src/finn/qnn-data/test_ext_weights/tfc-w1a1-extw.json
index 498d329ba3..9fe22443dc 100644
--- a/src/finn/qnn-data/test_ext_weights/tfc-w1a1-extw.json
+++ b/src/finn/qnn-data/test_ext_weights/tfc-w1a1-extw.json
@@ -1,8 +1,7 @@
 {
     "Defaults": {},
-    "Thresholding_hls_0": {
-      "PE": 49,
-      "ram_style": "distributed"
+    "Thresholding_rtl_0": {
+      "PE": 49
     },
     "MVAU_hls_0": {
       "PE": 16,
diff --git a/tests/end2end/test_end2end_bnn_pynq.py b/tests/end2end/test_end2end_bnn_pynq.py
index 94134967fa..556ba1d187 100644
--- a/tests/end2end/test_end2end_bnn_pynq.py
+++ b/tests/end2end/test_end2end_bnn_pynq.py
@@ -135,7 +135,8 @@ def fold_tfc(model):
     inp_qnt_node = model.get_nodes_by_op_type("Thresholding_rtl")[0]
     inp_qnt = getCustomOp(inp_qnt_node)
     inp_qnt.set_nodeattr("PE", 49)
-    inp_qnt.set_nodeattr("runtime_writeable_weights", 1)
+    # TODO: update PYNQ driver to support runtime writeable weights for RTL Thresholding
+    # inp_qnt.set_nodeattr("runtime_writeable_weights", 1)
     return model
 
 
diff --git a/tests/end2end/test_ext_weights.py b/tests/end2end/test_ext_weights.py
index 2f5f136d3a..bac343bedf 100644
--- a/tests/end2end/test_ext_weights.py
+++ b/tests/end2end/test_ext_weights.py
@@ -1,4 +1,5 @@
-# Copyright (c) 2021, Xilinx
+# Copyright (C) 2021-2022, Xilinx, Inc.
+# Copyright (C) 2022-2024, Advanced Micro Devices, Inc.
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without