From 3e2d21cd6126ce1915448b7f10136892121b2a05 Mon Sep 17 00:00:00 2001
From: Fardin <60337534+FardinHash@users.noreply.github.com>
Date: Thu, 29 Sep 2022 02:22:17 +0600
Subject: [PATCH] Update README.md

---
 README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index da8f792a0..0b633fdee 100644
--- a/README.md
+++ b/README.md
@@ -286,14 +286,16 @@ We then train a large model (12-layer to 24-layer Transformer) on a large corpus
 (Wikipedia + [BookCorpus](http://yknzhu.wixsite.com/mbweb)) for a long time (1M
 update steps), and that's BERT.
 
-Using BERT has two stages: *Pre-training* and *fine-tuning*.
+Using BERT has two stages: **Pre-training** and **fine-tuning**.
 
+## Pre-training
 **Pre-training** is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a
 one-time procedure for each language (current models are English-only, but
 multilingual models will be released in the near future). We are releasing a
 number of pre-trained models from the paper which were pre-trained at Google.
 Most NLP researchers will never need to pre-train their own model from scratch.
 
+## Fine-tuning
 **Fine-tuning** is inexpensive. All of the results in the paper can be
 replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU,
 starting from the exact same pre-trained model. SQuAD, for example, can be