From dcde91850f4a471f0e67be160612e4121641889c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gy=C3=B6rgy=20M=C3=A1rk=20Kis?= <kis.gyorgy.mark@gmail.com>
Date: Mon, 13 Dec 2021 11:40:10 +0100
Subject: [PATCH] Update README.md

---
 README.md | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/README.md b/README.md
index 9bc1390..779f970 100644
--- a/README.md
+++ b/README.md
@@ -6,53 +6,53 @@ This code and approach was written and tested on a Hungarian media sentiment cor
 Instead of fine-tuning a BERT model, we extract contextual embeddings from the hidden layers and use those as classical inputs for ML approaches.
 
 ## Results
-The approach was benchmarked against embeddings from a non fine-tuned XLM-Roberta, Hilbert, fine-tuned XLM-Roberta and fine-tuned Hubert on the same corpus, and reached the following topline results (Roberta result in brackets): 8-way sentiment classification weighted F1: 0.65 [0.73], with a range of category-level F1s of 0.35-0.72 [0.51-0.79]; 3-way classification weighted F1: 0.77 [0.82], 0.58-0.82 [0.51-0.87]. The code was run in a Google Colab GPU-supported free notebook.
+The approach was benchmarked against embeddings from a non fine-tuned XLM-Roberta, Hilbert, fine-tuned XLM-Roberta and fine-tuned Hubert on the same corpus, and reached the following topline results (Roberta result in brackets): 8-way sentiment classification weighted F1: 0.62 [0.73], with a range of category-level F1s of 0.25-0.71 [0.51-0.79]. The code was run in a Google Colab GPU-supported free notebook.
 
 
-![image](https://user-images.githubusercontent.com/23291101/145717215-ad3a83c1-6db1-44ff-aca5-aa6eaa3c9ffb.png)
+![image](https://user-images.githubusercontent.com/23291101/145797730-2cd0a4bf-f730-4000-9bb0-c4d053a9438b.png)
 
 
 ### Topline results
 |                  |     Hubert    |     Roberta Fine Tuned    |      Hubert Fine Tuned     |
 |:----------------:|:-------------:|:-------------------------:|:--------------------------:|
-|     Global F1    |      0.61     |            0.73           |             0.71           |
+|     Global F1    |      0.62     |            0.73           |             0.71           |
 
 
 ### Weighted F1-scores
 |                   |     Hubert    |     Roberta Fine Tuned    |      Hubert Fine Tuned     |
 |-------------------|:-------------:|:-------------------------:|:--------------------------:|
-|     Anger         |      0.58     |            0.74           |             0.69           |
-|     Disgust       |      0.60     |            0.75           |             0.72           |
+|     Anger         |      0.61     |            0.74           |             0.69           |
+|     Disgust       |      0.61     |            0.75           |             0.72           |
 |     Fear          |      0.25     |            0.71           |             0.50           |
-|     Happiness     |      0.36     |            0.67           |             0.45           |
-|     Neutral       |      0.50     |            0.51           |             0.59           |
+|     Happiness     |      0.32     |            0.67           |             0.45           |
+|     Neutral       |      0.51     |            0.51           |             0.59           |
 |     Sad           |      0.61     |            0.73           |             0.74           |
-|     Successful    |      0.69     |            0.79           |             0.77           |
+|     Successful    |      0.71     |            0.79           |             0.77           |
 |     Trustful      |      0.66     |            0.74           |             0.74           |
 
 ### Precision
 |                   |     Hubert    |     Roberta Fine Tuned    |      Hubert Fine Tuned     |
 |-------------------|:-------------:|:-------------------------:|:--------------------------:|
-|     Anger         |      0.54     |            0.76           |             0.73           |
-|     Disgust       |      0.62     |            0.72           |             0.77           |
+|     Anger         |      0.56     |            0.76           |             0.73           |
+|     Disgust       |      0.64     |            0.72           |             0.77           |
 |     Fear          |      0.21     |            0.72           |             0.46           |
-|     Happiness     |      0.28     |            0.75           |             0.67           |
-|     Neutral       |      0.49     |            0.50           |             0.57           |
-|     Sad           |      0.63     |            0.73           |             0.71           |
-|     Successful    |      0.73     |            0.80           |             0.73           |
-|     Trustful      |      0.63     |            0.78           |             0.80           |
+|     Happiness     |      0.23     |            0.75           |             0.67           |
+|     Neutral       |      0.52     |            0.50           |             0.57           |
+|     Sad           |      0.62     |            0.73           |             0.71           |
+|     Successful    |      0.74     |            0.80           |             0.73           |
+|     Trustful      |      0.61     |            0.78           |             0.80           |
 
 ### Recall
 |                   |     Hubert    |     Roberta Fine Tuned    |      Hubert Fine Tuned     |
 |-------------------|:-------------:|:-------------------------:|:--------------------------:|
-|     Anger         |      0.64     |            0.71           |             0.66           |
-|     Disgust       |      0.58     |            0.79           |             0.67           |
-|     Fear          |      0.33     |            0.70           |             0.54           |
-|     Happiness     |      0.51     |            0.60           |             0.34           |
-|     Neutral       |      0.52     |            0.53           |             0.60           |
+|     Anger         |      0.67     |            0.71           |             0.66           |
+|     Disgust       |      0.59     |            0.79           |             0.67           |
+|     Fear          |      0.31     |            0.70           |             0.54           |
+|     Happiness     |      0.52     |            0.60           |             0.34           |
+|     Neutral       |      0.50     |            0.53           |             0.60           |
 |     Sad           |      0.59     |            0.72           |             0.78           |
-|     Successful    |      0.66     |            0.77           |             0.81           |
-|     Trustful      |      0.69     |            0.71           |             0.68           |
+|     Successful    |      0.68     |            0.77           |             0.81           |
+|     Trustful      |      0.70     |            0.71           |             0.68           |
 
 ## Usage
 The input is needed in a tsv file, containing two necessary columns: "text" for the text itself, and "topik" for the numeric category labels. The code provides a JSON file for the results compiled in a dictionary, and another one for the optimized parameters.
@@ -65,7 +65,7 @@ pandas, torch, transformers, numpy, json, sklearn, google.drive (optional)
 ## Parameters best practice
 1. Mean pooling is thought to be the most effective option for extracting contextual embeddings from hidden layers, but this is not a definitive conclusion.
 2. Even though there are at least 768 variables for the LR model, the default L2-regularization of sklearn seems to properly take care of this. Previously several dimension reduction techniques were applied and experimented with but none helped with the classification.
-3. When using grid search for the LR-model, so far a high number of iterations (such as 8000), liblinear solver with L2-regularization, and a relatively narrow band of possible tolerance and C-values (at most 10x change between lower and upper limits) were found to be the most effective.
+3. When using grid search for the LR-model, so far a high number of iterations (such as 6-8000), liblinear solver with L2-regularization, and a relatively narrow band of possible tolerance and C-values (at most 10x change between lower and upper limits) were found to be the most effective.
 4. Even though k = 3 is the default for the cross-validation in the script, it can be increased to 5. Further than that possibly increases computing requirements tremendously while not providing notable improvements. The CV-loop runs 3 times by default, this can be changed. As values do not seem to vary much, anything above 9 runs seems unnecessary.