Skip to content

Commit

Permalink
update readme, file names, removing TF code, moving tests
Browse files Browse the repository at this point in the history
  • Loading branch information
thomwolf committed Nov 3, 2018
1 parent 3c24e4b commit f827600
Show file tree
Hide file tree
Showing 25 changed files with 385 additions and 4,594 deletions.
26 changes: 13 additions & 13 deletions Comparing TF and PT models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
"source": [
"# Comparing TensorFlow (original) and PyTorch models\n",
"\n",
"We use this small notebook to test the conversion of the model's weights and to make sure both the TensorFlow and PyTorch are coherent. In particular, we compare the weights of the last layer on a simple example (in `input.txt`).\n",
"You can use this small notebook to check the conversion of the model's weights from the TensorFlow model to the PyTorch model. In the following, we compare the weights of the last layer on a simple example (in `input.txt`) but both models returns all the hidden layers so you can check every stage of the model.\n",
"\n",
"To run this notebook, please make sure that your Python environment has both TensorFlow and PyTorch.\n",
"You should follow the instructions in the `README.md` and make sure that you have:\n",
"- the original TensorFlow implementation\n",
"- the `BERT-base, Uncased` model\n",
"- run the script `convert_tf_checkpoint_to_pytorch.py` to convert the weights to PyTorch\n",
"To run this notebook, follow these instructions:\n",
"- make sure that your Python environment has both TensorFlow and PyTorch installed,\n",
"- download the original TensorFlow implementation,\n",
"- download a pre-trained TensorFlow model as indicaded in the TensorFlow implementation readme,\n",
"- run the script `convert_tf_checkpoint_to_pytorch.py` as indicated in the `README` to convert the pre-trained TensorFlow model to PyTorch.\n",
"\n",
"Please modify the relative paths accordingly (at the beggining of Sections 1 and 2)."
"If needed change the relative paths indicated in this notebook (at the beggining of Sections 1 and 2) to point to the relevent models and code."
]
},
{
Expand All @@ -37,7 +37,7 @@
"bert_config_file = model_dir + \"bert_config.json\"\n",
"init_checkpoint = model_dir + \"bert_model.ckpt\"\n",
"\n",
"input_file = \"input.txt\"\n",
"input_file = \"./samples/input.txt\"\n",
"max_seq_length = 128"
]
},
Expand Down Expand Up @@ -296,8 +296,8 @@
},
"outputs": [],
"source": [
"import extract_features_pytorch\n",
"from extract_features_pytorch import *"
"import extract_features\n",
"from extract_features import *"
]
},
{
Expand Down Expand Up @@ -625,7 +625,7 @@
],
"source": [
"device = torch.device(\"cpu\")\n",
"model = extract_features_pytorch.BertModel(bert_config)\n",
"model = extract_features.BertModel(bert_config)\n",
"model.load_state_dict(torch.load(init_checkpoint_pt, map_location='cpu'))\n",
"model.to(device)"
]
Expand Down Expand Up @@ -1196,7 +1196,7 @@
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
Expand All @@ -1210,7 +1210,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.7"
},
"toc": {
"colors": {
Expand Down
40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,33 @@
# PyTorch implementation of Google AI's BERT


## Introduction

This is a PyTorch implementation of the [TensorFlow code](https://github.com/google-research/bert) released by Google AI with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).

It is op-for-op reimplementation that can load any pre-trained TensorFlow checkpoint in a PyTorch model (see below).

There are a few differences with the TensorFlow model:

- the PyTorch model has multi-GPU and distributed training capabilities (see below),
- there is not TPU support in the current stable version of PyTorch (0.4.1) and as a consequence, the pre-training script are not included in this repo. TPU support is supposed to be available in PyTorch v1.0 that will be released in the coming weeks. We will update the repository with TPU-adapted pre-training scripts when PyTorch will have TPU support. In the meantime, you can use the TensorFlow version to train a model on TPU and import the checkpoint using the following script.

## Converting a TensorFlow checkpoint (in particular Google's pre-trained models) to Pytorch

## Converting the TensorFlow pre-trained models to Pytorch
You can convert any TensorFlow checkpoint, and in particular the pre-trained weights released by GoogleAI, by using `convert_tf_checkpoint_to_pytorch.py`.

You can convert the pre-trained weights released by GoogleAI by calling the script `convert_tf_checkpoint_to_pytorch.py`.
It takes a TensorFlow checkpoint (`bert_model.ckpt`) containg the pre-trained weights and converts it to a `.bin` file readable for PyTorch.
This script takes as input a TensorFlow checkpoint (`bert_model.ckpt`) and converts it in a PyTorch dump as a `.bin` that can be imported using the usual `torch.load()` command.

TensorFlow pre-trained models can be found in the [original TensorFlow code](https://github.com/google-research/bert). We give an example with the `BERT-Base Uncased` model:
TensorFlow pre-trained models can be found in the [original TensorFlow code](https://github.com/google-research/bert). Here give an example with the `BERT-Base Uncased` model:

```shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export BERT_PYTORCH_DIR=/path/to/pytorch/bert/uncased_L-12_H-768_A-12

python convert_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path=$BERT_BASE_DIR/bert_model.ckpt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--pytorch_dump_path=$BERT_PYTORCH_DIR/pytorch_model.bin
--pytorch_dump_path=$BERT_BASE_DIR/pytorch_model.bin
```


## Fine-tuning with BERT: running the examples

We showcase the same examples as in the original implementation: fine-tuning on the MRPC classification corpus and the question answering dataset SQUAD.
Expand All @@ -40,7 +44,7 @@ Corpus (MRPC) corpus and runs in less than 10 minutes on a single K-80.
```shell
export GLUE_DIR=/path/to/glue

python run_classifier_pytorch.py \
python run_classifier.py \
--task_name MRPC \
--do_train \
--do_eval \
Expand All @@ -53,21 +57,21 @@ python run_classifier_pytorch.py \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output_pytorch/
--output_dir /tmp/mrpc_output/
```

The next example fine-tunes `BERT-Base` on the SQuAD question answering task.

The data for SQuAD can be downloaded with the following links and should be saved in a `$SQUAD_DIR` directory.

* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)


```shell
export SQUAD_DIR=/path/to/SQUAD

python run_squad_pytorch.py \
python run_squad.py \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_PYTORCH_DIR/pytorch_model.bin \
Expand All @@ -83,23 +87,27 @@ python run_squad_pytorch.py \
--output_dir=../debug_squad/
```


## Comparing TensorFlow and PyTorch models

We also include [a small Notebook](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/Comparing%20TF%20and%20PT%20models.ipynb) we used to verify that the conversion of the weights to PyTorch are consistent with the original TensorFlow weights.
Please follow the instructions in the Notebook to run it.


## Note on pre-training

The original TensorFlow code also release two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
As the authors notice, pre-training BERT is particularly expensive and requires TPU to run in a reasonable amout of time (see [here](https://github.com/google-research/bert#pre-training-with-bert)).

We have decided **not** to port these scripts for now and wait for the TPU support on PyTorch (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).


## Requirements

The main dependencies of this code are:

- PyTorch (>= 0.4.0)
- tqdm
- tqdm

To install the dependencies:

````bash
pip install -r ./requirements.txt
````
2 changes: 1 addition & 1 deletion convert_tf_checkpoint_to_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import torch
import numpy as np

from modeling_pytorch import BertConfig, BertModel
from modeling import BertConfig, BertModel

parser = argparse.ArgumentParser()

Expand Down
99 changes: 0 additions & 99 deletions convert_tf_checkpoint_to_pytorch_special_edition.py

This file was deleted.

2 changes: 1 addition & 1 deletion extract_features_pytorch.py → extract_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler

from modeling_pytorch import BertConfig, BertModel
from modeling import BertConfig, BertModel

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt = '%m/%d/%Y %H:%M:%S',
Expand Down
File renamed without changes.
Loading

0 comments on commit f827600

Please sign in to comment.