Release v0.2.0 · keras-team/keras-hub

Summary

Documentation live on keras.io.
Added two tokenizers: ByteTokenizer and UnicodeCharacterTokenizer.
Added a Perplexity metric.
Added three layers TokenAndPositionEmbedding, MLMMaskGenerator and MLMHead.
Contributing guides and roadmap.

What's Changed

Add Byte Tokenizer by @abheesht17 in #80
Fixing rank 1 outputs for WordPieceTokenizer by @aflah02 in #92
Add tokenizer accessors to the base class by @mattdangerw in #89
Fix word piece attributes by @mattdangerw in #97
Small fix: change assertEquals to assertEqual by @chenmoneygithub in #103
Added a Learning Rate Schedule for the BERT Example by @Stealth-py in #96
Add Perplexity Metric by @abheesht17 in #68
Use the black profile for isort by @mattdangerw in #117
Update README with release information by @mattdangerw in #118
Add a class to generate LM masks by @chenmoneygithub in #61
Add docstring testing by @mattdangerw in #116
Fix broken docstring in MLMMaskGenerator by @chenmoneygithub in #121
Adding a UnicodeCharacterTokenizer by @aflah02 in #100
Added Class by @adhadse in #91
Fix bert example so it is runnable by @mattdangerw in #123
Fix the issue that MLMMaskGenerator does not work in graph mode by @chenmoneygithub in #131
Actually use layer norm epsilon in encoder/decoder by @mattdangerw in #133
Whitelisted formatting and lint check targets by @adhadse in #126
Updated CONTRIBUTING.md for setup of venv and standard pip install by @adhadse in #127
Fix mask propagation of transformer layers by @chenmoneygithub in #139
Fix masking for TokenAndPositionEmbedding by @mattdangerw in #140
Fixed no oov token error in vocab for WordPieceTokenizer by @adhadse in #136
Add a MLMHead layer by @mattdangerw in #132
Bump version for 0.2.0 dev release by @mattdangerw in #142
Added WSL setup text to CONTRIBUTING.md by @adhadse in #144
Add attribution for the BERT modeling code by @mattdangerw in #151
Remove preprocessing subdir by @mattdangerw in #150
Word piece arg change by @mattdangerw in #148
Rename max_length to sequence_length by @mattdangerw in #149
Don't accept a string dtype for unicode tokenizer by @mattdangerw in #147
Adding Utility to Detokenize as list of Strings to Tokenizer Base Class by @aflah02 in #124
Fixed Import Error by @aflah02 in #161
Added KerasTuner Hyper-Parameter Search for the BERT fine-tuning script. by @Stealth-py in #143
Docstring updates for upcoming doc publish by @mattdangerw in #146
version bump for 0.2.0.dev2 pre-release by @mattdangerw in #165
Added a vocabulary_size argument to UnicodeCharacterTokenizer by @aflah02 in #163
Simplified utility to preview a tfrecord by @mattdangerw in #168
Update BERT example's README with data downloading instructions by @chenmoneygithub in #169
Add a call to repeat during pretraining by @mattdangerw in #172
Add an integration test matching our quick start by @mattdangerw in #162
Modify README of bert example by @chenmoneygithub in #174
Fix the finetuning script's loss and metric config by @chenmoneygithub in #176
Minor improvements to the position embedding docs by @mattdangerw in #180
Update docs for upcoming 0.2.0 release by @mattdangerw in #158
Restore accidentally deleted line from README by @mattdangerw in #185
Bump version for 0.2.0 release by @mattdangerw in #186
Pre release fix by @mattdangerw in #187

New Contributors

@Stealth-py made their first contribution in #96
@adhadse made their first contribution in #91

Full Changelog: v0.1.1...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Summary

What's Changed

New Contributors

Contributors