Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

WIP: huggingface tokenizer and Neural LM training pipeline. #139

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Commits on Mar 25, 2021

  1. hugginface tokenizer and Neural LM training pipeline.

    This commit is mainly about hugginface tokenizer and
    a draft transformer/RNN based LM training pipeline.
    glynpu committed Mar 25, 2021
    Configuration menu
    Copy the full SHA
    f038e60 View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2021

  1. draft of class LMDataset

    glynpu committed Mar 29, 2021
    Configuration menu
    Copy the full SHA
    e9482d2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    135bfdb View commit details
    Browse the repository at this point in the history

Commits on Mar 30, 2021

  1. collate function of NNLM

    glynpu committed Mar 30, 2021
    Configuration menu
    Copy the full SHA
    88e0d49 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    27b1863 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #2 from csukuangfj/fangjun-rnnlm

    add scripts to process word piece lexicons.
    glynpu authored Mar 30, 2021
    Configuration menu
    Copy the full SHA
    212b79b View commit details
    Browse the repository at this point in the history
  4. trainer

    glynpu committed Mar 30, 2021
    Configuration menu
    Copy the full SHA
    47bf358 View commit details
    Browse the repository at this point in the history
  5. generate lexicon

    glynpu committed Mar 30, 2021
    Configuration menu
    Copy the full SHA
    d8aaabd View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c44f99d View commit details
    Browse the repository at this point in the history
  7. remove shuf/comm commands

    glynpu committed Mar 30, 2021
    Configuration menu
    Copy the full SHA
    b13954d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    775d477 View commit details
    Browse the repository at this point in the history
  9. Merge pull request #1 from glynpu/lyg_dev

    Lyg dev
    glynpu authored Mar 30, 2021
    Configuration menu
    Copy the full SHA
    3b83338 View commit details
    Browse the repository at this point in the history
  10. remove unused file

    glynpu committed Mar 30, 2021
    Configuration menu
    Copy the full SHA
    d415ed0 View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2021

  1. add dependency and fix known bugs

    scripts to install tokenizers
    fix training bugs
    port online tokenization to offline tokenization
    load/save checkpoint
    glynpu committed Apr 1, 2021
    Configuration menu
    Copy the full SHA
    4937232 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2021

  1. fix various bugs

    with vocab_size=2000, epochs=50
    tokens ppl of train: around 80
               of dev: 119
    glynpu committed Apr 2, 2021
    Configuration menu
    Copy the full SHA
    61863db View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d4dccae View commit details
    Browse the repository at this point in the history

Commits on Apr 3, 2021

  1. add results.md

    glynpu committed Apr 3, 2021
    Configuration menu
    Copy the full SHA
    a4d5f1b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    53e2d1e View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2021

  1. support yaml configuration

    glynpu committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    b226a3a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    89ece61 View commit details
    Browse the repository at this point in the history
  3. fix reviews

    glynpu committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    c3f8811 View commit details
    Browse the repository at this point in the history
  4. fixed reviews

    glynpu committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    d1b803b View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2021

  1. Configuration menu
    Copy the full SHA
    c45d31f View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2021

  1. Configuration menu
    Copy the full SHA
    1d38c21 View commit details
    Browse the repository at this point in the history

Commits on Apr 20, 2021

  1. Configuration menu
    Copy the full SHA
    f6914cd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d847b28 View commit details
    Browse the repository at this point in the history
  3. use Noam optimizer

    glynpu committed Apr 20, 2021
    Configuration menu
    Copy the full SHA
    52300df View commit details
    Browse the repository at this point in the history
  4. add rescore scripts

    glynpu committed Apr 20, 2021
    Configuration menu
    Copy the full SHA
    e61a9d1 View commit details
    Browse the repository at this point in the history