Skip to content
@dell-research-harvard

dell-research-harvard

Popular repositories Loading

  1. linktransformer linktransformer Public

    A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

    Python 115 10

  2. AmericanStories AmericanStories Public

    The official Github for the American Stories dataset as in {link}

    Python 114 8

  3. effocr effocr Public

    A model(ing framework) for sample efficient OCR

    Python 56 5

  4. HJDataset HJDataset Public

    A Large Dataset of Historical Japanese Documents with Complex Layouts

    Jupyter Notebook 32 4

  5. NEWS-COPY NEWS-COPY Public

    Noise-robust de-duplication at scale

    Python 18 1

  6. newswire newswire Public

    Python 8

Repositories

Showing 10 of 29 repositories
  • linktransformer Public

    A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

    dell-research-harvard/linktransformer’s past year of commit activity
    Python 115 GPL-3.0 10 2 1 Updated Feb 21, 2025
  • newswire Public
    dell-research-harvard/newswire’s past year of commit activity
    Python 8 0 0 0 Updated Aug 15, 2024
  • efficient_ocr Public

    Efficient OCR for Building a Diverse Digital History

    dell-research-harvard/efficient_ocr’s past year of commit activity
    Python 7 Apache-2.0 1 0 0 Updated Apr 12, 2024
  • newsdejavu Public

    Python package for News Deja Vu

    dell-research-harvard/newsdejavu’s past year of commit activity
    Python 4 MIT 0 0 0 Updated Apr 9, 2024
  • AmericanStories Public

    The official Github for the American Stories dataset as in {link}

    dell-research-harvard/AmericanStories’s past year of commit activity
    Python 114 8 7 0 Updated Mar 7, 2024
  • HomoglyphsCJKTraining Public

    Quantifying Character Similarity with Vision Transformers

    dell-research-harvard/HomoglyphsCJKTraining’s past year of commit activity
    Python 6 0 0 0 Updated Oct 27, 2023
  • HomoglyphsCJK Public

    An efficient and useful tool to fuzzy match Japanese, Korean, Simplified Chinese or Traditional Chinese words.

    dell-research-harvard/HomoglyphsCJK’s past year of commit activity
    Python 3 MIT 1 0 0 Updated Oct 13, 2023
  • Associating-Press Public

    Associating layout elements from newspapers into full articles

    dell-research-harvard/Associating-Press’s past year of commit activity
    1 0 0 0 Updated Sep 15, 2023
  • DPR Public Forked from facebookresearch/DPR

    Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

    dell-research-harvard/DPR’s past year of commit activity
    Python 1 313 0 0 Updated Aug 15, 2023
  • dell-research-harvard/linktransformer-readthedocs’s past year of commit activity
    Python 0 0 0 0 Updated Aug 6, 2023

Top languages

Loading…

Most used topics

Loading…