Basic ELI5 Set of Terms, Vocabulary, Tools Glossary for Machine Learning and LLM Technology
The list of terms will apply to AI artificial intelligence, ML machine learning, LLM large language models. We may also sprinkle in some computer terms and people as well.
The README file may also have top terms in alphabetical-ish order if available.
Apache Spark - https://spark.apache.org/ - multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Bert - https://en.wikipedia.org/wiki/BERT_(language_model) - BERT langauge model - Bidirectional Encoder Representations from Transformers by Google
BPE - https://en.wikipedia.org/wiki/Byte_pair_encoding - Byte Pair encoding.
Falcon, William - CEO Lighting AI, creator of pytorch lighting - https://github.com/williamFalcon https://github.com/Lightning-AI/pytorch-lightning
Lighting AI - Falcon, William - CEO Lighting AI, creator of pytorch lighting - https://github.com/williamFalcon https://github.com/Lightning-AI/pytorch-lightning
LangChain - LangChain is a framework for building applications on large language models.
Parquet - Apache Parquet File Format
RAG - Retrieval-Augmented Generation
Tiktoken - https://github.com/openai/tiktoken - popular library as a fast BPE tokeniser for use with OpenAI's models.