Perplexity AI (perplexity.ai) is an chat tool that uses foundational language models, such as GPT-4 from OpenAI, along with current information from the internet. It not only provides answers, but also references to the sources that contributed to those answers. This simple, yet powerful approach addresses the limitation of potentially outdated training data used to train the models. By returning the sources used to provide an answer, you can verify its accuracy. This combats the issue of language models generating incorrect answers.
This may sound like a major project and a serious undertaking, but modern tools have made it surprisingly easy.
The workflow can be described as followed:
- The user poses a question.
- A Google search is performed using the question.
- The top-k search results, or the most relevant webpages, are downloaded.
- Raw HTML data is transformed into a usable format by LangChain.
- All documents are split into 1,000 character chunks.
- Compute embeddings for each document chunk and store them in a vector store (chromadb).
- Build a prompt using the user's question from step 1 and all the scraped web data using LangChain.
- Query an OpenAI model to generate an answer.
- Identify the documents that contributed to the answer and return them as references.
If you have any questions, feel free to reach out to me on Twitter.
Prompt: Who were the main players in the race to complete the human genome? And what were their approaches?