Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign new project by llama index #414

Open
koushikkrishna2702 opened this issue Jun 20, 2024 · 2 comments
Open

Assign new project by llama index #414

koushikkrishna2702 opened this issue Jun 20, 2024 · 2 comments

Comments

@koushikkrishna2702
Copy link

#searchEngine
import os
import nltk
import string
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Sample documents (you would replace this with your actual document collection)

documents = [
"Machine learning is the study of computer algorithms that improve automatically through experience.",
"Natural language processing (NLP) is a field of AI concerned with the interaction between computers and humans.",
"Deep learning is a subset of machine learning in which artificial neural networks mimic the human brain.",
"Search engines use algorithms to retrieve documents in response to user queries.",
"Artificial intelligence (AI) is the simulation of human intelligence by machines.",
]

Preprocessing function

def preprocess_text(text):
# Tokenization
tokens = nltk.word_tokenize(text.lower())
# Remove punctuation and stopwords
tokens = [token for token in tokens if token not in string.punctuation]
return " ".join(tokens)

Preprocess each document

preprocessed_documents = [preprocess_text(doc) for doc in documents]

TF-IDF vectorization

vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(preprocessed_documents)

Function to perform search

def search(query, documents, tfidf_matrix, vectorizer, top_n=1):
# Preprocess query
query = preprocess_text(query)
# Transform query to TF-IDF vector
query_vector = vectorizer.transform([query])
# Calculate cosine similarity between query vector and document vectors
cosine_similarities = cosine_similarity(query_vector, tfidf_matrix).flatten()
# Get indices of top similar documents
top_document_indices = cosine_similarities.argsort()[-top_n:][::-1]
# Return top documents
top_documents = [(cosine_similarities[i], documents[i]) for i in top_document_indices]
return top_documents

Example usage

query = "machine learning algorithms"
top_results = search(query, documents, tfidf_matrix, vectorizer, top_n=2)

Print results

print(f"Top results for query '{query}':")
for score, result in top_results:
print(f"Score: {score:.2f}, Document: {result}")

@charann29
Copy link
Owner

This is good project and very small one using RAG and LLamaindex complete it today and raise a PR..

Resource :
https://www.youtube.com/watch?v=beH56W7rcOQ

Code :
https://github.com/msuliot/simple_ai

@charann29
Copy link
Owner

working on this , reply here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants