Document Search – cleaned a large corpus to convert unstructured text to structured tokens using SpaCy, implemented object oriented hash tables and webpages with Jinja templates to return results to the user.Git repo
Neural Network for recommender system – built and fine-tuned embedding space to represent users and items, designed neural network class in Pytorch with customized forward pass to include additional features in training loop. Generated additional training data with negative sampling.
Feature Importance – implemented permutation and drop column techniques to assess feature impact across multiple machine learning models.Project report
Distributed Computing – Implemented a recommender system using word embeddings and an ALS classifier. The dataset, containing 73 million rows, was warehoused in S3 with preprocessing in PySpark and results were saved in a MongoDB Atlas instance. Feature engineering and machine learning model training were performed in SparkML.
Experimentation and A/B Testing – Applied 2k factorial testing to optimize the user engagement time by changing four factors in the UI. [I can share the report on request]