student_profile_classifier beta

Project Aim

To enable prospective graduate students to evaluate their profiles.

Description

In every graduate admission cycle students are expected to furnish a Statment of Purpose (SoP) as part of the application. This project aims to help students self-evaluate their SoPs for the particular program they are applying to.

Method

Data Collection: Samples of admitted and rejected SoPs are collected and collated continuously in the pos_samples.txt and neg_samples.txt files, repsectively.
Data Clean-up:
- Natural Language Processing: Each profile from the above files is tokenized and tagged using nltk to extract the important words (nouns, verbs).
  For example, student profile: I have always enjoyed science. I studied computer science at XYZ University. Tokenized and tagged profile: [('have', 'VBP'), ('enjoyed', 'VBN'), ('science', 'NN'), ('studied', 'VBD'), ('computer', 'NN'), ('science', 'NN'), ('XYZ', 'NNP'), ('University', 'NNP')]
- Lemmatize: The extracted words are lemmatized so that each student profile can be compared for the presence or absence of these words.
  Same example: ['have', 'enjoy', 'science', 'study', 'computer', 'science', 'xyz', 'university']
- Uniquifying and sorting: The reduced profile is further simplified by removing duplicate occurences of lemmatized words.
  Same example: ['computer', 'enjoy', 'have', 'science', 'study', 'university', 'xyz']
- Storing: Each sorted and simplified student profile is stored in clean_file.txt.
Clean Data to Binary Vectors:
- Vector Definition: Simplified profiles are combined as follows to obtain vectors.
  Simplified Profile 1: ['computer', 'enjoy', 'have', 'science', 'study', 'university', 'xyz']
  Simplified Profile 2: ['abc', 'aim', 'become', 'computer', 'learning', 'machine', 'scientist', 'study', 'university']
  Combined + uniquified: ['abc', 'aim', 'become', 'computer', 'enjoy', 'have', 'learning', 'machine', 'science', 'scientist', 'study', 'university', 'xyz'] <-- has length 13
  Assuming combined + uniquified as a vector: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] <-- also has length 13
  From this,
  Simplified Profile 1 vector: [0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1] <-- based on the words present in Simplified Profile 1
  Simplified Profile 2 vector: [1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0]
- Storing: The vectors are stored in vector_file.txt.
Binary Vectors to Labelled Data: Each line in vector_file.txt which contains a binary vector of profiles is converted into labelled data matrix for a neural network. The labelled data contains both input features and labels.
Neural Network:
- Random Selection: Training (90%), Cross-Validation (5%), and Test (5%) Samples are randomly selected.
- Neural Network Structure: Hidden layers and corresponding parameters are generated in tensorflow.
- Cross-Validation: Tensorflow model is trained to evaluate cross-validation error.
- Testing: Tensorflow model is trained to evaluate test error.

User Profile

Once the neural network is satisfactorily trained, the user profile is added to profile.txt. This file under goes the steps mentioned in Method, above and the probability of it being a positive profile is returned.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
clean_file.txt		clean_file.txt
neg_samples.txt		neg_samples.txt
pos_samples.txt		pos_samples.txt
profile.txt		profile.txt
profile_classifier.py		profile_classifier.py
test.txt		test.txt
test_vectors.txt		test_vectors.txt
vector_file.txt		vector_file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

student_profile_classifier beta

Project Aim

Description

Method

User Profile

About

Uh oh!

Releases

Packages

Languages

License

rkshthrmsh/student_profile_classifier

Folders and files

Latest commit

History

Repository files navigation

student_profile_classifier beta

Project Aim

Description

Method

User Profile

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages