classify_populations

Performing PCA-based population inference, utilising PLINK for variant extraction and R for classification with 1000 genomes as a reference.

Folder Contents

run_population_classifier.sh – Main shell script that orchestrates the entire pipeline
prepare_pcs.sh – Script to prepare principal components
classify.R – R script to classify populations based on PCA results and save plots
random_forest_model.RData – Pre-trained random forest model for population inference
KGP_0.3.prune.in – Reference variants file
KGP_pca.acount – Reference allele frequency file
KGP_pca.eigenvec.allele – Reference PCA eigenvectors
KGP_pca.eigenval – Reference PCA eigenvalues

How to Run

Step 1: Unzip the folder
Unzip the folder to any directory on your system. The pipeline will run in the unzipped directory, so no additional configuration is required.

unzip run_classifier.zip

cd run_classifier

Step 2: Edit run_population_classifier.sh
First, edit the slurm details accordingly. Then, provide study name and link to input data.

The input data should be in Plink binary format (.bed, .bim, .fam). Provide name of study and the link to your input files (edit Line 21 and 22 of run_population_classifier.sh).

Step 3: Run the pipeline
Submit the pipeline using the run_population_classifier.sh script. (Ensure you have the necessary permissions to run the script: chmod +x run_population_classifier.sh )

sbatch run_population_classifier.sh or bash run_population_classifier.sh if not submitting to a scheduler or edit to suit your system's job scheduler

This will run the entire pipeline, starting with preparing PCs from your data and then classifying populations using the pre-trained model.

Output

Population group classifications will be saved in .tsv files (e.g. EUR.tsv, AFR.tsv).
Population plots will be saved as .png images (e.g. prob0.5.png).
All output files will be saved in the same directory as the scripts.

Contact

If you encounter any issues or have questions, feel free to contact Ritah via [email protected].

Best wishes!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
run_classifer		run_classifer
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

classify_populations

Folder Contents

How to Run

Output

Contact

About

Releases

Packages

Languages

License

ritah-nabunje/classify_populations

Folders and files

Latest commit

History

Repository files navigation

classify_populations

Folder Contents

How to Run

Output

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages