This is the repo about Exploratory Data Analysis for employee salary in a company. Furthermore, the salary data breakdown is specifically made for Income Tax report preparation. Some features has been modified to protect the confidentials.
Language: R (code is written in Rmd file)
Libraries:
- tidyverse
- dplyr
- readxl
- scales
- descr
The dataset columns is in Indonesian, and I'm not changing column names. So here's the rough translation:
Masa Perolehan Awal
= Employee join monthMasa Perolehan Akhir
= Employee exit monthNIP
= Employee identification numberJenis Kelamin
= GenderStatus PTKP
= Marriage status: K = Married, TK = Not MarriedJumlah Tanggungan
= Amount of dependenceGaji Pokok dan Tunjangan Tetap
= Net Salary and Fixed AllowanceTunjangan lain (Variabel)
= Other AllowancesJKK & JKM & BPJS Kesehatan
= Health InsuranceTHR dan Bonus
= Bonus / Holiday AllowanceTunjangan PPh
= Tax AllowanceJumlah Penghasilan Bruto
= Gross Income
- Check whether any duplicate data based on NIP.
- Check Turnover rate based on the amount of employee out before end of year.
- Create new column
PTKP
based onStatus PTKP
andJumlah Tanggungan
group - Create new column
PTKP_to_Bruto
, to make conditional output: whether the Gross Income is over or under PTKP amount. - Single, Married, or Divorced? Check the marital status based on
Status PTKP
andJumlah Tanggungan
condition. - The Barchelors!!! Find the number for those who is still single, and make income over IDR 100million a year.
- Check correlation between gender and salary, using Dot Plot visualization.
- Lastly, check correlation between Gender and Marriage Status, using CrossTable analysis.
PTKP
is the maximum amount of income that will not be deducted for tax.- In actual, calculation for Income Tax report is more complicated. Here we did some simplification for the practice sake.