This project involves analyzing transaction data to understand sales trends over several years. It calculates annual sales, revenue up to specific product releases, and growth metrics. The analysis is performed using R, with a focus on data manipulation and aggregation tasks.
The project relies on several R packages for data manipulation, reading, and environmental management:
- dplyr
- tidyverse
- wdman
- netstat
- xml2
- purrr
- readr
- usethis
- dotenv
- here
- readxl
- stringr
These packages are checked for at the beginning of the script, and any missing packages are automatically installed and loaded.
The data is initially stored in CSV files named according to the year of the transactions they contain. The script sets the working directory, reads the transaction data from these files, and combines them into a single dataset.
Key steps in preparing the data include:
- Reading multiple CSV files for different years.
- Binding these files into a single dataset.
- Converting the
Date
column to the Date format. - Extracting the year from the
Date
column. - Cleaning and converting the
Gross.Sales
column from a character to a numeric format, ensuring all currency values are correctly formatted for analysis.
The analysis consists of several parts:
- Aggregate Gross Sales by Year: Calculating total sales for each year in the dataset.
- Revenue Calculation for Specific Periods:
- Revenue up to the release of "Ramses in Wonderland" and "The Lost Tar Heel".
- Room-over-Room growth calculations for these periods.
- Displaying Results: The results are formatted to display revenue and growth metrics clearly, with values rounded and presented in a user-friendly format.