Profili_Consumo.Rhistory

1.	# Calculation for each measurement (t) the period (number of months) that separates it from the previous one (t-1, obviously only starting from the second measurement in temporal order). Calculation for each measurement the total consumption of the period that separates it from the previous one (also in this case starting from the second measurement in temporal order) is included in the code


# Assuming your data frame is called dati_ut
library(dplyr)
library(tidyr)
library(lubridate)

# Convert timestamp column to Date format
dati_ut$data_misura <- as.Date(dati_ut$data_misura, format = "%Y-%m-%d")

# 1. Calculate the time span (number of months) between consecutive measurements
dati_ut <- dati_ut %>%
  arrange(data_misura) %>%
  mutate(time_span_months = c(NA, as.numeric(diff(data_misura))),
         consumption_diff = c(NA, diff(valore_misura)))

consumption_diff is estimated as the difference in valore_misura between consecutive measurements to ensure we're calculating the consumption changes correctly.

2)	Calculation of average monthly consumption for all the months included in the period from t-1 to t;


# 3. Calculate the average monthly consumption for each period and each id_utente_ges
average_monthly_consumption <- dati_ut %>%
  group_by(id_utente_ges, data_misura) %>%
  summarize(average_monthly_consumption = mean(consumption_diff, na.rm = TRUE))

3) Creation of observation for each month included in the period from t-1 to t for which the following is indicated: the average consumption calculated in step 3; the month of consumption; the year of consumption; the handler code (id_gestore); the user identification code (id_utente_ges). When creating observations for each month, we join with average_monthly_consumption to get the correct average consumption for each month. 
# 3. Create observations for each month in the period from t-1 to t
observation_data <- dati_ut %>%
  complete(data_misura = seq(min(data_misura), max(data_misura), by = "month")) %>%
  left_join(average_monthly_consumption, by = c("data_misura", "id_utente_ges"))

Once the monthly consumption database has been created for each water meter reading, the following variables contained in the user database are added to each of these records (using the id_user_ges variable as the linkage):
tipo_uso ( end user typology, 1 domestic resident) x ("x" coordinates - point that originated the interruption) y ( "y" coordinates - point that originated the interruption) epsg (coordinate reference system) id_rete (unique code of the network to which the meter refers) note (Record annotations) pro_com (code of province given by the concentration of the province code and the three-digit municipality code) commune (name of municipality) 


# Merge with other relevant data (assuming vol_ut contains additional information)
merged_data <- observation_data %>%
  left_join(
    vol_ut %>%
      select(
        id_utente_ges,
        tipo_uso,
        x,
        y,
        epsg,
        id_rete,
        note,
        pro_com,
        comune
      ),
    by = "id_utente_ges"
  )

# View the resulting data frame
View(merged_data)

In this code:

•	Finally, the resulting data is merged with additional relevant data from vol_ut.

month and month_year columns can be created from the data_misura column using the mutate() function from the dplyr package. This code adds two new columns, month and month_year, to the merged_data table. The month column contains the month number extracted from the data_misura column, and the month_year column contains the year and month in the format YYYY-MM.
library(dplyr)
library(lubridate)

merged_data <- merged_data %>%
  mutate(
    month = month(data_misura),
    month_year = format(data_misura, "%Y-%m")
  )


NaN values in the average_monthly_consumption column of the merged_data table can be replaced with 0 using the replace_na() function from the tidyr package. Here's how you can do it:
library(tidyr)

merged_data <- merged_data %>%
  replace_na(list(average_monthly_consumption = 0))

month and month_year columns can be created from the data_misura column using the mutate() function from the dplyr package for adding them to the merged_data table. The month column contains the month number extracted from the data_misura column, and the month_year column contains the year and month in the format YYYY-MM


library(dplyr)
library(lubridate)

merged_data <- merged_data %>%
  mutate(
    month = month(data_misura),
    month_year = format(data_misura, "%Y-%m")
  )

This code filters out rows where the average_monthly_consumption column has a value less than 0, effectively removing rows with negative values in that column.

library(dplyr)

# Filter rows with non-negative average_monthly_consumption
summarized_merged_data <- summarized_merged_data %>%
  filter(average_monthly_consumption >= 0)


The code, then, runs to perform the calculation of water consumption profiles (total values per month/year) classified by the municipality, by the handler type, and by type of user. Finally, the code is used to plot histograms in R studio corresponding to the classifications of the water consumption profiles. The steps are following:

a) This code summarizes all columns in merged_data, handling numeric columns with the sum() function and non-numeric columns by taking the first value encountered in each group. Code sums all columns for each unique value in the id_utente_ges column, handling both numeric and non-numeric columns. To sum all columns associated with the same id_utente_ges name, both numeric and non-numeric, group_by() function is followed by summarize_all(). The na.rm = TRUE argument ensures that any NA values are ignored during the summation process.


library(dplyr)

# Define a function to handle non-numeric columns
summarize_non_numeric <- function(x) {
    if(is.numeric(x)) {
        return(sum(x, na.rm = TRUE))
    } else {
        return(first(x))
    }
}

# Group by id_utente_ges and summarize all columns
summarized_merged_data <- merged_data %>%
    group_by(id_utente_ges) %>%
    summarize(across(everything(), summarize_non_numeric))


b) The group_by() and summarize() functions from the dplyr package are used to sum rows with the same value in the month_year column and comune column in the summarized_merged_data table. This code groups the data by month_year and comune, and then summarize numeric columns by summing while retaining the first value for non-numeric columns.


library(dplyr)

# Define a function to handle non-numeric columns
summarize_non_numeric <- function(x) {
    if(is.numeric(x)) {
        return(sum(x, na.rm = TRUE))
    } else {
        return(first(x))
    }
}

# Group by month_year and comune, then summarize numeric columns by summing
summarized_data <- summarized_merged_data %>%
  group_by(month_year, comune) %>%
  summarize(across(where(is.numeric), sum, na.rm = TRUE),
            across(where(is.character), ~first(.)))

# View the summarized data
View(summarized_data)

c) This code extract the average_monthly_consumption associated with each month_year for each comune in the dataset.

# Subset the summarized_data dataframe
result_sum <- summarized_data[641:688, ]

d) This code plots the average_monthly_consumption by month_year for each comune in the dataset.

# Plot the result_sum  dataframe
plot_comune <- ggplot(result_sum, aes(x = month_year, y = average_monthly_consumption, fill = comune)) +
  geom_bar(stat = "identity") +
  labs(title = "Water Consumption per Month/Year (Classified by Comune)",
       x = "Month/Year",
       y = "Total Consumption") +
  theme_minimal() +
  facet_wrap(~ comune, scales = "free_y", ncol = 2) +  # Split by comune, adjust ncol as needed
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels by 45 degrees

print(plot_comune)