05-results.Rmd

```{r, echo=FALSE}
library(readxl)
library(tidyverse)
library(patchwork)
```

# Results

## Overview of Employment Statistic

Before we dive into the questions about the factors influencing the employment, it would be better that we have an overview of overall civilian labor force in order to understand the data following more clearly.

```{r, echo=FALSE, fig.width=10}
overall_workforce_change <- read_excel("clean_data/overall_workforce_change.xlsx")

g1 <- ggplot(overall_workforce_change, aes(x = Year)) +
  geom_line(aes(y = `Civilian noninstitutional population`, color="Civilian Noninstitutional Population")) +
  geom_line(aes(y = `Civilian labor force_Total`, color="Civilian Labor Force")) +
  geom_line(aes(y = `Civilian labor force_Employed_Total`, color="Civilian Labor Force Employed")) +
  geom_line(aes(y = `Civilian labor force_Unemployed_Number`, color="Civilian Labor Force Unemployed")) +
  ggtitle("Civilian Noninstitutional Population and Labor Force") +
  labs (x = "Year", y = "Thousands of Persons")
g1
```

Here, Civilian Noninstitutional Population is defined as persons 16 years of age and older residing in the 50 states and the District of Columbia, who are not inmates of institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces. Civilian Labor Force is defined as all persons in the civilian noninstitutional population classified as either employed or unemployed. The detailed definition of person's employment state can be found [here](https://webapps.dol.gov/dolfaq/go-dol-faq.asp?faqid=111&topicid=6).

As we can see, Civilian Noninstitutional Population keeps increasing over time, as same as Civilian Labor Force and Civilian Labor Force Employed.

```{r, echo=FALSE, fig.width=10}
g2 <- ggplot(overall_workforce_change, aes(x = Year)) +
  geom_line(aes(y = `Civilian labor force_Percent of population`, color="Civilian Labor Force")) +
  geom_line(aes(y = `Civilian labor force_Employed_Percent of population`, color="Civilian Labor Force Employed")) +
  geom_smooth(aes(y = `Civilian labor force_Percent of population`)) +
  geom_smooth(aes(y = `Civilian labor force_Employed_Percent of population`)) +
  ggtitle("Civilian Labor Force Percentage") +
  labs (x = "Year", y = "Percent")
g2
```

The percentage here is calculated by dividing population of Civilian Labor Force and Civilian Labor Force Employed with Civilian Noninstitutional Population. The first statistic also can be referred as Labor Force Participation Rate.

Both lines of percentage show a similar pattern. Despite of the increasing trend in overall population, the percentage of Civilian Labor Force and Civilian Labor Force Employed reach peak at 2000 and then decreased.

A detailed analysis of Labor Force Participation Rate can be find [here](https://www.investopedia.com/terms/p/participationrate.asp), which is not the topic we tend to discuss here. It should be noticed that Labor Force Participation Rate is influenced by social, demographic, and economic trends. Therefore, it's complicated to explain the change of this statistic over time.

```{r, echo=FALSE, fig.width=10}
g3 <- ggplot(overall_workforce_change, aes(x = Year, y = `Civilian labor force_Unemployed_Percent of labor force`)) +
  geom_line() +
  geom_smooth() +
  ggtitle("Unemployed Rate") +
  labs (x = "Year", y = "Unemployed Rate")
g3
```

The Unemployment Rate is defined by the ratio of unemployed to the civilian labor force expressed as a percent [i.e., 100 times (unemployed/labor force)]. Therefore, the Unemployment Rate indicates what degree of people who try but can not find a job.

As we can see, the Unemployment Rate changes rapidly over time and in 2020 it reached another peak again. Despite the constant change over time, the Unemployment Rate hovers around 6% in general.

## Discrimination in Employment

Now we want to find out whether it exists discrimination in employment in different industries in recent period.

### Sex Discrimination

First we examine the sex discrimination. 

```{r, echo=FALSE, fig.width=10, fig.height = 12}
men_vs_women_over_20_years <- read_excel("clean_data/question_01/men_vs_women_over_20_years.xlsx")
X2020 <- read_excel("clean_data/question_01/occupation_sex_race/detailed/2020.xlsx")

g4 <- ggplot(men_vs_women_over_20_years, aes(x = Year, y = Civilian_labor_force_Employed_Percent_of_population, color = Group)) +
  geom_line() +
  ggtitle("Civilian Labor Force Employed Percent of Population") +
  labs (x = "Year", y = "Percent")
g4_1 <- ggplot(men_vs_women_over_20_years, aes(x = Year, y = Civilian_labor_force_Percent_of_population, color = Group)) +
  geom_line() +
  ggtitle("Civilian Labor Force Percent of Population") +
  labs (x = "Year", y = "Percent")
g4_2 <- ggplot(men_vs_women_over_20_years, aes(x = Year, y = Civilian_noninstitutional_population, color = Group)) +
  geom_line() +
  ggtitle("Civilian Noninstitutional Population") +
  labs (x = "Year", y = "Thousands of Persons")
g4_3 <- ggplot(men_vs_women_over_20_years, aes(x = Year, y = Civilian_labor_force_Unemployed_Percent_of_labor_force, color = Group)) +
  geom_line() +
  ggtitle("Unemployed Rate") +
  labs (x = "Year", y = "Unemployed Rate")
g4_2 + g4_1 + g4 + g4_3 + plot_layout(ncol = 1)
```

From a whole review, we can see that women significantly had less Civilian Labor Force Percent and Labor Force Employed Percent of population, despite the increasing of Civilian Noninstitutional Population over time. It indicates that women are likely not to have a job or try to pursue a job compared with men, which shows discrimination in the overall employment. The graph also shows that the gap in Civilian Labor Force Percent between men and women has been narrowed in recent years, indicating the situation may be changing to a good way.

However, as the last graph suggests, the Unemployment Rate for men and women shows a very similar pattern. Actually, the Unemployment Rate for women always shows a little bit less than men. This shows that it's not more difficult for women to find a job if they want to in the overall picture of employment.

Next let's try to examine the sex discrimination in different occupation.

```{r, echo=FALSE, fig.width=10, fig.height=10}
rX2020 <-  X2020 %>% 
  filter(`Total Employed` > 1000) %>%
  arrange(desc(`Percent of total employed_women`))
g5 <- ggplot(rX2020, aes(x=fct_reorder(Occupation, `Percent of total employed_women`))) +
  geom_bar(aes(y=`Total Employed`), stat="identity", size=.1, color="black", alpha=.4) + 
  geom_point(aes(y=`Percent of total employed_women`/0.003), stat="identity", color="blue") +
  scale_y_continuous(
    # Features of the first axis
    name = "Thousands of Persons",
    # Add a second axis and specify its features
    sec.axis = sec_axis(~.*0.003, name="Percent")
  ) + 
  ggtitle("Total Employed and Women Percent (Sorted by Percent)") +
  xlab("Occupation") +
  coord_flip()
g5
```

Here we only display the occupations which have total employed population over 1MM, due to the large number of occupations. Data was collected in 2020, which shows the recent state of total employee and percent of women employee.

From the graph we can find that there is a strong sex preference in a lot of occupations. For example, secretary and administrative assistant have a very strong tendency to women, which have over 90% of women employee. While in occupations related to construction, the percent of women employee is pretty low.

Some discrimination in the graph may be explained reasonably. For example, construction laborers has a strong preference to men because the work requires strong physical labor and apparently it fits to men physically. Occupation of elementary and middle school teachers has a strong tendency to women, which may be resulted from the social common sense that women are more patient, considerate and obviously better at looking after children. However, such explains also indicate that the discrimination towards sex has deeply rooted at the culture and social common sense, which tell people what men should do and women should do.

Besides, there are some occupations with discrimination towards sex, which can not be explained easily, such as software developers, computer and mathematical occupations, which have a tendency to men; bookkeeping, accounting and auditing clerks have a strong tendency to women.

### Race Discrimination

Now we focus on the discrimination towards races.

```{r, echo=FALSE, fig.width=10, fig.height=10}
r2X2020 <-  X2020 %>% 
  filter(`Total Employed` > 1000) %>%
  arrange(desc(`Percent of total employed_Asian`))
g6 <- ggplot(rX2020, aes(x = fct_reorder(Occupation, `Percent of total employed_Asian`))) +
  geom_bar(aes(y=`Total Employed`), stat="identity", size=.1, alpha=.3) + 
  geom_point(aes(y = `Percent of total employed_Asian`/0.003, color="Asian")) +
  geom_point(aes(y = `Percent of total employed_White`/0.003, color="White")) +
  geom_point(aes(y = `Percent of total employed_Black or African American`/0.003, color="Black")) +
  geom_point(aes(y = `Percent of total employed_Hispanic or Latino`/0.003, color="Hispanic or Latino")) +
  scale_y_continuous(
    # Features of the first axis
    name = "Thousands of Persons",
    # Add a second axis and specify its features
    sec.axis = sec_axis(~.*0.003, name="Percent")
  ) + 
  ggtitle("Total Employed and Races Percent (Sorted by Asian Percent)") +
  labs (x = "Occupation", y = "Percent of Total Employed") +
  coord_flip()
g6
```

Again we only display the occupations which have total employed population over 1MM, due to the large number of occupations. Data was collected in 2020, which shows the recent state of total employee and percent of employee of different races.

We can find out that there is discrimination in different occupations, and the degree of it varies according to the occupation. Occupations related to computer science have a strong tendency to Asian compared with other occupations. If we read the graph carefully, we can find that Asian tend to have occupations with higher requirements of professional knowledge, especially related to field of science, engineering and finance.

We can also observe that percent of Asian employee has a negative correlation with percent of White and Hispanic or Latino employee, but doesn't seem to have a correlation with percent of Black or African American employee.

This phenomenon can be explained by saying that in the recent decades many Asian come to the U.S. to pursue a Master or PHD degree, who tend to stay in the U.S. and continue their career. Still, the discrimination exists.

## Change of Employment Situation

Now we want to answer a question: How does the employment situation change in the recent years? Is there any improvement about employment in different industries?

We can answer the first question according to the graph at first part. Civilian Noninstitutional Population keeps increasing over time, while the Labor Force Participation Rate reached peak at 2000 and then kept decreasing until now. The Unemployment Rate changes rapidly over time and in 2020 it reached another peak again. Despite of the constant change over time, the Unemployment Rate hovers around 6% in general.

The conclusions above suggest that on the whole, the employment situation doesn't improve, and may change rapidly due to external factors, such as COVID-19 and Great Recession.

Now we want to examine the employment situation in different industries in recent years.

```{r, echo=FALSE, fig.width=15, fig.height=10}
Total_unemployed_number <- read_excel("clean_data/question_02/Total_unemployed_number.xlsx")
Unemployment_rate_sex_occupation_ <- read_excel("clean_data/question_02/Unemployment_rate(sex_occupation).xlsx")

df1 <- Total_unemployed_number %>%
  pivot_longer(-c(Occupation, Group), names_to="Year", values_to="Population")
df1$Occupation <- substr(df1$Occupation, 1, nchar(df1$Occupation)-12)
ggplot(df1, aes(x=Year, y=Population, group=1, color="Red")) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  facet_wrap( ~ Occupation, scale = 'free', ncol=4) +
  theme_bw() +
  labs(title = "Total Unemployment Population in 2016-2020", y = "Thousands of Persons") +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))

```

The figure above displays the total unemployed population in recent 5 years in different occupations. Although different occupations have different degree of population, the trend looks consistent.

We can see that almost all occupations share a same pattern: the unemployed population decrease slowly from 2016 to 2019, and then rapidly increase in 2020. Occupation about arming, fishing and forestry seems to be an exception, which increased a little in 2019 but decreased in 2020.

The reason behind rapid increasing in 2020 is well known to everyone: COVID-19. It's good to see that before 2020 the unemployment population have an overall deceasing trend in most occupations, which suggests a good trend in the employment situation. 

```{r, echo=FALSE, fig.width=15, fig.height=10}
df2 <- Unemployment_rate_sex_occupation_ %>%
  pivot_longer(-c(Occupation, Group))

df2$Year <- substr(df2$name, 1, 4)
df2$Sex <- substr(df2$name, 6, nchar(df2$name))
df2 <- select(df2, -name)

df2 <- df2 %>%
  pivot_wider(names_from=Sex, values_from=value)

ggplot(df2, aes(x=Year, group=1))+
  geom_line(aes(y=Total, color="Total"), size = 1) +
  geom_point(aes(y=Total, color="Total"), size = 2) +
  geom_line(aes(y=Men, color="Men"), size = 1) +
  geom_point(aes(y=Men, color="Men"), size = 2) +
  geom_line(aes(y=Women, color="Women"), size = 1) +
  geom_point(aes(y=Women, color="Women"), size = 2) +
  facet_wrap( ~ Occupation, scale = 'free', ncol=4) +
  theme_bw() +
  labs(title = "Unemployment Rate in 2016-2020", y = "Percent") +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = .5))

```

The graph above display the Unemployed Rate of total, men and women.

The trend here is similar with the previous graph. Since we use the Unemployment Rate, we can now compare the employment situation between different occupations. For example, we can find that food preparation and serving related occupation suffered most in 2020 due to COVID-19 and people tend to stay at home instead of eating outside at restaurant. 

Besides, we can not find an obvious relation between Unemployment Rate of men and women, except they share a similar pattern due to external factors. According to the graph, the difference varies between different occupations.

## Stirctness of requirement in the industries and workers' type

### The strcitness based on industries

Background Knowledge:
Certifications are issued by a non-governmental certification body and convey that an individual has the knowledge or skill to perform a specific job.

A license is awarded by a government agency and conveys a legal authority to work in an occupation.

Licenses are harder to gain compare to the certificates.

Thus, to some extent, licenses are more strict than certificates.

```{r, echo=FALSE, fig.width=12, fig.height=8}

library(tidyverse)
library(readxl)
library(ggplot2)

df1 <- read_excel("data/clean_data/question_03/industry_license_or_certificate.xlsx", sheet = 1)
df2 <- read_excel("data/clean_data/question_03/wokers_category_license_or_certificate.xlsx", sheet = 1)

df_com <- 
  df1 %>% 
  pivot_longer(-c(Industry, Group))

df_com$Year <- substr(df_com$name, 1, 4)
df_com$Cat  <- substr(df_com$name, 6, nchar(df_com$name))

df_com %>% 
  filter(Cat == "with_a_certificate/license_total_percent") %>% 
ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Industry, scale = "free")+
  theme_bw()+
  labs(title = "With a certificate/license total percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
Based on the graphs above shows with a certificate/license total percent calculates the percentage of workers who has a certificate or licenses in the overall employed people in the industry, and here is divided in to several specific industries:

Based on the information, we divide the levels of the percentage(the strict level) as follows:
low: below 20%
medium: 20%-30%
high: above 30%

For Agriculture and related industries: 
It suddenly drop from 2016 to 2017, from 13% to 11.9%, while rise form 2017 to 2018 and keep going down from 2018 to 2020, from 12.6% to 12.1%. But its overall percentage is low compare to the other industry, and in the recent years, its require for a license or certificate is decreasing

For Construction industry: 
It almost keep going done from 2016 to 2020, more specifically, from 2018 to 2020, it has almost no changes. It shows the requirement to enter this industry is becoming less strict, but its strict level is medium.

For the Education and health services:
It goes done from 2016 to 2017, and keep almost the same from 2017 to 2019 while in 2020, it rises, from 46% to 47.4%. This means in the recent years, this industry is becoming more strict, and at the same time, this industry has high level of strictness.

For the Financial activities:
It keeps going done from 2016 to 2019 and rises in 2020, which returns to the value in 2017. Thus, although having some fluctuation, this industry has the sign of keeping the similar strict level. The strict level of this industry is medium based on its values.

For the Information industry:
It keeps going done from 2016 to 2019, and rise from 2019 to 2020 but the value is lower than in 2016. It actually become less strict but has the sign of becoming more strict. Its level of strictness is low.

For the Leisure and hospitality:
It keeps going done form 2016 to 2020, thus it would be reasonably to predict it would keep going done in the future years. Its strict level is low.

For the Manufacturing industry:
It decrease from 2016 to 2017 but keep going up from 2017 to 2020 with little fluctuation, thus we predict it would keep rising in the future years. The strict level is low.

For the Mining , quarrying, oil and gas extraction industry:
It drops a lot from 2016 to 2017 but rises from 2017 to 2018 and keeps the same from 2018 to 2019 then drops from 2019 to 2020. Thus, it would be reasonable to assume it would goes done in the future. Its struct level is medium.

For the Other services:
It rises a bit from 2016 to 2017 and keep dropping fro 2017 to 2019 and rises from 2019 to 2020 in a relatively large amount. Thus we would predict it would become more strict in the near future. The level of strictness is medium.

For the Professional and business services:
It drops from 2016 to 2017 and keeps fluctuating from 2017 to 2020. In this sense, it will have the sign of fluctuating in the near future. The strict level is medium.

For the Public administration industry:
It rises a bit from 2016 to 2017 but drops more from 2017 to 2018 and has small changes from 2018 to 2020. From this pattern, it is reasonable to assume it would keep total percentage as around 30% in the near future which is a medium value.

For the Retail Trade:
It keeps dropping from 2016 to 2018 and rises from 2019 to 2020 with speed slowing done. Thus, It is reasonably predict it would rise a bit in the near future.

For the Transportation and utilities:
It goes done from 2016 to 2018, dropping with a large amount and rises a little from 2018 to 2019 and drops a little from 2019 to 2020. So it is more likely to keep in this data range in the near future, around 21% to 22%. It strict level is medium.

For Wholesale trade:
It keeps going done from 2016 to 2018 and rise a lot from 2018 to 2019 and then drops a lot from 2019 to 2020. Thus, it is reasonably to predict it would goes done in the near future. Its level of strictness is low.

Based on the analysis:
The LOW strict industries are: Agriculture and related industries; Information; Leisure and hospitality; Manufacturing; Retail trade; Wholesale trade

The MEDIUM strict industries are: Construction; Financial activities; Mining, quarrying, and oil and gas extraction; Other Services; Professional and business services; Transportation and utilities; 

The HIGH strict industries are: Education and health services; Public administration; 

```{r, echo=FALSE, fig.width=12, fig.height=8}
df_com %>% 
  filter(Cat == "with_a_certificate_no_license_percent") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Industry, scale = "free")+
  theme_bw()+
  labs(title = "With a certificate no license percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
The above graphs is to show the percentage of having a certificate but no licenses:

Almost each industries keeps the same pattern as the overall patterns for license and certificate except Education and health services(only having certificate goes done from 2019 to 2020), Leisure and hospitality(only having certificate has rising from 2017 to 2018 instead of keeping dropping) and Retail trade(only having certificate drops from 2019 to 2020)

Besides, the percent of only having a certificate is low in all industries, all belows 5% which means more people would prefer actually to gain a license. 


```{r, echo=FALSE, fig.width=12, fig.height=8}
df_com %>% 
  filter(Cat == "with_a_license") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Industry, scale = "free")+
  theme_bw()+
  labs(title = "With a license percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
The above graphs is to show the percentage of having a license(a people have a license may also have a certificate):

All of the industries keep the same pattern which is reasonable since having a license weight more in overall having a certificate or license


```{r, echo=FALSE, fig.width=12, fig.height=8}
df_com %>% 
  filter(Cat == "without_a_license_percent") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Industry, scale = "free")+
  theme_bw()+
  labs(title = "Without a license percent percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
The above graphs shows the employed people who have not a licenses:

From the graphs and values, we could see that most people would not have license or certificate, the lowest industry, Education and health service, even above 50%. 

Considering the dividence of work in each industry, having a high value of license or certificate seems not reasonable but the patterns we evaluate above could still offer useful information for people to choose industries to work in.


```{r, echo=FALSE, fig.width=12, fig.height=8}
df_com %>% 
  filter(Cat != "Total_Employed") %>% 
ggplot(aes(Cat, value, fill = Cat))+
  geom_boxplot(width = .5)+
  geom_jitter(width = .1, alpha = .5)+
  coord_flip()+
  facet_wrap(~ Year)+
  labs(x = "", y = "Employed Percent by Industry Category")+
  theme_bw()+
  theme(legend.position = "none")
```
The above graphs shows the distribution in the industries:

After comparing the graphs in these 5 years, we could see that that difference is little. And we could conclude from here that, the minimum requirement of each industry would keep the similar in the near future, without big fluctuation. And when people consider to choose to work in some industry, considering the license or certificate percentage would be useful for planning the overall career goal. 

### The strictness based on wokers' type

```{r, echo=FALSE, fig.width=10, fig.height=6}
df_trans <- 
  df2 %>% 
  pivot_longer(-c(Workers_category, Group))

df_trans$Year <- substr(df_trans$name, 1, 4)
df_trans$Cat  <- substr(df_trans$name, 6, nchar(df_trans$name))

df_trans %>% 
  filter(Cat == "with_a_certificate/license_total_percent") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Workers_category, scale = "free")+
  theme_bw()+
  labs(title = "With a certificate/license total percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
From this point, we look at the data in a different angle, by looking at the workers categories, whether previate industries, Federal, State, Local or Self_employment:

From the above graphs, keep using the levels for strictness defined above:
The LOW strict level: None
The MEDIUM strict level: Federal; Private Industries; Self-employed workers, unincorporated; 
The HIGH strict level: Local; State

Thus, the Local and state have high license or certificate requirement.

Detailed analysis of the above graphs:

For the Federal:
It is in the medium level. It keeps going done from 2016 to 2020 except from 2018 to 2019 rising a little. Thus, we could predict it would keeps going done in the near future.

For the Local: 
It is in the high level. It drops from 2016 to 2018 while rising from 2018 to 2020, back to the similar level. Therefore, it has rising trend in the near future.

For the State:
It is in the high level. It drops from 2016 to 2019 and rise relatively large from 2019 to 2020. Thus, it has the rising trend in the near future.

For the Private industries:
It is in the medium level. It drops from 2016 to 2018 and rises from 2018 to 2020, even exceed the values in 2016. Therefore, it is reasonable to predict it would rise in the near future.

For the Self_employed workers, unincorporated:
It is in the medium level. It drops a lot from 2016 to 2017 while rises a bit in from 2017 to 2018. However, it drops again from 2018 to 2019 but rises a lot from 2019 to 2020, back to the similar values in 2016. Thues, it is reasonable to predict it would rise in the near future.

Based on the analysis above, the government jobs, federal, state and local would be more strict. Besides, four out of five has the pattern of rising beginning either in 2018 or 2019. Thus, it would be reasonable to expect rising in local , private insutries, self-employed and state. 


```{r, echo=FALSE, fig.width=10, fig.height=6}
df_trans %>% 
  filter(Cat == "with_a_certificate_no_license_percent") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Workers_category   , scale = "free")+
  theme_bw()+
  labs(title = "With a certificate no license percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
The above graphs shows the worker category with a certificate but no license:

Most of them have the similar pattern except the self-employment workers which rises at first then drop and then rise instead of drop, keep the same and rise.

And from these graphs, we could also see that the percentage of only have a certificate without a license is relatively low which could indicate that prople would prefer a license or a license would mainly useful but a certificate is not that useful


```{r, echo=FALSE, fig.width=10, fig.height=6}
df_trans %>% 
  filter(Cat == "with_a_license") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Workers_category   , scale = "free")+
  theme_bw()+
  labs(title = "With a license percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
The above graphs shows the percentage of having a license, but these people may also have a certificate:

All of these five have the similar pattern as the overall percentage graphs, which also indicates that having a license is dominent in having a license or certificate.


```{r, echo=FALSE, fig.width=10, fig.height=6}
df_trans %>% 
  filter(Cat == "without_a_license_percent") %>% 
  ggplot(aes(Year, value, group = Cat, color = Cat))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  facet_wrap( ~ Workers_category   , scale = "free")+
  theme_bw()+
  labs(title = "Without a license percent percent in 2016-2020", y = "Percent (%)")+
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5))
```
The above graphs shoes the percentage of not having a license or certificate:

From these graphs we could also see that the percentage of not having a license or certificate is still large, which is consistent with the distribution of workers in each industry.

```{r, echo=FALSE, fig.width=12, fig.height=8}
df_trans %>% 
  filter(Cat != "Total_Employed") %>% 
  ggplot(aes(Cat, value, fill = Cat))+
  geom_boxplot(width = .5)+
  geom_jitter(width = .1, alpha = .5)+
  coord_flip()+
  facet_wrap(~ Year)+
  labs(x = "", y = "Employed Percent by Industry Category")+
  theme_bw()+
  theme(legend.position = "none")
```
The above graphs shows the distribution of the workers' categories:

From these graphs, from 2016 to 2020, all keep the similar pattern, which could tell us that the requirment of license or certificate is not an accident and would keep the similar in the near future


```{r, echo=FALSE, fig.width=12, fig.height=8}
library(inlmisc)
color <- GetColors(scheme = "sunset", 10)


df_com %>% 
  filter(Cat != "Total_Employed") %>% 
  mutate(short = substr(Industry, 1, 4)) %>% 
  ggplot(aes(short, Cat, fill = value)) +
  geom_tile()+
  scale_fill_gradientn(colors = color, name = "Percent")+
  coord_cartesian(expand = F)+
  facet_wrap(~ Year)+
  labs(x = "Industry", y = "")+
  theme(legend.position = c(.8, .2),
        legend.direction = "horizontal",
        axis.text.x  = element_text(angle = 45, hjust = 1, vjust = 1))
```
The above graphs give a more fancy way to show the data of percentage of license or certificate in each industry, the darker red it is means more percentage, could be over 75%, the darker blue it is means less percentage, could be less than 25%


```{r, echo=FALSE, fig.width=12, fig.height=8}
df_trans %>% 
  filter(Cat != "Total_Employed") %>% 
  mutate(short = substr(Workers_category, 1, 4)) %>% 
  ggplot(aes(short, Cat, fill = value)) +
  geom_tile()+
  scale_fill_gradientn(colors = color, name = "Percent")+
  coord_cartesian(expand = F)+
  facet_wrap(~ Year)+
  labs(x = "Workers category", y = "")+
  theme(legend.position = c(.8, .2),
        # legend.direction = "horizontal",
        axis.text.x  = element_text(angle = 45, hjust = 1, vjust = 1))
```
Also the above graphs give a more fancy way to show the percentage of license or certificate based on the workers' category, the darker the red is means more percentage, the darker the blue is means less percentage.


Conclusion:

Based on the analysis above, some industries would become more strict in the near future based on their current level of strictness: Education and health services; Manufacturing; Other services; Retail Trade; and the government related work categories would expect to become more strict.