You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the below snippet of the output of the program in the Looking at Data section, the program is referencing the fact that the Active_Growth_Period has been assigned a catch all category called other, among other references which are not present as the categorical data in this case has all received the "character" value as the value for the summary statistics
| You are doing so well!
|======================================================== | 64%
| After previewing the top and bottom of the data, you probably noticed lots of NAs, which are
| R's placeholders for missing values. Use summary(plants) to get a better feel for how each
| variable is distributed and how much of the dataset is missing.
> summary(plants)
Scientific_Name Duration Active_Growth_Period Foliage_Color pH_Min
Length:5166 Length:5166 Length:5166 Length:5166 Min. :3.000
Class :character Class :character Class :character Class :character 1st Qu.:4.500
Mode :character Mode :character Mode :character Mode :character Median :5.000
Mean :4.997
3rd Qu.:5.500
Max. :7.000
NA's :4327
pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
Min. : 5.100 Min. : 4.00 Min. : 16.00 Length:5166 Min. :-79.00
1st Qu.: 7.000 1st Qu.:16.75 1st Qu.: 55.00 Class :character 1st Qu.:-38.00
Median : 7.300 Median :28.00 Median : 60.00 Mode :character Median :-33.00
Mean : 7.344 Mean :25.57 Mean : 58.73 Mean :-22.53
3rd Qu.: 7.800 3rd Qu.:32.00 3rd Qu.: 60.00 3rd Qu.:-18.00
Max. :10.000 Max. :60.00 Max. :200.00 Max. : 52.00
NA's :4327 NA's :4338 NA's :4338 NA's :4328
| Keep up the great work!
|============================================================ | 68%
| summary() provides different output for each variable, depending on its class. For numeric data
| such as Precip_Min, summary() displays the minimum, 1st quartile, median, mean, 3rd quartile,
| and maximum. These values help us understand how the data are distributed.
...
|=============================================================== | 72%
| For categorical variables (called 'factor' variables in R), summary() displays the number of
| times each value (or 'level') occurs in the data. For example, each value of Scientific_Name
| only appears once, since it is unique to a specific plant. In contrast, the summary for
| Duration (also a factor variable) tells us that our dataset contains 3031 Perennial plants, 682
| Annual plants, etc.
...
|=================================================================== | 76%
| You can see that R truncated the summary for Active_Growth_Period by including a catch-all
| category called 'Other'. Since it is a categorical/factor variable, we can see how many times
| each value actually occurs in the data with table(plants$Active_Growth_Period).
The text was updated successfully, but these errors were encountered:
In the below snippet of the output of the program in the Looking at Data section, the program is referencing the fact that the Active_Growth_Period has been assigned a catch all category called other, among other references which are not present as the categorical data in this case has all received the "character" value as the value for the summary statistics
The text was updated successfully, but these errors were encountered: