Supervised and Unsupervised learning projects
The insurance company notices decline in house insurance renewal application. Data is available for important statistics that describe the are where the house insurance applicants are recorded. The purpose of this R code is to fit the data using linear regression model. Data: The data of 50 sample size has seven predictors Flood, Minority Population, Fire Report (%), Crime Rate (%), House Age, Income ($k). Declination (%) is the response variable. Flood is the discrete variable with three levels son, 1: unlikely to have flood; 2: occasionally have flood; 3: very likely to have flood. Remining six regressors are continuous variables.
Breast Cancer Research center Researcher computed 30 features using the digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)
The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius (RediusM), field 13 is Radius SE (RediusSE), field 23 is Worst Radius (RediusW). All feature values are recoded with four significant digits.
Constructed the prediction model for the diagnosis of B = benign, M = malignant.