Skip to content

TigistW/ML_Assignments

Repository files navigation

ML_Assignments

AAU, AAiT, SiTE Machine Learning Assignments

Assignment -1 [7 pts]

1.1 Distinguish among the following learning methods: Highlight the critical differences and similarities; mention use cases for each; identify example problems for each.

  • a. Supervised learning
  • b. Unsupervised learning
  • c. Semi-supervised learning
  • d. Reinforcement learning

1.2 Consider the following learning set containing 48 observations of an unknown event with features X and Y.

  • a. Using python and relevant libraries, implement K-means clustering on these 48 observations with cluster numbers k=2, k=3, and k=4.
  • b. For all values of k, calculate centroids and print; plot the scatter plot of the clusters; centroids should also be plotted with a different color or shape.
  • c. Which value of k do you think is the best value? Provide your reason for your answer.

Capture1

Assignment -2 [10 pts]

2.1 Write a report on the differences and similarities of linear regression and logistic regression. Using python and appropriate libraries, implement logistic regression for any suitable classification problem. An appropriate public dataset can be obtained from data repositories like,

2.2 Consider Figure 2.1, which illustrates a learning set that contains two classes.How many leaf nodes are required for a decision tree to separate the two classes correctly? Draw the decision tree that is capable of separating the two classes correctly. You can call the triangle class “T” and the square class “S” for convenience.

Capture2

Assignment 3 [8 pts]

The Iris dataset is perhaps the best known database to be found in the pattern recognition literature, still being referenced. The data set contains 3 classes of 50 instances each (i.e., a total of 150 observations) where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. The dataset can be obtained from this URL and has the below tabulated description. https://archive.ics.uci.edu/ml/datasets/iris. Consider this dataset for questions 3.2 and 3.3. Capture3

3.1 Explain the curse of dimensionality. Mention as many possible remedies as possible with only brief explanation.

3.2 Reduce the dimensionality of the Iris dataset to 2-Dimensions using the Principal Component Analysis (PCA). To do this you need to implement the PCM techniqueusing python built-in libraries and functions.

  • a) Print the covariance matrix, the Eigen values, and the Eigen vector.
  • b) Use scatter plot to project the reduced data.

3.3 Reduce the dimensionality of the Iris dataset to 2-Dimensions using the SVD technique. To do this you need to implement the SVD technique using python built-in libraries and functions.

  • a) Print all component vectors U, Sigma and V
  • b) Use scatter plot to project the reduced data.
  • c) Compare and contrast your plots of 3.2(b) and 3.3(b) and make your saying about the two methods.

[Note: You can take screenshots for your report.]

Assignment 4 [5 pts]

This assignment is only for Cyber and Software Engineering Stream students and it is optional for Artificial Intelligence Stream students.

  1. Write a report about Big Data (Definition(s), Vs of big data, challenges, and remedies).
  2. Compare and contrasting Hadoop and Spark.

Submission: Reports should be compiled and mailed to your section/stream representatives. Representatives will in turn zip them in a folder and mail them to me. Names and IDs of students should be mentioned on the coverage.

Deadline: February-03-2023. After deadline submission is possible with penalty of 10% of the total value for each late day.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published