forked from Jason2Brownlee/CleverAlgorithmsMachineLearning
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathresearch_process.txt
199 lines (146 loc) · 12.3 KB
/
research_process.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
Algorithm Research Process
It is important that each algorithm description is as complete and correct as possible to ensure the content is as useful as it can be. Toward this end, a systematic process is used to research each algorithm before preparing an algorithm description. This guide captures the systematic research process followed to research each algorithm for the book.
Principles:
* Research takes place first and is synthesized into an algorithm description second.
* Research takes place in a separate and less constrained file from the formal algorithm description.
* Each resource is touched only once.
* Print this document and use it as a checklist.
* Complete algorithm research incrementally or in batch.
* Complete algorithm research in 5-8 hours.
01. Open a new text file and save it in the project with a filename that includes the algorithm name.
For example, if researching the "Linear Regression" algorithm the file will be named "linear_regression.txt".
02. Under a heading "PRIOR KNOWLEDGE", list in point form all prior knowledge about the algorithm.
For example, list known problem types, parameters, parameter configurations and even specialty resources that come to mind. This prior knowledge may or may not be accurate and should be acknowledged to improve the objectiveness of the research.
03. Under the heading "NAMES", list in point form the name of the algorithm and all known and possible aliases and acronyms. Keep this list up to date as the research unfolds.
04. Under the heading "TAXONOMY" list algorithms that are related to the algorithm under research including extensions and parent methods. List the field of study to which this algorithm belongs and any specialty subfields of study that extend the method. Keep this list up to date as the research unfolds.
05. For each accessible physical book on Machine Learning, Data Mining, Artificial Intelligence or related fields:
05a. Write the title of the book as a heading.
05b. Search the table of contents and index for the algorithm name.
05c. Note in point form all salient details of the method described in the text. If there is no content on the algorithm, list N/A under the book title.
06. Summarize each relevant book accessible on Amazon Book search:
06a. Visit http://www.amazon.com
06b. Change the search category to "Books" and search for the algorithm name (repeat for any or all aliases as needed)
06c. Review each book in the search results on at least the first two pages (24 results), skipping any books whose content has already processed for this algorithm.
06c,1. Write the title of the book as a heading, amended with (AMAZON BOOK).
06c,2. Click on the book in the search results.
06c,3. Click on the cover to "look inside", if not skip and mark "cannot access content"
06c,4. Using the "Search within book" field on the left of screen, search for the algorithm name in quotes and review all mentions. Fall back to reviewing the "Table of Contents" and/or "Index" of this search fails.
07c,5. Note in point form all salient details of the method described in the text. If there is no content on the algorithm, list N/A under the book title.
07. Summarize each relevant book accessible on Google Book search:
07a. Visit http://books.google.com
07b. Search for the algorithm name (repeat for any or all aliases as needed)
07c. Review each book in the search results on at least the first two pages (40 results), skipping any books whose content has already processed for this algorithm.
07c,1. Write the title of the book as a heading, amended with (GOOGLE BOOK).
07c,2. Click on the book in the search results.
07c,3. The books contents should be shown, if not skip and mark "cannot access content"
07c,4. Using the "Search within book" field on the left of screen, search for the algorithm name.
07c,5. In the yellow bar above the book, click "View All" and reach all relevant mentions. Fall back to reviewing the "Table of Contents" and/or "Index" of this search fails.
07c,6. Note in point form all salient details of the method described in the text. If there is no content on the algorithm, list N/A under the book title.
08. Summarize each relevant resource accessible on Google Scholar search:
08a. Visit http://scholar.google.com
08b. Search for the algorithm name (repeat for any or all aliases as needed)
08c. Review each result in the search results on at least the first three pages (30 results), skipping any resources whose content has already processed for this algorithm.
08c,1. Write the title of the resource as a heading.
08c,2. Click on the search entry, recording the URL and skipping resource any whose content has already been processed.
08c,3. If a Google Book result, follow the process in STEP 07c from 07c,1.
08c,4. If the resource is not accessible, try links in the "All versions" link below the search entry in the main link, otherwise skip and mark as "cannot access content".
08c,5. Note in point form all salient details of the method described in the text. If there is no content on the algorithm, list N/A.
08c,6, Note all relevant resources not already processed in the bibliography, use resource name and context of reference in this determination.
09. Summarize the Wikipedia article for the algorithm.
09a. Visit http://en.wikipedia.org/wiki/Main_Page
09a. Using the article search at the top of the page, search for the for the algorithm name (repeat for any or all aliases as needed).
09b. Write the page title and URL as a heading, write the heading "WIKIPEDIA" if no such entry exists.
09c. Note in point form related algorithms and fields of study linked throughout the document and in related pages at the bottom of the entry.
09d. Note in point form all salient details of the method described in the resource. If there is no content on the algorithm, list N/A under the book title.
09e. Note any resources listed at the bottom of the article in the Notes, References, or Further Reading sections not already processed.
10. Summarize each relevant resource accessible on Springer Link
10a. Visit http://link.springer.com
10b. Search for the algorithm name (repeat for any or all aliases as needed)
10c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first page, skipping any resources whose content has already processed for this algorithm.
11. Summarize each relevant resource accessible on Scirus
11a. Visit http://www.scirus.com/
11b. Search for the algorithm name (repeat for any or all aliases as needed)
11c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first two pages (20 results), skipping any resources whose content has already processed for this algorithm.
12. Summarize each relevant resource accessible on IEEE
12a. Visit http://ieeexplore.ieee.org/
12b. Search for the algorithm name (repeat for any or all aliases as needed)
12c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first pages (25 results), skipping any resources whose content has already processed for this algorithm.
13. Summarize each relevant resource accessible on JSTOR
13a. Visit http://www.jstor.org/
13b. Search for the algorithm name (repeat for any or all aliases as needed)
13c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first pages (25 results), skipping any resources whose content has already processed for this algorithm.
14. Summarize each relevant resource accessible on CiteSeerX
14a. Visit http://citeseerx.ist.psu.edu/
14b. Search for the algorithm name (repeat for any or all aliases as needed)
14c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first two pages (20 results), skipping any resources whose content has already processed for this algorithm.
15. Summarize each relevant resource accessible on ACM
15a. Visit http://dl.acm.org/
15b. Search for the algorithm name (repeat for any or all aliases as needed)
15c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first page (20 results), skipping any resources whose content has already processed for this algorithm.
16. Summarize each relevant resource accessible on Google Web Search
16a. Visit http://google.com
16b. Search for the algorithm name (repeat for any or all aliases as needed)
16c. Review (using STEP 08c from 08c,1) each result in the search results on at least the first page (20 results), skipping any resources whose content has already processed for this algorithm.
17. Summarize each relevant resource on Q/A Sites.
They can be a good source of tips, tricks, usage heuristics and introductory resources.
17a. Visit website (or use a google search with "site:<URL>" followed by algorithm name)
http://metaoptimize.com/qa/
http://www.quora.com/search
http://stackoverflow.com/search
http://stats.stackexchange.com/search
http://cstheory.stackexchange.com/search
http://programmers.stackexchange.com/search
http://www.reddit.com/search
https://www.hnsearch.com/search
https://www.kaggle.com/forums
http://www.netflixprize.com//community/search.php
https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/
17b. Search for the algorithm name (repeat for any or all aliases as needed)
17c. Review (using STEP 08c from 08c,1) each result in the search results on at least 20 results, skipping any resources whose content has already processed for this algorithm.
18. Summarize relevant resources on video sites.
They can be a good source of tips, intuitions and usage heuristics and tie together what you have been reading.
18a. Visit website
http://www.youtube.com/
http://vimeo.com/
18b. Search for the algorithm name (repeat for any or all aliases as needed)
18c. Review (using STEP 08c from 08c,1) each result in the search results on at least 20 results, skipping any resources whose content has already processed for this algorithm.
19. Search for relevant implementations on CRAN
19a. Visit http://cran.r-project.org/search.html
19b. Search for the algorithm name (repeat for any or all aliases as needed)
19c. Click each search entry on the first two results pages (20 results) and only consider CRAN package web pages, skip others.
19d. For each User Manual and Vignette resource on the package webpage review using STEP 08c from 08c,1
20. Are there any resources which give an indication of being useful but for which the content has not been reviewed because the resource has not been accessible?
Some strategies to locate an accessible version of the resource include:
20a. Search for the full name of the resource on:
http://scholar.google.com
http://books.google.com
http://google.com
http://link.springer.com
http://www.scirus.com/
http://ieeexplore.ieee.org/
http://www.jstor.org/
http://citeseerx.ist.psu.edu/
http://dl.acm.org/
20b. Search for each authors full name on http://google.com and attempt to locate their home page and email address. Reach out to each author and request a copy of the paper via email.
20c. Contact friends in universities and request a copy of the paper providing a direct link to the PDF on the publishers website.
21. Are there any resources that have been listed as being potentially useful but have not yet been processed?
21a. Used the tactics in STEP 19 starting at 19a.
22. Do you have sufficient salient information about the algorithm (questions listed in Appendix A)?
22a. If not, search for each incomplete question using the sources listed in STEP 21a.
22b. Search for each incomplete question on sites listed in STEP 17a.
22c. Post the question on the sources listed in STEP 17a.
23. Do you have sufficient information to craft a demonstration in R? If not:
23a. Visit http://google.com
23b. Search for the algorithm name appended with "R project" and/or "R example"
Appendix A: Salient Information
Salient Information is identified using the following motivating questions:
* Does the method take parameters, if so what are they?
* What are common values for algorithm parameters?
* Does input require normalization or standardization prior to running the method?
* What is a description of the algorithm using words?
* What types of problems is the algorithm suited?
* What are the primary sources?
* What are some good introductory resources?
* What are the recommended ways to operate the algorithm?
* What packages in R support the algorithm?
* Which package in R is canonical?