-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.md.old
235 lines (168 loc) · 9.82 KB
/
README.md.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
ingest_deploy
=============
Ansible, packer and vagrant project for building and running ingest environment
on AWS and locally.
Currently only the ansible is working, need to get a local vagrant version
working....
### Dependencies
Tools:
- [VirtualBox](https://www.virtualbox.org/) (Version X.X)
- [Vagrant](https://www.vagrantup.com/) (Version X.X)
- [vagrant-vbguest](https://github.com/dotless-de/vagrnat-vbguest/) (`vagrant plugin install vagrant-vbguest`)
- [Ansible](http://www.ansible.com/home) (Version X.X)
- if using VirtualBox, install the vagrant-vbguest
UCLDC Harvesting operations guide
=================================
add monitoring user
--------------------------------
pull the ucldc/ingest_deploy project
Get the ansible vault password from Mark. It's easiest if you create a file
(perhaps ~/.vault-password-file) to store it in and alias ansible-playbook to
ansible-playbook --vault-password-file=~/.vault-password-file. Set mode to 600)
create an htdigest entry by running
htdigest -c tmp.pswd ingest <username>
Will prompt for password that is easy to generate with pwgen. copy the line in tmp.pswd
Then run:
ansible-vault --vault-password-file=~/.vault-password-file
ingest_deploy/ansible/roles/ingest_front/vars/basic_auth_users.yml
Entries in this file are htdigest lines, preceded by a - to make a yaml list.
eg:
---
basic_auth_users:
- "u1:ingest:435srrr3db7b180366ce7e653493ca39"
- "u1:ingest:rrrr756e5aacde0262130e79a888888c"
- "u2:ingest:rrrr1cd0cd7rrr7a7839a5c1450bb8bc"
From a machine that can already access the ingest front machine with ssh run:
ansible-playbook -i hosts --vault-password-file=~/.vault_pass_ingest provision_front.yml
This will install the users.digest to allow access for the monitoring user.
Adding an admin user
--------------------
add your public ssh to keys file in https://github.com/ucldc/appstrap/tree/master/cdl/ucldc-operator-keys.txt
From a machine that can already access the ingest front machine with ssh run:
ansible-playbook -i hosts --vault-password-file=~/.vault_pass_ingest provision_front.yml
This will add your public key to the ~/.ssh/authorized_keys for the ec2-user on
the ingest front machine.
Creating workers
----------------
You need to log onto the ingest front machine and then to the majorTom machine.
The key to access instances in the private subnet has a password, ask Mark for
it.
Log in to the majorTom machine. From here you can run the ansible playbooks to
create and provision worker machines.
First you need to activate the virtualenv in ~/workers_local/
. ~/workers_local/bin/activate
Then to create some worker machines run:
ansible-playbook --vault-password-file=~/.vault_pass_ingest -i ~/code/ingest_deploy/ansible/hosts ~/code/ingest_deploy/ansible/create_worker-stage.yml --extra-vars="count=3"
The "count=##" will set the number of instances to create.
Once this runs and the instances are in a state of "running", to get the
machines setup & running a worker process run:
ansible-playbook --vault-password-file=~/.vault_pass_ingest -i ~/code/ec2.py ~/code/ingest_deploy/ansible/provision_worker-stage.yml --extra-vars='rq_work_queues=["normal-stage","low-stage"]'
This will provision the workers by installing required software, configurations
and start running Akara & the worker processes that listen on the queues
specified. You should see the worker processes appear in the rq monitor
dashboard once this is done.
NOTE: if you already have provisioned worker machines running jobs, use the
--limit=<ip range> eg. --limit=10.60.22.\*. Rerunning the provisioning will put
the current running workers in a bad state, you will then have to log on to the
worker and restart the worker process or terminate the machine.
AWS assigns unique subnets to the groups of workers you start, so in general,
different generations of machines will be distinguished by the different C class
subnet. This makes the --limit parameter quite useful.
Creating New Solr Indexes
-------------------------
Currently, solr updates are run from the majorTom machine. The solr update
looks at the couchdb changes endpoint. This endpoint has a record for each
document that has been created in the database, including deleted documents.
Run
/usr/local/bin/solr-update.sh <--since=(int)>
This will run an incremental update from the last changes sequence number that is saved in s3 at solr.ucldc/couchdb_since/<DATA_BRANCH>.
To specify the last sequence number (since parameter to the couchdb change
endpoint). To reindex all docs use --since=0
Once the solr index is updated, run:
/usr/local/bin/solr-index-to-s3.sh <DATA_BRANCH> (stage|production)
This will push the last build solr index to s3 at the location
solr.ucldc/indexes/<DATA_BRANCH>/YYYY/MM/solr-index.YYYY-MM-DD-HH_MM_SS.tar.bz2
Updating the Beanstalk
----------------------
Go into the beanstalk control panel and select the ucldc-solr application.
![ucldc-solr app view](docs/images/screen_shot_ucldc_solr_app.png)
Select the existing environment you want to replace and clone the environment:
![ucldc-solr clone env](docs/images/screen_shot_clone_env.png)
If a new ami is available, choose it. Otherwise the defaults should be good. The
URL will be swapped later so the current one doesn't matter.
![ucldc-solr clone env config](docs/images/screen_shot_clone_env-config.png)
Wait for the cloning to finish, this can take a while, 15 minutes is not
unusual. Eventually you will see this:
![ucldc-solr clone env ready](docs/images/screen_shot_clone_env-ready.png)
Choose the "configuration" screen & go to the "software configuration" screen:
![ucldc-solr clone config screen](docs/images/screen_shot_clone_env-software-config.png)
Change the INDEX_PATH Environment Property to point to the new index on S3, then
click apply and wait for the new environment to be ready again. (Health should
be "Green")
Then from the environment Dashboard, select the "Rebuild Environment" action:
![ucldc-solr clone rebuild](docs/images/screen_shot_clone_env-rebuild.png)
Again this can take a while. During the rebuild, the new solr index will be
pulled to the beanstalk machines.
Now QA against the newly cloned environment's URL. Once the QA looks OK, the
final step is to swap URLs with the existing index environment. From the new
environment select the action "Swap Environment URLs". You will go to this
screen and select the currently active environment:
![ucldc-solr swap URLs](docs/images/screen_shot_clone_env-swap.png)
Click the swap button & in a few seconds the old URL will point to the new
environment.
Once everything checks out well with the new index, you can terminate the older
environment.
NOTE: need scripts to automate this.
Other AWS Related Admin Tasks
-----------------------------
When new harvester or ingest code is pushed, you need to create a new generation
of worker machines to pick up the new code.
First, terminate the existing machines.
ansible-playbook -i ~/code/ec2.py ~/code/ingest_deploy/ansible/terminate_workers.yml <--limit=10.60.?.?>
Then go through the worker create process again, creating and provisioning
machines as needed.
What to do when harvests fail
-----------------------------
First take a look at the RQ Dashboard. There will be a bit of the error message
there. Hopefully this would identify the error and you can modify whatever is
going wrong.
If you need more extensive access to logs, they are all stored on the AWS
CloudWatch platform. Go to the CloudWatch page in the AWS console and choose the
"Logs" page.
![ucldc-cw logs](docs/images/screen_shot_cloudwatch_logs_page.png)
The /var/local/rqworker & /var/local/akara contain the logs from the worker
processes & the Akara server on a worker instance.
The logs are named with the instance id & ip address, e.g. ingest-stage-i-127546c9-10.60.28.224
![ucldc-cw rqworker-log-page](docs/images/screen_shot_cloudwatch_rqworker-logs-page.png)
You will probably need to use the sorting by "Last Event Time" to get the most
recent logs. Find the log of interest by IP or instance id & click through. You
will then see the logs for that worker instance:
![ucldc-cw rqworker-log](docs/images/screen_shot_cloudwatch_rqworker-log.png)
If this doesn't get you enough information, you can ssh to a worker instance and
watch the logs real time if you like. tail -f /var/local/rqworker/worker.log or
/var/local/akara/logs/error.log.
License
=======
Copyright © 2015, Regents of the University of California
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
- Neither the name of the University of California nor the names of its
contributors may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.