Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Nightly Backup Job #1

Closed
jdimatteo opened this issue Aug 31, 2013 · 19 comments
Closed

Setup Nightly Backup Job #1

jdimatteo opened this issue Aug 31, 2013 · 19 comments
Assignees
Labels

Comments

@jdimatteo
Copy link
Member

run backups at 3 AM

ask Charles:

  • what needs to be backed up
  • where it should be backed up to
  • how many days of backups to keep (e.g. a backup for last month, last week, and daily backups for last 5 days)
  • confirm 3 AM nightly

initially just setup an rsync job triggered by cron or jenkins

document how the backups are done and how to do a restore. probably setup a wiki on this SystemAdmin repo

look into why we are backing up files, how important it is that the backups are available, and whether a simple rsync is really the best plan forward

@ghost ghost assigned jdimatteo Aug 31, 2013
@charlesylin
Copy link
Member

John,

These issue tickets look like they will be super useful. Glad we could have lunch today! Thanks again for all your help.

-Charles

On Aug 30, 2013, at 10:09 PM, jdimatteo [email protected] wrote:

run backups at 3 AM

ask Charles:

what needs to be backed up
where it should be backed up to
how many days of backups to keep (e.g. a backup for last month, last week, and daily backups for last 5 days)
confirm 3 AM nightly
initially just setup an rsync job triggered by cron or jenkins

document how the backups are done and how to do a restore. probably setup a wiki on this SystemAdmin repo

look into why we are backing up files, how important it is that the backups are available, and whether a simple rsync is really the best plan forward


Reply to this email directly or view it on GitHub.

@jdimatteo
Copy link
Member Author

I'm testing out an Amanda configuration of Amanda, and I'll probably set it up on Tuesday on TOD.

$ sudo apt-get install amanda-server amanda-client

I'm probably going to set up a dedicated gmail account for tod sys admin, so that this account is used for email updates for backups (e.g. to notify if a backup didn't occur) and this could also be used by Jenkins (which is currently just using my personal email account).

@charlesylin
Copy link
Member

Thanks John,

Am reading up on AMANDA now. Sounds interesting, but I'm worried how hard
it will be to support it over the long run. Should we be using something
like rdiff to keep it simple? What about crashplan?

http://rdiff-backup.nongnu.org/
https://www.crashplanpro.com/business/store.vtl

-Charles

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Sun, Sep 22, 2013 at 11:52 PM, jdimatteo [email protected]:

I'm testing out an Amanda configuration of Amanda, and I'll probably set
it up on Tuesday on TOD.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-24897855
.

@charlesylin
Copy link
Member

So crashplan won't work well with network drives, but amazon glacier works.

My buddies are using rdiff-backup to make constant snapshots for storage on
local backup and then using duplicity as a way to make big tarballs that
can be pushed to glacier.

http://duplicity.nongnu.org/

Ideally, we should have weekly snapshots backed up to crusader and then
monthly snapshots sent out to glacier.

-Charles

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Mon, Sep 23, 2013 at 12:22 PM, Charles Lin [email protected]:

Thanks John,

Am reading up on AMANDA now. Sounds interesting, but I'm worried how hard
it will be to support it over the long run. Should we be using something
like rdiff to keep it simple? What about crashplan?

http://rdiff-backup.nongnu.org/
https://www.crashplanpro.com/business/store.vtl

-Charles

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Sun, Sep 22, 2013 at 11:52 PM, jdimatteo [email protected]:

I'm testing out an Amanda configuration of Amanda, and I'll probably set
it up on Tuesday on TOD.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-24897855
.

@jdimatteo
Copy link
Member Author

During the Amanda install, I was prompted for postfix configuration. However, afterwards I installed ssmtp which automatically removed postfix:

I created a new gmail account [email protected], and followed the instructions at http://askubuntu.com/a/91857 :

jdimatteo@TOD-Test:~$ sudo apt-get install ssmtp
jdimatteo@TOD-Test:~$ sudo vim /etc/ssmtp/ssmtp.conf
jdimatteo@TOD-Test:~$ cat /etc/ssmtp/ssmtp.conf
#
# Config file for sSMTP sendmail
#
# The person who gets all mail for userids < 1000
# Make this empty to disable rewriting.
[email protected]

# The place where the mail goes. The actual machine name is required no 
# MX records are consulted. Commonly mailhosts are named mail.domain.com
mailhub=smtp.gmail.com:465

# Where will the mail seem to come from?
rewriteDomain=gmail.com

# The full hostname
hostname=TOD-Test

# Are users allowed to set their own From: address?
# YES - Allow the user to specify their own From: address
# NO - Use the system generated From: address
FromLineOverride=YES

[email protected]
# gmail password should go here: AuthPass=
UseTLS=Yes
jdimatteo@TOD-Test:~$

Now system emails are sent using the [email protected] account, e.g.

jdimatteo@TOD-Test:~$ echo hello | mail [email protected] -s "Test"

@jdimatteo
Copy link
Member Author

I haven't used rdiff-backup, but it looks a lot simpler than Amanda, so it should be easier to maintain in the long run (especially if others at Dana-Farber are using it as well).

I'm going to experiment with rdiff-backup and duplicity tomorrow.

@jdimatteo
Copy link
Member Author

Charles / @bradnerComputation :

I'm going to go ahead and setup rdiff-backup on TOD.

You mentioned a mysql database on /ark -- does that exist already? If so, please let me know the database name(s) that should be backed up.

Please note that rdiff-backup doesn't support a full system backup, so I'm just going to backup the 4 directories you mentioned earlier (/ark, /raider, /mnt/d0-0/share/bradnerlab/, and /ifs/labs/bradner/).

I expect the initial backup to take a while to transfer over the network (possibly days, depending on the local network speed). The initial backup will use a lot of local network bandwidth -- please let me know if there is any concern over using too much bandwidth and if I should throttle the transfer (I suggest we just let the backup go un-throttled and only throttle it if someone complains). Also, please let me know if you plan to restart TOD before the backups complete (I'll let you know when it is complete). After the initial backup is complete, subsequent backups should be fast and use little bandwidth (since they will be incremental backups, only copying over the changes).

I'm going to create a Jenkins Backup job to run automatic weekly backups (probably to run on Saturdays at 3 AM). (You can also manually trigger a backup if you like by clicking the build button for the Backup job.) If any of these backups fail, an email will be sent to both you and I. I'm going to configure Jenkins security to make the backup jobs only visible/runnable by you and I, and only configurable by the bradneradmin user.

On a github wiki page, I'm going to document the backup configuration and detailed instructions (with examples) of how to restore from backup.

After I get the rdiff-backups running, configured, and documented I'll start working on evaluating/planning duplicity backups to Amazon Glacier.

Please let me know if you'd like me to hold off on any of this, or if you have any concerns or questions.

@charlesylin
Copy link
Member

John,

This sounds good.

  1. The mysql database is going in today. Will let you know what we decide to call it.
  2. could you add stuff in the root / directory for backup. I think all we have there is the OS and programs
  3. Let's try an unthrottled backup starting Friday night and see what happens.

This looks great, thanks again for your help!

@jdimatteo
Copy link
Member Author

Charles, I'm going to go ahead and start the backups tonight. Please let me know if you'd like me to hold off, if you setup the database yet, and/or any TOD/network/crusader downtime is planned.

Sent from my iPhone

On Sep 24, 2013, at 10:06 AM, bradnerComputation [email protected] wrote:

John,

This sounds good.

The mysql database is going in today. Will let you know what we decide to call it.

could you add stuff in the root / directory for backup. I think all we have there is the OS and programs

Let's try an unthrottled backup starting Friday night and see what happens.

This looks great, thanks again for your help!


Reply to this email directly or view it on GitHub.

@charlesylin
Copy link
Member

Thanks Jon. Let's only back up /ark at first to make sure everything goes well.

We have just recently created a new MySQL database called seqDB. This should be stored on /ark.

-Charles

On Sep 27, 2013, at 7:17 PM, jdimatteo [email protected] wrote:

Charles, I'm going to go ahead and start the backups tonight. Please let me know if you'd like me to hold off, if you setup the database yet, and/or any TOD/network/crusader downtime is planned.

Sent from my iPhone

On Sep 24, 2013, at 10:06 AM, bradnerComputation [email protected] wrote:

John,

This sounds good.

The mysql database is going in today. Will let you know what we decide to call it.

could you add stuff in the root / directory for backup. I think all we have there is the OS and programs

Let's try an unthrottled backup starting Friday night and see what happens.

This looks great, thanks again for your help!


Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub.

@jdimatteo
Copy link
Member Author

I installed rdiff-backup:

jdm@tod:~$ sudo apt-get install rdiff-backup

But the crusader mount doesn't have fully functioning file permissions, e.g. when I touch a file I get an error and it shows the wrong user owner:

jdm@tod:/crusader$ mkdir jd-tmp
jdm@tod:/crusader$ ls -l
total 0
drwxrwxrwx 2 1026 users 0 Sep 28 00:11 jd-tmp
drwxrwxrwx 2 1026 users 0 Sep 16 09:46 test
jdm@tod:/crusader$ touch jd-tmp/a
touch: cannot touch `jd-tmp/a': Permission denied
jdm@tod:/crusader$ ls -l jd-tmp
total 0
-rw-rw-r-- 1 1026 users 0 Sep 28 00:11 a
jdm@tod:/crusader$ mount | grep crusader
//crusader.dfci.harvard.edu/data on /crusader type cifs (rw)
jdm@tod:/crusader$ 

Looking through the bradneradmin .bash_history, it looks like crusader was mounted like this:

sudo mount.cifs //crusader.dfci.harvard.edu/data /crusader -o credentials=/home/bradneradmin/.crusader_credentials

It looks like maybe the permissions aren't (as much of) an issue for root:

root@tod:/crusader# mkdir jd-tmp2
root@tod:/crusader# ls -l
total 0
drwxrwxrwx 2 1026 users 0 Sep 28 00:35 jd-tmp2
drwxrwxrwx 2 1026 users 0 Sep 16 09:46 test
root@tod:/crusader# touch jd-tmp2/a
root@tod:/crusader# ls -l jd-tmp2/
total 0
-rw-r--r-- 1 1026 users 0 Sep 28 00:36 a
root@tod:/crusader#

In a screen session, the ark backup has started:

root@tod:~# mkdir -p /crusader/backup/rdiff-backup/
root@tod:~# rdiff-backup /ark/ /crusader/backup/rdiff-backup/ark

@jdimatteo
Copy link
Member Author

The backed resulted in many errors such as the following, so I aborted the backup:

OSError while renaming /crusader/backup/rdiff-backup/ark/home/af661/ressrv19/projects/athero/rose/;069;067_;066;082;0684_;067;079;078_;082;079;083;069/mapped;071;070;070/rdiff-backup.tmp.1261 to /crusader/backup/rdiff-backup/ark/home/af661/ressrv19/projects/athero/rose/;069;067_;066;082;0684_;067;079;078_;082;079;083;069/mapped;071;070;070/;069;067_;066;082;0684_;067;079;078_peaks_12
;075;066_;083;084;073;084;067;072;069;068_;084;083;083_;068;073;083;084;065;076_;0723;07527;065c_;071;083;077733691_;083;082;082227461-;071;083;077733691_;083;082;082227462-;071;083;077733691_;
083;082;082227463.hg18.bwt.sorted.bam_;077;065;080;080;069;068.gff
UpdateError home/af661/ressrv19/projects/athero/rose/EC_BRD4_CON_ROSE/mappedGFF/rdiff-backup.tmp.1261 [Errno 2] No such file or directory

I think this error is related to the crusader CIFS deficiencies with rdiff-backup, e.g. as described here: http://rdiff-backup.nongnu.org/FAQ.html#cifs

@jdimatteo
Copy link
Member Author

Charles / @bradnerComputation : is it possible to mount crusader using something besides CIFS/SAMBA? For example, is it possible to mount it as NFS? This would make both backups and restores simpler.

I'm going to hold off on resuming the backup until I hear back from you. I could probably get CIFS working, but it looks like it might be a headache according to the rdiff-backup documentation. According to http://rdiff-backup.nongnu.org/FAQ.html#cifs , "Using a CIFS or smbfs mount as the mirror directory has been troublesome for some users because of the wide variety of Samba configurations."

Please feel free to forward me the make/model of the storage array and I can help evaluate any options besides CIFS. Thanks

@charlesylin
Copy link
Member

John,

NFS mounting of crusader is easy. I'll try to get that working monday. The
problem is that a lot of the files we need to back up are cifs or smb
mounted like home/af661/ressrv19/projects/athero/rose/

Do you think NFS mounting crusader will solve these problems?

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Sat, Sep 28, 2013 at 1:21 AM, jdimatteo [email protected] wrote:

Charles / @bradnerComputation https://github.com/bradnerComputation :
is it possible to mount crusader using something besides CIFS/SAMBA? For
example, is it possible to mount it as NFS? This would make both backups
and restores simpler.

I'm going to hold off on resuming the backup until I hear back from you. I
could probably get CIFS working, but it looks like it might be a headache
according to the rdiff-backup documentation. According to
http://rdiff-backup.nongnu.org/FAQ.html#cifs , "Using a CIFS or smbfs
mount as the mirror directory has been troublesome for some users because
of the wide variety of Samba configurations."

Please feel free to forward me the make/model of the storage array and I
can help evaluate any options besides CIFS. Thanks


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-25290888
.

@jdimatteo
Copy link
Member Author

Charles, yes, I think NFS mounting will solve these problems.

The rdiff-backup documentation suggests that CIFS/Samba is only a problem on the backup mirror (i.e. /crusader), but the the files to backup (i.e. /ark) are fine on CIFS/Samba.

Sent from my iPhone

On Sep 28, 2013, at 10:47 AM, bradnerComputation [email protected] wrote:

John,

NFS mounting of crusader is easy. I'll try to get that working monday. The
problem is that a lot of the files we need to back up are cifs or smb
mounted like home/af661/ressrv19/projects/athero/rose/

Do you think NFS mounting crusader will solve these problems?

Charles Y. Lin, Ph.D.
Dana-Farber Cancer Institute
Department of Medical Oncology
[email protected]:[email protected]
http://bradnerlab.com

On Sat, Sep 28, 2013 at 1:21 AM, jdimatteo [email protected] wrote:

Charles / @bradnerComputation https://github.com/bradnerComputation :
is it possible to mount crusader using something besides CIFS/SAMBA? For
example, is it possible to mount it as NFS? This would make both backups
and restores simpler.

I'm going to hold off on resuming the backup until I hear back from you. I
could probably get CIFS working, but it looks like it might be a headache
according to the rdiff-backup documentation. According to
http://rdiff-backup.nongnu.org/FAQ.html#cifs , "Using a CIFS or smbfs
mount as the mirror directory has been troublesome for some users because
of the wide variety of Samba configurations."

Please feel free to forward me the make/model of the storage array and I
can help evaluate any options besides CIFS. Thanks


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-25290888
.


Reply to this email directly or view it on GitHub.

@charlesylin
Copy link
Member

John,

Through much brute strength and ignorance, I think I was able to nfs mount crusader. Have a look at the /etc/fstab to see if I did it correctly.

Also, the way i have the system set up now, different users are mounting through their smb accounts the same drive.

For instance /ark/home/af661/ressrv19/projects/ is the same place as /ark/home/cl512/ressrv19/projects/

Will this be problematic w/ rdiff and cause multiple copies? If so, we can target the backup more carefully.

-Charles

@jdimatteo
Copy link
Member Author

Hi Charles,

The ark backup completed. It was only about 12 GB, so it just took a few minutes. I included the "--exclude-other-filesystems" option to rdiff-backup, so none of the ressrv19/projects/ mounts were included in the backup. Do any of the mounts under /ark/ (e.g. /ark/home/af661/ressrv19/projects/) need to be backed up to /crusader?

I included some notes below.

I'll setup a Jenkins job to run this backup every Friday, along with backing up the other directories we discussed. I'll document the backup/restore procedure once it is configured in Jenkins.

Regards,
John

Notes

root@tod:~# rdiff-backup --exclude-other-filesystems /ark/ /crusader/backup/rdiff-backup/ark
root@tod:~# echo $?
0
root@tod:~# 

Simulating loosing a file:

jdm@tod:~$ mv extended-choice-parameter.hpi Gunk/

Restore the file:

root@tod:~# ls -l /crusader/backup/rdiff-backup/ark/home/jdm/extended-choice-parameter.hpi
-rw-r--r-- 1 nobody nogroup 48176 Sep 13 23:44 /crusader/backup/rdiff-backup/ark/home/jdm/extended-choice-parameter.hpi
root@tod:~# rdiff-backup -r now /crusader/backup/rdiff-backup/ark/home/jdm/extended-choice-parameter.hpi /ark/home/jdm/extended-choice-parameter.hpi

Verify restore occurred properly:

jdm@tod:~$ ls -l extended-choice-parameter.hpi 
-rw-r--r-- 1 jdm jdm 48176 Sep 13 23:44 extended-choice-parameter.hpi
jdm@tod:~$ ls -l Gunk/extended-choice-parameter.hpi 
-rw-r--r-- 1 jdm jdm 48176 Sep 30 22:11 Gunk/extended-choice-parameter.hpi
jdm@tod:~$ diff extended-choice-parameter.hpi Gunk/extended-choice-parameter.hpi 
jdm@tod:~$ echo $?
0
jdm@tod:~$ 

The /crusader NFS permissions still seem a little screwy (all users are mapped to user nobody and group nogroup). rdiff-backup handles this nicely though, and restored files have the correct users. So I guess the /crusader mount configuration is OK.

@jdimatteo
Copy link
Member Author

@bradnerComputation / Charles:

The backup to /crusader is configured to run automatically every Saturday. I documented the configuration here (including a list of directories/databases being backed up and where they are being backed up to):

https://github.com/BradnerLab/SystemAdmin/wiki#backups

The initial backup is still running, and should complete later today.

A few questions/comments:

  1. Would you prefer a daily incremental backup? For example, I could configure the backups to occur daily at 3 AM (instead of weekly). The backups should be reasonably fast after the initial backup completes.
  2. The backups are currently restricted to a single file system for each directory, so for example, for the /ark backup files mounted from another file system like /ark/home/af661/ressrv19/projects/ are not included in the backup -- is this OK?
  3. Please consider updating the NFS crusader configuration so that TOD file permissions are maintained. Currently all files are owned by "nobody" and viewable by anybody. This creates a loophole in file permissions, so regardless of what the permissions are in your home directly, everyone on TOD can view any files by looking under /crusader. I think the following instructions indicate that the right option would be "No Mapping" for the "Root Squash" field on the "Edit NFS privileges" page for crusader (assuming crusader is synology based, which it seemed to be from me poking around): http://www.synology.com/support/tutorials_show.php?q_id=566&lang=enu . This would also allow easier restores, since the root user could just copy the files normally (e.g. with "cp") instead of using "rdiff-backup -r now" (which is currently required to restore the files back with the correct permissions). If we can't get the NFS configuration to maintain permissions properly, to at least prevent the security loop hole we could just mount crusader in a subdirectory so we can prevent users from viewing the sub-directory (e.g. /crusader-dir/crusader with only root able to read /crusader-dir/ and then /crusader-dir/crusader can continue to have the wacky nobody permissions).

I plan on setting up the offsite backups separately next week (or perhaps later) with #3 .

PS
FYI, I'm off from work today, which is why I'm setting this up mid-day instead of during my train commute.

@jdimatteo
Copy link
Member Author

I discussed with Charles on the phone on Monday, and based on that I made the following changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants