forked from LLNL/magpie
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
2018 lines (1542 loc) · 86 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Running Big Data Software on Traditional HPC Clusters
Albert Chu
Updated June 1, 2015
What is Magpie?
---------------
Magpie contains a number of scripts for running Big Data software in
HPC environments. Thus far, Hadoop, Hbase, Pig, Spark, Storm,
Tachyon, and Zookeeper are supported. It currently supports running
over the schedulers of Slurm and Moab, and the resource managers of
Slurm and Torque. It currently supports running over the parallel
file system Lustre and running over any generic network filesytem.
Some of the features presently supported:
- Run jobs interactively or via scripts.
- Run Mapreduce 1.0 or 2.0 jobs via Hadoop 1.0 or 2.0
- Run against a number of filesystem options, such as HDFS, HDFS over
Lustre, HDFS over a generic network filesystem, Lustre directly, or
a generic network filesystem.
- Take advantage of SSDs for local caching if available
- Run the UDA Infiniband optimization plugin for Hadoop.
- Make decent optimizations for your hardware
Some experimental features currently supported:
- Support MagpieNetworkFS - See README.magpienetworkfs
- Support intellustre - See README.intellustre
- Support HDFS federation - See README.hdfsfederation
Basic Idea
----------
The basic idea behind these scripts are to:
1) Allocate nodes on a cluster using your HPC scheduler/resource
manager. Slurm, Moab+Slurm, and Moab+Torque are currently
supported.
2) Scripts will create configuration files for all appropriate
projects (Hadoop, Hbase, etc.) The configuration files will be
setup so the rank 0 node is the "master". All compute nodes will
have configuration files created that point to the node designated
as the master server.
The configuration files will be populated with values for your
filesystem choice and the hardware that exists in your cluster.
Reasonable attempts are made to determine optimal values for your
system and hardware (they are almost certainly better than the
default values). A number of options exist to adjust these values
for individual jobs.
3) Launch daemons on all nodes. The rank 0 node will run master
daemons, such as the Hadoop Namenode or the Hbase Master. All
remaining nodes will run appropriate slave daemons, such as the
Hadoop Datanodes or Hbase RegionServers.
4) Now you have a mini big data cluster to do whatever you want. You
can log into the master node and interact with your mini big data
cluster however you want. Or you could have Magpie run a script to
execute your big data calculation instead.
5) When your job completes or your allocation time has run out, Magpie
will cleanup your job by tearing down daemons. When appropriate,
Magpie may also do some additional cleanup work to hopefully make
re-execution on later runs cleaner and faster (e.g. Hbase
compaction).
Instructions For Running Hadoop
-------------------------------
0) If necessary, download your favorite version of Hadoop off of
Apache and install it into a location where it's accessible on all
cluster nodes. Usually this is on a NFS home directory.
See below about patches that may be necessary for Hadoop depending
on your environment and Hadoop version.
See below about misc/magpie-apache-download-and-setup.sh, which may
make the downloading and patching easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts in script-sbatch, Moab Msub+Slurm scripts in
script-msub-slurm, and Moab Msub+Torque scripts in
script-msub-torque.
You'll likely want to start with the base hadoop script
(e.g. magpie.sbatch-hadoop) for your scheduler/resource manager. If
you wish to configure more, you can choose to start with the base
script (e.g. magpie.sbatch) which contains all configuration
options.
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'hadoop'
JAVA_HOME : B/c you need to ...
3) Now setup the essentials for Hadoop.
HADOOP_SETUP : Set to yes
HADOOP_SETUP_TYPE : Are you running Mapreduce version 1 or 2. Or
if you are only configuring HDFS, HDFS 1 or 2.
HADOOP_VERSION : Make sure your build matches HADOOP_SETUP_TYPE
(i.e. don't say you want MapReduce 1 and point to Hadoop 2.0 build)
HADOOP_HOME : Where your hadoop code is. Typically in an NFS mount.
HADOOP_LOCAL_DIR : A small place for conf files and log files local
to each node. Typically /tmp directory.
HADOOP_FILESYSTEM_MODE : Most will likely want "hdfsoverlustre" or
"hdfsovernetworkfs". See below for details on HDFS over Lustre and
HDFS over NetworkFS.
HADOOP_HDFSOVERLUSTRE_PATH or equivalent: For HDFS over Lustre, you
need to set this. If not using HDFS over Lustre, set the
appropriate path for your filesystem mode choice.
4) Select how your job will run by setting HADOOP_MODE. The first
time you'll probably want to run w/ 'terasort' mode just to try
things out and make things look setup correctly.
After this, you may want to run with 'interactive' mode to play
around and figure things out. In the job output you will see
output similar to the following:
ssh node70
setenv HADOOP_CONF_DIR "/tmp/username/hadoop/ajobname/1081559/conf"
cd /home/username/hadoop-2.6.0
These instructions will inform you how to login to the master node
of your allocation and how to initialize your session. Once in
your session. You can do as you please. For example, you can
interact with the Hadoop filesystem (bin/hadoop fs ...) or run a
job (bin/hadoop jar ...). There will also be instructions in your
job output on how to tear the session down cleanly if you wish to
end your job early.
Once you have figured out how you wish to run your job, you will
likely want to run with 'script' mode. Create a script that will
run your job/calculation automatically, set it in
HADOOP_SCRIPT_PATH, and then run your job. You can find an example
job script in examples/hadoop-example-job-script. See below on
"Exported Environment Variables", for information on exported
environment variables that may be useful in scripts.
5) Submit your job into the cluster by running "sbatch -k
./magpie.sbatchfile" for Slurm or "msub ./magpie.msubfile" for
Moab. Add any other options you see fit.
6) Look at your job output file to see your output. There will also
be some notes/instructions/tips in the output file for viewing the
status of your job in a web browser, environment variables you wish
to set if interacting with it, etc.
See below on "General Advanced Usage" for additional tips.
See below for "Hadoop Advanced Usage" for additional Hadoop tips.
Example Job Output for Hadoop running Terasort
----------------------------------------------
The following is an example job output of Magpie running Hadoop and
running a Terasort. This is run over HDFS over Lustre. Sections of
extraneous text have been left out.
While this output is specific to using Magpie with Hadoop, the output
when using Spark, Storm, Hbase, etc. is not all that different.
1) First we see that HDFS over Lustre is being setup by formatting the
HDFS Namenode.
*******************************************************
* Formatting HDFS Namenode
*******************************************************
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/01/28 07:17:36 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = apex70.llnl.gov/192.168.123.70
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
<snip>
<snip>
<snip>
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at apex70.llnl.gov/192.168.123.70
************************************************************/
2) Next we get some details of the job
*******************************************************
* Magpie General Job Info
*
* Job Nodelist: apex[70-78]
* Job Nodecount: 9
* Job Timelimit in Minutes: 60
* Job Name: terasort
* Job ID: 1081559
*
*******************************************************
3) Hadoop begins to launch and startup daemons on all cluster nodes.
Starting hadoop
15/01/28 07:18:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [apex70]
apex70: starting namenode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-namenode-apex70.out
apex72: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex72.out
apex71: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex71.out
apex77: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex77.out
apex76: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex76.out
apex73: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex73.out
apex74: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex74.out
apex78: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex78.out
apex75: starting datanode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-datanode-apex75.out
Starting secondary namenodes [apex70]
apex70: starting secondarynamenode, logging to /tmp/achu/hadoop/terasort/1081559/log/hadoop-achu-secondarynamenode-apex70.out
15/01/28 07:18:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-resourcemanager-apex70.out
apex71: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex71.out
apex72: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex72.out
apex77: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex77.out
apex78: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex78.out
apex74: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex74.out
apex75: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex75.out
apex73: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex73.out
apex76: starting nodemanager, logging to /tmp/achu/hadoop/terasort/1081559/log/yarn-achu-nodemanager-apex76.out
Waiting 30 seconds to allows Hadoop daemons to setup
4) Next, we see output with details of the Hadoop setup. You'll find
addresses indicating web services you can access to get detailed
job information. You'll also find information about how to login
to access Hadoop directly and how to shut down the job early if you
so desire.
*******************************************************
*
* Hadoop Information
*
* You can view your Hadoop status by launching a web browser and pointing to ...
*
* Yarn Resource Manager: http://apex70:8088
*
* Job History Server: http://apex70:19888
*
* HDFS Namenode: http://apex70:50070
* HDFS DataNode: http://<DATANODE>:50075
*
* HDFS can be accessed directly at:
*
* hdfs://apex70:54310
*
*
* To access Hadoop directly, you'll want to:
* ssh apex70
* setenv HADOOP_CONF_DIR "/tmp/achu/hadoop/terasort/1081559/conf"
* cd /home/achu/hadoop/hadoop-2.6.0
*
* Then you can do as you please. For example to interact with the Hadoop filesystem:
*
* bin/hadoop fs ...
*
* To launch jobs you'll want to:
*
* bin/hadoop jar ...
*
*
* To end/cleanup your session, kill the daemons via:
*
* ssh apex70
* setenv HADOOP_CONF_DIR "/tmp/achu/hadoop/terasort/1081559/conf"
* cd /home/achu/hadoop/hadoop-2.6.0
* sbin/stop-yarn.sh
* sbin/stop-dfs.sh
* sbin/mr-jobhistory-daemon.sh stop historyserver
*
* Some additional environment variables you may sometimes wish to set
*
* setenv JAVA_HOME "/usr/lib/jvm/jre-1.6.0-sun.x86_64/"
* setenv HADOOP_HOME "/home/achu/hadoop/hadoop-2.6.0"
*
*******************************************************
5) The job then runs Teragen
Running bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen -Ddfs.datanode.drop.cache.behind.reads=true -Ddfs.datanode.drop.cache.behind.writes=true 50000000 terasort-teragen
15/01/28 07:19:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/28 07:19:01 INFO client.RMProxy: Connecting to ResourceManager at apex70/192.168.123.70:8032
15/01/28 07:19:05 INFO terasort.TeraSort: Generating 50000000 using 192
15/01/28 07:19:10 INFO mapreduce.JobSubmitter: number of splits:192
15/01/28 07:19:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422458304036_0001
15/01/28 07:19:12 INFO impl.YarnClientImpl: Submitted application application_1422458304036_0001
15/01/28 07:19:12 INFO mapreduce.Job: The url to track the job: http://apex70:8088/proxy/application_1422458304036_0001/
15/01/28 07:19:12 INFO mapreduce.Job: Running job: job_1422458304036_0001
15/01/28 07:19:20 INFO mapreduce.Job: Job job_1422458304036_0001 running in uber mode : false
15/01/28 07:19:20 INFO mapreduce.Job: map 0% reduce 0%
15/01/28 07:19:31 INFO mapreduce.Job: map 1% reduce 0%
15/01/28 07:19:32 INFO mapreduce.Job: map 5% reduce 0%
<snip>
<snip>
<snip>
15/01/28 07:20:48 INFO mapreduce.Job: map 97% reduce 0%
15/01/28 07:20:49 INFO mapreduce.Job: map 98% reduce 0%
15/01/28 07:20:52 INFO mapreduce.Job: map 100% reduce 0%
15/01/28 07:22:24 INFO mapreduce.Job: Job job_1422458304036_0001 completed successfully
15/01/28 07:22:24 INFO mapreduce.Job: Counters: 31
<snip>
<snip>
<snip>
Map-Reduce Framework
Map input records=50000000
Map output records=50000000
Input split bytes=16444
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=3037
CPU time spent (ms)=313700
Physical memory (bytes) snapshot=66395398144
Virtual memory (bytes) snapshot=502137180160
Total committed heap usage (bytes)=192971538432
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=107387891658806101
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=5000000000
6) The job then runs Terasort
Running bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar terasort -Dmapred.reduce.tasks=16 -Ddfs.replication=1 -Ddfs.datanode.drop.cache.behind.reads=true -Ddfs.datanode.drop.cache.behind.writes=true terasort-teragen terasort-sort
15/01/28 07:22:55 INFO terasort.TeraSort: starting
15/01/28 07:22:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/28 07:22:56 INFO input.FileInputFormat: Total input paths to process : 192
Spent 269ms computing base-splits.
Spent 4ms computing TeraScheduler splits.
Computing input splits took 274ms
Sampling 10 splits of 192
Making 16 from 100000 sampled records
Computing parititions took 1525ms
Spent 1801ms computing partitions.
15/01/28 07:22:57 INFO client.RMProxy: Connecting to ResourceManager at apex70/192.168.123.70:8032
15/01/28 07:23:04 INFO mapreduce.JobSubmitter: number of splits:192
15/01/28 07:23:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/01/28 07:23:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422458304036_0002
15/01/28 07:23:06 INFO impl.YarnClientImpl: Submitted application application_1422458304036_0002
15/01/28 07:23:06 INFO mapreduce.Job: The url to track the job: http://apex70:8088/proxy/application_1422458304036_0002/
15/01/28 07:23:06 INFO mapreduce.Job: Running job: job_1422458304036_0002
15/01/28 07:23:11 INFO mapreduce.Job: Job job_1422458304036_0002 running in uber mode : false
15/01/28 07:23:11 INFO mapreduce.Job: map 0% reduce 0%
15/01/28 07:23:21 INFO mapreduce.Job: map 5% reduce 0%
15/01/28 07:23:22 INFO mapreduce.Job: map 65% reduce 0%
<snip>
<snip>
<snip>
15/01/28 07:23:44 INFO mapreduce.Job: map 100% reduce 97%
15/01/28 07:23:45 INFO mapreduce.Job: map 100% reduce 99%
15/01/28 07:23:46 INFO mapreduce.Job: map 100% reduce 100%
15/01/28 07:24:03 INFO mapreduce.Job: Job job_1422458304036_0002 completed successfully
15/01/28 07:24:03 INFO mapreduce.Job: Counters: 50
<snip>
<snip>
<snip>
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=5000000000
File Output Format Counters
Bytes Written=5000000000
15/01/28 07:24:03 INFO terasort.TeraSort: done
7) With the job complete, Magpie now tears down the session and cleans
up all daemons.
Stopping hadoop
stopping yarn daemons
stopping resourcemanager
apex76: stopping nodemanager
apex74: stopping nodemanager
apex77: stopping nodemanager
apex73: stopping nodemanager
apex75: stopping nodemanager
apex72: stopping nodemanager
apex71: stopping nodemanager
apex78: stopping nodemanager
no proxyserver to stop
stopping historyserver
Saving namespace before shutting down hdfs ...
Running bin/hdfs dfsadmin -safemode enter
15/01/28 07:25:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is ON
Running bin/hdfs dfsadmin -saveNamespace
15/01/28 07:25:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Save namespace successful
Running bin/hdfs dfsadmin -safemode leave
15/01/28 07:26:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF
15/01/28 07:26:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [apex70]
apex70: stopping namenode
apex76: stopping datanode
apex78: stopping datanode
apex74: stopping datanode
apex75: stopping datanode
apex77: stopping datanode
apex72: stopping datanode
apex73: stopping datanode
apex71: stopping datanode
Stopping secondary namenodes [apex70]
apex70: stopping secondarynamenode
Basic Instructions For Running Hadoop w/ UDA
--------------------------------------------
UDA is an Infiniband plugin to alter the suffle/sort mechanism in
Hadoop to take advantage of RDMA. It can be found at:
https://code.google.com/p/uda-plugin/
It is currently supported in Hadoop 2.3.0 and newer. UDA can be used
with older versions of Hadoop but will require patches against those
builds. Magpie currently supports UDA in versions 2.2.0 and up.
Minor modifications to the scripts should allow support in older
versions.
To configure UDA to be used with Hadoop, you need to configure:
HADOOP_UDA_SETUP : Set to yes
HADOOP_UDA_JAR : Set to the path for the uda jar file, such as
uda-hadoop-2.x.jar for Hadoop 2.2.0.
HADOOP_UDA_LIBPATH : Set to location of libuda.so. Necessary if your
libuda.so isn't in a common location (e.g. /usr/lib).
You'll find starting point scripts such as
magpie.sbatch-hadoop-with-uda and magpie.msub-slurm-hadoop-with-uda to
begin configuration.
If you are building UDA yourself, be aware that the jar may have
difficulty finding the libuda.so file. Some appropriately placed
symlinks in your filesystem should solve that problem. Please refer
to error logs to see the paths you may be missing.
Remember, that when running your job, you may also need to link to the
UDA jar file to your job run, such as "--libjars /mypath/uda-hadoop-2.x.jar"
Basics of HDFS over Lustre/NetworkFS
------------------------------------
Instead of using local disk, we designate a Lustre/network file system
directory to "emulate" local disk for each compute node. For example,
lets say you have 4 compute nodes. If we create the following paths
in Lustre,
/lustre/myusername/node-0
/lustre/myusername/node-1
/lustre/myusername/node-2
/lustre/myusername/node-3
We can give each of these paths to one of the compute nodes, which
they can treat like a local disk. HDFS operates on top of these
directories just as though there were a local disk on the server.
Q: Does that mean I have to constantly rebuild HDFS everytime I start
a job?
A: No, using node ranks, "disk-paths" can be consistently assigned to
nodes so that all your HDFS files from a previous run can exist on
a later run. The next time you run your job, it doesn't matter
what server you're running on, b/c your scheduler/resource manager
will assign that node it's appropriate rank. The node will
subsequently load HDFS from its appropriate directory.
Q: But that'll mean I have to consistently use the same number of
cluster nodes?
A: Generally speaking no, but you can hit issues if you don't. Just
imagine what HDFS issues if you were on a traditional Hadoop
cluster and added or removed nodes.
Generally speaking, increasing the number of nodes you use for a
job is fine. Data you currently have in HDFS is still there and
readable, but it is not viewed as "local" according to HDFS and
more network transfers will have to happen. You may wish to
rebalance the HDFS blocks though. The convenience script
hadoop-rebalance-hdfs-over-lustre-or-hdfs-over-networkfs-if-increasing-nodes-script.sh
be used instead.
(Special Note: The start-balancer.sh that is
normally used probably will not work. All of the paths are in
Lustre/NetworkFS, so the "free space" on each "local" path is identical,
messing up calculations for balancing (i.e. no "local disk" is
more/less utilized than another).
Decreasing nodes is a bit more dangerous, as data can "disappear"
just like if it were on a traditional Hadoop cluster. If you try
to scale down the number of nodes, you should go through the
process of "decomissioning" nodes like on a real cluster, otherwise
you may lose data. You can decomission nodes through the
hadoop-hdfs-over-lustre-or-hdfs-over-networkfs-nodes-decomission-script.sh
convenience script.
Q: What should HDFS replication be?
A: The scripts in this package default to HDFS replication of 3 when
HDFS over Lustre is done. If HDFS replication is > 1, it can
improve performance of your job reading in HDFS input b/c there
will be fewer network transfer of data (i.e. Hadoop may need to
transfer "non-local" data to another node). In addition, if a
datanode were to die (i.e. a node crashes) Hadoop has the ability
to survive the crash just like in a traditional Hadoop cluster.
The trade-off is space and HDFS writes vs HDFS reads. With lower
HDFS replication (lowest is 1) you save space and decrease time for
writes. With increased HDFS replication, you perform better on
reads.
Q: What if I need to upgrade the HDFS version I'm using.
A: If you want to use a different Hadoop version than what you started
with, you will have to go through the normal upgrade or rollback
precedures for Hadoop.
With Hadoop versions 2.2.0 and newer, there is a seemless upgrade
path done by specifying "-upgrade" when running the "start-dfs.sh"
script. This is implemented in the "upgradehdfs" option for
HADOOP_MODE in the launch scripts.
Pro vs Con of HDFS over Lustre/NetworkFS vs. Posix FS (e.g. rawnetworkfs, etc.)
-------------------------------------------------------------------------------
Here are some pros vs. cons of using a network filesystem directly vs
HDFS over Lustre/NetworkFS.
HDFS over Lustre/NetworkFS:
Pro: Portability w/ code that runs against a "traditional" Hadoop
cluster. If it runs on a "traditional" Hadoop cluster w/ local disk,
it should run fine w/ HDFS over Lustre/NetworkFS.
Con: Must always run job w/ Hadoop & HDFS running as a job.
Con: Must "import" and "export" data from HDFS using job runs, cannot
read/write directly. On some clusters, this may involve a double copy
of data. e.g. first need to copy data into the cluster, then run job to
copy data into HDFS over Lustre/NetworkFS.
Con: Possible difficulty changing job size on clusters.
Con: If HDFS replication > 1, more space used up.
Posix FS directly:
Pro: You can read/write files to Lustre without Hadoop/HDFS running.
Pro: Less space used up.
Pro: Can adjust job size easily.
Con: Portability issues w/ code that usually runs on HDFS. As an
example, HDFS has no concept of a working directory while Posix
filesystems do. In addition, absolute paths will be different. Code
will have to be adjusted for this.
Con: User must "manage" and organize their files directly, not gaining
the block advantages of HDFS. If not handled well, this can lead to
performance issues. For example, a Hadoop job that creates a 1
terabyte file under HDFS is creating a file made up of smaller HDFS
blocks. The same job may create a single 1 terabyte file under access
to the Posix FS directly. In the case of Lustre, striping of the file
must be handled by the user to ensure satisfactory performance.
Troubleshooting Hadoop
----------------------
1) When running Hadoop w/ HDFS over Lustre or a NetworkFS, if you see
errors like the following:
Waiting 30 more seconds for namenode to exit safe mode
Waiting 30 more seconds for namenode to exit safe mode
Namenode never exited safe mode, setup problem or maybe need to increase MAGPIE_STARTUP_TIME
There is one of several likely problems. Please look at the log
file for the namenode. You can find it in the master node in the
location indicated by HADOOP_LOG_DIR. You can also set
MAGPIE_POST_JOB_RUN to
post-job-run-scripts/magpie-gather-config-files-and-logs-script.sh
to gather the log file for you. See "General Advanced Usage" below
for more info.
A) If you see messages such as the following in the log
The reported blocks 169 needs additional 26 blocks to reach the threshold 0.9990 of total blocks 195.
The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
There are several possibilities.
- If the number of reported blocks continually increases and the
additional blocks decreases, HDFS's namenode may have simply not
had enough time to read all of the blocks and setup HDFS.
Increase MAGPIE_STARTUP_TIME.
- If there is no progress on the number of reported blocks or
additional blocks, HDFS's namenode cannot find the blocks it needs
to complete setup. This can be due to several reasons.
- You have decreased the number of nodes in your HDFS allocation.
Similar to if you removed nodes from a real HDFS cluster, data
has "disappeared". See above for information on using the
hadoop-hdfs-over-lustre-or-hdfs-over-networkfs-nodes-decomission-script.sh
script if you truly want to use fewer nodes.
- You have lost blocks of data due to corruption. This can be
due to variety of reasons. See script
hadoop-hdfs-fsck-cleanup-corrupted-blocks-script.sh to try and
fix the corruption.
B) If you see messages such as the following in the log
File system image contains an old layout version -56.
An upgrade to version -60 is required.
Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade.
You are likely using a newer version of Hadoop against an older
version of HDFS. If you are upgrading with Hadoop 2.2.0 or newer,
there is a seemless upgrade path. Run the "upgradehdfs" option by
setting it in HADOOP_MODE in your launch scripts.
Instructions For Using Pig
--------------------------
0) If necessary, download your favorite version of Pig off of Apache
and install it into a location where it's accessible on all cluster
nodes. Usually this is on a NFS home directory.
Make sure that the version of Pig you install is compatible with
the version of Hadoop you are using.
In some cases, a re-compile of Pig may be necessary. For example,
by default Pig 0.12.0 works against the 0.20.0 (i.e. Hadoop 1.0)
branch of Hadoop. You may need to modify the Pig build.xml to work
with the 0.23.0 branch (i.e. Hadoop 2.0).
See below about misc/magpie-apache-download-and-setup.sh, which may
make the downloading easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts in script-sbatch, Moab Msub+Slurm scripts in
script-msub-slurm, and Moab Msub+Torque scripts in
script-msub-torque.
You'll likely want to start with the base hadoop+pig script
(e.g. magpie.sbatch-hadoop-and-pig) for your scheduler/resource
manager. If you wish to configure more, you can choose to start
with the base script (e.g. magpie.sbatch) which contains all
configuration options.
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'pig'
JAVA_HOME : B/c you need to ...
3) Setup the essentials for Pig.
PIG_SETUP : Set to yes
PIG_VERSION : Set appropriately.
PIG_HOME : Where your pig code is. Typically in an NFS mount.
PIG_LOCAL_DIR : A small place for conf files and log files local to
each node. Typically /tmp directory.
4) Select how your job will run by setting PIG_MODE. The first
time you'll probably want to run w/ 'testpig' mode just to try
things out and make things look setup correctly.
After this, you may want to run with 'interactive' mode to play
around and figure things out. In the job output you will see
output similar to the following:
ssh node70
setenv HADOOP_CONF_DIR "/tmp/achu/hadoop/ajobname/1081559/conf"
These instructions will inform you how to login to the master node
of your allocation and how to initialize your session. Once in
your session. You can do as you please. For example, you can
launch a pig job (bin/pig ...). There will also be instructions in
your job output on how to tear the session down cleanly if you wish
to end your job early.
Once you have figured out how you wish to run your job, you will
likely want to run with 'script' mode. Create a Pig script and set
it in PIG_SCRIPT_PATH, and then run your job.
5) Pig requires Hadoop, so ensure the Hadoop is configured and also in
your submission script. See above for Hadoop setup instructions.
6) Submit your job into the cluster by running "sbatch -k
./magpie.sbatchfile" for Slurm or "msub ./magpie.msubfile" for
Moab. Add any other options you see fit.
7) Look at your job output file to see your output. There will also
be some notes/instructions/tips in the output file for viewing the
status of your job in a web browser, environment variables you wish
to set if interacting with it, etc.
See below on "General Advanced Usage" for additional tips.
Instructions For Running Zookeeper
----------------------------------
0) If necessary, download your favorite version of Zookeeper off of
Apache and install it into a location where it's accessible on all
cluster nodes. Usually this is on a NFS home directory.
See below about misc/magpie-apache-download-and-setup.sh, which may
make the downloading easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts in script-sbatch, Moab Msub+Slurm scripts in
script-msub-slurm, and Moab Msub+Torque scripts in
script-msub-torque.
As Zookeeper is predominantly a node coordination service used by
other services (e.g. Hbase, Storm), it's likely in the main
submission scripts for those big data projects. If the Zookeeper
section isn't in it, just copy the Zookeeper section from base
script (e.g. magpie.sbatch) into it.
2) Setup your job essentials at the top of the submission script. See
other projects (e.g. Hbase, Storm) for details on this setup.
Be aware of how you many nodes you allocate for your job when
running Zookeeper. Zookeeper normally runs on cluster nodes
separate from the rest (e.g. separate from nodes running HDFS or
Hbase Regionservers). So you may need to increase your node count.
For example, if you desire 3 Zookeeper servers and 8 compute nodes,
your total node count should be 12 (1 master, 8 compute, 3
Zookeeper).
3) Setup the essentials for Zookeeper.
ZOOKEEPER_SETUP : Set to yes
ZOOKEEPER_VERSION : Set appropriately.
ZOOKEEPER_HOME : Where your zookeeper code is. Typically in an NFS
mount.
ZOOKEEPER_REPLICATION_COUNT : Number of nodes in your Zookeeper
quorom.
ZOOKEEPER_MODE : This will almost certainly be "launch".
ZOOKEEPER_FILESYSTEM_MODE : most will likely want "networkfs" so
data files can be stored to Lustre. If you have local disks such
as SSDs, you can use "local" instead, and set ZOOKEEPER_DATA_DIR to
the local SSD path.
ZOOKEEPER_DATA_DIR : The base path for where you will store
Zookeeper data. If a local SSD is available, it may be preferable
to set this to a local drive and set ZOOKEEPER_FILESYSTEM_MODE
above to "local".
ZOOKEEPER_LOCAL_DIR : A small place for conf files and log files
local to each node. Typically /tmp directory.
4) Run your job as instructions dictate in other project sections
(e.g. Hbase, Storm).
Instructions For Running Hbase
------------------------------
0) If necessary, download your favorite version of Hbase off of Apache
and install it into a location where it's accessible on all cluster
nodes. Usually this is on a NFS home directory.
See below about patches that may be necessary for Hbase depending
on your environment and Hbase version.
See below about misc/magpie-apache-download-and-setup.sh, which may
make the downloading and patching easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts in script-sbatch, Moab Msub+Slurm scripts in
script-msub-slurm, and Moab Msub+Torque scripts in
script-msub-torque.
You'll likely want to start with the base hbase w/ hdfs script
(e.g. magpie.sbatch-hbase-with-hdfs) for your scheduler/resource
manager. If you wish to configure more, you can choose to start
with the base script (e.g. magpie.sbatch) which contains all
configuration options.
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'hbase'
JAVA_HOME : B/c you need to ...
3) Setup the essentials for Hbase.
HBASE_SETUP : Set to yes
HBASE_VERSION : Set appropriately.
HBASE_HOME : Where your Hbase code is. Typically in an NFS
mount.
HBASE_LOCAL_DIR : A small place for conf files and log files local
to each node. Typically /tmp directory.
4) Select how your job will run by setting HBASE_MODE. The first time
you'll probably want to run w/ 'performanceeval' mode just to try
things out and make things look setup correctly.
After this, you may want to run with 'interactive' mode to play
around and figure things out. In the job output you will see
output similar to the following:
ssh node70
setenv HBASE_CONF_DIR "/tmp/username/hbase/ajobname/1081559/conf"
cd /home/username/hbase-0.98.9-hadoop2
These instructions will inform you how to login to the master node
of your allocation and how to initialize your session. Once in
your session. You can do as you please. For example, you can
interact with the Hbase shell to start (bin/hbase shell). There
will also be instructions in your job output on how to tear the
session down cleanly if you wish to end your job early.
Once you have figured out how you wish to run your job, you will
likely want to run with 'script' mode. Create a script that will
run your job/calculation automatically, set it in
HBASE_SCRIPT_PATH, and then run your job. You can find an example
job script in examples/hbase-example-job-script. See below on
"Exported Environment Variables", for information on exported
environment variables that may be useful in scripts.
5) Hbase requires HDFS, so ensure the Hadoop w/ HDFS is configured and
also in your submission script. MapReduce is not needed with Hbase
but can be setup along with it. See above for Hadoop setup
instructions.
6) Hbase requires Zookeeper, so setup the essentials for Zookeeper.
See above for Zookeeper setup instructions.
7) Submit your job into the cluster by running "sbatch -k
./magpie.sbatchfile" for Slurm or "msub ./magpie.msubfile" for
Moab. Add any other options you see fit.
8) Look at your job output file to see your output. There will also
be some notes/instructions/tips in the output file for viewing the
status of your job in a web browser, environment variables you wish
to set if interacting with it, etc.
See below on "General Advanced Usage" for additional tips.
Hbase Notes
-----------
If you increase the size of your node allocation when running Hbase on
HDFS over Lustre or HDFS over NetworkFS, data/regions will not be
balanced over all of the new nodes. Think of this similarly to how
data would not be distributed evenly if you added new nodes into a
traditional Hbase/Hadoop cluster. Over time Hbase will rebalance data
over the new nodes.
Instructions For Spark
----------------------
0) If necessary, download your favorite version of Spark off of Apache
and install it into a location where it's accessible on all cluster
nodes. Usually this is on a NFS home directory. You may need to
set SPARK_HADOOP_VERSION and run 'sbt/sbt assembly' to prepare
Spark for execution. If you are not using the default Java
implementation installed in your system, you may need to edit
sbt/sbt to use the proper Java version you desire (this is the case
with 0.9.1, not the case in future versions).
See below about patches that may be necessary for Spark depending
on your environment and Spark version.
See below about misc/magpie-apache-download-and-setup.sh, which may
make the downloading and patching easier.
1) Select an appropriate submission script for running your job. You
can find them in the directory submission-scripts/, with Slurm
Sbatch scripts in script-sbatch, Moab Msub+Slurm scripts in
script-msub-slurm, and Moab Msub+Torque scripts in
script-msub-torque.
You'll likely want to start with the base spark script
(e.g. magpie.sbatch-spark) or spark w/ hdfs
(e.g. magpie.sbatch-spark-with-hdfs) for your scheduler/resource
manager. If you wish to configure more, you can choose to start
with the base script (e.g. magpie.sbatch) which contains all
configuration options.
It should be noted that you can run Spark without HDFS. You can
access files normally through "file://<path>".
2) Setup your job essentials at the top of the submission script. As
an example, the following are the essentials for running with Moab.
#MSUB -l nodes : Set how many nodes you want in your job
#MSUB -l walltime : Set the time for this job to run
#MSUB -l partition : Set the job partition
#MSUB -q <my batch queue> : Set to batch queue
MOAB_JOBNAME : Set your job name.
MAGPIE_SCRIPTS_HOME : Set where your scripts are
MAGPIE_LOCAL_DIR : For scratch space files
MAGPIE_JOB_TYPE : This should be set to 'spark'
JAVA_HOME : B/c you need to ...
3) Setup the essentials for Spark.
SPARK_SETUP : Set to yes
SPARK_VERSION : Set appropriately.
SPARK_HOME : Where your Spark code is. Typically in an NFS
mount.