Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS Silentarmy v5 #69

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 32 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,38 @@
# Current tip

* Add nicehash compatibility (stratum servers fixing 17 bytes of the nonce)
* Add nerdralph's optimization (OPTIM_FOR_FGLRX)
* Implement mining.extranonce.subscribe (kenshirothefist)
* Optimization: +10% speedup, increase collision items tracked per thread
(nerdralph). 'make test' finds 196 sols again.

# Version 5 (11 Nov 2016)

* Optimization: major 2x speedup (eXtremal) by storing 8 atomic counters in
1 uint, and by reducing branch divergence when iterating over and XORing Xi's.
Note that as a result of these optimizations, sa-solver compiled with
NR_ROWS_LOG=20 now only finds 182 out of 196 existing solutions ("make test"
verification data was adjusted accordingly)
* Defaulting OPTIM_SIMPLIFY_ROUND to 1; GPU memory usage down to 0.8 GB per
instance
* Optimization: significantly reduce CPU usage and PCIe bandwidth (before:
~100 MB/s/GPU, after: 0.5 MB/s/GPU), accomplished by filtering invalid
solutions on-device
* Optimization: reduce size of collisions[] array; +7% speed increase measured
on RX 480 and R9 Nano using AMDGPU-PRO 16.40
* Implement stratum method client.reconnect
* Avoid segfault when encountering an out-of-range input
* For simplicity `-i <header>` now only accepts 140-byte headers
* Update README.md with Nvidia performance numbers
* Fix mining on Xeon Phi and CPUs (fix OpenCL warnings)
* Fix compilation warnings and 32-bit platforms

# Version 4 (08 Nov 2016)

* Add Nvidia GPU support (fix more unaligned memory accesses)
* Add nerdralph's optimization (OPTIM_SIMPLIFY_ROUND) for potential +30%
speedup, especially useful on Nvidia GPUs
* Drop the Python 3.5 dependency; now requires only Python 3.3 or above (lhl)
* Drop the libsodium dependency; instead use our own SHA256 implementation
* Add nicehash compatibility (stratum servers fixing 17 bytes of the nonce)
* Only apply set_target to *next* mining job
* Do not abandon previous mining jobs if clean_jobs is false
* Fix KeyError's when displaying stats
Expand Down
46 changes: 36 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,19 +1,41 @@
#Detect OS
UNAME := $(shell uname)
ifeq ($(UNAME), Darwin)
# Mac OS Frameworks
OPENCL_HEADERS = "/System/Library/Frameworks/OpenCL.framework/Headers/"
LIBOPENCL = "/System/Library/Frameworks/OpenCL.framework/Versions/Current/Libraries"
LDLIBS = -framework OpenCL
# gcc installed with brew or macports cause xcode gcc is only clang wrapper
CC = gcc-6
else
# Change this path if the SDK was installed in a non-standard location
OPENCL_HEADERS = "/opt/AMDAPPSDK-3.0/include"
# By default libOpenCL.so is searched in default system locations, this path
# lets you adds one more directory to the search path.
LIBOPENCL = "/opt/amdgpu-pro/lib/x86_64-linux-gnu"

LDLIBS = -lOpenCL
CC = gcc
CPPFLAGS = -std=gnu99 -pedantic -Wextra -Wall -ggdb \
endif
CPPFLAGS = -I${OPENCL_HEADERS}
CFLAGS = -O2 -std=gnu99 -pedantic -Wextra -Wall \
-Wno-deprecated-declarations \
-Wno-overlength-strings \
-I${OPENCL_HEADERS}
-Wno-overlength-strings
LDFLAGS = -rdynamic -L${LIBOPENCL}
LDLIBS = -lOpenCL

OBJ = main.o blake.o sha256.o
INCLUDES = blake.h param.h _kernel.h sha256.h


CPPFLAGS = -I${OPENCL_HEADERS}
CFLAGS = -O2 -std=gnu99 -pedantic -Wextra -Wall -ggdb \
-Wno-deprecated-declarations \
-Wno-overlength-strings
LDFLAGS = -rdynamic -L${LIBOPENCL}

OBJ = main.o blake.o sha256.o
INCLUDES = blake.h param.h _kernel.h sha256.h


all : sa-solver

sa-solver : ${OBJ}
Expand All @@ -27,13 +49,17 @@ _kernel.h : input.cl param.h
echo ')_mrb_";' >>$@

test : sa-solver
./sa-solver --nonces 100 -v -v 2>&1 | grep Soln: | \
diff -u testing/sols-100 - | cut -c 1-75
@echo Testing...
@if res=`./sa-solver --nonces 100 -v -v 2>&1 | grep Soln: | \
diff -u testing/sols-100 -`; then \
echo "Test: success"; \
else \
echo "$$res\nTest: FAILED" | cut -c 1-75 >&2; \
fi
# When compiling with NR_ROWS_LOG != 20, the solutions it finds are
# different: testing/sols-100

clean :
rm -f sa-solver _kernel.h *.o _temp_*

re : clean all

.cpp.o :
${CC} ${CPPFLAGS} -o $@ -c $<
93 changes: 45 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,9 @@

Official site: https://github.com/mbevand/silentarmy

SILENTARMY is a [Zcash](https://z.cash) miner for Linux written in OpenCL with
multi-GPU support. The
[Stratum](https://github.com/str4d/zips/blob/77-zip-stratum/drafts/str4d-stratum/draft1.rst) protocol is implemented for connecting to mining pools. It runs
best on AMD GPUs but has also been reported to work on other OpenCL devices such
as Xeon Phi, Intel GPUs, and through OpenCL CPU drivers. (Nvidia GPUs are not
currently supported due to an
[issue](https://github.com/mbevand/silentarmy/issues/6).)
SILENTARMY is a free open source [Zcash](https://z.cash) miner for Linux
with multi-GPU and [Stratum](https://github.com/str4d/zips/blob/77-zip-stratum/drafts/str4d-stratum/draft1.rst) support. It is written in OpenCL and has been tested
on AMD/Nvidia/Intel GPUs, Xeon Phi, and more.

After compiling SILENTARMY, list the available OpenCL devices:

Expand Down Expand Up @@ -80,57 +76,40 @@ quick test/benchmark is simply:
`$ sa-solver --nonces 100`

Note: due to BLAKE2b optimizations in my implementation, if the header is
specified it must be 140 bytes and its last 12 bytes **must** be zero. For
convenience, `-i` can also specify a 108-byte nonceless header to which
`sa-solver` adds an implicit nonce of 32 zero bytes.
specified it must be 140 bytes and its last 12 bytes **must** be zero.

Use the verbose (`-v`) and very verbose (`-v -v`) options to show the solutions
and statistics in progressively more and more details.

# Performance

* 47.5 Sol/s with one R9 Nano
* 45.0 Sol/s with one R9 290X
* 41.0 Sol/s with one RX 480 8GB
* 115.0 sol/s with one R9 Nano
* 75.0 sol/s with one RX 480 8GB
* (TODO: add Nvidia performance numbers)

Note: the `silentarmy` **miner** automatically achieves this performance level,
however the `sa-solver` **command-line solver** by design runs only 1 instance
of the Equihash proof-of-work algorithm causing it to underperform. One must
manually run 2 instances of `sa-solver` (eg. in 2 terminal consoles) to
achieve the same performance level as the `silentarmy` **miner**.

Troubleshooting performance issues:
* By default SILENTARMY mines with only one device/GPU; make sure to specify
all the GPUs in the `--use` option, for example `silentarmy --use 0,1,2`
if the host has three devices with IDs 0, 1, and 2.
* If some GPUs have less than ~2.4 GB of GPU memory, run
`silentarmy --instances 1` (2 instances use ~2.4 GB of GPU memory,
1 instance uses ~1.2 GB of GPU memory.)
* If you are using an AMD GPU with the **Radeon Software Crimson Edition**
driver, as opposed to the **AMDGPU-PRO** driver, then edit param.h and set
`OPTIM_FOR_FGLRX` to 1. This will improve performance by +5% and reduce
GPU memory usage from 1.2 GB per instance to 805 MB per instance. But do
**not** set it if you are using the AMDGPU-PRO driver or else it will
degrade performance by -15% or more.
* If 1 instance still requires too much memory, edit `param.h` and set
`NR_ROWS_LOG` to `19` (this reduces the per-instance memory usage to ~670 MB)
and run with `--instances 1`.
of the Equihash proof-of-work algorithm causing it to slightly underperform by
5-10%. One must manually run 2 instances of `sa-solver` (eg. in 2 terminal
consoles) to achieve the same performance level as the `silentarmy` **miner**.

# Dependencies

SILENTARMY has primarily been tested with AMD GPUs on 64-bit Linux with
the **AMDGPU-PRO** driver (amdgpu.ko, for newer GPUs) and the **Radeon Software
Crimson Edition** driver (fglrx.ko, for older GPUs). Its only build
dependency is an OpenCL implementation.
SILENTARMY has only one build dependency: an OpenCL implementation. And it
has only one runtime dependency: Python 3.3 or later (needed to support the
use of the `yield from` syntax.)

Installation of the drivers and SDK can be error-prone, so below are
step-by-step instructions for the AMD OpenCL implementation (**AMD APP SDK**),
for Ubuntu 16.04 as well as Ubuntu 14.04 (beware: the `silentarmy` miner makes
use of Python's `ensure_future()` which requires Python 3.4.4, however Ubuntu
14.04 ships 3.4.3, therefore only the `sa-solver` tool is usable on Ubuntu
14.04.)
When running on AMD GPUs, install the **AMD APP SDK** (OpenCL implementation)
and either:
* the **AMDGPU-PRO** driver (amdgpu.ko, for newer GPUs), or
* the **Radeon Software Crimson Edition** driver (fglrx.ko, for older GPUs)

## Ubuntu 16.04
When running on Nvidia GPUs, install the Nvidia OpenCL development files,
and their binary driver.

Instructions are provided below for a few Linux versions.

## Ubuntu 16.04 / amdgpu

1. Download the [AMDGPU-PRO Driver](http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Install.aspx)
(as of 30 Oct 2016, the latest version is 16.40)
Expand All @@ -155,17 +134,28 @@ use of Python's `ensure_future()` which requires Python 3.4.4, however Ubuntu
8. Install system-wide by running as root (accept all the default options):
`$ sudo ./AMD-APP-SDK-v3.0.130.136-GA-linux64.sh`

9. Install compiler dependencies which you will need to compile SILENTARMY:
9. Install compiler dependencies in order to compile SILENTARMY:
`$ sudo apt-get install build-essential`

## Ubuntu 14.04
## Ubuntu 14.04 / fglrx

1. Install the official Ubuntu package:
`$ sudo apt-get install fglrx`
(as of 30 Oct 2016, the latest version is 2:15.201-0ubuntu0.14.04.1)

2. Follow steps 5-9 above.

## Ubuntu 16.04 / Nvidia

1. Install the OpenCL development files and the latest driver:
`$ sudo apt-get install nvidia-opencl-dev nvidia-361`

2. Either reboot, or load the kernel driver:
`$ modprobe nvidia_361`

3. Install compiler dependencies in order to compile SILENTARMY:
`$ sudo apt-get install build-essential`

## Arch Linux

1. Install the [silentarmy AUR package](https://aur.archlinux.org/packages/silentarmy/).
Expand All @@ -177,9 +167,9 @@ Compiling SILENTARMY is easy:
`$ make`

You may need to specify the paths to the locations of your OpenCL C headers
and libOpenCL.so if the Makefile does not find them:
and libOpenCL.so if the compiler does not find them, eg.:

`$ make OPENCL_HEADERS=/path/here LIBOPENCL=/path/there`
`$ make OPENCL_HEADERS=/usr/local/cuda-8.0/targets/x86_64-linux/include LIBOPENCL=/usr/local/cuda-8.0/targets/x86_64-linux/lib`

Self-testing the command-line solver (solves 100 all-zero 140-byte blocks with
their nonces varying from 0 to 99):
Expand Down Expand Up @@ -244,6 +234,8 @@ almost certainly bits 180-199), this is also discarded as a likely invalid
solution because this is statistically guaranteed to be all inputs repeated
at least once. This check is implemented in `kernel_sols()` (see
`likely_invalids`.)
* When input references are expanded on-GPU by `expand_refs()`, the code
checks if the last (512th) input is repeated at least once.
* Finally when the GPU returns potential solutions, the CPU also checks for
invalid solutions with duplicate inputs. This check is implemented in
`verify_sol()`.
Expand All @@ -261,7 +253,12 @@ Donations welcome: t1cVviFvgJinQ4w3C2m2CfRxgP5DnHYaoFC

I would like to thank these persons for their contributions to SILENTARMY,
in alphabetical order:
* eXtremal
* kenshirothefist
* lhl
* nerdralph
* poiuty
* solardiz

# License

Expand Down
97 changes: 97 additions & 0 deletions TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Troubleshooting

Follow this checklist to verify that your entire hardware and software
stack works (drivers, OpenCL, SILENTARMY).

## Driver / OpenCL installation

Run `clinfo` to list all the OpenCL devices. If it does not find all your
devices, something is wrong with your drivers and/or OpenCL stack. Uninstall
and reinstall your drivers. Here are good instructions:
https://hashcat.net/wiki/doku.php?id=frequently_asked_questions#i_may_have_the_wrong_driver_installed_what_should_i_do

## Check silentarmy

Does `./silentarmy --list` list your devices? If `clinfo` does, silentarmy
should list them as well.

## Basic operation

Run the Equihash solver `sa-solver` to solve the all-zero block. It should
report 2 solutions. Specify the device ID to test with `--use ID`

```
$ ./sa-solver --use 0
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 205.3 ms (9.7 Sol/s)
```

Note that `sa-solver` only supports 1 device at a time. It will not recognize
eg. `--use 0,1,2`.

## Correct results

Verify that `make test` reports valid Equihash solutions for 100 different
blocks:

```
$ make test
./sa-solver --nonces 100 -v -v 2>&1 | grep Soln: | \
diff -u testing/sols-100 - | cut -c 1-75
```

It should output nothing else. If you see a bunch of lines with numbers,
something is wrong with your hardware and/or drivers.

## Sustained operation on one device

Let the Equihash solver `sa-solver` run for multiple hours:

```
$ ./sa-solver --nonces 100000000
Solving default all-zero 140-byte header
Building program
Hash tables will use 1208.0 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Nonce 0100000000000000000000000000000000000000000000000000000000000000: 0 sols
...
```

It should not crash or hang.

## Mining

Run the miner without options. By default it will use the first device,
and connect to flypool with my donation address. These known-good parameters
should let you know easily if your machine can mine properly:

```
$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 48.9 sol/s [dev0 48.9] 1 share
Total 44.9 sol/s [dev0 44.9] 1 share
...
```

Verify that the number of shares increases over time.

## Performance

Not achieving the performance you expected?

* By default SILENTARMY mines with only one device/GPU; make sure to specify
all the GPUs in the `--use` option, for example `silentarmy --use 0,1,2`
if the host has three devices with IDs 0, 1, and 2.
* If a GPU has less than 2 GB of GPU memory, run `silentarmy --instances 1`
(1 instance uses ~0.8 GB of memory, 2 instances use ~1.6 GB of memory.)
* If 1 instance still requires too much memory, edit `param.h` and set
`NR_ROWS_LOG` to `19` (this reduces the per-instance memory usage to ~670 MB)
and run with `--instances 1`.
Loading