Skip to content

Commit

Permalink
Merge branch 'release/2.4.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
fedelemantuano committed Apr 5, 2018
2 parents aa6e755 + b2b47d5 commit ff26341
Show file tree
Hide file tree
Showing 75 changed files with 1,415 additions and 404 deletions.
13 changes: 7 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
*.pyc
_build
_resources
.*.swp
.DS_Store
.coverage
.DS_Store
.env
.idea/
.ropeproject
SpamScope.egg-info/
_build
_resources
.vscode/
*.pyc
build/
dist/
logs/
venv/
SpamScope.egg-info/
venv/
3 changes: 1 addition & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ before_install:
- sudo apt-get -y -o Dpkg::Options::="--force-confnew" install docker-ce


# Build latest images spamscope-root, spamscope-elasticsearch
# make images
# Build spamscope-elasticsearch
- if [ "$TRAVIS_BRANCH" == "master" ]; then
git clone -b $TRAVIS_BRANCH --single-branch https://github.com/SpamScope/spamscope-dockerfile-elasticsearch.git $DOCKER_ELASTICSEARCH_PATH;
cd $DOCKER_ELASTICSEARCH_PATH && docker build --build-arg SPAMSCOPE_VER=master -t $DOCKER_USERNAME/spamscope-elasticsearch . && cd -;
Expand Down
201 changes: 83 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,21 @@
[![Build Status](https://travis-ci.org/SpamScope/spamscope.svg?branch=master)](https://travis-ci.org/SpamScope/spamscope)
[![Coverage Status](https://coveralls.io/repos/github/SpamScope/spamscope/badge.svg?branch=develop)](https://coveralls.io/github/SpamScope/spamscope?branch=develop)
[![BCH compliance](https://bettercodehub.com/edge/badge/SpamScope/spamscope?branch=develop)](https://bettercodehub.com/)
[![](https://images.microbadger.com/badges/image/fmantuano/spamscope-elasticsearch.svg)](https://microbadger.com/images/fmantuano/spamscope-elasticsearch "Get your own image badge on microbadger.com")

![SpamScope](https://raw.githubusercontent.com/SpamScope/spamscope/develop/docs/logo/spamscope.png)

# Overview
SpamScope is an advanced spam analysis tool that use [Apache Storm](http://storm.apache.org/) with [streamparse](https://github.com/Parsely/streamparse) to process a stream of mails.
To understand how SpamScope works, I suggest to read these overviews:
- [Apache Storm Concepts](http://storm.apache.org/releases/1.2.1/Concepts.html)
- [Streamparse Quickstart](http://streamparse.readthedocs.io/en/stable/quickstart.html)

## Overview
SpamScope is an advanced spam analysis tool that use [Apache Storm](http://storm.apache.org/) with [streamparse](https://github.com/Parsely/streamparse) to process a stream of mails.

It's possible to analyze more than 5 milions of mails for day with a 4 cores server and 4 GB of RAM (without third party analysis).
In general the first step is run Apache Storm, then you can run the topologies on it.
SpamScope has some topologies in [topologies folder](./topologies/), but you can make others topologies.

![Schema topology](docs/images/schema_topology.png?raw=true "Schema topology")

### Why should I use SpamScope
# Why should I use SpamScope
- It's very fast: the job is splitted in functionalities that work in parallel.
- It's flexible: you can choose what SpamScope has to do.
- It's distributed: SpamScope uses Apache Storm, free and open source distributed realtime computation system.
Expand All @@ -24,118 +26,92 @@ It's possible to analyze more than 5 milions of mails for day with a 4 cores ser
- It's free and open source (for special functions you can contact me).
- It can analyze Outlook msg.

### Distributed
## Distributed
SpamScope uses Apache Storm that allows you to start small and scale horizontally as you grow. Simply add more workers.

### Flexibility
You can choose your mails input sources (with **spouts**) and your functionalities (with **bolts**).
## Flexibility
You can choose your mails input sources (with **spouts**) and your functionalities (with **bolts**).

SpamScope comes with the following bolts:
- tokenizer splits mail in token like headers, body, attachments and it can filter emails, attachments and ip addresses already seen
- phishing looks for your keywords in email and connects email to targets (bank, your customers, etc.)
- raw_mail is for all third party tools that analyze raw mails like SpamAssassin
- attachments analyzes all mail attachments and uses third party tools like VirusTotal
- network analyzes all sender ip addresses with third party tools like Shodan
- urls extracts all urls in email and attachments
- json_maker and outputs make the json report and save it
SpamScope comes with the following bolts:
- **tokenizer** splits mail in token like headers, body, attachments and it can filter emails, attachments and ip addresses already seen
- **phishing** looks for your keywords in email and connects email to targets (bank, your customers, etc.)
- **raw_mail** is for all third party tools that analyze raw mails like SpamAssassin
- **attachments** analyzes all mail attachments and uses third party tools like VirusTotal
- **network** analyzes all sender ip addresses with third party tools like Shodan
- **urls** extracts all urls in email and attachments
- **json_maker** and **outputs** make the json report and save it

### Store where you want
## Store where you want
You can build your custom output bolts and store your data in Elasticsearch, MongoDB, filesystem, etc.

### Build your topology
## Build your topology
With streamparse tecnology you can build your topology in Python, add and/or remove spouts and bolts.

### API
## API
For now SpamScope doesn't have its own API, because it isn't tied to any tecnology.
If you use `Redis` as spout (input), you'll use Redis API to put mails in topology.
If you use `Elasticsearch` as output, you'll use Elasticsearch API to get results.

It's possible to develop a middleware API that it talks with input, output and changes the configuration, but now there isn't.

### Apache 2 Open Source License
# Apache 2 Open Source License
SpamScope can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.


[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif "Donate")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)



## SpamScope on Web
# SpamScope on Web
- [Shodan Applications & Integrations](https://developer.shodan.io/apps)
- [The Honeynet Project](http://honeynet.org/node/1329)
- [securityonline.info](http://securityonline.info/pcileech-direct-memory-access-dma-attack-software/)
- [jekil/awesome-hacking](https://github.com/jekil/awesome-hacking)

# Authors


## Output example
- [Raw example email](https://goo.gl/wMBfbF).
- [SpamScope output](https://goo.gl/MS7ugy).
- [SpamScope complete output](https://goo.gl/fr4i7C).



## Authors

### Main Author
## Main Author
Fedele Mantuano (**LinkedIn**: [Fedele Mantuano](https://www.linkedin.com/in/fmantuano/))

# Requirements
For operating system requirements you can read [Ansible playbooks](./ansible), that go into details.

For Python requirements you can read:
* [mandatory requirements](./requirements.txt)
* [optional requirements](./requirements_optional.txt)

## Installation
For more details please visit the [wiki page](https://github.com/SpamScope/spamscope/wiki/Installation).

Clone repository

```
$ git clone https://github.com/SpamScope/spamscope.git
```

then enter in SpamScope directory and install it:
_Thug_ is another optional requirement, that it's not in requirements. See [Thug section](#thug-optional) for more details.

```
$ python setup.py install
```
or
## Apache Storm
[Apache Storm](http://storm.apache.org/) is a free and open source distributed realtime computation system.

```
$ pip install SpamScope
```

If you want to install all optional packages:

```
$ git clone https://github.com/SpamScope/spamscope.git
$ pip install -r requirements_optional
```
## streamparse
[streamparse](https://github.com/Parsely/streamparse) lets you run Python code against real-time streams of data via Apache Storm.

Thug is not in requirements_optional. To install it go in Thug section.
## mail-parser
[mail-parser](https://github.com/SpamScope/mail-parser) is the parsing for raw email of SpamScope.

### Faup
## Faup
[Faup](https://github.com/stricaud/faup) stands for Finally An Url Parser and is a library and command line tool to parse URLs and normalize fields.
To install it follow the [wiki](https://github.com/SpamScope/spamscope/wiki/Installation#faup).

### SpamAssassin (optional)
## rarlinux (optional)
[rarlinux](https://www.rarlab.com/) unarchives rar file.

## SpamAssassin (optional)
SpamScope can use [SpamAssassin](http://spamassassin.apache.org/) an open source anti-spam to analyze every mails.

### Tika (optional)
SpamScope can use [Tika App](https://tika.apache.org/) to parse every attachments.
The **Apache Tika** toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
To install it follow the [wiki](https://github.com/SpamScope/spamscope/wiki/Installation#tika-app-optional).
To enable Apache Tika analisys, you should set it in `attachments` section.
## Apache Tika (optional)
SpamScope can use [Apache Tika](https://tika.apache.org/) to parse every attachments.
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
To use Apache Tika in SpamScope you must install [tika-app-python](https://github.com/fedelemantuano/tika-app-python) with `pip` and [Apache Tika](https://tika.apache.org/download.html).

### Thug (optional)
## Thug (optional)
From release v1.3 SpamScope can analyze Javascript and HTML attachments with [Thug](https://github.com/buffer/thug).
If you want to analyze the attachments with Thug, follow [these instructions](http://buffer.github.io/thug/doc/build.html) to install it and enable it in `attachments` section.
If you want to analyze the attachments with Thug, follow [these instructions](http://buffer.github.io/thug/doc/build.html) to install it. Enable it in `attachments` section of [main configuration file](./conf/spamscope.example.yml).

What is Thug? From README project:
```
Thug is a Python low-interaction honeyclient aimed at mimicing the behavior of a web browser in order to detect and emulate malicious contents.
```
> Thug is a Python low-interaction honeyclient aimed at mimicing the behavior of a web browser in order to detect and emulate malicious contents.
You can see a complete SpamScope report with Thug analysis [here](https://goo.gl/Y4kWCv).

Thug analysis can be very slow and you can have `heartbeat timeout` in Apache Storm.
Thug analysis can be very slow and you can have `heartbeat timeout` errors in Apache Storm.
To avoid any issue set `supervisor.worker.timeout.secs`:

```
Expand All @@ -144,68 +120,59 @@ nr. user agents * timeout_thug < supervisor.worker.timeout.secs

The best value for `threshold` is 1.

### VirusTotal (optional)
## VirusTotal (optional)
It's possible add to results (for mail attachments and sender ip address) the VirusTotal report. You need a private API key.

### Shodan (optional)
## Shodan (optional)
It's possible add to results the Shodan report for sender ip address. You need a private API key.

### Elasticsearch (optional)
## Elasticsearch (optional)
It's possible to store the results in Elasticsearch. In this case you should install `elasticsearch` package.

### Redis (optional)
## Redis (optional)
It's possible to store the results in Redis. In this case you should install `redis` package.

# Configuration
Read the [example of main configuration file](./conf/spamscope.example.yml).
The default value where SpamScope will search the configuration file is `/etc/spamscope/spamscope.yml`, but it's possible to set the environment variable `SPAMSCOPE_CONF_FILE`:

```
$ export SPAMSCOPE_CONF_FILE=/etc/spamscope/spamscope.yml
```

## Configuration
For more details please visit the [wiki page](https://github.com/SpamScope/spamscope/wiki/Configuration) or read the comments in the files in `conf` folder.

You can decide to **filter emails, attachments and ip addresses** already analyzed. All filters are in `tokenizer` bolt section.

When you change the configuration file, SpamScope automatically reloads the new changes.

# Installation
You can use:
* [Docker images](./docker/README.md) to run SpamScope with docker engine
* [Ansible](./ansible/README.md): to install and run SpamScope on server

## Usage
# Topologies
SpamScope comes with three topologies:
- spamscope_debug (save json on file system)
- spamscope_elasticsearch
- spamscope_redis

and a general configuration file `spamscope.example.yml` in `conf/` folder.
- [spamscope_debug](./topologies/spamscope_debug.py): the output are JSON files on file system.
- [spamscope_elasticsearch](./topologies/spamscope_elasticsearch.py): the output are stored in Elasticsearch indexes.
- [spamscope_redis](./topologies/spamscope_redis.py): the output are stored in Redis.


If you want submit SpamScope topology use `spamscope-topology submit` tool. For more details `spamscope-topology submit -h`:
If you want submit SpamScope topology use `spamscope-topology submit` tool. For more details [see SpamScope cli tools](src/cli/README.md):

```
$ spamscope-topology submit --topology {spamscope_debug,spamscope_elasticsearch,spamscope_redis}
```


### Important
It's very important to set the main configuration file. The default value is `/etc/spamscope/spamscope.yml`, but it's possible to set the environment variable `SPAMSCOPE_CONF_FILE`:

```
$ export SPAMSCOPE_CONF_FILE=/etc/spamscope/spamscope.yml
```

If you use Elasticsearch output, I suggest you to use Elasticsearch template that comes with SpamScope.

### Apache Storm settings
It's possible change the default settings for all Apache Storm options. I suggest for SpamScope these options:
It's possible to change the default settings for all Apache Storm options. I suggest to change these options:

- **topology.tick.tuple.freq.secs**: reload configuration of all bolts
- **topology.max.spout.pending**: Apache Storm framework will then throttle your spout as needed to meet the `topology.max.spout.pending` requirement
- **topology.sleep.spout.wait.strategy.time.ms**: max sleep for emit new tuple (mail)

For more details you can refer [here](http://streamparse.readthedocs.io/en/stable/quickstart.html).

To simplify this operation, SpamScope comes with a custom tool `spamscope-topology submit` where you can choose the values of all these parameters.

You can use `spamscope-topology submit` to do these changes.

# Important
If you are using Elasticsearch output, I suggest you to use [Elasticsearch templates](./conf/templates) that comes with SpamScope.

## Unittest
# Unittest
SpamScope comes with unittests for each modules. In bolts and spouts there are no special features, all intelligence is in external modules.
All unittests are in `tests` folder.
All unittests are in [tests folder](tests/).

To have complete tests you should set the followings enviroment variables:

Expand All @@ -216,21 +183,19 @@ $ export VIRUSTOTAL_APIKEY="your key"
$ export ZEMANA_ENABLED=True
$ export ZEMANA_APIKEY="your key"
$ export ZEMANA_PARTNERID="your partner id"
$ export ZEMANA_USERID="your userid"
$ export ZEMANA_USERID="your userid"
$ export SHODAN_ENABLED=True
$ export SHODAN_APIKEY="your key"
$ export SPAMASSASSIN_ENABLED=True
```

# Output example
This is a [raw email](https://goo.gl/wMBfbF) that I analyzed with SpamScope:
- [SpamScope output](https://goo.gl/fr4i7C).

## Docker images
It's possible to use complete Docker images with Apache Storm and SpamScope. Take the following images:

- [Deps](https://hub.docker.com/r/fmantuano/spamscope-deps/): to use as base image
- [Elasticsearch](https://hub.docker.com/r/fmantuano/spamscope-elasticsearch/): integrated with Elasticsearch

This is another example with [Thug analysis](https://goo.gl/Y4kWCv).

## Screenshots
# Screenshots
![Apache Storm](docs/images/Docker00.png?raw=true "Apache Storm")

![SpamScope](docs/images/Docker01.png?raw=true "SpamScope")
Expand Down
Loading

0 comments on commit ff26341

Please sign in to comment.