Skip to content

Commit

Permalink
Merge pull request #59 from ncsa/release/beta
Browse files Browse the repository at this point in the history
Release/beta
  • Loading branch information
longshuicy authored Sep 15, 2023
2 parents 2174765 + 4adbd54 commit 1a0f484
Show file tree
Hide file tree
Showing 40 changed files with 1,042 additions and 200 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ email_password.txt

# scripts
bae_docker.sh
docker-command-smile.sh
docker-command-bae.sh

# folders
Expand Down
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Docker building script for whole components [23](https://github.com/ncsa/standalone-smm-analytics/issues/23)
- Docker compose launch script [45](https://github.com/ncsa/standalone-smm-analytics/issues/45)
- Docker compose file using traefik [46](https://github.com/ncsa/standalone-smm-analytics/issues/46)
-

### Changed
- Hard coded rabbimq url changed to env variable [18](https://github.com/ncsa/standalone-smm-analytics/issues/18)
- Modified S3 url to env variable [21](https://github.com/ncsa/standalone-smm-analytics/issues/21)
- Renamed Minio related environment variables [31](https://github.com/ncsa/standalone-smm-analytics/issues/31)
- Rabbitmq handler's connection with dynamic credentials [41](https://github.com/ncsa/standalone-smm-analytics/issues/41)
- Docker compose file to work with new settings [42](https://github.com/ncsa/standalone-smm-analytics/issues/42)
- Docker compose file to work with new settings [42](https://github.com/ncsa/standalone-smm-analytics/issues/42)
- Updated README with docker compose information [50](https://github.com/ncsa/standalone-smm-analytics/issues/50)
- Created base image for sentiment analysis with model [55](https://github.com/ncsa/standalone-smm-analytics/issues/55)
- Created base image for name entity recognition with model [56](https://github.com/ncsa/standalone-smm-analytics/issues/56)
91 changes: 91 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

# Social Media Macroscope Analytics
Welcome to the Social Media Macroscope Analytics repository!
Here, you'll find the essential analytics components that power various projects within the Social Media Macroscope
ecosystem, such as the SMILE tool, BAE tool, and more. Our analytics suite is designed to provide powerful insights
and data-driven solutions for social media research and beyond.

## Get Started
### Deploying SMILE
SMILE can be deployed using docker-compose.
In this repository, there are two ways of deploying it,
one is using traefik, and the other is a conventional way using nginx.
[Traefik](https://traefik.io/traefik/) is s a modern reverse proxy and
load balancer that makes deploying microservices easy, and is designed to be
as simple as possible to operate.
It integrates with infrastructure components and configures itself automatically and dynamically.

### Using Docker Compose with Nginx (Deprecating soon)
#### Set up environment variables
- Use the script [docker-compose-smile.sh](./rabbitmq/docker-command-smile.sh)
- Alternatively, manually set following environment variables that start docker-compose with `docker-compose-smile.yml`
- environment variable information is in the script
- Following code block includes the setups for the local directory and this can be modified based on the convenience
- Note that some of the configuration variables are optional

### Using Docker Compose with traefik (Recommended)
- Use the script [docker-compose-smile-traefik.sh](./rabbitmq/docker-command-smile-traefik.sh)
- Alternatively, manually set the following environment variables then run docker-compose using the
`docker-compose-smile-traefik.yml`
- Note that some of the configuration variables are optional

#### Optional environment variables
- System setting. Set to true to use standalone containerized SMILE.
- DOCKERIZED=true
- If using algorithm deployed on AWS, then you must use a static IP address.
- LOCAL_ALGORITHM=true
- Single user mode vs multiple users mode.
- SINGLE_USER=false
- Settings for CILOGON (this section is not required if running in single user mode)
- CILOGON_CLIENT_ID={{cilogon id}}
- CILOGON_CLIENT_SECRET={{cilogon client secret}}
- CILOGON_CALLBACK_URL={{ci logon callback url}}
- Configure email server to enable capability of sending email notifications for long running jobs.
- EMAIL_HOST={{email host}}
- EMAIL_PORT=465
- EMAIL_FROM_ADDRESS={{email from address}}
- EMAIL_PASSWORD={{email password}}
- MINIO access keys and secret. Can be set to align with AWS S3 access keys and secret.
- AWS_ACCESSKEY={{aws_accesskey}}
- AWS_ACCESSKEYSECRET={{aws_accesskeysecret}}
- Social media platforms configurations.
- REDDIT_CLIENT_ID={{reddit client id}}
- REDDIT_CLIENT_SECRET={{reddit client secret}}
- REDDIT_CALLBACK_URL={{reddit callback url}}
- TWITTER_CONSUMER_KEY={{twitter consumer key}}
- TWITTER_CONSUMER_SECRET={{twitter consumer secret}}
- TWITTER_V2_CLIENT_ID={{twitter v2 client id}}
- TWITTER_V2_CLIENT_SECRET={{twitter v2 client secret}}
- TWITTER_V2_CALLBACK_URL={{twitter v2 callback url}}
- Cloud storage platforms configurations (Optional)
- BOX_CLIENT_ID=<box client id>
- BOX_CLIENT_SECRET={{box client secret}}
- DROPBOX_CLIENT_ID={{dropbox client id}}
- DROPBOX_CLIENT_SECRET={{dropbox client secret}}
- GOOGLE_CLIENT_ID={{google client id}}
- GOOGLE_CLIENT_SECRET={{google client secret}}
- Clowder configurations (Optional)
- CLOWDER_BASE_URL={{clowder instance base url}}
- CLOWDER_GLOBAL_KEY={{clowder global key}}
- CLOWDER_ON=false (enable connection to clowder or not)

## Past Version History of major SMM analytics components
- [AutoPhrase](./rabbitmq/autophrase/version.md)
- [Classification Spit](./rabbitmq/classification_split/version.md)
- [Classification Train](./rabbitmq/classification_train/version.md)
- [Classification Predict](./rabbitmq/classification_predict/version.md)
- [Named Entity Recognition](./rabbitmq/name_entity_recognition/version.md)
- [Network Analysis](./rabbitmq/network_analysis/version.md)
- [Sentiment Analysis](./rabbitmq/sentiment_analysis/version.md)
- [Topic Modeling](./rabbitmq/topic_modeling/version.md)
- [Preprocessing](./rabbitmq/preprocessing/version.md)

## Contributions
We welcome contributions from the community to enhance and expand our analytics features. Whether you're an experienced
data scientist or just starting in the field, your insights and contributions can help drive innovation in
social media research.

## Contact Us
- For more information, visit [Social Media Macroscope](https://smm.ncsa.illinois.edu/).
- Contact us the if you have any questions: <a href="mailto:[email protected]">[email protected]</a>
10 changes: 10 additions & 0 deletions rabbitmq/autophrase/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.4] - 09-14-2023

### Added
- Update email function to allow flexible SMTP authentication options [#49](https://github.com/ncsa/standalone-smm-analytics/issues/49)
61 changes: 42 additions & 19 deletions rabbitmq/autophrase/notification.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,34 @@
import os


def notification(toaddr,case,filename,links,sessionURL):
def reformat_sessionURL(sessionURL):
# hubzero tool have sessionURL = https://{host}/session/{sessionID}/originalPath
# standalone smile have sessionURL = https://{host}/originalPath
# Split the URL by '/' and remove the last part
url_parts = sessionURL.split('/')
url_parts.pop()

# Append "/history" to the URL
url_parts.append("history")

# Reconstruct the URL
new_sessionURL = '/'.join(url_parts)

return new_sessionURL


def notification(toaddr, case, filename, links, sessionURL):
# toaddr -- email address to send to
# text content to send
# subject
host = os.environ.get('EMAIL_HOST')
port = os.environ.get('EMAIL_PORT')
port = os.environ.get('EMAIL_PORT', 25)
fromaddr = os.environ.get('EMAIL_FROM_ADDRESS')
password = os.environ.get('EMAIL_PASSWORD')

if host is not None and host != "" and \
port is not None and port != "" and\
fromaddr is not None and fromaddr != "" and \
password is not None and password != "":
port is not None and port != "" and \
fromaddr is not None and fromaddr != "":
# map the fpath component to History panel names
# local/NLP/sentiment/xxxxxxxxxxxxxxxxxxxxxxxx/ => [local,nlp,sentiment,xxxx,space]
# [local, GraphQL, reddit-post, aww, space]
Expand Down Expand Up @@ -76,11 +91,14 @@ def notification(toaddr,case,filename,links,sessionURL):
<p>Dear user (session ID: """ + fpath[0] + """),</p>
<p>Your Reddit Comment collection is exceeding 400 Megabyte, and is terminated due to lack of disk space.</p>
<ul>
<li>You have requested comments and replies for the Reddit Submission (Post):<b>""" + fpath[3] + """</b>. The partial comments we manage to collect and save will be compressed for you in an .zip file named <a href='""" + links + """'>""" + fpath[3] + """-comments.zip</a> (click)</li>
<li>In order to download this file, you need to first locate the original submission in the <b>HISTORY</b> page in SMILE.
<a href=""" + sessionURL + """>Go to your session...</a>
<li>You have requested comments and replies for the Reddit Submission (Post):<b>""" + \
fpath[
3] + """</b>. The partial comments we manage to collect and save will be compressed for you in an .zip file named <a href='""" + links + """'>""" + \
fpath[3] + """-comments.zip</a> (click)</li>
<li>In order to download this file, you need to first locate the original submission in the <b>Past Results</b> page in SMILE.
<a href=""" + reformat_sessionURL(sessionURL) + """>Go to your session.</a>
<ul>
<li>Go to <b>History</b></li>
<li>Go to <b>Past Results</b></li>
<li>--> under <b>""" + fpath[1] + """</b></li>
<li>--> click <b>""" + fpath[2] + """</b></li>
<li>--> then find <b>""" + fpath[3] + """</b></li>
Expand All @@ -103,12 +121,15 @@ def notification(toaddr,case,filename,links,sessionURL):
<p>Dear user (session ID: """ + fpath[0] + """),</p>
<p>Your Reddit Comment collection is ready for you!</p>
<ul>
<li>You have requested comments and replies for the Reddit Submission (Post):<b>""" + fpath[3] + """</b>. It will be compressed for you in an .zip file named <a href='""" + links + """'>"""+ fpath[3] +"""-comments.zip</a></li>
<li>In order to download this file, you need to first locate the original submission in the <b>HISTORY</b> page in SMILE.
<a href=""" + sessionURL + """>Go to your session...</a>
<li>You have requested comments and replies for the Reddit Submission (Post):<b>""" + \
fpath[
3] + """</b>. It will be compressed for you in an .zip file named <a href='""" + links + """'>""" + \
fpath[3] + """-comments.zip</a></li>
<li>In order to download this file, you need to first locate the original submission in the <b>Past Results</b> page in SMILE.
<a href=""" + reformat_sessionURL(sessionURL) + """>Go to your session.</a>
<ul>
<li>Go to <b>History</b></li>
<li>--> under <b>""" + fpath[1] +"""</b></li>
<li>Go to <b>Past Results</b></li>
<li>--> under <b>""" + fpath[1] + """</b></li>
<li>--> click <b>""" + fpath[2] + """</b></li>
<li>--> then find <b>""" + fpath[3] + """</b></li>
<li>--> click <b>VIEW</b></li>
Expand All @@ -134,10 +155,10 @@ def notification(toaddr,case,filename,links,sessionURL):
<p>Dear user (session ID: """ + fpath[0] + """),</p>
<p>Your """ + fpath[2] + """ results are ready for you! (job ID: """ + fpath[3] + """)</p>
<ul>
<li>You can view the visualization and download the results at <b>HISTORY</b> page in SMILE.
<a href=""" + sessionURL + """>Go to your session...</a>
<li>You can view the visualization and download the results at <b>Past Results</b> page in SMILE.
<a href=""" + reformat_sessionURL(sessionURL) + """>Go to your session.</a>
<ul>
<li>Go to <b>History</b></li>
<li>Go to <b>Past Results</b></li>
<li>--> under <b>""" + fpath[1] + """</b> tab</li>
<li>--> click <b>""" + fpath[2] + """</b></li>
<li>--> then find <b>""" + fpath[3] + """</b></li>
Expand All @@ -162,8 +183,10 @@ def notification(toaddr,case,filename,links,sessionURL):
msg['To'] = toaddr
msg.attach(MIMEText(html, 'html'))

server = smtplib.SMTP_SSL(host, port)
server.login(fromaddr, password)
server = smtplib.SMTP(host, int(port))
server.starttls()
if password is not None and password != "":
server.login(fromaddr, password)
server.sendmail(fromaddr, toaddr, msg.as_string())
server.quit()
else:
Expand Down
14 changes: 10 additions & 4 deletions rabbitmq/autophrase/rabbitmq_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
import postToAWSBatch

RABBITMQ_HOST = os.getenv('RABBITMQ_HOST', 'rabbitmq')
RABBITMQ_USER = os.getenv('RABBITMQ_HOST', 'guest')
RABBITMQ_PASSWORD = os.getenv('RABBITMQ_HOST', 'guest')
RABBITMQ_USER = os.getenv('RABBITMQ_USER', 'guest')
RABBITMQ_PASSWORD = os.getenv('RABBITMQ_PASSWORD', 'guest')

def rabbitmq_handler(ch, method, properties, body):
try:
Expand Down Expand Up @@ -52,8 +52,13 @@ def rabbitmq_handler(ch, method, properties, body):

if __name__ == '__main__':
credentials = pika.PlainCredentials(RABBITMQ_USER, RABBITMQ_PASSWORD)
parameters = pika.ConnectionParameters(RABBITMQ_HOST, 5672, '/', credentials)
connection = pika.BlockingConnection(pika.ConnectionParameters(parameters))
connection = pika.BlockingConnection(pika.ConnectionParameters(
port=5672,
host=RABBITMQ_HOST,
heartbeat=600,
blocked_connection_timeout=600,
credentials=credentials
))
channel = connection.channel()

# pass the queue name in environment variable
Expand All @@ -63,3 +68,4 @@ def rabbitmq_handler(ch, method, properties, body):
channel.basic_qos(prefetch_count=1)
channel.basic_consume(queue=queue, on_message_callback=rabbitmq_handler, auto_ack=True)
channel.start_consuming()

8 changes: 8 additions & 0 deletions rabbitmq/autophrase/version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Past Version History


### Version 0.1.3 (2023-07-01)
- Renamed Minio related environment variables
-
### Version 0.1.2 (2023-02-16)
- Changed rabbitmq url to env variable
5 changes: 5 additions & 0 deletions rabbitmq/classification_predict/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
8 changes: 8 additions & 0 deletions rabbitmq/classification_predict/version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Past Version History


### Version 0.1.2 (2023-07-01)
- Renamed Minio related environment variables

### Version 0.1.1 (2023-02-16)
- Changed rabbitmq url to env variable
5 changes: 5 additions & 0 deletions rabbitmq/classification_split/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
8 changes: 8 additions & 0 deletions rabbitmq/classification_split/version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Past Version History


### Version 0.1.2 (2023-07-01)
- Renamed Minio related environment variables

### Version 0.1.1 (2023-02-16)
- Changed rabbitmq url to env variable
5 changes: 5 additions & 0 deletions rabbitmq/classification_train/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
8 changes: 8 additions & 0 deletions rabbitmq/classification_train/version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Past Version History


### Version 0.1.2 (2023-07-01)
- Renamed Minio related environment variables

### Version 0.1.1 (2023-02-16)
- Changed rabbitmq url to env variable
Loading

0 comments on commit 1a0f484

Please sign in to comment.