Skip to content

Commit

Permalink
Merge pull request #84 from ncsa/release/beta
Browse files Browse the repository at this point in the history
Release/beta
  • Loading branch information
longshuicy authored Oct 27, 2023
2 parents 1a0f484 + 3084a0f commit 86f6fcd
Show file tree
Hide file tree
Showing 51 changed files with 1,445 additions and 1,870 deletions.
28 changes: 16 additions & 12 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,22 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
## [Beta] - 10-26-2023

### Added
- Docker building script for whole components [23](https://github.com/ncsa/standalone-smm-analytics/issues/23)
- Docker compose launch script [45](https://github.com/ncsa/standalone-smm-analytics/issues/45)
- Docker compose file using traefik [46](https://github.com/ncsa/standalone-smm-analytics/issues/46)
- Docker building script for whole components [#23](https://github.com/ncsa/standalone-smm-analytics/issues/23)
- Docker compose launch script [#45](https://github.com/ncsa/standalone-smm-analytics/issues/45)
- Docker compose file using traefik [#46](https://github.com/ncsa/standalone-smm-analytics/issues/46)
- Environment variables for turn on off twitter and reddit [#73](https://github.com/ncsa/standalone-smm-analytics/issues/73)
- Environment variable for Google Analytics 4 [#81](https://github.com/ncsa/standalone-smm-analytics/issues/81)

### Changed
- Hard coded rabbimq url changed to env variable [18](https://github.com/ncsa/standalone-smm-analytics/issues/18)
- Modified S3 url to env variable [21](https://github.com/ncsa/standalone-smm-analytics/issues/21)
- Renamed Minio related environment variables [31](https://github.com/ncsa/standalone-smm-analytics/issues/31)
- Rabbitmq handler's connection with dynamic credentials [41](https://github.com/ncsa/standalone-smm-analytics/issues/41)
- Docker compose file to work with new settings [42](https://github.com/ncsa/standalone-smm-analytics/issues/42)
- Updated README with docker compose information [50](https://github.com/ncsa/standalone-smm-analytics/issues/50)
- Created base image for sentiment analysis with model [55](https://github.com/ncsa/standalone-smm-analytics/issues/55)
- Created base image for name entity recognition with model [56](https://github.com/ncsa/standalone-smm-analytics/issues/56)
- Hard coded rabbitmq url changed to env variable [#18](https://github.com/ncsa/standalone-smm-analytics/issues/18)
- Modified S3 url to env variable [#21](https://github.com/ncsa/standalone-smm-analytics/issues/21)
- Renamed Minio related environment variables [#31](https://github.com/ncsa/standalone-smm-analytics/issues/31)
- Rabbitmq handler's connection with dynamic credentials [#41](https://github.com/ncsa/standalone-smm-analytics/issues/41)
- Docker compose file to work with new settings [#42](https://github.com/ncsa/standalone-smm-analytics/issues/42)
- Updated README with docker compose information [#50](https://github.com/ncsa/standalone-smm-analytics/issues/50)
- Created base image for sentiment analysis with model [#55](https://github.com/ncsa/standalone-smm-analytics/issues/55)
- Created base image for name entity recognition with model [#56](https://github.com/ncsa/standalone-smm-analytics/issues/56)
- Docker compose file updated to fix minio default bucket making [#63](https://github.com/ncsa/standalone-smm-analytics/issues/63)
13 changes: 13 additions & 0 deletions rabbitmq/autophrase/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.6] - 10-26-2023

### Fixed
- Visualization in the email link doesn't work [#61](https://github.com/ncsa/standalone-smm-analytics/issues/61)

## [0.1.5] - 09-21-2023

### Fixed
- Autophrase error due to the code updates [#65](https://github.com/ncsa/standalone-smm-analytics/issues/65)

### Changed
- Autophrase uses base docker image [#67](https://github.com/ncsa/standalone-smm-analytics/issues/67)

## [0.1.4] - 09-14-2023

### Added
Expand Down
13 changes: 6 additions & 7 deletions rabbitmq/autophrase/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
FROM ubuntu:18.04

# git clone autophrase algorithm
RUN apt-get update \
&& apt-get -y install git && apt-get -y install cron \
&& cd / && git clone https://github.com/IllinoisSocialMediaMacroscope/SMILE-AutoPhrase.git AutoPhrase
FROM socialmediamacroscope/autophrase:base

# overwrite
WORKDIR /AutoPhrase
COPY . ./

ENV RABBITMQ_HOST="rabbitmq"

# set environmet variables to prevent interactive prompt duing installation of openjdk
ENV DEBIAN_FRONTEND=noninteractive
ENV REGION=US
ENV TZ=America/Chicago

# install dependency libraries
RUN apt-get -y update \
&& apt-get -y install g++ openjdk-8-jdk curl python3-pip \
Expand Down
7 changes: 7 additions & 0 deletions rabbitmq/autophrase/Dockerfile.base
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM ubuntu:20.04

# git clone autophrase algorithm
RUN apt-get update \
&& apt-get -y install git && apt-get -y install cron \
&& cd / && git clone https://github.com/IllinoisSocialMediaMacroscope/SMILE-AutoPhrase.git AutoPhrase

50 changes: 50 additions & 0 deletions rabbitmq/autophrase/generate_raw_train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import argparse
import csv

import pandas as pd

from writeToS3 import WriteToS3


def main(s3, remoteReadPath, column):
filename = remoteReadPath.split('/')[-2] + '.csv'
s3.downloadToDisk(filename=filename, localpath='data/', remotepath=remoteReadPath)

Array = []
try:
with open('data/' + filename,'r',encoding="utf-8", errors="ignore") as f:
reader = csv.reader(f)
try:
for row in reader:
Array.append(row)
except Exception as e:
pass
except:
with open('data/' + filename,'r',encoding="ISO-8859-1", errors="ignore") as f:
reader = csv.reader(f)
try:
for row in reader:
Array.append(row)
except Exception as e:
pass

df = pd.DataFrame(Array[1:],columns=Array[0])
df[df[column]!=''][column].dropna().astype('str').to_csv('data/raw_train.txt', index=False)

return None

if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--remoteReadPath', required=True)
parser.add_argument('--column', required=True)

# user specified parameters
parsed, unknown = parser.parse_known_args()
for arg in unknown:
if arg.startswith("--"):
parser.add_argument(arg, required=False)

params = vars(parser.parse_args())

s3 = WriteToS3()
main(s3, params['remoteReadPath'], params['column'])
Loading

0 comments on commit 86f6fcd

Please sign in to comment.