Before running the code, you need to install BeautifulSoup library to allow for scraping of the articles:

pip install beautifulsoup4

To create SmartBook reports for the Ukraine-Russia crisis, you need to first scrape the CNN daily coverage articles. These will be clustered together within SmartBook to identify the major events.

You can run the code to scrape all of CNN's daily coverage within a given time period (start_date and end_date in mm-dd-yyyy format):

python cnn_ukraine_crawler_json.py --start_date <start_date_here> --end_date <end_date_here> --output_dir <output_dir_here>

Note: For optimal clustering performance, we recommend not using time periods that are more than than 2 weeks apart. For instance, if you want to generate situation reports for a one month period, we recommend running SmartBook separately for two 2-week durations.

The above code creates raw text files in the output_dir with each article having it's own file and file name corresponding to the article ID. The output_dir used here should be used as the input_dir when running the SmartBook code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls