Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study

Description

This repository contains scripts and utilities for experiments related to the paper "Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study". The main script is designed to collect and process data from Instagram using the CrowdTangle API.

Prerequisites

Python: The project requires Python 3.9 or higher.
CrowdTangle API Token: An API token for CrowdTangle is required to fetch data. This token should be set in a .env file in the root directory of the project, under the key API_TOKEN.

Example:
```
API_TOKEN="YOUR_API_TOKEN_HERE"
```

CSV File Format: `dataset_accounts.csv`

This file serves as the metadata input for your Instagram data collection and processing. Each row represents an individual Instagram account's metadata. The CSV consists of the following columns:

username (required): The Instagram handle or username of the account.
country (optional): A country code that represents the primary audience or location of the account.
size(optional): The categorization of the account based on its following size (e.g. micro or mega)
number_of_posts (optional): The total number of posts made by the account up to the last date of collection.
followers_collection_time (optional): The follower count of the account at data collection time.
first_post (required): The earliest date and time (in the format 'YYYY-MM-DD HH:MM:SS') from which posts should be collected for the respective account
last_post (required): The latest date and time (in the format 'YYYY-MM-DD HH:MM:SS') until which posts should be collected for the respective account.

Example:

username,country,size,number_of_posts,followers_collection_time,first_post,last_post
ab_bowen,US,mega,3652,1626210,2013-06-24 14:01:12,2022-09-15 06:32:49
achrafhakimi,DE,mega,612,10162234,2014-01-19 19:49:42,2022-09-16 12:39:47

Setup & Installation

Clone the repository:

git clone https://github.com/thalesbertaglia/instagram-disclosure-trends
cd instagram-disclosure-trends

Install dependencies using Poetry:
```
poetry install
```
Activate the virtual environment:
```
poetry shell
```

Usage

To run the main script:

python scripts/collect_data.py [OPTIONS]

Options:

--csv_path: Path to the dataset_accounts.csv file. Default is data/dataset_accounts.csv.
--skip_collection: If passed, data collection will be skipped.
--skip_create_df: If passed, processing the raw CrowdTangle data into a DataFrame will be skipped.
--skip_augmentation: If passed, augmenting the DataFrame with additional columns will be skipped. Use this option for collecting data from new accounts not included in the original dataset.
--post_df_path: Path to the processed posts df pickle file. Default is data/df_posts.pkl.
--profile_df_path: Path to the processed profiles df pickle file. Default is data/df_profiles.pkl.

Troubleshooting

Ensure that the .env file exists in the root directory with the correct API_TOKEN.
Verify that the CSV file provided contains all the necessary columns.

License

MIT

Contact

For any queries or issues, please contact Thales Bertaglia.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study

Description

Prerequisites

CSV File Format: `dataset_accounts.csv`

Example:

Setup & Installation

Usage

Options:

Troubleshooting

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study

Description

Prerequisites

CSV File Format: dataset_accounts.csv

Example:

Setup & Installation

Usage

Options:

Troubleshooting

License

Contact

CSV File Format: `dataset_accounts.csv`