Skip to content

A collection of multiple social media dataset samples. Each sample contains over 1,000 records. These datasets are ideal for brand awareness, consumer sentiment analysis, and for tracking social media presence

Notifications You must be signed in to change notification settings

luminati-io/Social-media-dataset-samples

Repository files navigation

Social-media-dataset-samples

A collection of sample social media datasets, each containing over 1,000 records.

social media dataset header

Social media dataset samples featuring thousands of records in total. All datasets were extracted using the Bright Data API.

Some of the data points include:

  • url: The URL of the page or post containing the comments
  • post_id: Unique identifier for each post
  • post_url: URL of the specific post
  • comment_id: Unique identifier for each comment
  • user_name: Name of the user who made the comment
  • user_id: Unique identifier for each user
  • user_url: URL of the user's profile
  • date_created: Date and time when the comment was created
  • comment_text: The actual text content of the comment
  • num_likes: Number of likes or reactions received by the comment
  • num_replies: Number of direct replies to the comment
  • attached_files: Any files or attachments associated with the comment
  • video_length: Length of video content, if applicable
  • source_type: Type of source (e.g., Facebook, external link)
  • subtype: Subcategory or specific type of the post or comment
  • type: General category of the post or comment
  • user_posted: Username of the post creator
  • description: Post text description
  • hashtags: Hashtags used in the post
  • num_comments: Number of comments
  • date_posted: Post publication date
  • likes: Number of likes
  • photos: URLs of attached photos
  • videos: URLs of attached videos
  • location: Geographical location
  • latest_comments: Recent comments
  • post_id: Unique post identifier
  • discovery_input: Discovery input values

And many more.

These are sample datasets and subsets derived from dozens of social media datasets (public data) containing millions of records.

Available Dataset File Formats:

  • JSON, NDJSON, JSON Lines, CSV, or Parquet
  • Optionally, files can be compressed to .gz

Dataset Delivery Options:

  • Email, API download, Webhook, Amazon S3, Google Cloud Storage, Microsoft Azure, Snowflake, SFTP

Update Frequency:

  • Once, Daily, Weekly, Monthly, Quarterly, or Custom intervals

Data Enrichment:

  • Additional data points can be enriched based on specific requirements.

Get the full social media datasets

Get the full Facebook dataset

Get the full Instagram dataset

Get the full Twitter dataset

Get the full TikTok dataset

What are the social media dataset use cases?

1. Social Media Presence

Leverage our social media datasets to identify influencers with significant social impact by analyzing metrics like engagement, brand affiliations, and follower demographics. Partner with those best suited to effectively promote your brand.

2. Monitor consumer sentiment

Gain insights into user sentiment by analyzing social media data. Monitor likes, shares, comments, hashtags, mentions, and other metrics to quickly identify shifts in popularity and brand perception.

3. Brand Monitoring and Awareness

Track online conversations across social networks to capture both positive and negative mentions using social media data. Stay proactive by responding to customer feedback, addressing concerns, and maintaining your brand's reputation.

Free Access for Researchers and NGOs

The Bright Initiative provides free access to Web Scraper APIs and ready-to-use datasets for academic faculties, researchers, NGOs, and NPOs working on environmental or social causes. Submit your application here.

About

A collection of multiple social media dataset samples. Each sample contains over 1,000 records. These datasets are ideal for brand awareness, consumer sentiment analysis, and for tracking social media presence

Topics

Resources

Stars

Watchers

Forks