Skip to content

Documentation and code examples for IronWebscraper (ironsoftware.com/csharp/webscraper)

Notifications You must be signed in to change notification settings

iron-software/IronWebScraper.Examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nuget Version Nuget Installs Build Status Windows Compatibility Live Chat Status

Iron WebScraper - The C# Library for Web Scraping

IronWebscraper NuGet Trial Banner Image

Get Started | Code Examples | Licensing | Free Trial

Iron WebScraper is a robust C# library for web scraping. It simulates human browsing patterns to extract content, files, and images from web applications and provides them as native .Net objects. This library ensures polite and efficient multithreading while simplifying maintenance and understanding of your application.

It is ideal for content migration, building search indexes, and tracking changes in website structure or content.

Features of Iron WebScraper:

  • Utilizes html DOM, Javascript, Xpath, and jQuery Style CSS Selectors to extract structured content.
  • Employs fast multithreading for handling numerous simultaneous requests.
  • Elegantly manages demand on servers with IP/domain level throttling and support for robots.txt.
  • Handles multiple identities, DNS, proxies, user agents, custom headers, methods, cookies, and logins.
  • Converts data scraped from websites into manageable C# objects for immediate use or storage.
  • Incorporates exception handling outside the developer's code, with automatic retries on errors or captchas.
  • Features to save, pause, resume, and autosave scraping tasks.
  • Includes a built-in web cache for replaying actions, crash recovery, and data query without network traffic.

Supported Platforms for Iron WebScraper include:

  • .NET 6 and versions down to .NET Framework
  • Platforms like Windows, macOS, Linux, and containers such as Docker, Azure, and AWS

IronWebScraper Platform Compatibility Image

For comprehensive API references and full licensing details, please visit our website.

Getting Started with Iron WebScraper

To integrate IronWebScraper into your project, simply install the package via NuGet:

PM> Install-Package IronWebScraper

Begin by importing Iron Web Scraper into your C# application like so:

using IronWebScraper;

namespace YourApp
{
    public class Program
    {
        private static void Main(string[] args)
        {
            var ScrapeJob = new BlogScraper();
            ScrapeJob.Start();
        }
    }

    public class BlogScraper : WebScraper
    {
        public override void Init()
        {
            LoggingLevel = LogLevel.All;
            Request("https://www.zyte.com/blog/", Parse);
        }

        public override void Parse(Response response)
        {
            foreach (HtmlNode title_link in response.Css(".oxy-post-title"))
            {
                string strTitle = title_link.TextContentClean;
                Scrape(new ScrapedData() { { "Title", strTitle } });
            }

            if (response.CssExists("div.oxy-easy-posts-pages > a[href]"))
            {
                string next_page = response.Css("div.oxy-easy-posts-pages > a[href]")[0].Attributes["href"];
                Request(next_page, Parse);
            }
        }
    }
}

Support & Licensing

Explore code samples, tutorials, and detailed documentation at Iron Web Scraper Learning Resources.

For direct support, contact us at [email protected]. We provide extensive support and licensing options for commercial projects.

About

Documentation and code examples for IronWebscraper (ironsoftware.com/csharp/webscraper)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages