Get Started | Code Examples | Licensing | Free Trial
Iron WebScraper is a robust C# library for web scraping. It simulates human browsing patterns to extract content, files, and images from web applications and provides them as native .Net objects. This library ensures polite and efficient multithreading while simplifying maintenance and understanding of your application.
It is ideal for content migration, building search indexes, and tracking changes in website structure or content.
- Utilizes html DOM, Javascript, Xpath, and jQuery Style CSS Selectors to extract structured content.
- Employs fast multithreading for handling numerous simultaneous requests.
- Elegantly manages demand on servers with IP/domain level throttling and support for robots.txt.
- Handles multiple identities, DNS, proxies, user agents, custom headers, methods, cookies, and logins.
- Converts data scraped from websites into manageable C# objects for immediate use or storage.
- Incorporates exception handling outside the developer's code, with automatic retries on errors or captchas.
- Features to save, pause, resume, and autosave scraping tasks.
- Includes a built-in web cache for replaying actions, crash recovery, and data query without network traffic.
- .NET 6 and versions down to .NET Framework
- Platforms like Windows, macOS, Linux, and containers such as Docker, Azure, and AWS
For comprehensive API references and full licensing details, please visit our website.
To integrate IronWebScraper into your project, simply install the package via NuGet:
PM> Install-Package IronWebScraper
Begin by importing Iron Web Scraper into your C# application like so:
using IronWebScraper;
namespace YourApp
{
public class Program
{
private static void Main(string[] args)
{
var ScrapeJob = new BlogScraper();
ScrapeJob.Start();
}
}
public class BlogScraper : WebScraper
{
public override void Init()
{
LoggingLevel = LogLevel.All;
Request("https://www.zyte.com/blog/", Parse);
}
public override void Parse(Response response)
{
foreach (HtmlNode title_link in response.Css(".oxy-post-title"))
{
string strTitle = title_link.TextContentClean;
Scrape(new ScrapedData() { { "Title", strTitle } });
}
if (response.CssExists("div.oxy-easy-posts-pages > a[href]"))
{
string next_page = response.Css("div.oxy-easy-posts-pages > a[href]")[0].Attributes["href"];
Request(next_page, Parse);
}
}
}
}
Explore code samples, tutorials, and detailed documentation at Iron Web Scraper Learning Resources.
For direct support, contact us at [email protected]. We provide extensive support and licensing options for commercial projects.