Wrapper code for Apache HttpClient that provides common page fetching functionality
TODO - add more context here.
An example of creating a fetcher with five threads that will only accept content identified by the server as text/html:
BaseFetcher fetcher = new SimpleHttpFetcher(1, new UserAgent("mycrawler", "[email protected]", "http://domain.com"));
Set<String> validMimeTypes = new HashSet<String>();
validMimeTypes.add("text/html");
fetcher.setValidMimeTypes(validMimeTypes);
FetchedResult result = fetcher.get("http://localhost:8089/");