forked from apache/nutch
-
Notifications
You must be signed in to change notification settings - Fork 0
Nutch Sitemap Crawler
License
cguzel/nutch-sitemapCrawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Apache Nutch for Sitemap Crawler README For the information about Sitemap Crawler for Nutch, please visit our wiki: https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler For the latest information about Nutch, please visit our website at: http://nutch.apache.org and our wiki, at: http://wiki.apache.org/nutch/ To get started using Nutch read Tutorial: http://wiki.apache.org/nutch/Nutch2Tutorial Export Control This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See <http://www.wassenaar.org/> for more information. The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Apache Nutch uses the PDFBox API in its parse-tika plugin for extracting textual content and metadata from encrypted PDF files. See http://pdfbox.apache.org for more details on PDFBox.
About
Nutch Sitemap Crawler
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Java 93.7%
- HTML 5.3%
- Other 1.0%