WPCrawler

针对单个WordPress网站的网络爬虫程序

使用的开源类库如下：

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

使用UTF-8编码以记录中文标签

使用XAMPP默认MySQL端口localhost:3306

需要本地XAMPP环境

下一次更新会加入统计每篇文章所使用的标签的功能

可以在我的博客内阅读详细原理：

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

(博客空间是新近开通的，如果访问时出现问题烦请告知，我会想办法解决)

=========

a web crawler for single WordPress site

open source projects that I am using:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

Need XAMPP environment.

The program assume that there is a database called "crawler" in your localhost with port 3306.

Analyzing tags for each article will be added in the next update.

You can read about this in my blog:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

My blog is new and yet unstable. If you have any problems entering my blog, please notify me:)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.settings		.settings
bin/net/johnhany/wpcrawler		bin/net/johnhany/wpcrawler
lib		lib
src/net/johnhany/wpcrawler		src/net/johnhany/wpcrawler
.classpath		.classpath
.project		.project
README.md		README.md
result-2013-11-29.txt		result-2013-11-29.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WPCrawler

About

Releases

Packages

Languages

QinboZhao/WPCrawler

Folders and files

Latest commit

History

Repository files navigation

WPCrawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages