Crawler

一个抓取线上学习资源的爬虫网站。

Introduction

线上的生产环境：http://crawler.aloo.cn 这是一个抓取线上学习资源的爬虫网站,通过node和cron来实现的。

Web

这里用来展示数据,展示了读取来的全量数据。

Requirement

2.3 crontab generate search index

16/9/8 生成了index索引
16/9/8 生成原文件json data集合

2.2 server render template

16/9/8 express输出静态文件
16/9/8 vue server render
16/9/8 abolish ejs template
16/9/8 生成最近搜索的tag
16/8/17 更新弹出状态提示框。

2.1 列表热词检索

~~16/9/8 时间格式~~
~~16/9/7 搜索大小写通用匹配~~
~~16/9/4 通用Tag搜索API~~
~~16/9/2 Tag倒排索引~~
~~16/9/2 Tag搜索~~
~~16/9/2 Tag搜索结果对应的tag高亮~~

2.0 版本功能列表

~~16/8/17 热词分析。~~
~~16/8/17 Word Tag Cloud。~~
~~16/8/17 定时去重~~。
~~16/8/17 跨分页去重。~~
~~16/8/17 拿到记录总数-在去重页里面拿到的数据。~~
~~16/8/17 CQL查询的应用。~~

1.0 版本功能列表

16/8/11 需要自动部署到服务器上。
~~16/8/11 web数据加载不完整,需要加分页。~~
~~16/8/15 分页抓取。~~
~~16/8/15 放弃了在leancoud去重，因为权限控制的问题ACL。~~
~~16/8/11 数据重复的问题还是没有解决。~~
~~16/8/11 web数据数据完善样式文件。~~
~~16/8/9 需要添加定时的任务。~~
~~16/8/9 需要一个展示的Web~~

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
crawler		crawler
data		data
views		views
web		web
.gitignore		.gitignore
README.md		README.md
app.js		app.js
config.js		config.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler

Introduction

Web

Requirement

2.3 crontab generate search index

2.2 server render template

2.1 列表热词检索

2.0 版本功能列表

1.0 版本功能列表

About

Releases

Packages

Languages

ivernaloo/crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler

Introduction

Web

Requirement

2.3 crontab generate search index

2.2 server render template

2.1 列表热词检索

2.0 版本功能列表

1.0 版本功能列表

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages