WeiboLab

created by QiaoHongbo

Data mining for short textual data in social media.

Data Source

Semantic Knowledge Source

Wikipedia of xml format can be downloaded here. We use the latest-abstract which contains only title, abstract and links of a wikipedia page. A preprocessing work is required before drop it into database. So We provide a quick sax parsing and dirty data cleaning.

Social Media text

We choose textual data from different users of sina weibo. As it is a pretty hard work to crawl data from sina, (considering of the more and more strong anti-crawler technology has been put into use), we in hence download some data from csdn. The data is old but sufficient for us. But we won't use it this time.

Here's my environment,

Python 3.6 Anaconda 3 MySQL 5.7

The environment is important, because some of the features I use only supported by the latest version. choose MYISAM as the engine of MySQL to get support for chinese fulltext index.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
DataConnector.py		DataConnector.py
README.md		README.md
Segment.py		Segment.py
WeiboFeatureConstructor.py		WeiboFeatureConstructor.py
XMLProcessor.py		XMLProcessor.py
driver.py		driver.py
read_txt.py		read_txt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeiboLab

Data Source

Semantic Knowledge Source

Social Media text

About

Releases

Packages

Languages

AssKicker0214/WeiboLab

Folders and files

Latest commit

History

Repository files navigation

WeiboLab

Data Source

Semantic Knowledge Source

Social Media text

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages