Skip to content
Thamme Gowda edited this page Apr 6, 2016 · 7 revisions

Welcome to the Auto-Extractor wiki!

Here you will find information related to Auto Extractor.


The current status

  • Clustering the web pages based on style and structure
  • Scalable on Apache Spark
  • Work in progress - Visualization of clusters

Roadmap

  • Auto extraction of content
  • Integrate to Apache Tika and Apache Nutch

Links

Clone this wiki locally