Skip to content

CodeChef coding contest score scrapping. Selenium with java and BeautyfulSoup with Python.

Notifications You must be signed in to change notification settings

agent-storm/score-scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

score-scrapper

NOTE: The project is still in progress.

  The main goal of the project is to return a file(s) containing data about the contest held by codechef known as STARTERS. Basically by usuing selenium I can get the contest rankings site and choose my college as the filter, then by using Beautiful Soup (Python) I can get the raw HTML data and process it furhter for my needs, then finally by using xlsxwriter I can store the data in the form of xlsx sheets for each Division of the contest.

Sample Output:

image

Key Points:


  • Scrapping of all four divisions is possible. [GOTO: Main Method]
  • Each division can be executed on a seperate Thread.
  • Default filter options are:
    • By "Institution"
    • "CMR Institute of Technology"
  • Data collection is done in java while processing is done in Python.
  • Selenium with java is used for automating some steps to go to the desired webpage and apply some settings
  • Beautifulsoup-Python is used for converting the raw HTML to useful data and putting it in a XLSX file.

Step by Step detailed explaination of the working can be found with in the code in the form of comments.

TOOLS USED

  • Selenium (Java)
  • Beautiful Soup (Python)
  • Xlsxwriter
  • VS Code (IDE)

Dependency

  • Chrome Driver (Recommended) ->Downlaod
  • Selenium Server v4.0 or above ->Download

Python Requirements

  • Beautiful Soup ->pip install beautifulsoup4
  • xlsxwriter ->pip install xlsxwriter
  • lxml parser ->pip install lxml

How to use:

  For now there is no interactive method for using the program but in the future I will add a GUI Interface to make things simple and easy to use. To use the programm as is, you need to go to the main method in the ScoreScrapper.java file and edit the link variable to the desired link that only points any START contest ex: https://codechef.com/START99

NOTICE

  You must add the selenium server jar file to the "Referenced Libraries" folder of your Java project (VS code), or add it the the "java build path" in Eclipse.

Folder Structure

The workspace contains two folders by default, where:

  • src: the folder to maintain sources
  • lib: the folder to maintain dependencies

Meanwhile, the compiled output files will be generated in the bin folder by default.

If you want to customize the folder structure, open .vscode/settings.json and update the related settings there.

Dependency Management

The JAVA PROJECTS view allows you to manage your dependencies. More details can be found here.

About

CodeChef coding contest score scrapping. Selenium with java and BeautyfulSoup with Python.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published