Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape GE info #1

Open
nathanmsmith opened this issue Jul 24, 2022 · 0 comments
Open

Scrape GE info #1

nathanmsmith opened this issue Jul 24, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request rails scraper

Comments

@nathanmsmith
Copy link
Member

Hotseat currently has no information about what classes are GEs, what requirements they fulfill, etc.

GE requirements vary by school, but they're grouped into three foundational areas, each with 2-3 subarea:

  • Foundations of the Arts and Humanities
    • Literary and Cultural Analysis
    • Philosophical and Linguistic Analysis
    • Visual and Performance Arts Analysis and Practice
  • Foundations of Society and Culture
    • Historical Analysis
    • Social Analysis
  • Foundations of Scientific Inquiry
    • Life Sciences
    • Physical Sciences

There are three main parts of this issue:

  • How do we scrape the data from the Registrar?
  • How do we store the scraped data into our database and associate it with our classes?
  • How do we display GE information on our frontend?

Scraping

The Request

The information on which classes satisfy which requirements is available from the UCLA Registrar: https://sa.ucla.edu/ro/Public/SOC/Search/GECoursesMasterList

From looking at the webpage, it appears that the data is made available via the following URL: https://sa.ucla.edu/ro/Public/SOC/Search/SearchByFoundation. When making a request for all "Foundations of Arts and Humanities" GEs, I noticed my browser made a request to https://sa.ucla.edu/ro/Public/SOC/Search/SearchByFoundation with the following parameters:

input: {"FoundationCode":"AH","CategoryCode":"%","LabDemoFilter":false,"WritingTwoFilter":false,"MultiCategoryFilter":false,"DiversityFilter":false}
search_criteria: Foundations of Arts and Humanities

Both input and search_criteria seem to be required. We'll need to enumerate all the possible FoundationCodes and search_criteria in order to scrape for all GE categories.

The Response

GE info is returned as HTML, with the courses grouped by subject area. There is one <table> per subject area. Standard HTML parsing should work well here.

Screen Shot 2022-07-24 at 3 16 54 PM

Storage

We store information about a course in the courses table. A course can fulfill multiple GE categories and each GE category has multiple courses that can satisfy the requirement. So we're looking at a many to many relationship. We should create a new migration that:

  • Creates a new table: ge_categories is a good potential name. Each row on the table should be a sub-category GE requirement. (ex: "Literary and Cultural Analysis")
  • Sets up a many-to-many relationship between courses and ge_categories

We should then update the models to use has_and_belongs_to_many for this new relationship.

Frontend

We'll likely want to create a new method on the Course model that gives all the GE categories that a course satisfies. Then we can surface this via the controller/erb templates.

@nathanmsmith nathanmsmith added enhancement New feature or request good first issue Good for newcomers scraper rails labels Jul 24, 2022
@nathanmsmith nathanmsmith removed the good first issue Good for newcomers label Nov 13, 2022
@ishangarg0 ishangarg0 self-assigned this Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rails scraper
Projects
None yet
Development

No branches or pull requests

2 participants