Skip to content

Latest commit

 

History

History
238 lines (193 loc) · 5.99 KB

TecholutionAssignment.md

File metadata and controls

238 lines (193 loc) · 5.99 KB

Problem statement: Scrap the data from Techolution Careers website and store the data according to the date of posting(Most old first) as a DataFrame in CSV.

Website URL: https://techolution.app.param.ai/jobs/

To solve this problem the steps were:

  1. Opened the website URL
    2)While inspecting the website I found out that there were three request sent by the website one of them was giving query on job type description and location
    3)We copied this link it was https://techolution.app.param.ai/api/career/get_job/?query=&locations=&category=&job_types= we also got the content type it was json
    4)To extract information from this we used requests python package
    5)We loaded the information in json format using json.loads
import requests  
import json, sys, csv
import pandas as pd  

r = requests.get('https://techolution.app.param.ai/api/career/get_job/?query=&locations=&category=&job_types=')

j = json.loads(r.content.decode('UTF-8'))

We saved the json file as data_file.json

with open("data_file.json", "w") as write_file:
    json.dump(j, write_file)
for row in j:
    print(row)
fil_locations
data
fil_job_types
fil_category
query_str
total_jobs

In JSON file we found six keys, as we can see above, we observed the data and found that the data key will be sufficient to give all information related to jobs.
The structure of data in json file is:
data
    categories
          jobs
Inside the jobs we can find the informations related to the job we store this in a list arr and also the location is in array so we convert it into string

arr=[]
locations = []
for i in j['data']:
    for obj in j['data'][i]['jobs']:
        locations = obj['locations']
        obj['locations'] = locations[0]
        arr.append(obj)

We found the columns and saved in obj list and created a dataframe using arr list and columns were in obj list

obj=[]
for col in j['data'][i]['jobs'][0]:
        obj.append(col)
df=pd.DataFrame(arr, columns=obj)

                

We need to sort the list using the posting date so first we convert the date created_at using pandas to_datetime and then sort the data frame on this column

df['created_at'] =  pd.to_datetime(df['created_at'])
df=df.sort_values('created_at')
df.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
id title req_id slug created_at locations description job_type min_exp max_exp added_by added_by_email category business_unit_name organization_name
20 a933f8e2-dd5d-4a82-ab10-1209634d31c7 Engineering Lead 1861 engineering-lead 2019-02-08T13:11:49.966886Z mauritius <p><strong style="color: rgb(0, 0, 0); backgro... Full-time 84 216 Rekha Allam [email protected] Information Technology Cloud Automation - Mauritius Techolution Mauritius
19 4e17f47b-7916-411a-b0cd-90d5eeb6346f DevOps Architect 1873 devops-architect 2019-02-11T12:00:25.061831Z Hyderabad <p><span style="color: rgb(0, 0, 0); backgroun... Full-time 60 180 Nikhil Shekhar [email protected] Information Technology Cloud Automation - India Techolution Pvt Ltd
26 4e641217-901a-4670-886e-dd2946bf5476 Machine Learning Engineer 1898 machine-learning-engineer 2019-02-14T16:13:38.000894Z Hyderabad <p><strong style="color: rgb(51, 51, 51);">Tit... Full-time 36 60 Madhav Kommineni [email protected] Facial recognition FaceOpen Techolution LLC
18 d4847f54-7a0a-44dd-b3a6-b93c7fb3cb7d Sr SDET 1903 sr-sdet 2019-02-14T16:38:50.411436Z New York <p>Techolution is a premier cloud, user interf... Full-time 36 120 Satish Kumar [email protected] Information Technology UI/UX Modernization - US Techolution LLC
17 c4daf0d7-f86f-4d86-b62e-ab9117ba2800 OSS DevOps Engineer 1905 oss-devops-engineer 2019-02-14T16:55:20.844881Z Hyderabad <p><strong>Title&nbsp;: OSS DevOps Engineer</s... Full-time 72 144 Pavan Kumar [email protected] Information Technology Cloud Automation - India Techolution Pvt Ltd
df.to_csv("jobfile.csv",encoding='utf-8', index=False)

Saving the file as jobfile.csv. This is the required file