I wrote this code for a project for the School of Management at the University of San Francisco. The purpose of the code was to be able to get a Dataset that captures different jobs posts from Indeed that can be analyzed to understand the job market demand, in terms of job titles, skills, industries and locations.
Importing needed packages.
Loading CSV files that are the inputs for the code, the files contain the job searches and keywords you are looking for in each job.
Phase 1 - Scraping Job links: This code will create a dataset of all the job searches with links for each job.
Phase 2 - Scraping the body of job posts: This code will pull the body text of each job in the created dataset.
Phase 3 - Text mining the body of job posts: This code will look for the desired keywords and min years of experience needed for each job.
Phase 4 - Classifying jobs: This code classifies the jobs into different categories: Job title, Business Function / Department, and city.
These files act as inputs for the code, to make it easier to edit based on need.
You should edit your searches and desired keywords based on your need before running the python code.
This file contains the desired searches needed to be scraped from Indeed: There are 4 columns:
Job Title: The job title you want to search for.
Job Location: The location you want to search for jobs in.
search radius (in miles): the radius of your job search.
This file contains the desired keywords that you want to check if they are mentioned in the body text of each job post. The results for each keyword will be shown in a binary format (‘1’ if the word exists in the job post, ‘0’ if the word does NOT exist in the job post.)
Each keyword in this file will become a binary variable in the final dataset.
Each column in the CSV file represents a category of keywords (e.g.: Soft Skills, Analytics Skills, Programing Languages.) The header of the dataset has names of each category, so that the user can easily know what the different categories of keywords are.
The second row in the dataset has acronyms for each keyword category, these acronyms will become prefixes for the keywords when they are appended in the final dataset, to keep the keywords’ variables nice and clean to find and use.
You can add as many categories as you want, just make sure you follow the same format as the file shared in my repository and read the guidelines.
The code might take a long time to execute depending on the amount of job searches needed, however the code will show you the progress made while it is running, so you can sit, relax, or even run the code and go to sleep and come back later to see the results.
The code will track time needed for each phase in the code and will give you a summary at the end of Phase 4. Here is the summary I got after I ran the code for my 1240 job searches:
Me and many of my colleagues have used this code for our job search, we would edit the searches.csv file based on what we are looking for, and input keywords of skills that we have and want to find jobs that are looking for these skills.
We then would access the final dataset and filter the jobs to ones that match the skills/keywords that we have, and would end up with a list of jobs that exactly match our skillset, then we simply click on the links of these jobs and apply.
Professor Mehrotra was the project supervisor for this project, he shaped the scope and timeline for this project, and provided the team with feedback and guidelines to achieve the project’s goal effectively in a timely manner.
Linkedin Profile: https://www.linkedin.com/in/vijay-mehrotra-ba9498 ##
Zeus Habash is the author of this README, I wrote the code and documentation for this project on Python, and made sure that the program meets the guidelines offered by Prof. Mehrotra, and collaborated with Joaquin Hernandez to reach our goal of maximizing the utilization of this program to be as beneficial to as many stakeholders as possible. I also made sure that this code can be understood, used, and edited by any Pythonista.
Linkedin Profile: https://www.linkedin.com/in/zeus-habash
Website: https://zeushabash.com ##
Joaquin Hernandez was the subject matter expert in this project. He provided the team with the job searches needed to acquire our desired dataset. He also identified keywords and metrics needed for the analysis of the dataset. Additionally, Joaquin worked on the logic of classifying the jobs into fewer job titles categories, and different business functions.
Linkedin Profile: https://www.linkedin.com/in/joaquin-h-43b9b795
Website: https://iamjoaquinhernandez.com ##