The core objective of this project is to leverage advanced AI techniques to identify similarities among job descriptions based on the nature of the project, associated technologies, and requisite soft skills. This will be achieved by sourcing job descriptions from a range of popular job websites, parsing the unstructured free text data they contain, and structuring this information in a readily-analyzable format.
In the pursuit of this goal, we will utilize various tools to ensure a streamlined extraction process and to maintain the integrity of the data obtained. The results of this process will be stored as a graph for easier, more effective future analysis.
The key technologies planned for use in this project include:
- Langchain: This tool will be vital in chaining the extraction process and converting unstructured data into a more structured form. Find out more about Langchain here.
- Guardrails AI: This technology ensures that the output format and content generated are accurate and reliable. Learn more about Guardrails AI here.
- NetworkX: This Python library will be used to convert the extracted information into a graphical form, providing a clear, intuitive visual representation of our data. You can find more information about NetworkX here.
Currently, the project utilizes Chat GPT which incurs a cost of approximately $0.004 per call. As a potential cost-saving measure, we might consider switching to a locally hosted language model in the future, subject to the project's requirements and budget.
Please note that the links provided above are placeholders and you will need to replace them with the correct URLs related to each library or tool.