I’ll show you here how you can start building an AI-based recruitment platform.

This image was generated by Dall-E

Have you ever wanted a sourcing platform where you could start a recruitment workflow with this text:

“Startup is looking for a founder engineer with experience in Ethereum and smart contracts. Experience in frontEnd development with Angular is also desired.”

And then that system returns you a list of candidates as:

CATEGORY: Blockchain
RESUME: Skills Strong CS fundamentals and problem solving Ethereum, Smart Contracts, Solidity skills Golang,…

— —

CATEGORY: Blockchain
RESUME: SKILLS Bitcoin, Ethereum Solidity Hyperledger, Beginner Go, Beginner R3 Corda, Beginner Tendermint, Nodejs, C Programming …

— —

And for the highest-ranked candidate, the system would automatically ask and provide the following answers:

1 — What are your main technical skills?

2 — What was your major at school?

3 — Please list all the SQL databases you have worked with.

4 — Can you List all your work experience?

Answers:

1 — My main technical skills are CS fundamentals and problem solving, Ethereum, Smart Contracts, Solidity skills, Golang, Node, Angular, React.

2 — I have M.Tech in Computer Engineering from Jaipur, Rajasthan Malaviya National Institute Of Technology Jaipur.

3 — I have worked with MongoDB, PostgreSQL, MySql.

4 — I have worked with Ethereum, Smart Contracts, Solidity, Golang, Node, Angular, React, CakePHP (PHP Framework), JQuery, MySql.

Looks interesting, right? In this post, I will show you a quick approach to building a recruitment platform based on AI that implements these nice features described above. However, here I’m not going to dive deeply into the theory behind the tools and concepts used here; I’m sure you can follow the dots and go into further conceptual details yourself.

The following are the concepts, tools, and libraries that I used to build this prototype:

1 — Semantic search.

2 — Embeddings.

2 — Facebook Faiss.

3 — Hugginface’s Transformers.

4 — Huggingface’s datasets.

5 — Openai’s GPT-3.

6 — A resume dataset hosted in Kaggle.

This article is the continuation of an initial post on Resume Analysis with GTP-3 that I published before.

The motivation behind a recruitment platform based on AI at a glance

In a sourcing company, a recruitment workflow usually starts with a requirement from a customer to fill one or several positions at their company that can be in the following format:

“Startup is looking for a founder engineer with experience on Blockchain and smart contracts. Experience in frontEnd development with Angular is also desired..”

The problem comes when you hand over a job post as the above to a junior recruiter, who probably lacks the technical background to infer the client is looking for a senior developer with experience in Ethereumsolidity, and front-end. So, most of the time, the recruitment leader or a senior developer must preprocess the job post to provide the details (Ethereum, Solidity, etc.) the junior recruiter uses to browse the company’s resume database and find suitable candidates.

So, based on the previous “issues”, I think we have enough motivation to prototype a platform that:

  • Allows you to get the most relevant (high-ranked) resumes from a “raw” Job post. That would significantly reduce operational time and costs.
  • Provides insightful information about the candidate and allows you to ask language natural questions on the resume.

Having a clear motivation, let’s get straight to the technical solution.

Structuring the project

For this project, we will leverage Semantic search; according to Wikipedia, “Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seeks to improve search accuracy by understanding the searcher’s intent and the contextual meaning of terms as they appear in the searchable dataspace.”.

In other words, we will use semantic search to capture the intent or meaning of a job post and then match that contextual meaning to relevant resumes from our searchable dataspace (the company’s resume database). So, before matching the job post to the resume database, we must capture the contextual meaning of all the resumes we have and save them in suitable storage that we can query in the future.

In NLP (Natural language processing) terms, the contextual meaning of a sentence, word, or paragraph is called an embedding vector, a multidimensional float vector ranging from hundreds to thousands of dimensions. We can use pre-trained NLP models to extract embedding vectors. In this case, we are using the fantastic library HuggingFace Transformers to extract the embedding vectors of all our resumes.

Assuming we have the embedding of all our resumes, matching a job post to candidates consists in extracting the embeddings of a job post and using it to find the most similar vectors in our resume repository. Usually, using a distance search algorithm like cosine-similarity or product dot.

Facebook Faiss, searchable storage for our resume embeddings

We will store our resume embeddings in Facebook Faiss, a scalable and ultra fast index that allows store and search thousands and even billions of vectors.

So far, we have structured the following workflow:

1 — Extract the embedding vector of all our resumes using a pre-trained model from Huggingfaces.

2 — Store the embedding vectors in Faiss.

3 — Extract the embedding vector of a job post.

4 — Query the Faiss index using the embedding vector of the job post.

We are using Python to build the solution. So, the first step is setting up the environment by installing the Huggingface Transformers, Faiss, and the other required libraries (you can find all the code in this notebook):

!pip install datasets evaluate transformers[sentencepiece]
!pip install faiss-gpu
!pip install -U sentence-transformers
from transformers import AutoTokenizer, AutoModel
from transformers.pipelines import pipeline
from datasets import load_dataset, Dataset

The resume dataset

The bank of resumes we are using here comes from the resume-dataset available on Kaggle. This dataset has two columns [‘Category’,’Resume’] and 963 rows:

 

To start processing the resumes, we need to load the data in CSV format into a Huggingface Dataset object. HuggingFace Dataset is excellent for handling large volumes of information and integrates seamlessly with Faiss.

resume_dataset = load_dataset("csv", data_files='UpdatedResumeDataSet.csv', split="train")resume_dataset = Dataset.from_pandas(df)resume_dataset

As mentioned before, a key point of the solution is to extract the Embedding vector of each resume. HuggingFace provides several pre-trained Transformers for that purpose; we’ll use the model sentence-transformers/multi-qa-mpnet-base-dot-v1 that, according to Huggingface’s documentation, performs great for Semantic search.

model_ckpt = "sentence-transformers/multi-qa-mpnet-base-dot-v1"tokenizer = AutoTokenizer.from_pretrained(model_ckpt)model = AutoModel.from_pretrained(model_ckpt)

So now, we can extract each resume’s embedding vector and add it to a new column called embeddings. You can see in the following code snippet how straightforward it is.

embeddings_dataset = resume_dataset.map( lambda x: {"embeddings": get_embeddings(x["Resume"]).detach().cpu().numpy()[0]} )

Then, we can add a Faiss index to the new embeddings column.

embeddings_dataset.add_faiss_index(column="embeddings")

We are ready to define the text (Job description) that will match candidates using semantic search.

question = '''
Startup is looking for a founder engineer with experience
on Blockchain and smart contracts.Experience on frontEnd
development with Angular is also desired.

'''

And then extract the embeddings of that job post.

question_embedding=get_embeddings([question]).cpu().detach().numpy()
question_embedding.shape

Now we can search the index to find the five resumes most similar to the job post.

scores, samples = embeddings_dataset.get_nearest_examples( "embeddings", question_embedding, k=5)

Then we use using Pandas to sort and show the query result.

samples_df = pd.DataFrame.from_dict(samples)
samples_df["scores"] = scores
samples_df.sort_values("scores", ascending=False, inplace=True)
for _, row in samples_df.iterrows():
print(f"CATEGORY: {row.Category}")
print(f"SCORE: {row.scores}")
print(f"RESUME: {row.Resume}")
print("=" * 50)
print()

And this is the result:

CATEGORY: Blockchain 
SCORE: 29.915586471557617
RESUME: Skills Strong CS fundamentals and problem solving Ethereum, Smart Contracts, Solidity skills Golang, Node, Angular, React Culturally fit for startup environment MongoDB, PostGresql, MySql Enthusiastic to learn new technologies AWS, Docker, Microservices Blockchain, Protocol, ConsensusEducation Details January 2014 M.Tech Computer Engineering Jaipur, Rajasthan Malaviya National Institute Of Technology Jaipur January 2011 B.E. Computer Science And Engg Kolhapur, Maharashtra Shivaji University Blockchain Engineer Blockchain Engineer - XINFIN Orgnization Skill Details MONGODB- Exprience - 16 months CONTRACTS- Exprience - 12 months MYSQL- Exprience - 9 months AWS- Exprience - 6 months PROBLEM SOLVING-
Exprience - 6 monthsCompany Details company - XINFIN Orgnization description - Xinfin is a global open source Hybrid Blockchain protocol. Rolled out multiple blockchain based pilot projects on different use cases for various clients. Eg. Tradefinex (Supply chain Management), Land Registry (Govt of MH), inFactor (Invoice Factoring) Build a secure and scalable hosted wallet based on ERC 20 standards for XINFIN Network. Working on production level blockchain use cases. Technology: Ethereum Blockchain, Solidity, Smart Contracts, DAPPs, Nodejs company - ORO Wealth description - OroWealth is a zero commision online investment platform, currently focused on direct mutual funds Build various scalable web based products (B2B and B2C) based on MEAN stack technology and integrated with multiple finance applications/entities. eg. Integration KYC and MF Entities. Technology: Node.js, Angular.js, MongoDB, Express company - YallaSpree description - Hyderabad, Telangana Yallaspree is a largest digital shopping directory in U.A.E with over 22K stores. Own the responsibility to develop and maintain following modules: - Admin and Vendor interface - Database operations - Writing Webservices - Complete Notification system - Events and Offers Page Technology: CakePHP (PHP Framework), JQuery, MySql company - RailTiffin.com description - Mumbai, Maharashtra RailTiffin.com is an e-commerce platform to serve food to railway passengers. Worked on multiple roles like bug fixing, DB operations, Feature customisation and writing API endpoints. Technology: OpenCart (Ecommerce Framework), JQuery, MySql company - Accolite Software India Private Limited description - Bengaluru, KA Accolite is a global IT Services company headquartered in Dallas, USA with offices in India. Worked on Birst Analytics Tool to develop, deploy and maintain reports ==================================================
...

By default, I want our system to ask some key questions on the top candidate’s resume returned by the query. HuggingFace provides a convenient question-and-answer pipeline we can use.

from transformers.pipelines import pipelinequestion_answerer = pipeline('question-answering')

Let’s see if this pipeline provides good answers to our questions:

answer = question_answerer(question='What are your main technical skills?', context=samples_df['Resume'][0])print(answer)---{'score': 0.7134349346160889, 'start': 2238, 'end': 2312, 'answer': 'bug fixing, DB operations, Feature customisation and writing API endpoints'}

And also this one:

answer = question_answerer(question='What was your major at school?', context=samples_df['Resume'][0])print(answer)---{'score': 0.00600675493478775, 'start': 2475, 'end': 2486, 'answer': 'IT Services'}

In my opinion, those answers are not good enough; I think we can aspire to better results.

So far, our solution does not rely on third-party APIs. However, since the results of the Q&A part of our solution may be better, we can use GPT-3 to see if we can get better results in that specific functionality.

Be aware that GPT-3 implies costs; sometimes, it can go very high based on the frequency, model, and request size. Fortunately, the “heavy” parts of our process (extracting embeddings and searching the index) don’t depend on GPT-3.

# let's create a prompt based on the most relevant resumeopenai_prompt = samples_df['Resume'][0]openai_prompt += '''Answer the following questions:1 - What are your main technical skills?2 - What was your major at schools?3 - What is your experience with databases?4 - Can you List all your work experience?Answers:'''

Let’s invoke GTP-3

openai.Completion.create(
prompt=openai_prompt,temperature=0,
max_tokens=100,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
model=COMPLETIONS_MODEL)["choices"][0]["text"].strip(" \n")

And this is the answer:

1 - I have strong CS fundamentals and problem solving skills in Ethereum, Smart Contracts, Solidity.
2 - I am a Computer Engineering graduate from Jaipur, Rajasthan Malaviya National Institute Of Technology.
3 - I have experience with databases such as MongoDB, PostgreSQL, MySql.
4 - I have worked in various roles such as Blockchain Engineer, Ethereum Developer, Web Developer, and Database Administrator.

Much better results, I think!

We have reached the end of our adventure, creating the prototype of an AI-based recruiting platform. I think this article can help lay the foundations of an actual digital product. All the code shown in the article is in this notebook.

Next steps:

The current prototype can be powerful enough for small and medium-sized agencies to start optimizing their recruitment processes. However, to create a more robust product, the following elements can be improved:

1 — On some occasions, search results might include irrelevant elements. To deal with that cases, we must add a re-ranking phase to process and filter the query results before returning them.

2 — We can add a logistic regression model to validate if a job prompt or search request is valid or relevant.

I hope this article has been helpful. If you have any questions or need some help to turn this prototype into a production-level product, send me an email.

Thanks for reading!

Stay tuned for more content about GPT-3, NLP, System design, and AI in general. Follow Klever for Solutions on LinkedIn too.

Resume analysis with GPT-3.

Hi, thanks for being here. I suppose that if you are reading this article, you might have already heard about GPT-3, the powerful AI created by OpenAi. I will not be entering into much detail about GPT-3’s history and capabilities because I think there are already many excellent introductory articles about this fantastic AI on the web.

Read More »
What is the interview process for software engineers?

Are you a recently graduated software engineer looking for your first job? Or has it been a while since you applied for a job as a software engineer? Either way, interview processes might seem intimidating. Let’s see what the interview process is for us, software engineers.

Read More »