As Indeed continues to grow, we’re finding more ways to help people get jobs. We’re also offering more ways job seekers can see those jobs. Job seekers can search directly on Indeed.com, receive recommendations, view sponsored jobs or Indeed Targeted Ads, or receive invitations to apply — to name a few. While each option presents jobs in a slightly different way, our goal for each is the same: showing the right jobs to the right job seekers.
If we miss the mark with the jobs we present, you may lose trust in our ability to connect you with your next opportunity. Our mission is to help people get jobs, not waste their time.
Some of the ways we’d consider a job to be wrong for a job seeker are if it:
- Pays less than their expected salary range
- Requires special licensure they do not have
- Is located outside their preferred geographic area
- Is in a related field but mismatched, such as nurses and doctors being offered the same jobs
To mitigate this issue, we built a jobs filter to remove jobs that are obviously mismatched to the job seeker. Our solution uses a combination of rules and machine learning technologies, and our analysis shows it to be very effective.
System architecture
The jobs filter product consists of the following components, as shown in the preceding diagram:
- Jobs Filter Service. A high throughput, low latency application service that evaluates potential match-ups of jobs to users, identified by ID. If the service determines that the job is appropriate for the user ID, it returns an ALLOW decision; otherwise it returns a VETO. This service is horizontally scalable so it can serve many real-time Indeed applications.
- Job Profile. A data storage service that provides high throughput, low latency performance. It retrieves job attributes such as estimated salary, job titles, and job locations at serving time. The job profile uses Indeed NLP libraries and machine learning technologies to extract or aggregate user attributes.
- User Profile. Similar to the job profile, but provides attributes about the job seeker rather than the job. Like the job profile, it is a data storage service that provides high throughput, low latency performance. It retrieves job seeker attributes such as expected salary, current job title, and preferred job locations at serving time. Like the job profile, it uses Indeed NLP libraries and machine learning technologies to extract or aggregate user attributes.
- Offline Evaluation Platform. Consumes historic data to evaluate rule effectiveness without actually integrating with the upstream applications. It is also heavily used for fine-tuning existing rules, identifying new rules, and validating new models.
- Offline Model Training. Component that consists of our offline training algorithms, with which we train models that can be used in the jobs filter rules at serving time for evaluation.
Filter rules to improve job matches
The jobs filter uses a set of rules to improve the quality of jobs displayed to any given job seeker. Rules can be simple: “Do not show a job requiring professional licenses to job seekers who don’t possess such licenses,” or “Do not show jobs to a job seeker if they come with a significant pay cut.” They can also be complex: “Do not show jobs to the job seeker if we are confident the job seeker will not be interested in the job titles,” or “Do not show jobs to the job seeker if our complex predictive models suggest the job seeker will not be interested in them.”
All rules are compiled into a decision engine library. We share this library in our online service and offline evaluation platform.
Although the underlying data for building jobs filter rules might be complex to acquire, most of the heuristic rules themselves are straightforward to design and implement. For example, in one rule we use a user response prediction model to filter out jobs that the job seeker is less likely to be interested in. An Indeed proprietary metric helps us evaluate our performance by measuring the match quality of the job seeker and the given jobs.
Ads ranking and recommender systems commonly rely on user response prediction models, such as click prediction and conversion prediction, to generate a score. They then set a threshold to filter out everything with low scores. This filtering is possible because the models predict positive reactions from users, and low scores indicate poor match quality.
We adopted similar technologies in our jobs filter product, but we used negative matching models when designing our machine learning based rules. We build models to predict negative responses from users. We use Tensorflow to build the Wide and Deep model. This facilitates future experimentation with more complex models such as Factorization machine or neural networks. The features we use cover major user attributes and job data.
After we train a model that performs well, we export it using the Tensorflow SimpleSave API. We load the exported model into our online systems and serve requests using the Tensorflow Java API. Besides traditional classifier metrics such as AUC, precision, and recall, we also load our model into our offline evaluation platforms to validate the performance.
Putting it all to work
We apply our jobs filter in several applications within Indeed. One application is Job2Job, which recommends similar jobs to the job seeker based on the jobs they have clicked or applied for. Using the Job2Job service, we saw a greater than 20% increase in job match quality. When we applied the service to other applications, we observed similar, if not greater, improvements.
Rule-based engines work well in solving corner cases. However, the number of rules can easily spiral out of control. Our design’s hierarchy of rules and machine learning technologies effectively solve this challenge and keep our system working. In the future, we aim to add more features into the model so that it can become even more effective.