Modeling contextual uncertainty in crowdsourcing using Gaussian Processes

April 3, 2025

What is modeling contextual uncertainty in crowdsourcing?

Domains like crowdsourcing are extensively researched and are applied in multiple domains. Some of these applications may include eliminating offensive content from social media websites, training different machine learning models, and calculating the prevalence in a specific population.

Human beings have always been noisy decision makers which is why the labels we receive can also be incorrect. A commonly accepted process is to accumulate the label from multiple labels while keeping in mind their level of accuracy and aggregate them all together.

As mentioned earlier with the help of crowdsourcing we can eliminate the contents that are liable to violate the online community standards of social media platforms. For this specific task, we can use Theodon, which is a Bayesian non-parametric model that can learn commonness in the label categories and target the preciseness of the labels in a specified context.

In this model Gaussian Processes (GPs) are used to find the prevalence, accuracy, and responsiveness of the functions.

Application of the modeling contextual uncertainty in crowdsourcing using Gaussian Processes

Crowdsourcing using Gaussian Processes can be applied to a social media platform to measure the commonness of violations. The ubiquity of violence is usually very low on social media platforms and labeling the entire population is next to impossible (due to its extensively large volume).

That is why it is better to demonstrate the likely violations. In this case, a classifier is used to anticipate the prospect of violation for every individual in the entire population. You also need to keep in mind that these types of classifiers are extremely dissimilar from the enforcement classifiers that help in abolishing transgressing content with higher accuracy.

The up-sampling classifiers are trained with the help of content features and that is why it can give us genuine context in the case of labelers’ and ubiquity performance. Theodon receives the classifier’s score along with the labeler’s presentation and creates an average out of the labels.

In this particular scenario, Theodon can receive 1 to 4 percent enhancement in the predictions of AUC-PR when compared to the avant-garde baselines for public datasets, it is also fairly productive as a calibration method and it also can furnish detailed information on the execution of the labelers.

How the model works

We can take the Bayesian probabilistic approach while delineating the crowdsourcing models and evaluating the observed data. As for the variable we can take three of them: commonness, selectivity, and reactivity which is very identical to Skene and Dawid model. In Theodon the sample is extracted and sent for human review and the result of the review is again sent back to the Theodon model.

By now you will be able to understand how the Modeling contextual uncertainty in crowdsourcing using Gaussian Processes can be directly applied in machine learning and social media platforms to get rid of unnecessary objects and contents. Furthermore, it can also be used to evaluate and receive useful information on the changes in labeler performances.