Exploring pseudo-labeling for reject inference

Martins, Margarida

Publicação

Exploring pseudo-labeling for reject inference

2024-01-25Dissertação de mestrado

datacite.subject.fos	Ciências Sociais::Economia e Gestão	pt_PT
dc.contributor.advisor	Brandão, Susana
dc.contributor.author	Martins, Margarida
dc.date.accessioned	2024-04-30T15:47:49Z
dc.date.available	2024-04-30T15:47:49Z
dc.date.issued	2024-01-25
dc.date.submitted	2024-01
dc.description.abstract	Banks use algorithms to estimate the credit risk of loan applicants. However, we need to retrain these models. When retraining, we only know the label, meaning whether the applicant defaulted or not, for those accepted for the loan. Retraining only with the accepted will result in biased models and losses for the bank due to selection bias. To counteract this issue, we can infer the labels of those rejected. This is known as reject inference. In this thesis, we will pursue pseudo-labeling to do reject inference, which needs two models, the first to create the pseudo-labels for the rejected and the second to make the final predictions. We will create the pseudo-labels by training a lightGBM on the available data. Afterward, we will apply a logistic regression as the final model. We will compare the results against a baseline, setting all rejected to a category (default /not default). In addition, we will compare to a scenario where the rejection results from random decision-making, experiment five rejection rates, and see the effect of setting to default vs. not default. We found that doing lightGBM to infer the labels had a lower F1 score, AUC, and profit for the bank. As such, the bank should set all rejected to a category. Additionally, we found that setting all to default has a higher recall in the rejected population and higher profit. Moreover, a lower rejection rate increases profits.	pt_PT
dc.description.abstract	Os bancos usam algoritmos para estimar o risco de crédito dos candidatos a empréstimos. No entanto, esses algoritmos necessitam de ser novamente treinados, mas para tal, é preciso possuir dados históricos com etiqueta. Neste caso, é necessário ter uma variável que indique se o candidato cumpriu na totalidade o pagamento do empréstimo. Nesta circunstância, só conhecemos a etiqueta de candidatos que foram aprovados para empréstimo. Ao treinar novamente apenas com estas observações, o modelo irá ser enviesado, resultando em perdas monetárias para o banco. De forma a impedir tais perdas, tentaremos apurar as etiquetas dos candidatos rejeitados. Nesta tese, iremos usar “pseudo-labeling” para inferir esta etiqueta. “Pseudo-labeling” funciona tendo dois modelos. Primeiro, criar-se-á “pseudo-labels” ao treinar o modelo “lightGBM”. Após, iremos aplicar regressão logística. No final, estes resultados serão comparados com o cenário de classificação de duas categorias, analisando ambas. Concomitantemente, iremos comparar com o cenário da decisão de rejeição inicial resultante do acaso e experimentar cinco taxas de rejeição sobre a regressão logística. Ao usar o “lightGBM” obteve-se um “F1”, “AUC” e lucro inferior. Como tal, o banco deverá classificar os rejeitados em uma das categorias. Sucede que se descobriu que classificar os rejeitados como incumpridores tem um ”recall” superior na população rejeitada e leva a um lucro superior. E que uma taxa de rejeição inferior tem um lucro superior.	pt_PT
dc.identifier.tid	203590783	pt_PT
dc.identifier.uri	http://hdl.handle.net/10400.14/44863
dc.language.iso	eng	pt_PT
dc.subject	Machine learning	pt_PT
dc.subject	Pseudo-labeling	pt_PT
dc.subject	Reject inference	pt_PT
dc.subject	Selection bias	pt_PT
dc.title	Exploring pseudo-labeling for reject inference	pt_PT
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess	pt_PT
rcaap.type	masterThesis	pt_PT
thesis.degree.name	Mestrado em Análise de Dados para Gestão	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: 203590783.pdf
Tamanho:: 1.13 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 3.44 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

R - Dissertações de Mestrado / Master Dissertations
CLSBE - Dissertações de Mestrado / Master Dissertations