Repository logo
 
No Thumbnail Available
Publication

Predicting hourly demand for shared bicycles with weather data and machine learning models

Use this identifier to reference this record.
Name:Description:Size:Format: 
203278666.pdf1.39 MBAdobe PDF Download

Abstract(s)

This thesis aims to analyze the bike sharing system in Chicago and apply predictive models that accurately predict the hourly demand for shared bicycles by using time­related and weather­related features. The dependent variable is Count, expressing the sum of the number of bicycles used per hour. Predictive models that are used for this regression problem are Linear Regression, Random Forest, Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting, and Multi­Layer Perceptron. Accuracies of these predictive models are measured by R2_score, Root Mean Square Error and Mean Absolute Error. For better predictions, different hyperparameters are used in predictive models. Without hyperparameters, Random Forest achieves the best accuracy measures. However, after using hyperparameters, Gradient Boosting predicts the most accurate results. The accuracy of Gradient Boosting boosts with hyperparameters, whereas Random Forest is almost unaffected by them for this regression problem. The second­best model when using hyperparameters is Extreme Gradient Boosting. The neural network model, Multi­Layer Perceptron presents less accurate results than the Random Forest and the Boosting models for this type of problem. Features that are most important for predictive models to forecast accurately were Temperature, Hour, Weekend, Pressure, Uv_Index, and Day.
Esta tese, ao debruçar­se sobre o sistema de partilha de bicicletas em Chicago, pretende contribuir para a implementação de modelos que permitem analisar, com rigor, a procura por hora de bicicletas partilhadas, utilizando componentes temporais e climatéricas. A variável dependente é o Count, que representa o somatório do número de bicicletas utilizadas por hora. Os modelos preditivos utilizados neste problema de regressão são: Linear Regression, Random Forest, Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting, e MultiLayer Perceptron. A precisão destes modelos é medida através do R2_score, Root Mean Square Error e Mean Absolute Error. No intuito de minimizar o grau de erro são utilizados vários hiperparâmetros para os diferentes modelos preditivos. Sem hiperparâmetros, o Random Forest alcança as melhores previsões. Contudo, após a utilização de hiperparâmetros, o Gradient Boosting prevê resultados mais precisos. A precisão do Gradient Boosting aumenta com a utilização de hiperparâmetros, enquanto que o Random Forest não é afetado por eles, de modo significativo. O segundo melhor modelo ao utilizar hiperparâmetros é o Extreme Gradient Boosting. O modelo de rede neural Multi­Layer Perceptron, apresenta resultados menos precisos do que o Random Forest e os modelos de Boosting. As características mais importantes para que os modelos preditivos revelem maior exatidão foram: Temperature, Hour, Weekend, Pressure, Uv_Index, e Day.

Description

Keywords

Demand forecasting Shared bikes Weather data Machine learning Predictive models Procura Bicicletas partilhadas Dados meteorológicos Aprendizado de máquina Modelos preditivos

Pedagogical Context

Citation

Research Projects

Organizational Units

Journal Issue