Work Flow
A classification problem involves preparing the data, training a model on a
labeled dataset, evaluating its performance, and deploying it to make predictions on new, unseen
instances.
-
1 Data cleaning & Pre-processing
-
2 Handling class Imbalance
-
3 Building models
-
4 Tuning Hyper-parameters
-
5 Model Evaluation
The process of solving a classification problem involves several steps. First, the
data is collected and prepared, ensuring it is in a suitable format. Then, the dataset is divided
into a training set and a test set. A suitable classification algorithm is selected, and the model
is trained on the training set, adjusting its parameters to minimize prediction errors. The model's
performance is evaluated using metrics like accuracy or F1 score on the test set. If the performance
is satisfactory, the model is deployed to make predictions on new data. Regular monitoring and
reevaluation are necessary to maintain the model's effectiveness.
Grid search CV
In our project, we leverage the power of Grid Search Cross-Validation (CV) to
fine-tune our
machine learning models and optimize their performance. Grid Search CV is a technique that helps
us systematically search for the best combination of hyperparameters for our models, ensuring
they deliver the best results.
Summary
After thorough evaluation and comparison, we can confidently state that the Gradient
Boosting
model is the best choice for detecting fraudulent claims in this project. Its superior
performance and robustness in handling imbalanced datasets make it a valuable asset for fraud
prevention, providing a dependable and accurate solution to safeguard against fraudulent
activities in insurance claims and similar scenarios.