Top 10 Python Functions to Automate the Steps in Data Science

DATASET can be downloaded here -> https://www.kaggle.com/vetrirah/customer

Steps for Applied Machine Learning (ML) for Hackathons :

  1. Understand the Problem Statement & Import Packages and Datasets.

  2. Perform EDA (Exploratory Data Analysis) - Understanding the Datasets :

    • Explore Train and Test Data and get to know what each Column / Feature denotes.
    • Check for Imbalance of Target Column in Datasets.
    • Visualize Count Plots & Unique Values to infer from Datasets.
  3. Remove Duplicate Rows from Train Data if present.

  4. Fill/Impute Missing Values Continuous - Mean/Median/Any Specific Value & Categorical - Others/ForwardFill/BackFill.

  5. Feature Engineering

    • Feature Selection - Selection of Most Important Existing Features.
    • Feature Creation - Creation of New Feature from the Existing Features.
  6. Split Train Data into Train and Validation Data with Predictors(Independent) & Target(Dependent).

  7. Data Encoding - Label Encoding, OneHot Encoding and Data Scaling - MinMaxScaler, StandardScaler, RobustScaler
  8. Create Baseline ML Model for Multi Class Classification Problem
  9. Improve ML Model,Fine Tune with MODEL Evaluation METRIC - "Accuracy" and Predict Traget "Outcome"
  10. Result Submission, Check Leaderboard & Improve "Accuracy" Score

1. Understand the Problem Statement & Import Packages and Datasets :

Customer_Seg.jpg

2. Perform EDA (Exploratory Data Analysis) - Understanding the Datasets :

2.1 Explore Train and Test Data and get to know what each Column / Feature denotes :

Python Function 1

Python Function 2

Python Function 3

Python Function 4

3. Remove Duplicate Rows from Train data if present :

Python Function 5

4. Fill/Impute Missing Values Continuous - Mean/Median/Any Specific Value & Categorical - Others/ForwardFill/BackFill :

Python Function 6

Python Function 7

Multi - Class Classification Problem - Target has more than 2 Categories -

Target - Segmentation has 4 Values of Customers ['D' 'A' 'B' 'C']

5. Feature Engineering

5.1 Feature Selection - Selection of Most Important Existing Features

5.2 Feature Creation - Creation of New Features from the Existing Features / Predictors :

6. Split Train Data into Train and Validation Data with Predictors(Independent) & Target(Dependent) :

7. Data Encoding - Label Encoding :

Python Function 8

8. Create Baseline ML Model :

Python Function 9

9. Improve ML Model,Fine Tune with MODEL Evaluation METRIC - "Accuracy" and Predict Target "Segmentation" :

Python Function 10

10. Result Submission, Check Leaderboard & Improve "ACCURACY" :