Machine learning workflow

Think of Machine Learning like teaching a child to recognize fruits.
Just like humans learn step-by-step, machines also learn using a sequence of steps. These steps form the Machine Learning Workflow. Figure depicts Machine Learning Workflow.


Step 1: Data Collection- Collect Data (Raw Data)

This is the first and most important step.

Every ML model starts with data

Meaning:
Gather the information the machine will learn from.

Example:
If the goal is to identify fruits, the data could be many images of apples, bananas, grapes.

  • Student marks data
  • Email spam dataset
  • Patient medical records
  • Product reviews


Data is like the study material for the machine. Without study material, learning cannot start.


Step 2: Data Preprocessing (Cleaning the Data) – Preprocess & Clean Data

Raw data is usually messy, incomplete, or inconsistent.

Meaning:

Real-world data is messy.
We must clean it before giving it to the model.
Fix mistakes in the data, remove noise, and make everything clean and uniform.

For example,

🔹 1. Handling Missing Values

Explain like this:

“Sometimes, some information is not available in data.”

Example table:

StudentMarks
A85
B
C78

Ask students:

“Can a model work with blank values?”

Answer: ❌ NO

What we do:

  • Remove that row
  • OR fill missing value with:
    • Average marks
    • Most common value

Say:

“We either remove missing data
or fill it with a reasonable value.”


🔹 2. Removing Duplicate Records

“If the same student’s record appears twice,
the model thinks there are two students.”

Example:

StudentMarks
A80
A80

Say:

“This gives wrong importance to some data.”

Solution:

  • Keep only one copy
  • Delete duplicates

Example:
Removing blurry images, correcting wrong labels, resizing pictures, removing duplicate entries.

Simple explanation:
Before the machine studies, we clean the notes so it doesn’t get confused.


3. Choose a Model

Different tasks need different models.

Meaning:
Select the type of machine learning algorithm suitable for the problem.

Example:
For fruit classification, you might use a decision tree, neural network, etc.


Choosing a model is like choosing the right tool—scissor for paper, knife for fruit.

  • Linear Regression
  • Decision Tree
  • KNN
  • Naive Bayes
  • SVM

STEP 4: Training and Validation

What it means:

Teaching the model using data and checking its learning.

What is Model Training?

Model Training is the process of teaching a machine learning model using past data so that it can learn patterns and make correct predictions in the future.

During training, the model is given input data and correct answers (labels). The model makes predictions, compares them with the actual answers, and improves itself by adjusting its internal values (called weights).

This step internally includes Train–Test Split.

What happens inside this step:

  • Dataset is divided into:
    • Training data (for learning)
    • Validation/Test data (for checking)
  • Model learns patterns from training data
  • Validation checks if learning is correct

training data is like studying,
validation is like practice tests.

In very simple words:

Model training means making the machine learn from examples.

for example:

Student Pass / Fail Prediction

Step 1: Training Data (Past Records)

We give the model old student data:

MarksResult
80Pass
35Fail
60Pass
40Fail

This is called training data.

The model looks at this data and starts learning a pattern:

Higher marks → Pass
Lower marks → Fail

This learning process is Model Training.


 Validation (Checking the Learning)

Now we give different student data (not used in training):

MarksActual Result
70Pass
38Fail

This is validation data.

Here we check:

  • Did the model learn correctly?
  • Is it predicting properly?


STEP 5: Evaluation

What it means:

Measuring how well the trained model performs.

Explain clearly:

“Now the model is tested on unseen test data.”

This is where performance metrics are calculated:

  • Accuracy
  • Precision
  • Recall
  • F1-score

All performance metrics are calculated on TEST DATA,
not on training data.


6. Deploy the Model

Once the model works well, use it in the real world.

Meaning:
Put the model into an application so others can use it.

Example:
A mobile app that identifies fruits in real time.

Simple explanation:
“This is when the machine starts using its knowledge to help people.”

7. Model Inferencing

What is Model Inferencing?

Model Inferencing means using a trained and deployed model to make predictions on new, unseen data.

In simple words:

Inferencing = Asking the trained model to give answers.

After a model is trained and deployed, it is ready to work in the real world. Whenever new input is given, the model processes it and produces an output — this process is called inferencing.

Think of this sequence:

  • Training → Teaching the model
  • Deployment → Placing the model in an application
  • Inferencing → Using the model

So,

Deployment makes the model available.
Inferencing actually uses the model.

Easy Analogy

  • Training → Studying from textbooks
  • Deployment → Getting a job
  • Inferencing → Doing daily work

You don’t start working just by studying — you work after you get the job

Example 1: Student Pass / Fail Prediction

Step 1: During Training

We give the model past student data:

MarksResult
80Pass
35Fail
60Pass
40Fail

The model learns a pattern like:

“Higher marks → Pass, lower marks → Fail”


Step 2: Deployment

Now this trained model is added to a college software system.


Step 3: Inferencing (Real Use)

Now a new student’s marks are entered:

Marks = 72

The model thinks using what it learned and predicts:

Output: Pass

This prediction process is called model inferencing.

Example 2: Email Spam Detection

  • Model is trained using old emails (spam / not spam)
  • Model is deployed in Gmail
  • A new email arrives
  • Model predicts: Spam or Not Spam

That prediction = Inferencing


Summary

Collect Data: Get the study material

  • Clean Data: Make the material neat
  • Choose Model: Pick the right learning method
  • Train Model: Machine learns
  • Evaluate Model: Check performance
  • Deploy Model: Use it in real world
  • Model Inferencing (Prediction on new data)