ML Diaries: Day 2
All Things Data, Features, Evaluation and Problem Definition
— a daily log of my learning and projects built as I take up Machine Learning. Welcome to The Mind Palace by Dayo :)
Date: Aug 19, 2022
Day 2 of the ‘Complete Machine Learning & Data Science Bootcamp 2022’ involved understanding the necessary theoretical concepts pertaining to Machine Learning.
The Machine Learning Framework (ELI5)
The entire process of a machine learning project can be batched into three stages: Data Collection, Data modelling and Model Deployment. Data collection involves getting data and making it model-ready (i.e. ready to use); data modelling involves developing and training the machine learning model, and model deployment involves applying the model to solve other problems.
There. (see a simple explanation of Machine Learning from Day 1 here.)
Now, data modelling is an iterative process. Meaning it happens in a cycle of some steps. These are:
- Problem definition
While these six steps make up the data modelling process, they are not linear. A machine learning engineer might start from defining a problem in machine learning terms through to the last step. Or might find out that the model isn’t accurate enough and return to the evaluation step, back to the data step, to the features, and … you get the point.
Problem definition answers the question “what problem am I trying to solve?” but is described clearly and in machine learning terms. Is this a supervised or unsupervised learning problem, or something else? (Those are types of machine learning problems.) Knowing what type of problem you’re dealing with, as an ML engineer, informs you on how to approach solving the problem.
Data answers the question “what kind of data am I working with?”. Is it structured or unstructured, and is it static or streaming? Rightly identifying the data type directs the machine learning engineer on what tools to utilize for the developed model to effectively extract patterns and make predictions.
Evaluation comes next after working on the data. The evaluation step answers the question “what defines success for us?”. In other words, what am I, the machine learning engineer supposed to aim for; what exactly is my goal? Basically, evaluation defines the indicators for a successful model (which differs based on the type of machine learning model).
Features. To put it succinctly (and in my own words), features are the “properties” of a data(set). These are characteristics the model looks out for to give a solution to a problem. Say we use body weight and chest pain of patients to determine if they have heart disease, the body weights and chest pain are the features used to build the model. These features could be numerical or categorical in nature.
Defining the problem tells the type of machine learning problem, and understanding the type of ML problem informs what evaluation metric to adopt to measure the success of an ML model.
And this is where I stop for Day 2. (Day 3)
Of course, each step contains more information like feature coverage, feature engineering, evaluation metrics, and more but they are not for today’s write-up. You can check out this post by Daniel Bourke, who is one of the instructors, to go more in-depth.