The future of intelligence is being written by machines.
Hello there! 👋
Another day, another blog! Today, I will be diving into something that’s been on my mind for a while — what exactly is Machine Learning? My first run-in with Machine Learning was during the final-year as an undergraduate, where I failed miserably at picking a topic to conduct my research on. Every topic I pitched to my advisor was met with a ‘meh, find better’, until all the articles and whitepapers led me to Machine Learning!
Like the name suggests, Machine Learning is basically the computer learning on its own. Well, how does that happen? Machine Learning in a nutshell is a branch in AI (Artifical Intelligence) where the computer is trained and tested on data that enables the computer to make decisions on their own afterwards without explicitly programming to do so. As cool as it sounds, the process behind the development of a Machine Learning model is prolong.
You cannot expedite the process but you can learn how to have fun! The steps I took for the development of my thesis were :-
1. Finding the ideal dataset
Finding a dataset that ticks all your requirements is a substantial part. Let’s say you’re building a model to predict if someone is diabetic. You need data points like age, bodymass, family history. If you stumble upon a dataset without age — a primary factor (since the bigger the age, the higher the risk). Don’t panic, you can work with it but its best to find a complete dataset. That’s why when push comes to shove, many engineers and data scientists just create their own datasets from scratch. A popular open sources for datasets is Kaggle.
2. Give the dataset a makeover
Before your model struts the runway, you data needs a glow-up! Its time we clean, polish and prep it like a pro! In this stage of the process, we preprocess the data. Missing values? Fill ’em. Outliers? Smooth ’em out. Duplicates? Show ’em the door. For your model to slay, the data cannot be a mess.
3. Choose the right model — never force a square peg in a round hole!
Just like how you wouldn’t wear a tuxedo to a beach, your model needs to match its scenario. Now that our data is clean, we need to find the right model to handle it. Are we looking to solve a classification problem? A regression problem? A clustering problem? Each section has its own toolbox of models that would require the researcher to experiment, tweak and find the one that fits just right!
4. Time to whip your model into shape — train it, test it, and hope it doesn’t fail
Once you find the best model for your usecase, its training time! Feed the model data, let it learn from the data and then evaluate by testing the model on brand-new, unseen data and see if the model’s learned something. Here we discover if your model’s been attentive or just memorizing like a student cramming the night before.
5. Get to know your model
Onto the final (and optional) stage! It depends based on the testing results, you understand if your model needs tweaking (parameter tuning or hyperparameter optimzation), some improvements or if the results are good enough to just make peace with the model and call it a day.
From obtaining the right dataset to getting your model to behave does not seem much, but its all about patience, a bit of trial and error and bucket loads of research! Reminder, each project has its own demands and unqiue challenges but this article was to provide a glimpse into the procedure. Do keep in mind that ever step, every tweak and every “Ha! got it” moment brings you closer to mastering this tool!
Hope you enjoy it as much as I enjoyed writing it!✌️