Profile PictureYour Data Teacher
$14.99

Data pre-processing for machine learning in Python - eBook

Add to cart

Data pre-processing for machine learning in Python - eBook

$14.99

In this book, the author shows the practical use of Python programming language to perform pre-processing tasks in machine learning projects.

Download free preview

What is pre-processing?

Pre-processing is the set of transformations to be applied to a dataset before it can be used to train a machine learning model. It's a very important phase of a data science pipeline because a wrong pre-processing will give a very poor performance of the model, while a good pre-processing is able to make the model learn properly.

The pre-processing transformations shown in this book are:

  • Data cleaning
  • Encoding of the categorical variables (one-hot encoding and ordinal encoding)
  • Principal Component Analysis
  • Scaling (normalization, standardization, robust scaling)
  • Binarizing
  • Binning
  • Power transformations
  • Filter-based feature selection
  • Oversampling using SMOTE

All the transformations are described in theory and in practice using Python programming language and its powerful scikit-learn library.

What will I get?

By buying this book you'll get it in ePub and PDF formats. Plus, you'll get the sample dataset used in the book and 18 Python notebooks.

Is there a paperback format available?

Sure! You can order it on Amazon.

Order the paperback format

About the author

Gianluca Malato was born in 1986 and he is an Italian data scientist, teacher and author. In 2010, he received his Master’s Degree cum laude in Theoretical Physics of disordered systems at “La Sapienza” University of Rome (thesis advisors: Giorgio Parisi and Tommaso Rizzo). He has been working for years as a data architect, project manager, data analyst and data scientist for a large Italian company.

He is the founder of yourdatateacher.com, an online school where he teaches Data Science, Machine Learning, R, Python and SQL language using online courses and individual online training programs.

He has published several articles about Data Science on his blog yourdatateacher.com and on Towards Data Science online publication (towardsdatascience.com). He received the “Top Writer” mention on Medium.com in the “Artificial Intelligence” category for his articles.

He has written several fiction books in Italian, focusing on horror, thriller and fantasy genres.

E-mail: gianluca@yourdatateacher.com

Privacy

By buying this product you accept the Privacy Policy.

Add to cart

ePub and PDF versions of the book, with Python notebooks and the dataset

Pages
84
Python notebooks
18
Datasets
1
Copy product URL