Close

Getting Started Guide for Python, Data Science and Machine Learning for wanna be practitioners & hobbyists

A friend recently asked about a good book on data science, with python. I have been trying to go through the current landscape of books for a course I am teaching this summer, so here is my recommendation. There are several other good ones out there, but this one quite fits the newbie bill. So without further ado

 

Are you a Beginner who would like to learn python, in context with a specific area, and tired of using syntax focused books sans practical examples?

OR

Are you exploring data science landscape and want to see practical examples of how to actually use machine learning algorithms in data science context?

If you answer in the affirmative to either of the questions above, "Python for data science for dummies" is the perfect book for you. Luca Massaron is a practicing data scientist, and a prolific author of several books including Regression Analysis with Python , Machine Learning For Dummies, Python Data Science Essentials, Regression Analysis with Python, and Large Scale Machine Learning with Python. He is also a leading Kaggle enthusiast, and you can see his 'practitioner fingerprints' all over this book; especially in later chapters about data processing, ETL, cleanup, data sources, and challenges.

Python for Data Science For Dummies

 

This book starts with the fundamentals of Python data analysis programming, and explains the setup of Python development environment using anaconda with IPython (Jupyter notebooks). Authors start by considering the emergence of data science, outline the core competencies of a data scientist, and describe the Data Science Pipeline before taking a plunge into explaining Python’s Role in Data Science and introducing Python’s Capabilities and Wonders.

Once you get your bearings about the IDE setup, chapter 4 focuses on Basic Python before you get your Hands Dirty with Data. What I like about this manuscript is that the writing keeps it real. Instead of giving made up examples, authors talk about items like knowing when to use NumPy or pandas and real world scenarios like removing duplicates, creating a data map and data plan, dealing with Dates in Your Data, Dealing with Missing data, parsing etc; problems which practicing data scientists encounter on a daily basis.

Contemporary topics like Text mining are also addressed in the book with enough details of topics such as working with Raw Text, Stemming and removing stop words, Bag of Words Model and Beyond, Working with n‐grams, Implementing TF‐IDF transformations, and adjacency matrix handling. This is also where you start getting a basic understanding of how machine learning algorithms work in practice.
Practical aspects of evaluating a data science problem are addressed later, with techniques defined for researching solutions, formulating a hypothesis, data preperation, feature creation, binning and discretization, leading up to vectors and matrix manipulation, and visualization with MatPlotLib. Even though the book does not discuss theano, DL4J, Torch, Caffe or TensorFlow, it still provides an introduction to key python ML library Scikit‐learn. This 400 page book also covers key topics like SVD, PCA, NMF, Recommendation systems, Clustering, Detecting Outliers, logistic Regression, Naive Bayes, Fitting a Model, bias and variance, Support Vector Machines, and Random Forest classifiers to name a few. The resources provided in the end are definitely worth subscribing to for every self-respecting data science enthusiast.

I highly recommend this book for those beginners interested in data science and also want to learn and leverage Python skills for this rapidly emerging field.

Share