Research & Development

Getting Started Guide for Python, Data Science and Machine Learning for wanna be practitioners & hobbyists


A friend recently asked about a good book on data science, with python. I have been trying to go through the current landscape of books for a course I am teaching this summer, so here is my recommendation. There are several other good ones out there, but this one quite fits the newbie bill. So without further ado


Are you a Beginner who would like to learn python, in context with a specific area, and tired of using syntax focused books sans practical examples?


Are you exploring data science landscape and want to see practical examples of how to actually use machine learning algorithms in data science context?

If you answer in the affirmative to either of the questions above, "Python for data science for dummies" is the perfect book for you. Luca Massaron is a practicing data scientist, and a prolific author of several books including Regression Analysis with Python , Machine Learning For Dummies, Python Data Science Essentials, Regression Analysis with Python, and Large Scale Machine Learning with Python. He is also a leading Kaggle enthusiast, and you can see his 'practitioner fingerprints' all over this book; especially in later chapters about data processing, ETL, cleanup, data sources, and challenges.

Python for Data Science For Dummies


This book starts with the fundamentals of Python data analysis programming, and explains the setup of Python development environment using anaconda with IPython (Jupyter notebooks). Authors start by considering the emergence of data science, outline the core competencies of a data scientist, and describe the Data Science Pipeline before taking a plunge into explaining Python’s Role in Data Science and introducing Python’s Capabilities and Wonders.

Once you get your bearings about the IDE setup, chapter 4 focuses on Basic Python before you get your Hands Dirty with Data. What I like about this manuscript is that the writing keeps it real. Instead of giving made up examples, authors talk about items like knowing when to use NumPy or pandas and real world scenarios like removing duplicates, creating a data map and data plan, dealing with Dates in Your Data, Dealing with Missing data, parsing etc; problems which practicing data scientists encounter on a daily basis.

Contemporary topics like Text mining are also addressed in the book with enough details of topics such as working with Raw Text, Stemming and removing stop words, Bag of Words Model and Beyond, Working with n‐grams, Implementing TF‐IDF transformations, and adjacency matrix handling. This is also where you start getting a basic understanding of how machine learning algorithms work in practice.
Practical aspects of evaluating a data science problem are addressed later, with techniques defined for researching solutions, formulating a hypothesis, data preperation, feature creation, binning and discretization, leading up to vectors and matrix manipulation, and visualization with MatPlotLib. Even though the book does not discuss theano, DL4J, Torch, Caffe or TensorFlow, it still provides an introduction to key python ML library Scikit‐learn. This 400 page book also covers key topics like SVD, PCA, NMF, Recommendation systems, Clustering, Detecting Outliers, logistic Regression, Naive Bayes, Fitting a Model, bias and variance, Support Vector Machines, and Random Forest classifiers to name a few. The resources provided in the end are definitely worth subscribing to for every self-respecting data science enthusiast.

I highly recommend this book for those beginners interested in data science and also want to learn and leverage Python skills for this rapidly emerging field.


Notes from Data Science with Azure Talk @ TAP


I recently spoke at the Tampa Analytics Group, a Microsoft recognized Data Science group ran by Joe Blankenshi­p on the topic of Data Science with Azure. The talk focused on Azure offerings, with a demo on how to write a map-redcuce job in Azure using C#. Following are the slides.


TAP Standard Meeting

Monday, Apr 11, 2016, 5:30 PM

4301 W Boy Scout Blvd #590 Tampa, FL

15 Data Scientists Went

Tampa Analytics Professionals!The intent of this meeting is to bring together people in the Tampa Bay and surrounding areas involved or interested in data science or related professions.This week we have Dr. Adnan Masood talking to us about:Hadoop in the Azure Cloud - A Hands On GuideTEKSystems has graciously provided the space as well as food ...

Check out this Meetup →


BS2000 - List of Earned Doctorates & Dissertations

My undergrad class from Dept. Of Computer Science, UBIT consisted of an amazing group of people.  We have kept life long friendships and connections across continents. One of my colleagues just successfully defended his dissertation so I decided to compile a list of all the PhDs from my class. Here it is, the best class of DCS! 🙂


Mixed transfer function neural networks for generalization and knowledge extraction

M. Imad Khan - Deakin University. Australia.


A Framework for Provenance System in Open Distributed Environment

Syed Imran Jami - National University of Computer & Emerging Sciences, Pakistan.


Measuring Interestingness in Outliers with Explanation Facility using Belief Networks

Adnan Masood - Nova Southeastern University, USA.


Team Learning from Demonstration - A Framework to Build Collaboration in a Team of Agents via Imitation

Syeda Saleha Raza - Institute of Business Administration, Pakistan

Improved Data Mining in distributed environments

Muhammad Saeed - University of Karachi, Pakistan


Enhanced Method Call Tree for Comprehensive Detection of Symptoms of Cross Cutting Concerns

Saleem Mir - Nova Southeastern University, USA
(Dissertation - Not yet publicly available)

"The Five Tribes of Machine Learning (And What You Can Learn from Each)," Pedro Domingos


Introducing Pico-Services Architecture

Abstract The term "Microservice Architecture" has sprung up over the last few years to describe a 'particular' way of designing software applications. Like every new industry FADSynonym for SOA, BPEL, WADL, etc, including but not limited to Service Oriented Architecture (SOA), Microservices architecture has no precise definition, and it follows the “it depends” school of ivory tower software design. Following this prevalent and ubiquitous architectural style, we introduce a novel architectural design pattern called (pico) p-Services Architecture. Our architectural pattern addresses the prevelant characteristics around organizations such as maintaining and increasing technical debt, forming silos to decrease business capability, not leveraging automated deployment, ensuring lack of intelligence in the endpoints, and centralizedReviewer 2 thinks bottle neck sounds too pejoritive. control of languages and data. Following are the tenets of the pico services architecture to help redirect focus away from the minor problems in enterprise distributed computing such as compliance, security, scalability, and fragmentation.

Download: Pico Services Architecture


Machine Learning for Big Data and Text Processing - Short Programs Testimonials

Adnan Masood MIT


The five Tribes of Machine Learning, and other algorithmic tales

Pedro Domingos' The Master Algorithm - How the Quest for the Ultimate Learning Machine Will Remake Our World is an interesting and thought provoking book about the state of machine learning, data science, and artificial intelligence.



Categorizing,  classifying and clearly representing the ideas around any rapidly developing/evolving field is hard job. Machine learning with its multi-faceted approaches and ubiquitous implementation is an especially challenging topic. To write about it in a comprehensive yet easily understandable (aka non-jargon-ridden-hand-waving) way is definitely a great accomplishment.

One thing I really enjoyed about this writing is how the ML taxonomy and classification works; even for the people who have been in industry for a while, it is hard to create such meaningful distinctions and clusters around ideas.

“Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, “and the analogizers’ is the support vector machine. In practice, however, each of these algorithms is good for some things but not others. What we really want is a single algorithm combining the key features of all of them: the ultimate master algorithm. For some this is an unattainable dream, but for many of us in machine learning, it’s what puts a twinkle in our eye and keeps us working late into the night.”


Starting with the question of are you rationalist or an empiricist, and extended this analogy to five tribes of machine, author has also challenged the notion of "intelligence" in a very direct manner against. By stating that this skeptical knowledge engineer's dogma that AI cannot "beat" humans is based on an 'archaic' Minsky/Chomsky school of thought; the variants of '“poverty of the stimulus" arguments are irrelevant for all practical intents and purposes. The outstanding success of deep learning is a proof to the contrary. Author has answered most of the 'usual' argumentum ad logicam in chapter 2 in which he paraphrase that the proof is in the pudding. From autonomous vehicles to sentiment analysis, Machine Learning / Statistical learners work, and hand-engineered expert systems with human experts don’t scale;


...learning-based methods have swept the field, to the point where it’s hard to find a paper devoid of learning. Statistical parsers analyze language with accuracy close to that of humans, where hand-coded ones lagged far behind. Machine translation, spelling correction, part-of-speech tagging, word sense disambiguation, question answering, dialogue, summarization: the best systems in these areas all use learning. Watson, the Jeopardy! computer champion, would not have been possible without it.

The book further elaborates by stating what author intuitively know (pun intended) as a frequently heard objection

...“Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data. Intuition is what you use when you don’t know the facts, and since you often don’t, intuition is precious. But when the evidence is before you, why would you deny it? Statistical analysis beats talent scouts in baseball (as Michael Lewis memorably documented in Moneyball), it beats connoisseurs at tasting, and every day we see new examples of what it can do. Because of the influx of data, the boundary between evidence and intuition is shifting rapidly, and as with any revolution, entrenched ways have to be overcome. If I’m the expert on X at company Y, I don’t like to be overridden by some guy with data. There’s a saying in industry: “Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.” If you want to be tomorrow’s authority, ride the data, don’t fight it.

and of course the eureka!  argument doesn't escape his criticism

And some may say, machine learning can find statistical regularities in data, but it will never discover anything deep, like Newton’s laws. It arguably hasn’t yet, but I bet it will. Stories of falling apples notwithstanding, deep scientific truths are “not low-hanging fruit. Science goes through three phases, which we can call the Brahe, Kepler, and Newton phases. In the Brahe phase, we gather lots of data, like Tycho Brahe patiently recording the positions of the planets night after night, year after year. In the Kepler phase, we fit empirical laws to the data, like Kepler did to the planets’ motions. In the Newton phase, we discover the deeper truths. Most science consists of Brahe- and Kepler-like work; Newton moments are rare. Today, big data does the work of billions of Brahes, and machine learning the work of millions of Keplers. If—let’s hope so—there are more Newton moments to be had, they are as likely to come from tomorrow’s learning algorithms as from tomorrow’s even more overwhelmed scientists, or at least from a combination of the two.

Whether you agree with the author's point of view or not, this is one of the best "big picture" reading on the state of machine learning and AI which will help you understand how things may shape up to be (or not) in next computing revolution.


On Explainability of Deep Neural Networks

During a discussion yesterday with software architect extraordinaire David Lazar regarding how everything old is new again, the topic of deep neural networks and its amazing success was brought up. Unless one is living under a rock for past five years, the advancements in artificial neural networks (ANN) has been quite significant and noteworthy. Since the thaw of AI winter, the frowned-upon wave has come a long way to be a successful and relied upon technique in multiple problem spaces. From an interesting apocryphal which sums up the state of ANN back in the day to its current state of ConvNets with Google Translate squeezing deep learning onto a phone, there has been significant progress made. We all have seen the dreamy images of Inceptionism: Going Deeper into Neural Network with great results in image classification and speech recognition while fine tuning network parameters. Beyond the classical feats of Reading Digits in Natural Images with Unsupervised Feature Learning Deep Neural Networks (DNNs) have shown outstanding performance on image classification tasks. We now have excellent results on MNIST, Imagenet classification with deep convolutional neural networks, and effective use of Deep Neural Networks for Object Detection.

Otavio Good of Google puts it quite well,

Five years ago, if you gave a computer an image of a cat or a dog, it had trouble telling which was which. Thanks to convolutional neural networks, not only can computers tell the difference between cats and dogs, they can even recognize different breeds of dogs.

Geoffrey Hinton et al noted that

Best system in 2010 competition got 47% error for its first choice and 25% error for its top 5 choices. A very deep neural net (Krizhevsky et. al. 2012) gets less than 40% error for its first choice and less than 20% for its top 5 choices


Courtesy: XKCD and


So with all this fanfare, what could possibly go wrong?

In deep learning systems where both the classifiers and the features are learned automatically, neural networks possess a grey side, the explain-ability problem.

Explain-ability and determinism in ML systems is a larger discussion, but limiting the scope to stay within the context of neural nets when you see the Unreasonable Effectiveness of Recurrent Neural Networks, it is important to pause and ponder, why does it work? Is it good enough that I can peek into this black-box by getting strategic heuristics out of the network, or infer the concept of cat from a trained neural network by Building High-level Features Using Large Scale Unsupervised Learning? Does it make it a ‘grey-box’ if we can figure out word embedding extractions from the network in high dimensional space, and therefore exploit similarities among languages for machine translation? The very idea of this non deterministic nature is problematic; as in context of how you choose the initial parameters such as starting point for gradient descent when training the back-propagation being of key importance. How about retain-ability? The imperviousness makes troubleshooting harder to say the least.

If you haven’t noticed, I am trying hard not make this a pop-science alarmist post but here is the leap I am going to take; that the relative lack of explain-ability and transparency inherent in the neural networks (and research community’s relative complacency towards the approach ‘because it just works’), this idea of black-boxed-intelligence is probably what may lead to larger issues identified by Gates, Hawking, and Musk. I would be the first one to state that this argument might be a stretch or over generalization of the shortcomings of a specific technique to create the doomsday scenario, and we might be able to ‘decrypt’ the sigmoid and all these fears will go away. However, my fundamental argument stays; if the technique isn’t quite as explainable, and with the ML proliferation as we have today, the unintended consequences might be too real to ignore.

With the ensemble of strong AI from weak AI, the concern towards explain-ability enlarges. There is no denying that it can be challenging to understand what a neural network is really doing under those layers approximating functions. For a happy path scenario when a network is trained well, we have seen repeatedly that it does achieve high quality results. However, it is still perplexing to comprehend the underpinnings as to how it is doing so? Even more alarmingly, if the network fails, it is hard to understand what went wrong. Can we really shrug off the skeptics fearful about the dangers that seemingly sentient Artificial Intelligence (AI) poses. As Bill Gates said articulately (practically refuting Eric Horvitz's position)

I am in the camp that is concerned about super intelligence. First the machines will do a lot of jobs for us and not be super intelligent. That should be positive if we manage it well. A few decades after that though the intelligence is strong enough to be a concern. I agree with Elon Musk and some others on this and don’t understand why some people are not concerned.

The non-deterministic nature of a technique like neural network pose a larger concerns in terms of understanding the confidence of the classifier? The convergence of a neural network isn’t really clear but alternatively for SVM, it’s fairly trivial to validate.  Depicting the approximation of an ‘undocumented’ function as a black-box is most probably a fundamentally flawed idea in itself. If we equate this with the biological thought process, the signals and the corresponding trained behavior, we have an expected output based on the training set as an observer. However, in the non-identifiable model, the approximation provided by the neural network is fairly impenetrable for all intents and purposes.

I don’t think anyone with deep understanding of AI and machine learning is really worried about Skynet, at this point. Like Andrew Ng said

Fearing a rise of killer robots is like worrying about overpopulation on Mars.

The concern is more about adhering to “but it works!” aka If-I-fits-I-sits approach (the mandatory cat meme goes here).

If it fits I sits


The sociological challenges associated with self-driving trucks, taxis, delivery people and employment are real but these are regulatory issues. The key issue lies in the heart of the technology and our understanding of its internals. Stanford's Katie Malone said it quite well in linear digressions episode on Neural Nets

Even though it sounds like common sense that we would like to have controls in place where automation should not be allowed to engage targets without human intervention, and luminaries like Hawking, Musk and Wozniak would like to Ban autonomous weapons, urging AI experts, our default reliance on black-box approaches may make this nothing more than wishful thinking. As Stephen Hawking said

“The primitive forms of artificial intelligence we already have, have proved very useful. But I think the development of full artificial intelligence could spell the end of the human race. Once humans develop artificial intelligence it would take off on its own and redesign itself at an ever-increasing rate. Humans, who are limited by slow biological evolution, couldn’t compete and would be superseded.”

It might be fair to say that since we don’t completely understand a new technique, it makes us afraid (of change), and will be adapted as the research moves forward. As great as the results are, for non-black box models or interpretable models such as regression (closed form approximation) and decision trees / belief nets (graphical representations of deterministic and probabilistic beliefs) there is the comfort of determinism and understanding. We know today that smaller changes in NN can lead to significant changes as one of the “Intriguing” properties of neural networks. In their paper, authors demonstrated that small changes can cause larger issues

We find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error….

We demonstrated that deep neural networks have counter-intuitive properties both with respect to the semantic meaning of individual units and with respect to their discontinuities.



The existence of the adversarial negatives appears to be in contradiction with the network’s ability to achieve high generalization performance. Indeed, if the network can generalize well, how can it be confused by these adversarial negatives, which are indistinguishable from the regular examples? Possible explanation is that the set of adversarial negatives is of extremely low probability….. However, we don’t have a deep understanding of how often adversarial negatives appears…

Let’s be clear that when we discuss the black-box nature of ANN, we are not talking about Single-unit perceptron only being capable of learning linearly separable patterns (Minsky et al, 69). It is well established that XOR functions inability to learn in single layer networks does not extend to multi-layer perceptron (MLP). Convolutional Neural Networks (CNN) are therefore a working proof to the contrary; the biologically-inspired variants of MLPs with the explicit assumption that the input comprises of images hence certain properties can be embedded into the architecture. The point here is against the rapid adaption of a technique which is black-box in nature with greater computational burden, inherent non-determinism, and over-fitting proneness over its “better” counterparts. To paraphrase Jitendra Malik without being an NN skeptic, there is no reason that multi-layer random forests or SVM cannot achieve the same results. During AI winter we made ANN pariah, aren’t we repeating the same mistake with other techniques now?

Recently Elon Musk has tweeted

Worth reading Superintelligence by Bostrom. We need to be super careful with AI. Potentially more dangerous than nukes.

And even though things might not be so bad right now, let’s conclude this with the following quote from Michael Jordan from IEEE spectrum.

Sometimes those go beyond where the achievements actually are. Specifically on the topic of deep learning, it’s largely a rebranding of neural networks, which go back to the 1980s. … In the current wave, the main success story is the convolutional neural network, but that idea was already present in the previous wave. And one of the problems … is that people continue to infer that something involving neuroscience is behind it, and that deep learning is taking advantage of an understanding of how the brain processes information, learns, makes decisions, or copes with large amounts of data. And that is just patently false.

Now this also leaves the other fundamental question is that if the pseudo-mimicry of biological neural nets actually a good approach to emulate intelligence? Or may be Noam Chomsky on Where Artificial Intelligence Went Wrong?

That we will talk about some other day.




Learning F# Functional Data Structures and Algorithms is Out!

الحمد للہ رب العالمین

Wondering what to do on 4th of July long weekend? Learn Functional Programming in F# with my book!

I am glad to inform that my book on Learning F# Functional Data Structures and Algorithms is published, and is now available via Amazon and other retailers. F# is a multi-paradigm programming language that encompasses object-oriented, imperative, and functional programming language properties. The F# functional programming language enables developers to write simple code to solve complex problems.

Learning F# Functional Data Structures and Algorithms - Adnan Masood PhD

Starting with the fundamental concepts of F# and functional programming, this book will walk you through basic problems, helping you to write functional and maintainable code. Using easy-to-understand examples, you will learn how to design data structures and algorithms in F# and apply these concepts in real-life projects. The book will cover built-in data structures and take you through enumerations and sequences. You will gain knowledge about stacks, graph-related algorithms, and implementations of binary trees. Next, you will understand the custom functional implementation of a queue, review sets and maps, and explore the implementation of a vector. Finally, you will find resources and references that will give you a comprehensive overview of F# ecosystem, helping you to go beyond the fundamentals.

If you have just started your adventure with F#, then this book will help you take the right steps to become a successful F# coder. An intermediate knowledge of imperative programming concepts, and a basic understanding of the algorithms and data structures in .NET environments using the C# language and BCL (Base Class Library), would be helpful.

With detailed technical and editorial reviews, it is a long process to write a technology book, but equally rewarding and unique learning experience. I am thankful to my technical reviewer, and Packt editorial team to provide excellent support to make this a better book. Nothing is perfect and to err is human; if you find any issues in the code or text, please let me know.

Learning F# Functional Data Structures and Algorithms - Get it via Amazon

Learning F# Functional Data Structures and Algorithms - Get it via Google Books

Learning F# Functional Data Structures and Algorithms - Get it via Packt

Happy Functional Programming!

The source code for the book can be downloaded from here.



Visualizing Decision Boundaries for Deep Learning

Decision boundary is the region of a problem space in which the output label of a classifier is ambiguous. In this concise yet informative article Dr. Takashi J Ozaki outlines decision boundaries for deep learning and other Machine Learning classifiers and emphasize on parameter tuning for Deep Learning.

The source code for this article is on github, and he uses H2O, one of the leading deep learning framework in python, is now also available in R.



Deep Learning – Getting Started - important resources for learning and understanding

Go to Top