FIRST (For Inspiration and Recognition of Science and Technology) helps students build STEM skills, confidence, teamwork, public speaking skills, and make connections with professionals. I am glad to be part of this effort as a volunteer Coach and mentor for both the First Lego League Jr, and First Lego League Championship. Had to work quite hard to earn this one, but definitely worth it.
The mission of FIRST is to inspire young people to be science and technology leaders, by engaging them in exciting Mentor-based programs that build science, engineering, and technology skills, that inspire innovation, and that foster well-rounded life capabilities including self-confidence, communication, and leadership.
For young people in your community, signup here.
Getting Started Guide for Python, Data Science and Machine Learning for wanna be practitioners & hobbyists0
A friend recently asked about a good book on data science, with python. I have been trying to go through the current landscape of books for a course I am teaching this summer, so here is my recommendation. There are several other good ones out there, but this one quite fits the newbie bill. So without further ado
Are you a Beginner who would like to learn python, in context with a specific area, and tired of using syntax focused books sans practical examples?
Are you exploring data science landscape and want to see practical examples of how to actually use machine learning algorithms in data science context?
If you answer in the affirmative to either of the questions above, "Python for data science for dummies" is the perfect book for you. Luca Massaron is a practicing data scientist, and a prolific author of several books including Regression Analysis with Python , Machine Learning For Dummies, Python Data Science Essentials, Regression Analysis with Python, and Large Scale Machine Learning with Python. He is also a leading Kaggle enthusiast, and you can see his 'practitioner fingerprints' all over this book; especially in later chapters about data processing, ETL, cleanup, data sources, and challenges.
This book starts with the fundamentals of Python data analysis programming, and explains the setup of Python development environment using anaconda with IPython (Jupyter notebooks). Authors start by considering the emergence of data science, outline the core competencies of a data scientist, and describe the Data Science Pipeline before taking a plunge into explaining Python’s Role in Data Science and introducing Python’s Capabilities and Wonders.
Once you get your bearings about the IDE setup, chapter 4 focuses on Basic Python before you get your Hands Dirty with Data. What I like about this manuscript is that the writing keeps it real. Instead of giving made up examples, authors talk about items like knowing when to use NumPy or pandas and real world scenarios like removing duplicates, creating a data map and data plan, dealing with Dates in Your Data, Dealing with Missing data, parsing etc; problems which practicing data scientists encounter on a daily basis.
Contemporary topics like Text mining are also addressed in the book with enough details of topics such as working with Raw Text, Stemming and removing stop words, Bag of Words Model and Beyond, Working with n‐grams, Implementing TF‐IDF transformations, and adjacency matrix handling. This is also where you start getting a basic understanding of how machine learning algorithms work in practice.
Practical aspects of evaluating a data science problem are addressed later, with techniques defined for researching solutions, formulating a hypothesis, data preperation, feature creation, binning and discretization, leading up to vectors and matrix manipulation, and visualization with MatPlotLib. Even though the book does not discuss theano, DL4J, Torch, Caffe or TensorFlow, it still provides an introduction to key python ML library Scikit‐learn. This 400 page book also covers key topics like SVD, PCA, NMF, Recommendation systems, Clustering, Detecting Outliers, logistic Regression, Naive Bayes, Fitting a Model, bias and variance, Support Vector Machines, and Random Forest classifiers to name a few. The resources provided in the end are definitely worth subscribing to for every self-respecting data science enthusiast.
I highly recommend this book for those beginners interested in data science and also want to learn and leverage Python skills for this rapidly emerging field.
I recently spoke at the Tampa Analytics Group, a Microsoft recognized Data Science group ran by Joe Blankenship on the topic of Data Science with Azure. The talk focused on Azure offerings, with a demo on how to write a map-redcuce job in Azure using C#. Following are the slides.
TAP Standard Meeting
Monday, Apr 11, 2016, 5:30 PM
4301 W Boy Scout Blvd #590 Tampa, FL
15 Data Scientists Went
Tampa Analytics Professionals!The intent of this meeting is to bring together people in the Tampa Bay and surrounding areas involved or interested in data science or related professions.This week we have Dr. Adnan Masood talking to us about:Hadoop in the Azure Cloud - A Hands On GuideTEKSystems has graciously provided the space as well as food ...
Last Saturday I attended Global Azure Bootcamp which was held in Microsoft Tampa office, hosted by Blain Barton of Microsoft. With presentations from Dan Patrick of Opsigility, Alex Melching and Blain himself, Azure bootcamp was a great primer/refresher to the Azure offerings. I especially enjoyed the Post-Build-conference offerings, announcements and demos pertaining to the new Azure-AWS parity. Great stuff!
Here are Some Random Notes/Links from the Talk!
- Azure Quickstart Templates (must see!)
- What is Azure Resource Manager
- Microsoft OMS - https://www.microsoft.com/en-us/server-cloud/operations-management-suite/overview.aspx
- Windows PowerShell Desired State Configuration Overview - https://msdn.microsoft.com/en-us/powershell/dsc/overview
- VorlonJS – A Journey to DevOps: Infrastructure as Code with Microsoft Azure and Resource Manager https://blogs.technet.microsoft.com/devops/2016/01/27/vorlonjs-a-journey-to-devops-infrastructure-as-code-with-microsoft-azure-and-resource-manager/
- Azure Readiness - https://github.com/Azure-Readiness/HOL-Intro-to-Azure
- AWS Direct Connect - https://aws.amazon.com/directconnect/
- Secure Cloud Interconnect - http://www.verizonenterprise.com/products/networking/secure-cloud-interconnect/
- Peak 10 http://www.peak10.com/
- Cloud Providers - http://callibt.com/cloud-providers/
- Creating and deploying Azure resource groups through Visual Studio - https://azure.microsoft.com/en-us/documentation/articles/vs-azure-tools-resource-groups-deployment-projects-create-deploy/
- Docker Swarm - https://github.com/Azure/azure-quickstart-templates/tree/master/docker-swarm-cluster
- ExpressRoute: Connecting Private and Public Clouds through - video.ch9.ms/sessions/teched/na/2014/DCIM-B423.pptx
My undergrad class from Dept. Of Computer Science, UBIT consisted of an amazing group of people. We have kept life long friendships and connections across continents. One of my colleagues just successfully defended his dissertation so I decided to compile a list of all the PhDs from my class. Here it is, the best class of DCS! 🙂
Mixed transfer function neural networks for generalization and knowledge extraction
A Framework for Provenance System in Open Distributed Environment
Measuring Interestingness in Outliers with Explanation Facility using Belief Networks
Team Learning from Demonstration - A Framework to Build Collaboration in a Team of Agents via Imitation
Improved Data Mining in distributed environments
Enhanced Method Call Tree for Comprehensive Detection of Symptoms of Cross Cutting Concerns
Download: Pico Services Architecture
Links from the talk and other labs:
ETL in Hortonworks Sandbox on Azure.
Hortonworks Sandbox virtual machine from the Microsoft Azure Marketplace and an Azure SQL Database sample. Extract data from the Azure SQL Database into a Hive table and query the data from Hive.
Hortonworks Data Platform by Hortonworks
CHEF - Deployment of Chef on Microsoft Azure
Pedro Domingos' The Master Algorithm - How the Quest for the Ultimate Learning Machine Will Remake Our World is an interesting and thought provoking book about the state of machine learning, data science, and artificial intelligence.
Categorizing, classifying and clearly representing the ideas around any rapidly developing/evolving field is hard job. Machine learning with its multi-faceted approaches and ubiquitous implementation is an especially challenging topic. To write about it in a comprehensive yet easily understandable (aka non-jargon-ridden-hand-waving) way is definitely a great accomplishment.
One thing I really enjoyed about this writing is how the ML taxonomy and classification works; even for the people who have been in industry for a while, it is hard to create such meaningful distinctions and clusters around ideas.
“Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, “and the analogizers’ is the support vector machine. In practice, however, each of these algorithms is good for some things but not others. What we really want is a single algorithm combining the key features of all of them: the ultimate master algorithm. For some this is an unattainable dream, but for many of us in machine learning, it’s what puts a twinkle in our eye and keeps us working late into the night.”
Starting with the question of are you rationalist or an empiricist, and extended this analogy to five tribes of machine, author has also challenged the notion of "intelligence" in a very direct manner against. By stating that this skeptical knowledge engineer's dogma that AI cannot "beat" humans is based on an 'archaic' Minsky/Chomsky school of thought; the variants of '“poverty of the stimulus" arguments are irrelevant for all practical intents and purposes. The outstanding success of deep learning is a proof to the contrary. Author has answered most of the 'usual' argumentum ad logicam in chapter 2 in which he paraphrase that the proof is in the pudding. From autonomous vehicles to sentiment analysis, Machine Learning / Statistical learners work, and hand-engineered expert systems with human experts don’t scale;
...learning-based methods have swept the field, to the point where it’s hard to find a paper devoid of learning. Statistical parsers analyze language with accuracy close to that of humans, where hand-coded ones lagged far behind. Machine translation, spelling correction, part-of-speech tagging, word sense disambiguation, question answering, dialogue, summarization: the best systems in these areas all use learning. Watson, the Jeopardy! computer champion, would not have been possible without it.
The book further elaborates by stating what author intuitively know (pun intended) as a frequently heard objection
...“Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data. Intuition is what you use when you don’t know the facts, and since you often don’t, intuition is precious. But when the evidence is before you, why would you deny it? Statistical analysis beats talent scouts in baseball (as Michael Lewis memorably documented in Moneyball), it beats connoisseurs at tasting, and every day we see new examples of what it can do. Because of the influx of data, the boundary between evidence and intuition is shifting rapidly, and as with any revolution, entrenched ways have to be overcome. If I’m the expert on X at company Y, I don’t like to be overridden by some guy with data. There’s a saying in industry: “Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.” If you want to be tomorrow’s authority, ride the data, don’t fight it.
and of course the eureka! argument doesn't escape his criticism
And some may say, machine learning can find statistical regularities in data, but it will never discover anything deep, like Newton’s laws. It arguably hasn’t yet, but I bet it will. Stories of falling apples notwithstanding, deep scientific truths are “not low-hanging fruit. Science goes through three phases, which we can call the Brahe, Kepler, and Newton phases. In the Brahe phase, we gather lots of data, like Tycho Brahe patiently recording the positions of the planets night after night, year after year. In the Kepler phase, we fit empirical laws to the data, like Kepler did to the planets’ motions. In the Newton phase, we discover the deeper truths. Most science consists of Brahe- and Kepler-like work; Newton moments are rare. Today, big data does the work of billions of Brahes, and machine learning the work of millions of Keplers. If—let’s hope so—there are more Newton moments to be had, they are as likely to come from tomorrow’s learning algorithms as from tomorrow’s even more overwhelmed scientists, or at least from a combination of the two.
Whether you agree with the author's point of view or not, this is one of the best "big picture" reading on the state of machine learning and AI which will help you understand how things may shape up to be (or not) in next computing revolution.