Programming

Getting Started Guide for Python, Data Science and Machine Learning for wanna be practitioners & hobbyists

0

A friend recently asked about a good book on data science, with python. I have been trying to go through the current landscape of books for a course I am teaching this summer, so here is my recommendation. There are several other good ones out there, but this one quite fits the newbie bill. So without further ado

 

Are you a Beginner who would like to learn python, in context with a specific area, and tired of using syntax focused books sans practical examples?

OR

Are you exploring data science landscape and want to see practical examples of how to actually use machine learning algorithms in data science context?

If you answer in the affirmative to either of the questions above, "Python for data science for dummies" is the perfect book for you. Luca Massaron is a practicing data scientist, and a prolific author of several books including Regression Analysis with Python , Machine Learning For Dummies, Python Data Science Essentials, Regression Analysis with Python, and Large Scale Machine Learning with Python. He is also a leading Kaggle enthusiast, and you can see his 'practitioner fingerprints' all over this book; especially in later chapters about data processing, ETL, cleanup, data sources, and challenges.

Python for Data Science For Dummies

 

This book starts with the fundamentals of Python data analysis programming, and explains the setup of Python development environment using anaconda with IPython (Jupyter notebooks). Authors start by considering the emergence of data science, outline the core competencies of a data scientist, and describe the Data Science Pipeline before taking a plunge into explaining Python’s Role in Data Science and introducing Python’s Capabilities and Wonders.

Once you get your bearings about the IDE setup, chapter 4 focuses on Basic Python before you get your Hands Dirty with Data. What I like about this manuscript is that the writing keeps it real. Instead of giving made up examples, authors talk about items like knowing when to use NumPy or pandas and real world scenarios like removing duplicates, creating a data map and data plan, dealing with Dates in Your Data, Dealing with Missing data, parsing etc; problems which practicing data scientists encounter on a daily basis.

Contemporary topics like Text mining are also addressed in the book with enough details of topics such as working with Raw Text, Stemming and removing stop words, Bag of Words Model and Beyond, Working with n‐grams, Implementing TF‐IDF transformations, and adjacency matrix handling. This is also where you start getting a basic understanding of how machine learning algorithms work in practice.
Practical aspects of evaluating a data science problem are addressed later, with techniques defined for researching solutions, formulating a hypothesis, data preperation, feature creation, binning and discretization, leading up to vectors and matrix manipulation, and visualization with MatPlotLib. Even though the book does not discuss theano, DL4J, Torch, Caffe or TensorFlow, it still provides an introduction to key python ML library Scikit‐learn. This 400 page book also covers key topics like SVD, PCA, NMF, Recommendation systems, Clustering, Detecting Outliers, logistic Regression, Naive Bayes, Fitting a Model, bias and variance, Support Vector Machines, and Random Forest classifiers to name a few. The resources provided in the end are definitely worth subscribing to for every self-respecting data science enthusiast.

I highly recommend this book for those beginners interested in data science and also want to learn and leverage Python skills for this rapidly emerging field.

Share

Notes from Azure Global Bootcamp

0

Last Saturday I attended Global Azure Bootcamp which was held in Microsoft Tampa office, hosted by Blain Barton of Microsoft. With presentations from Dan Patrick of Opsigility, Alex Melching and Blain himself, Azure bootcamp was a great primer/refresher to the Azure offerings. I especially enjoyed the Post-Build-conference offerings, announcements and demos pertaining to the new Azure-AWS parity. Great stuff!

Here are Some Random Notes/Links from the Talk!

  • Azure Quickstart Templates (must see!)
    • https://azure.microsoft.com/en-us/documentation/templates/
    • https://github.com/Azure/azure-quickstart-templates
  • What is Azure Resource Manager
  • Microsoft OMS - https://www.microsoft.com/en-us/server-cloud/operations-management-suite/overview.aspx
  • Windows PowerShell Desired State Configuration Overview - https://msdn.microsoft.com/en-us/powershell/dsc/overview
  • VorlonJS – A Journey to DevOps: Infrastructure as Code with Microsoft Azure and Resource Manager https://blogs.technet.microsoft.com/devops/2016/01/27/vorlonjs-a-journey-to-devops-infrastructure-as-code-with-microsoft-azure-and-resource-manager/
  • Azure Readiness - https://github.com/Azure-Readiness/HOL-Intro-to-Azure
  • AWS Direct Connect -  https://aws.amazon.com/directconnect/
  • Secure Cloud Interconnect - http://www.verizonenterprise.com/products/networking/secure-cloud-interconnect/
  • Peak 10 http://www.peak10.com/
  • Cloud Providers - http://callibt.com/cloud-providers/
  • Creating and deploying Azure resource groups through Visual Studio - https://azure.microsoft.com/en-us/documentation/articles/vs-azure-tools-resource-groups-deployment-projects-create-deploy/
  •  Docker Swarm - https://github.com/Azure/azure-quickstart-templates/tree/master/docker-swarm-cluster
  •  ExpressRoute: Connecting Private and Public Clouds through - video.ch9.ms/sessions/teched/na/2014/DCIM-B423.pptx
Share

Machine Learning for Big Data and Text Processing - Short Programs Testimonials

Adnan Masood MIT

Share

Troubleshooting Tip - Service cannot be started. System.TypeLoadException: Could not load type 'type name' from assembly

Recently I ran into this bizarre issue while developing a windows service, and thought it would be great to share the remedy on interwebs to save others some time and pain.

The problem usually starts when you have a windows service project, with possibly other associated class library projects as part of the solution. Everything worked fine, your unit tests still run, and your console tester (recommended to have with a windows service for stepping thru / debugging) also still works. However, your service stops working. Install util works ok, however upon net start, when the service is started it immediately stops.  The Event Log says following.

 

Service cannot be started. System.TypeLoadException: Could not load type '' from assembly '', Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.
   at nova.edu.MyService.OnStart(String[] args)
   at System.ServiceProcess.ServiceBase.ServiceQueuedMainCallback(Object state)

You check for diffs, run it thru tests, console app etc, and wonder what could possibly be wrong. Its because of namespaces.

Somewhere in your solution, you have a class, outside of your service project, which refers to the same namespace as the service namespace.

So how do you resolve this issue?

Step 1. Locate the assembly where there is a "duplicate" reference; so for instance your type library; see the matching namespace and be careful NOT to do replace all via search and replace.

Service cannot be started. System.TypeLoadException Could not load type 'type name' from assembly -1

Step 2. Rename the namespace appropriately.

Service cannot be started. System.TypeLoadException Could not load type 'type name' from assembly -2

 

Tada! Service cannot be started. System.TypeLoadException: Could not load type '' from assembly should be gone.

Share

Notes from Tampa //rebuild/ Event

Tampa //rebuild/ Event via Microsoft Mondays

Tuesday, Aug 18, 2015, 6:30 PM

Microsoft Office
5426 Bay Center Dr tampa, FL

29 App Developers Went

Tampa //rebuild/ Event via Microsoft Mondays Catch up on some of what you missed at the Microsoft //build/ Conference! Join Randy Patterson with Catapult Systems, Donald Bickel with Mercury New Media and others as we take a deep dive into topics covered at the conference.Agenda• Welcome and Introduction• Microsoft Edge has redefined itself! Lea...

Check out this Meetup →

Tampa //rebuild/ Event at Microsoft office comprised of 3 lightening talks on Microsoft Edge Browser, ASP.NET 5, and IoT with Raspberry Pi and Windows 10. Some pictures and links from the talk follows.

MS Edge / UI/UX Talk 

  • David Walsh's Blog
  • Code Pen - http://codepen.io/
  • Can I use -  http://caniuse.com/
  • wufoo - http://www.wufoo.com/
  • flight arcade - http://flightarcade.com/missions/tin
  • HTML5 hub - http://html5hub.com/
  • JetStream Benchmark Suite -  https://www.webkit.org/blog/3418/introducing-the-jetstream-benchmark-suite/
  • Octane benchmark suite - https://developers.google.com/octane/?hl=en
  • Octopus Deployment - https://octopusdeploy.com/
  • Modern IE - http://dev.modern.ie/
  • Windows IoT https://dev.windows.com/en-us/iot
  • Test Drive sites and demos dev.modern.ie/testdrive/
  • Download virtual machines dev.modern.ie/tools/vms/mac/

ASP.NET Talk

No service for type 'Microsoft.Framework.Runtime.ILibraryManager' has been registered

 

IoT Talk (great session by Randy Patterson)

  • Getting Started - http://ms-iot.github.io/content/en-US/GetStarted.htm
  • Microsoft is holding a contest! Join Windows 10 IoT Core - Home Automation Contest on Hackster.io
  • Hackster.io https://microsoft.hackster.io/en-US
  • Become a part of our early adopter community https://www.windowsondevices.com/signup.aspx
  • Randy Patterson Github repo https://github.com/RandyPatterson
  • https://github.com/randypatterson/motiondetector
  • and if you like to troll Randy Patterson 🙂 http://rrpiot.azurewebsites.net/%20rrpiot.azurewebsites.net/SensorData?what%27s%20up!
  • https://dev.windows.com/en-us/iot

And last but not least, an honorable mention to team duct tape who is fundraising for their upcoming robotics / tech challenge. All the best guys & gals.

Happy Coding!

11143104_10153680100097784_809925724949038523_n 11899799_10153680100142784_8785801035797463560_n 11892273_10153680100187784_4839238068884185443_n 11889597_10153680100282784_5991658731733494552_n 10520822_10153680100322784_1676893270012548971_n 11898587_10153680100412784_4505419149280529454_n

Share

Learning F# Functional Data Structures and Algorithms is Out!

الحمد للہ رب العالمین

Wondering what to do on 4th of July long weekend? Learn Functional Programming in F# with my book!

I am glad to inform that my book on Learning F# Functional Data Structures and Algorithms is published, and is now available via Amazon and other retailers. F# is a multi-paradigm programming language that encompasses object-oriented, imperative, and functional programming language properties. The F# functional programming language enables developers to write simple code to solve complex problems.

Learning F# Functional Data Structures and Algorithms - Adnan Masood PhD

Starting with the fundamental concepts of F# and functional programming, this book will walk you through basic problems, helping you to write functional and maintainable code. Using easy-to-understand examples, you will learn how to design data structures and algorithms in F# and apply these concepts in real-life projects. The book will cover built-in data structures and take you through enumerations and sequences. You will gain knowledge about stacks, graph-related algorithms, and implementations of binary trees. Next, you will understand the custom functional implementation of a queue, review sets and maps, and explore the implementation of a vector. Finally, you will find resources and references that will give you a comprehensive overview of F# ecosystem, helping you to go beyond the fundamentals.

If you have just started your adventure with F#, then this book will help you take the right steps to become a successful F# coder. An intermediate knowledge of imperative programming concepts, and a basic understanding of the algorithms and data structures in .NET environments using the C# language and BCL (Base Class Library), would be helpful.

With detailed technical and editorial reviews, it is a long process to write a technology book, but equally rewarding and unique learning experience. I am thankful to my technical reviewer, and Packt editorial team to provide excellent support to make this a better book. Nothing is perfect and to err is human; if you find any issues in the code or text, please let me know.

Learning F# Functional Data Structures and Algorithms - Get it via Amazon

Learning F# Functional Data Structures and Algorithms - Get it via Google Books

Learning F# Functional Data Structures and Algorithms - Get it via Packt

Happy Functional Programming!

The source code for the book can be downloaded from here.

 

Share

Rendezvous with MIT Bot @ NASA - Sample Return Robot Challenge

The fun thing about spending time at MIT is that you always run into interesting things. Couple of days ago, I encountered the MIT Bot submission for NASA - Sample Return Robot Challenge.

robot

 

NASA and the Worcester Polytechnic Institute (WPI) in Worcester teamed up for competing in the Sample Return Robot Challenge to demonstrate a robot that can locate and retrieve geologic samples from a wide and varied terrain without human control.

Sample Return Robot Challenge is part of NASA centennial challenges; a robot which has autonomous capability to locate and retrieve specific sample types from various locations over a wide and varied terrain and return those samples to a designated zone in a reasonable amount of time with limited mapping data.

The challenge description follows:

The Sample Return Robot Challenge is scheduled for June 14-17, 2012 in Worcester, MA. The Challenge requires demonstration of an autonomous robotic system to locate and collect a set of specific sample types from a large planetary analog area and then return the samples to the starting zone. The roving area will include open rolling terrain, granular medium, soft soils, and a variety of rocks, and immovable obstacles (trees, large rocks, water hazards, etc.) A pre-cached sample and several other samples will be located in smaller sampling zones within the larger roving area. Teams will be given aerial/geological/topographic maps with appropriate orbital resolution, including the location of the starting position and a pre-cached sample.

MIT Robotics Team 2015 Promo Video

The bot is powered with the following technologies:

ROS: The Robot Operating System (ROS) is a set of software libraries and tools that help you build robot applications. From drivers to state-of-the-art algorithms, and with powerful developer tools, ROS has what you need for your next robotics project. And it's all open source. www.ros.org

Arduino: Arduino is an open-source electronics platform based on easy-to-use hardware and software. It's intended for anyone making interactive projects.

RabbitMQ for Async messaging: RabbitMQ is a messaging broker - an intermediary for messaging. It gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.

 

Update:

MIT team couldn't make it to the challenge due to some technical issues. NASA has awarded $100,000 in prize money to the Mountaineers, a team from West Virginia University, Morgantown, for successfully completing Level 2 of the Sample Return Robot Challenge, part of the agency’s Centennial Challenges prize program.



20150611_165903

Share

MIT Machine Learning for Big Data and Text Processing Class Notes - Day 2

So after having an awesome Day 1 @ MIT, I was in CSAIL library and met Pedro Ortega, NIPS 2015 Program Manager @adaptiveagents. Celebrity sighting!

Today on Day 2, Dr. Jaakkola (Bio) (Personal Webpage) professor, Electrical Engineering and Computer Science/Computer Science and Artificial Intelligence Laboratory (CSAIL), went over the following .

  • Non-linear classification and regression, kernels
  • Passive aggressive algorithm
  • Overfitting, regularization, generalization
  • Content recommendation

Dr. Jaakkola's socratic method of inquiring the common sense questions ingrain the common concepts in the mind of people. The class started with the follow up of perceptron from yesterday and quickly turned into a session on when NOT to use perceptron such as in case of non linearly seperable problems. Today's lecture was derieved  from 6.867 Machine Learning Lecture 8. The discussion extended to Support Vector Machine (and Statistical Learning Theory) Tutorial, which is also well explained in the An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot joke1

Speaking of SVM and dimensionality, Dr. Jaakkola posed the question if ranking can also be a secondary classification problem? Learning to rank or machine-learned ranking (MLR) is a fascinating topic where common intuitions like number of items displayed, error functions between user's preference and display order sparseness fall flat. Microsoft research has some excellent reference papers and tutorials on learning to rank which are definitely worth pouring over in case you are interested in this topic.  Label ranking by learning pairwise preferences is another topic discussed in detail during the class. Some reference papers follow:

Indeed with SVM, the natural progression led to the 'k' word; kernel functions. A brief introduction to kernel classifiers Mark Johnson Brown University is a good starting point and The difference of kernels in SVM?, and how to select a kernel for SVM provide good background material to understand the practical aspects of kernel.  Kernels and the Kernel Trick Martin Hofmann Reading Club "Support Vector Machines" 

 

 

The afternoon topic was Anomaly detection; use cases included aberrant behavior in financial transactions, insurance fraud, bot detection, manufacturing quality control etc. One the most comprehensive presentations on Anomaly Detection Data Mining Techniques is by Francesco Tamberi which is great for the background. Several problems worked on during the class were from 6.867 Machine learning which shows how instructors carefully catered the program for practitioners with the right contents from graduate level courses, as well as industry use cases. Other topics discussed included Linear versus nonlinear classifiers and we learned how decision boundary is the region of a problem space in which the output label of a classifier is ambiguous. Class discussions and Q&A touched on the wide variety of subjects including but not limited to How to increase accuracy of classifiers?Recommendation SystemsA Comparative Study of Collaborative Filtering Algorithms which eventually led to Deep Learning Tutorial: From Perceptrons to Deep Networks which performed really well on MNIST Database for handwritten digits.

Linear vs. non linear classifiers followed where Dr. Jaakkola spoke about why logistic regression a linear classifier, more on Linear classifierKernel Methods for General Pattern AnalysisKernel methods in Machine learningHow do we determine the linearity or nonlinearity of a classification problem? and review of Kernel Methods in Machine Learning

 

Misc. discussions of Kernel MethodsSo you think you have a power lawRadial basis function kernelKernel Perceptron in Python surfaced, some of which briefly reviewed in Machine Learning: Perceptrons- Kernel Perceptron Learning Part-3/4Shape Fitting with Outliers and SIGIR 2003 Tutorial Support Vector and Kernel Methods tutorial with radial basis functions. Other topics included Kernel based Anomaly Detection with Multiple Kernel Anomaly Detection (MKAD) AlgorithmSupport Vector Machines: Model Selection Using Cross-Validation and Grid-SearchLIBSVM -- A Library for Support Vector MachinesPractical Guide to Support Vector ClassificationOutlier Detection with Kernel Density Functions and Classification Framework for Anomaly Detection as relevant readings.

For a linear Algebra Refresher, Dr. Barzilay recommended Prof. Gilbert Strang MIT Open Course Number 18.06  or Gilbert Strang lectures on Linear Algebra via video lectures.

Looking forward to the Deep Learning and Boosting tomorrow! Dr. Barzilay said its going to be pretty cool.

 

Misc:

Share

MIT Machine Learning for Big Data and Text Processing Class Notes - Day 1

As a follow up on MIT's tackling the challenges of Big Data, I am currently in Boston attending Machine Learning for Big Data and Text Processing Classification (and therefore blogging about it for posterity based on public domain data / papers - nothing posted here is MIT proprietary info to violate any T&C). MIT professional education courses are tailored towards professionals and it is always a great opportunity to learn what others practitioners are up to, especially in a relatively new field of data science.

Today's lecture #1 was outlined as

  • machine learning primer
  • features, feature vectors, linear classifiers
  • On-line learning, the perceptron algorithm and
  • application to sentiment analysis

Instructors Tommi Jaakkola (Bio) (Personal Webpage) and Regina Barzilay (Bio) (Personal Webpage) started the discussion with breif overview of the course. Dr. Barzilay is a great teacher who explains the concepts in amazing detail. As an early adapter and practitioner,  she was one of the technology review innovator under 35.

The course notes are fairly comprehensive; following are the links to the publicly available material.

  • Youtube: http://www.youtube.com/MITProfessionalEd
  • FB: https://www.facebook.com/MITProfessionalEducation
  • twitter: https://twitter.com/MITProfessional
  • LinkedIn - https://www.linkedin.com/grp/home?gid=2352439

In collaboration with CSAIL - MIT Computer Science and AI Lab- www.csail.mit.edu, today's lecture was a firehose version of Ulman's large scale machine learning. Dr. Barzilay walked through the derivation of the Perceptron Algorithm, covering Perceptrons for Dummies and Single Layer Perceptron as Linear Classifier. For a practical implementation, Seth Juarez's NUML implementation of perceptron is a good reading. A few relevant publications can be found here.

The discussion progressed into Opinion Mining and Sentiment Analysis with related techniques. Some of the pertinent data sets can be found here:

Dr. Barzilay briefly mentioned Online Passive-Aggressive Algorithms  and details from Lillian Lee, AAAI 2008 Invited Talk - A “mitosis” encoding / min-cost cut while talking about Domain Adaptation which is quite an interesting topic on its own. Domain Adaptation with Structural Correspondence Learning by John Blitzer, Introduction to Domain Adaptation guest lecturer: Ming-Wei Chang CS 546,  and Word Segmentation of Informal Arabic with Domain Adaptation are fairly interesting readings. The lecture slides are heavily inspired by Introduction to Domain Adaptation guest lecturer: Ming-Wei Chang CS 546.

With sentiment analysis and opinion mining,  we went over the seminal Latest Semantic Analysis - LSIClustering Algorithm Based on Singular Value DecompositionLatent Semantic Indexing (LSI), (Deerwester et al. 1990), and Latent Dirichlet Allocation (LDA), (Blei et al. 2003). The class had an interesting discussion around the The Hathaway Effect: How Anne Gives Warren Buffett a Rise, with a potential NSFW graphic. The lecture can be summed up in Comprehensive Review of Opinion Summarization Kim, Hyun Duk; Ganesan, Kavita; Sondhi, Parikshit; Zhai, ChengXiang (PDF version).

Few other papers / research work and demos discussed during the lecture included Get out the vote: Determining support or opposition from Congressional floor-debate transcriptsMultiple Aspect Ranking using the Good Grief AlgorithmDistributional Footprints of Deceptive Product ReviewsRecursive Neural Tensor Network - Deeply Moving: Deep Learning for Sentiment Analysis,  Code for Deeply Moving: Deep Learning for Sentiment Analysis, and Sentiment Analysis - The Stanford NLP DemoStanford Sentiment Treebank.

Among several class discussions and exercises/quiz, The Distributional Footprints of Deceptive Product Reviews was of primary importance. Started with  Amazon Glitch Unmasks War Of Reviewers, darts were thrown around Opinion Spam Detection: Detecting Fake Reviews and Reviewers Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews 

With all this sentiment analysis talks, I have asked fellow attendee Mohammed Al-Hamdan (Data Analyst at Al-Elm Information Security Company), about publishing a paper by the end of this course on sentiment analysis in Arabic language twitter feeds for potential political dissent. Would be a cool project / publication.

Looking forward to the session tomorrow!

Bonus, here is Dr. Regina Barzilay — Information Extraction for Social Media video - publicly available on youtube.

 

Share

MicroServices - Selected Links & Resources.

Microservices: An Unexpected Journey (Sam Newman)
Microservices @ Netflix: A Challenge of Scale
Principles of Microservices (ThoughtWorks)
Developing Enterprise Applications for the Cloud: From Monolith to Microservices (IBM)
Testing & Deploying Microservices (ThoughtWorks)
Growing a Microservices Landscape (With Smart Use Cases)
Think Small to Go Big - Introduction to Microservices (IBM)
Building Java (micro)services for the Cloud: The DHARMA Principles
Technology Projects
Spring Cloud
Spring Cloud Netflix
Netflix OSS
Akka
Akka.NET
DropWizard
 
Webinars
Building Distributed Systems with Netflix OSS and Spring Cloud (with Matt Stine)
7-part series, each part is 15 minutes or less
Part 1 of 7
Part 2 of 7
Part 3 of 7
Part 4 of 7
Part 5 of 7
Part 6 of 7
Part 7 of 7
Spring Cloud, Spring Boot, and Netflix OSS (with Spencer Gibb)
1 hour, 28 minutes
Books
Building Microservices: Designing Fine-Grained Systems (Sam Newman)
ReleaseIT!: Design and Deploy Production-Ready Software
Blogs and Articles
Microservice Design Patterns
An Architecture for Microservices using Spring on Cloud Foundry
Warehouse Computing and the Evolution of the Data Center: A Layman's Guide (Lenny Pruss)
Microservices, Monoliths, and NoOps
Microservice Maturity Model Proposal (Daniel Bryant)
Exploring Microservices in the Enterprise
Distributed Big Balls of Mud (Simon Brown)
Hexagonal Architecture: The Great Reconciler
Thanks to colleague and friend, David Lazar for preparing this comprehensive list, and allowing me to share it.
Share
Go to Top