5 Day Machine Learning Engineering Workshop
GPSE Consulting has developed a 5 day machine learning engineering workshop that covers the Machine Learning theory, algorithms and coding your internal development team needs to successfully begin building machine learning systems.
By making the investment in your team, you’ll be able to begin deploying machine learning solutions within your organization. You’ll have the skills and expertise in-house to begin solving business challenges and recognize where third-party expertise may be needed, while reducing the need to use external programmers and engineers.
At a glance, GPSE Consulting’s Machine Learning Engineering Workshops include:
Day 1: Introduction to Machine Learning - Great solutions start with a great set of data; learn how to prepare and process data with Pandas.
Day 2: Classification, Training and Inference - Learn supervised learning and classification methods along with Sklearn for training and predicting.
Day 3: Regression Modeling and Hyperparameter Search - Supervised learning, regression modeling and Sklearn for pipelining and hyperparameter search.
Day 4: Deep Learning with TensorFlow - Learn how to make your smart machines smarter by training neural network with TensorFlow.
Day 5: Scalable Machine Learning Services - Explore services and learn best practices to design machine learning systems that scale to large amount of data
The course content is based on nearly a decade of industry experience building machine learning systems at some of the world’s leading data science organizations. This is a fast-paced, fun and informative five day workshop that delivers the expertise your engineers need to successfully build a machine learning predictor in your environment.
Five days isn’t a lot of time to acquire these skills, so we stay laser-focused on libraries, coding and engineering of machine learning systems. We discuss theory as it’s needed to understand how to build solutions, without spending too much time in the weeds.
What to Expect: This is a hands-on workshop. Participants will complete code exercises and compete to find out: who can train the best predictor? Class sizes are limited to 10 participants per workshop to ensure each student receives ample guidance and instruction from our facilitator.
Who Should Attend: Because this is a hands on workshop, this is a workshop for coders and engineers with basic Python skills. Skills a little rusty? We’ll email a tutorial and refresher materials a few weeks before the workshop so you can prepare.
Learning Format: Algorithms and other course material will be provided via lectures, then coding will begin live in the classroom using relevant material. Participants will compete to see who can build the best predictor.
Technical Details: The learning environment is fully-hosted; only a laptop is required. Yubikeys are provided for login convenience. No setup is required.
What You’ll Learn: This course has a strong focus on engineering and real-world applications: you’ll learn the hyperparameters that actually matter, what the libraries’ APIs look like and what questions to answer in a design document for a machine-learning system.
Participants will train machine-learning models in 3 different computing environments:
Scikit-learn on 1 large memory intensive instance
TensorFlow on 1 large GPU
Spark/SparkML on a large cluster of memory intensive instances, for distributed training
Venue: This workshop is conducted on your site, making it easier to train an entire engineering team at once. If necessary, alternative venues such as conference rooms can be arranged.
Get Ready! GPSE Consulting provides refresher materials on Python and other skills so participants can prepare to hit the ground running in this fast-paced workshop. We also provide ample reference materials so your engineers can continue developing their machine-learning engineering skills once the workshop is completed.
To make the workshop fun and challenging, we’ve added a competition element. As they learn, participants will compete to see who can train the best classification and regression models using a customer review data set.
Workshop participants will begin by working on a product classification task: given a new product review, can you predict to which category of products, such as books or shoes, the review belongs? Participants will also train a model to predict the rating a product will receive, based on a five-star scale, given the text.
As the workshop progresses, learners will have an opportunity to tweak features, try different algorithms and search for optimal hyperparameters in order to train a model that performs as well as or better than our own models.
At the end of the workshop, GPSE Consulting will award unique ERC-721 tokens on the Ethereum blockchain to all participants, so they can prove what they’ve learned. We like to think of them as digital “Certificates of Achievement.” We’ll also provide specialized certificates to contest award winners. These online certificates are digital bragging rights that don’t attract dust! See the certificates we've already awarded here.
This schedule is subject to change based on developments in data science, technology and client needs. It can also be customized to meet the needs of your organization. Let us know if you have a specialized requirement or concern.
Day 1 - Introduction to Machine Learning
Extracting features for natural language processing tasks
Overview of useful libraries for NLP tasks: NLTK, spaCy, TextBlob.
Training data best practices: splitting, storing, versioning.
Competition: data processing and feature extraction
Welcome notes and overview of the workshop: expectations, schedule, competition.
Overview of the Jupyter platform, including magic commands and other tricks
Introduction to machine learning
Participants get their hands on the data for the competition
Day 2: Classification, Training and Prediction
Feature selection, dimensionality reduction
Models: decision trees, ensemble learning
Participants work on the classification task
sklearn.feature_selection, sklearn.tree, sklearn.ensemble
Supervised learning, classification tasks, metrics.
Models: KNN, Naive Bayes
Participants work on the classification task
sklearn.dummy, sklearn.neighbors, sklearn.naive_bayes
Day 3: Regression Modeling and Hyperparameter Search
Unsupervised learning: LDA and GMMs.
Plotting and making sense of the learning curve.
Hyperparameter optimization: grid vs. random.
Using Pipeline to optimize parameters.
Participants receive the test data set for the competition.
sklearn.model_selection, sklearn.pipeline, matplotlib
Regression tasks, metrics.
Linear models, overfitting and the L1/L2 penalties.
More models: KNN regressor, decision trees regressor, ensembled regressor.
Participants work on the regression task.
Day 4 - Deep Learning with TensorFlow
Architecture overview: CNNs, LSTMs, GANs, word2Vec, siamese network
Reusing pre-trained word embeddings with the TensorFlow Hub.
Guided tour of TensorBoard
MNIST tutorial with the EMNIST dataset
Deep Neural Networks (DNN): inference and training with backpropagation.
Learning rate, different optimizers available.
Instructors live code a DNN for text classification with TensorFlow.
Participants train a DNN for the product classification task.
Day 5 - Scalable Machine Learning Systems
Guided tour of scalable machine learning using AWS and Google.
Introduction to Spark/SparkML on EMR clusters.
Practice exercises with Spark and very large open data sets.
How to develop your machine-learning engineer skills from here.
Discussion of scalability and review of what we have covered so far.
How to create a design document for a machine learning system.
System design patterns for machine-learning.
Group exercise: Practice your machine learning problem understanding and intuition.