Notes from 23/9–30/9
It’s going to be a new month!
AI
Designing Your Neural Networks
Will serve as a good reference when discussing what sort of elements to include in your NN
We’ve looked at how to set up a basic neural network (including choosing the number of hidden layers, hidden neurons, batch sizes, etc.)
We’ve learned about the role momentum and learning rates play in influencing model performance.
And finally, we’ve explored the problem of vanishing gradients and how to tackle it using non-saturating activation functions, BatchNorm, better weight initialization techniques and early stopping.
A 6 Step Field Guide for Building Machine Learning Projects
1 Problem definition — What business problem are we trying to solve? How can it be phrased as a machine learning problem?
Data Types: Supervised v Unsupervised v Transfer
Learning Types:
- Classification — Do you want to predict whether something is one thing or another? Such as whether a customer will churn or not churn? Or whether a patient has heart disease or not? Note, there can be more than two things. Two classes is called binary classification, more than two classes is called multi-class classification. Multi-label is when an item can belong to more than one class.
- Regression — Do you want to predict a specific number of something? Such as how much a house will sell for? Or how many customers will visit your site next month?
- Recommendation — Do you want to recommend something to someone? Such as products to buy based on their previous purchases? Or articles to read based on their reading history?
2 Data — If machine learning is getting insights out of data, what data we have? How does it match the problem definition? Is our data structured or unstructured? Static or streaming?
Data Types:
- Structured data — Think a table of rows and columns, an Excel spreadsheet of customer transactions, a database of patient records. Columns can be numerical, such as average heart rate, categorical, such as sex, or ordinal, such as chest pain intensity.
- Unstructured data — Anything not immediately able to be put into row and column format, images, audio files, natural language text.
- Static data — Existing historical data which is unlikely to change. Your companies customer purchase history is a good example.
- Streaming data — Data which is constantly updated, older records may be changed, newer records are constantly being added.
3 Evaluation — What defines success? Is a 95% accurate machine learning model good enough?
Measuring success:
- False negatives — Model predicts negative, actually positive. In some cases, like email spam prediction, false negatives aren’t too much to worry about. But if a self-driving cars computer vision system predicts no pedestrian when there was one, this is not good.
- False positives — Model predicts positive, actually negative. Predicting someone has heart disease when they don’t, might seem okay. Better to be safe right? Not if it negatively affects the person’s lifestyle or sets them on a treatment plan they don’t need.
- True negatives — Model predicts negative, actually negative. This is good.
- True positives — Model predicts positive, actually positive. This is good.
- Precision — What proportion of positive predictions were actually correct? A model that produces no false positives has a precision of 1.0.
- Recall — What proportion of actual positives were predicted correctly? A model that produces no false negatives has a recall of 1.0.
- F1 score — A combination of precision and recall. The closer to 1.0, the better.
- Receiver operating characteristic (ROC) curve & Area under the curve (AUC) — The ROC curve is a plot comparing true positive and false positive rate. The AUC metric is the area under the ROC curve. A model whose predictions are 100% wrong has an AUC of 0.0, one whose predictions are 100% right has an AUC of 1.0.
4 Features — What parts of our data are we going to use for our model? How can what we already know influence this?
Types of features:
- Categorical features — One or the other(s). For example, in our heart disease problem, the sex of the patient. Or for an online store, whether or not someone has made a purchase or not.
- Continuous (or numerical) features — A numerical value such as average heart rate or the number of times logged in.
- Derived features — Features you create from the data. Often referred to as feature engineering. Feature engineering is how a subject matter expert takes their knowledge and encodes it into the data. You might combine the number of times logged in with timestamps to make a feature called time since last login. Or turn dates from numbers into “is a weekday (yes)” and “is a weekday (no)”.
Usage:
- Keep them the same during experimentation (training) and production (testing) — A machine learning model should be trained on features which represent as close as possible to what it will be used for in a real system.
- Work with subject matter experts — What do you already know about the problem, how can that influence what features you use? Let your machine learning engineers and data scientists know this.
- Are they worth it? — If only 10% of your samples have a feature, is it worth incorporating it in a model? Have a preference for features with the most coverage. The ones where lots of samples have data for.
- Perfect equals broken — If your model is achieving perfect performance, you’ve likely got feature leakage somewhere. Which means the data your model has trained on is being used to test it. No model is perfect.
5 Modelling — Which model should you choose? How can you improve it? How do you compare it with other models?
Criterion:
- Interpretability and ease to debug — Why did a model make a decision it made? How can the errors be fixed?
- Amount of data — How much data do you have? Will this change?
- Training and prediction limitations — This ties in with the above, how much time and resources do you have for training and prediction?
Model Types: ensembles of decision trees v deep models v pre-trained
6 Experimentation — What else could we try? Does our deployed model do as we expected? How do the other steps change based on what we’ve found?
And then…
Deployment changes everything. A good model offline doesn’t always mean a good model online. This article has focused on data modelling. Once you deploy a model, there’s infrastructure management, data verification, model retraining, analysis and more. Any cloud provider has services for these but putting them together is still a bit of a dark art. Pay your data engineers well. If you’re data engineer, share what you know.
Blockchain
Understanding the Internals of Crypto-Exchanges Using Machine Learning and Data Visualizations: Binance and Poloniex Explained
Really very interesting. Possibly let’s you identify points of failure. But how did this person do it?
Introducing the DeFi Score — an open-source methodology to evaluate code and financial risk in DeFi lending
Visual example
Computer Science
Learn Data Structures from a Google Engineer — A Free 8-hour Course
Design
5 Design Approaches to start a New Creative Project
Forecasting
Design Thinking:
Systems Thinking:
Design-driven innovation:
Backcasting Approach
Speculative design:
Transition design:
Participatory design approach
These concepts are considered as an organization of the project rather than a design process/approach. So these are placed in the above diagram as the infrastructure for the design project.
SCENARIO BUILDING
Life Optimisation
Want To Be In The Top 1% Of IT Teams? Here’s What It Takes
SPEED
Delivery velocity.
We had to tackle:
- Internal resistance to DevOps and culture changes.
- External pressure created by our innovation goals versus the external stability needs.
- Calculated risks in tools, new process, people churn, and the team’s breakpoints.
How I Build Learning Projects — Part I
Choosing
- Prioritise foundational skills
- Explore adjacent skills
- Focus on transferable skills
Define a learning plan with milestones
The Thinking Ladder
10 Things Incredibly Likable People Never, Ever Do (and Why You Love Them for It)
- Don’t blame
- Don’t control
- Don’t try to impress
- Don’t cling
- Don’t interrupt
- Don’t whine
- Don’t criticize
- Don’t preach
- Don’t live in the past
- Don’t let fear hold them back
Seems like an all round nice guy that respects others. though I think criticism is good, but might not make you liked.
hence it’s probably important to decide what parts you want to be liked for and not.
BLUF: The Military Standard That Can Make Your Writing More Powerful
BLUF is a military communications acronym — it stands for “bottom line up front” — that’s designed to enforce speed and clarity in reports and emails.
In some ways, the trick to writing good content is assuming your own readers are equally as busy.
Don’t
- Asking if someone has time to chat or if they have time to answer a question, on its own, forces a context-switch with no immediate resolution.
- offer slightly more context but still not enough for it to be an actionable request for your recipient
Example
Product
How to Assess Product Management Skills and Competencies?
How to be a GREAT Product Manager
In it, Ken Norton writes that hiring managers should do or look for the following:
- Hire all the smart people
- Strong technical background
- “Spidey-sense” product instincts and creativity
- Leadership that’s earned
- Ability to channel multiple points-of-view
- Give me someone who’s shipped something
3. Set up a process to get the hard data that is needed to support or justify decisions. Not only will others be pleasantly surprised, but they’ll also have a hard time refuting your recommendations.
4. Responsibility + Authority. this is achieved through credibility, commitment, communication, and courage
5. You can have a significant positive impact on those teams if you understand what informationpeople need, when they need it, and in what form they need it? If you can do that for them, they’ll be able to make educated decisions sooner, streamline their work and be more effective.
Focus on the key objectives you need to deliver and ensure those get done in a timely manner.
Product Roadmaps: Love, Hate (& Hate)
A roadmap is not a project plan. Far too often a roadmap gets confused for a project plan. A roadmap isn’t meant to be the detailed execution of specific projects. It’s not meant to commit to dates for releases or timelines for features. There is certainly a place for that level of specificity, but it isn’t the roadmap, especially as we look several quarters into the future.
- Focus on Outcomes
- Understand the Audience
- Keep it Clean
- Convey the Vision
- Keep it Updated
Ultimately a roadmap is about creating a shared understanding with our team, our users, and our stakeholders. It is not about creating a document in a specific format or with specific pieces.
Reimagining Experimentation Analysis at Netflix
The analysis reports tell us whether or not a new experience made statistically significant changes to relevant metrics, such as member behavior, or technical metrics that describe streaming video quality.