Skip to main content

20 docs tagged with "data-science"

View All Tags

Confident Learning

As you can see from the above image, confident learning is about estimating the likelyhood of the data being labeled correctly based upon the confidence of the model. If the model confidence is above the threshold confidence (The Tj parameter, tdog, tfox tcow) and if the confidence of the model prediction is higher than the threshold but the label is different, then we predict a wrong label

Data Cleaning

Data Cleaning is the process of turning the data you have into data that is usable. It is, for the lack of a better term, the fight against entropy in the data domain.

Data Drift

Data drift refers to the phenomenon where the statistical properties of a dataset used for machine learning or analysis change over time. This alteration can be due to various factors, such as shifts in data collection processes, changes in the underlying distribution of the data, or modifications in the environment from which the data originates. Detecting and addressing data drift is crucial to maintaining the performance and reliability of machine learning models and analytical systems.

Data Science

What is data science? It is a bunch of different jobs bunched together and given the tie of AI to make a company sound innovative.

Data Science Project Start-Up Phase

The start-up phase of a data science project is, for me, one of the most exciting parts of the project, but it is also one of the most unclear phases. To have a fighting chance of making it to production, there are several factors that are extremely important and need to be addressed.

Data Science the Hard Parts

This book dives into the difficult aspects of data science. The difficult aspects are business value proposition, communication and measuring impact. These topics are discussed and methods for doing this the right way are presented.

Designing Machine Learning Systems

This book covers the fundamentals of designing machine learning systems. It goes through the entire lifecycle of a machine learning system and then discusses the ecosystem and the challenges and cases that need to be considered.


Evaluation is one of the most important aspects of machine learning development. It is the craft of understanding the model and how it works.

Feature Engineering for Machine Learning

This book is about how to make features for machine learning models and implement them into models. The book goes into natural language text, tabular data, and image data. It contains discussions about how to implement good engineering practices in feature engineering.

Fundamentals of Data Engineering

This book covers the fundamentals of data engineering and how to solve problems. of data engineering without going to much into detail of the programming. It introduces concepts such as data warehouse and Kafka and data pipelines and ETL.

Machine Learning Design Patterns

This book is about machine learning design patterns and discussions around those—concepts in machine learning for this. And therefore, it is a good reminder of the concepts and core tenants of machine learning.

ML Design Sprint

The ML Design Sprint is a modification of the design sprint workshop, where the goal is to give the project relevant context on the problem, the data, and the resources available. The goal of the ML Design Sprint is to decide on the goal of the model, the input features of the model, and how the model should be evaluated. The design sprint brings together Subject Matter Experts, Users, Product Owners Data Scientists, and ML Engineers together to quickly understand the problem, the potential solutions, and the risks. ML Design Sprint should shorten the duration of the scoping and exploratory analysis phases by bringing the analysts and experts together.

Practical MLOps

This book is a no-nonsense book about practical MLOps and how you should approach it to solve business problems. The book takes an even more hardline approach to automation and focuses on the concept of Kaizen ML, where continuous improvement and striving to make the feedback loop even shorter and the process more and more seamless.