Data Drift
Data drift refers to the phenomenon where the statistical properties of a dataset used for machine learning or analysis change over time. This alteration can be due to various factors, such as shifts in data collection processes, changes in the underlying distribution of the data, or modifications in the environment from which the data originates. Detecting and addressing data drift is crucial to maintaining the performance and reliability of machine learning models and analytical systems.
Data Engineering with dbt
This is a book about data engineering, with a sprinkle of dbt as well. What it is not is a book on dbt, it most definitely is a book on data engineering. It contains data engineering knowledge and ways of working.
Data Science
What is data science? It is a bunch of different jobs bunched together and given the tie of AI to make a company sound innovative.
Data Science the Hard Parts
This book dives into the difficult aspects of data science. The difficult aspects are business value proposition, communication and measuring impact. These topics are discussed and methods for doing this the right way are presented.
Data Strategy
This book is about strategy, and data is the context in which strategy is discussed. There are some things like the McKinsey data maturity model that are discussed, but the main jist is the strategy. ‘Change is inevitable. … Change is constant.’ This is an important aspect of this entire book.
Database
In the context of business, everything is a database. Databases are the bedrock of how we design things nowadays.
DBT
dbt is an open-source command-line tool that enables data transformation and modeling in a structured and efficient manner. It allows data engineers and analysts to define and manage the data transformation pipeline using SQL queries. With dbt, you can write modular and reusable SQL code, called "models," which define the transformations required to convert raw data into structured and analysis-ready data. These models can be organized, tested, and documented within the dbt framework. dbt leverages the power of SQL and provides a layer of abstraction on top of the data warehouse, making it easier to develop, test, and maintain complex data transformations. It promotes best practices such as version control, testing, and documentation, enabling collaborative and maintainable data modeling workflows. dbt integrates with various data warehouses and can be used in conjunction with other data tools and orchestration platforms to create a robust and reliable data pipeline.
GCP
I have mostly worked with GCP on my own. It is the little brother of the
Getting Started with Streamlit for Data Science
This book is a welcoming introduction to a Python module that has seen rapid growth. It offers a brief overview of the application's capabilities and shows how its user-friendly nature makes it an inclusive tool for both new and experienced data scientists.
Hands-On Unsupervised Learning Using Python
This book is an introduction to unsupervised machine learning techniques and practices. It introduces methods of unsupervised learning for clustering, correlations and time series analysis. It analyses models and provides guidance on how to use them.
Interpretable Machine Learning
This book is about methods and ways to understand AI and data modeling and how to utilize the different ways of interpreting machine learning models. It gives the basis and then dives into the different types of models and methods you can use. It separates the models into specific to models or model families, and model agnostic.
Programming Internals
I made this article to filter out a lot of the more computer sciency things in programming.