Data Cleaning

Data Cleaning is the process of turning the data you have into data that is usable. It is, for the lack of a better term, the fight against entropy in the data domain.

Domain Knowledge

To be successful in the data cleaning domain, understanding the domain is paramount. This is a good chance to team up with domain knowledge experts and exploit their intricate understanding of the business cases.

Approach to Cleaning and Understanding Data

I like this approach, as it has worked well with some of my previous projects



  • I always muse back on the quote of Lester Fremon from the first season of The Wire. The "this is the job" quote
  • Always write down things that you find intersting, weird or out of place. This is a great place to discuss later with stakeholders.
  • Data versus