E-book: Principles of data wrangling
Through the last decades of the twentieth century and into the twenty-first, data was largely a medium for bottom-line accounting: making sure that the books were balanced, the rules were followed, and the right numbers could be rolled up for executive decision-making.
Attitudes toward data have changed radically in the past decade, as new people, processes, and technologies have come forward to define the hallmarks of a data-driven organisation.
Today’s data-driven organisations have analysts working broadly across departments to find methods to use data creatively. It is an era in which new mantras like “extracting signal from the noise” capture a different attitude of agile experimentation and exploitation of large, diverse sources of data.
The phrase data wrangling, born in the modern context of agile analytics, is meant to describe the lion’s share of the time people spend working with data. It represents much of the analyst’s professional process: it captures activities like understanding what data is available; choosing what data to use and at what level of detail; understanding how to meaningfully combine multiple sources of data; and deciding how to distill the results to a size and shape that can drive downstream analysis.
Trifacta has been looking into these issues with data-centric folks of various stripes and this e-book presents their effort to wrangle the lessons they have learned in this context into a coherent overview, with a specific focus on the more recent and quickly growing agile analytic processes in data-driven organisations.
You can download the e-book here.