Data Science

Introducing Python for data scientists - Pt2

by Leo

23 Mar 2018

3 minute read

This second part of the ‘Python for Data Scientists’ post talks about the specifics of Python for data scientists. Part 1 of Python for Data Scientists talks about Python generally and can be found here.

Introducing Python for data scientists - Pt1

by Leo

15 Mar 2018

3 minute read

If you have decided you want to learn Python but your not sure where to start then this post will point you in the right direction.

Understanding rolling calculations in R

by steph

7 Mar 2018

6 minute read

In R, we often need to get values or perform calculations from information not on the same row. We need to either retrieve specific values or we need to produce some sort of aggregation. This post explores some of the options and explains the weird (to me at least!) behaviours around rolling calculations and alignments.

Working with PDFs - scraping the PASS budget

by steph

29 Dec 2017

6 minute read

Using tabulizer we’re able to extract information from PDFs so it comes in really handy when people publish data as a PDF! This post takes you through using tabulizer and tidyverse packages to scrape and clean up some budget data from PASS, an association for the Microsoft Data Platform community. The goal is to mainly show some of the tricks of the data wrangling trade that you may need to utilise when you scrape data from PDFs.