This second part of the ‘Python for Data Scientists’ post talks about the specifics of Python for data scientists. Part 1 of Python for Data Scientists talks about Python generally and can be found here.
Data Science
If you have decided you want to learn Python but your not sure where to start then this post will point you in the right direction.
In R, we often need to get values or perform calculations from information not on the same row. We need to either retrieve specific values or we need to produce some sort of aggregation. This post explores some of the options and explains the weird (to me at least!) behaviours around rolling calculations and alignments.
Using tabulizer
we’re able to extract information from PDFs so it comes in really handy when people publish data as a PDF! This post takes you through using tabulizer
and tidyverse
packages to scrape and clean up some budget data from PASS, an association for the Microsoft Data Platform community. The goal is to mainly show some of the tricks of the data wrangling trade that you may need to utilise when you scrape data from PDFs.