Last year I built a pretty sweet web service in R as part of the day job. However, not being well-versed in stuff like object-oriented programming, I did not do the best job of making the flow of my program particularly clear or robust. It wouldn’t take multiple inputs properly and I found it to be tough to test. In spare moments, I took to cogitating how to improve things.
I tried simply refactoring some of the functions but found my structure too cumbersome to allow much change. I tried starting afresh with an S4 system but was soon in a death spiral of class proliferation and no experience in how to stop it. After dabbling with different methods, I was getting pretty frustrated – I want my code to be better and more maintainable!
Now I’m looking at magrittr.
About magrittr
magrittr
was designed to better facilitate functional programming based on piping inputs from one function to another. It’s the same paradigm as the PowerShell operator |
.
This means you can more succinctly pass an input through various transformation steps (in contrast to my initial method) with a lot less code. The ability to add conditional functions or even new functions on the fly (aka lambda functions) with a similarly low code burden gives the added benefit of helping with branching logic.
Example
Obviously, this is a contrived example but please bear with me! I want to take a subset of the iris data and produced a multiplied version of the figures.
- The 1st method is my old way
- the skeleton flow is how I mocked it together to make sure the overall logic was in place
- The new method is how I fleshed out the component functions to deliver the results
- The final file shows how you can build a function with multiple arguments and pass one main argument through – this is key for making conditional functions
I think there’s a massive increase in readability, succinctness and, ultimately, flexibility in the component functions. In terms of flows that work well in magrittr
, I can highly recommend Hadley’s presentation on the topic.
https://gist.github.com/stephlocke/1749e0f9055947037ec2
Quirks noticed
I struggled with my first hurdle – I wanted to write a skeleton high-level process and then add the functional flesh bit by bit. This meant I wanted lots of functions which didn’t do anything but simply passed through the data unaffected initially. The syntax for so many functions that could chain was tough. In the end I settled on . %>% data.table
as . %>% .
would lead to recursion issues.
Additionally, I started off trying to make a function in the traditional syntax myfnc<- function(x, y) { x }
as I wanted to have arguments that could direct the flow inside lambda statements. Unfortunately, this didn’t work too well as the arguments did not seem to stay in scope throughout the steps. I think this is something I need to do more RTFM / experimentation with.
Courtesy of magrittr
s author Stephan Milton Bache, I’m able to scratch that last quirk out as it was, as anticipated, an understanding issue on my part. Damn but I love open source when this sort of assistance happens!