Using purrr with APIs – revamping my code

I wrote a little while back about using Microsoft Cognitive Services APIs with R to first of all detect the language of pieces of text and then do sentiment analysis on them. I wasn’t too happy with the some of the code as it was very inelegant. I knew I could code better than I had, especially as I’ve been doing a lot more work with purrr recently. However, it had sat in drafts for a while. Then David Smith kindly posted about the process I used which meant I had to get this nicer version of my code out ASAP!

Get the complete code in this gist.

Prerequisites

Setup

library(httr)
library(jsonlite)
library(dplyr)
library(purrr)

cogapikey<-"XXX"

text=c("is this english?"
       ,"tak er der mere kage"
       ,"merci beaucoup"
       ,"guten morgen"
       ,"bonjour"
       ,"merde"
       ,"That's terrible"
       ,"R is awesome")

# Put data in an object that converts to the expected schema for the API
data_frame(text) %>% 
  mutate(id=row_number()) ->
  textdf

textdf %>% 
  list(documents=.) ->
  mydata

Language detection

We need to identify the most likely language for each bit of text in order to send this additional bit of info to the sentiment analysis API to be able to get decent results from the sentiment analysis.

cogapi<-"https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages?numberOfLanguagesToDetect=1"

cogapi %>% 
  POST(add_headers(`Ocp-Apim-Subscription-Key`=cogapikey),
       body=toJSON(mydata)) ->
  response

# Process response
response %>% 
  content() %>%
  flatten_df() %>% 
  select(detectedLanguages) %>% 
  flatten_df()->
  respframe

textdf %>% 
  mutate(language= respframe$iso6391Name) ->
  textdf

Sentiment analysis

With an ID, text, and a language code, we can now request the sentiment of our text be analysed.

# New info
mydata<-list(documents = textdf)

# New endpoint
cogapi<-"https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment"

# Construct a request
cogapi %>% 
  POST(add_headers(`Ocp-Apim-Subscription-Key`=cogapikey),
       body=toJSON(mydata)) ->
  response

# Process response
response %>% 
  content() %>%
  flatten_df() %>% 
  mutate(id=as.numeric(id))-> 
  respframe

# Combine
textdf %>%
  left_join(respframe) ->
  textdf

And… et voila! A multi-language dataset with the language identified and the sentiment scored using purrr for easier to read code.

Using purrr with APIs makes code nicer and more elegant as it really helps interact with hierarchies from JSON objects. I feel much better about this code now!

Original

id language text score
1 en is this english? 0.2852910
2 da tak er der mere kage NA
3 fr merci beaucoup 0.8121097
4 de guten morgen NA
5 fr bonjour 0.8118965
6 fr merde 0.0515683
7 en That’s terrible 0.1738841
8 en R is awesome 0.9546152

Revised

text id language score
is this english? 1 en 0.2265771
tak er der mere kage 2 da 0.7455934
merci beaucoup 3 fr 0.8121097
guten morgen 4 de 0.8581840
bonjour 5 fr 0.8118965
merde 6 fr 0.0515683
That’s terrible 7 en 0.0068665
R is awesome 8 en 0.9973871

Interestingly the scores for English have not stayed the same – for instance, Microsoft now sees “R is awesome” in a much more positive light. It’s also great to see German and Danish are now supported!

Get the complete code in this gist.