Markdown based web analytics? Rectangle your blog

Locke Data’s great blog is Markdown-based. What this means is that all blog posts exist as Markdown files: you can see all of them here. They then get rendered to html by some sort of magic cough blogdown cough we don’t need to fully understand here. For marketing efforts, I needed a census of existing blog posts along with some precious information. Here is how I got it, in other words here is how I rectangled the website GitHub repo and live version to serve our needs.

Note: This should be applicable to any Markdown-based blog!

What are you about, dear blog posts?

To find out what a blog post is about, I read its tags and categories, that live in the YAML header of each post, see for instance this one. Just a note, thank you, participants in this Stack Overflow thread.

Getting all blog posts names and path

I used the gh package to interact with GitHub V3 API.

# get link to all posts and their filename
posts <- gh::gh("/repos/:owner/:repo/contents/:path",
                owner = "lockedatapublished",
                repo = "blog",
                path = "content/posts")

gh_posts <- tibble::tibble(name = purrr::map_chr(posts, "name"),
                           path = purrr::map_chr(posts, "path"),
                           raw = purrr::map_chr(posts, "download_url"))

Here is the table I got:

library("magrittr")
gh_posts %>%
  head() %>%
  knitr::kable()
name path raw
2013-05-11-setting-up-wordpress-on-azure.md content/posts/2013-05-11-setting-up-wordpress-on-azure.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2013-05-11-setting-up-wordpress-on-azure.md
2013-05-12-objectless-chec-boxes-using-vba.md content/posts/2013-05-12-objectless-chec-boxes-using-vba.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2013-05-12-objectless-chec-boxes-using-vba.md
2013-05-19-time-to-go-home.md content/posts/2013-05-19-time-to-go-home.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2013-05-19-time-to-go-home.md
2013-05-24-center-across-selection.md content/posts/2013-05-24-center-across-selection.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2013-05-24-center-across-selection.md
2013-05-25-making-charts-with-conditionally-coloured-series.md content/posts/2013-05-25-making-charts-with-conditionally-coloured-series.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2013-05-25-making-charts-with-conditionally-coloured-series.md
2013-05-29-synchronising-schema-between-mssql-mysql-with-ssis.md content/posts/2013-05-29-synchronising-schema-between-mssql-mysql-with-ssis.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2013-05-29-synchronising-schema-between-mssql-mysql-with-ssis.md

There are 169 posts in this table.

Getting all blog post image links

In a blogdown blog, you do not need to be consistent with image naming, as long as you give the correct link inside your post. Images used on Steph’s blog live here and their names often reflect the blog post name, but not always. I thought it could be useful to have a table of all blog posts images. I wrote a function that downloads the content of each post and extract image links.

get_pics <- function(path){
    message(path)
    file <- gh::gh("/repos/:owner/:repo/contents/:path",
                   owner = "lockedatapublished",
                   repo = "blog",
                   path = path)
    content <- rawToChar(base64enc::base64decode(file$content))
    # get links to imgs
    img <- stringr::str_match(content, 'src=\\\".*?\\/img\\/(.*?)\"' )[,2]
    tibble::tibble(path = path,
                   img = img)
}

Let’s see what it does for one path.

get_pics("content/posts/2013-05-11-setting-up-wordpress-on-azure.md") %>%
  knitr::kable()
path img
content/posts/2013-05-11-setting-up-wordpress-on-azure.md azurescreenshot1_ujw0yl_finrxl.png

Ok then I simply needed to apply it to all posts.

pics <- purrr::map_df(gh_posts$path, get_pics)
gh_pics <- dplyr::left_join(gh_posts, pics, by = "path")
gh_pics <- dplyr::filter(gh_pics, !is.na(img))
readr::write_csv(gh_pics, path = "data/gh_imgs.csv")

Having this table, one could run some analysis of the number of images by post, extract pictures when promoting a post, and tidy a website. I think Steph’s filenames are good, but I could imagine renaming files based on the blog post they appear in if it had not been done previously (and changing the link inside posts obviously), but hey why clean if one can link the data anyway.

Getting all tags and categories

The code here is similar to the previous one but slightly more complex because I wrote the post content inside a temporary .yaml file in order to read it using rmarkdown::yaml_front_matter.

get_one_yaml <- function(path){
  print(path)
  file <- gh::gh("/repos/:owner/:repo/contents/:path",
                 owner = "lockedatapublished",
                 repo = "blog",
                 path = path)
  content <- rawToChar(base64enc::base64decode(file$content))
  
  # the yaml function didn't like this
  content <- stringr::str_replace_all(content, "“", "")
  content <- stringr::str_replace_all(content, "â€\u009d", "")
  write(content, "temporary.md")
  data <- rmarkdown::yaml_front_matter("temporary.md")
  file.remove("temporary.md")
  
  # is this an elegant solution? No :-)
  # but this way I'll get both categories and tags
  # and won't get issues if several categories/tags
  categories <- data$categories
  data$categories <- NULL
  data <- dplyr::as_data_frame(data)
  
  
  if("tags" %in% names(data)){

    data <- dplyr::mutate(data, value = TRUE)
    data <- tidyr::spread(data, tags, value, fill = FALSE)
    data <- dplyr::mutate(data, path = path)
  }
  
  
  if(!is.null(categories)){
    categories <- dplyr::tibble(categories = paste0("cat_", categories))
    categories <- dplyr::mutate(categories, value = TRUE)
    categories <- tidyr::spread(categories, categories, value, fill = FALSE)
  }
  
  data <- cbind(data, categories)
  data
}

I’ll illustrate this with one post:

get_one_yaml("content/posts/2013-05-11-setting-up-wordpress-on-azure.md") %>%
  knitr::kable()
## [1] "content/posts/2013-05-11-setting-up-wordpress-on-azure.md"
title author type date dsq_thread_id azure blogging ec2 wordpress path cat_Misc Technology
Setting up WordPress on Azure Steph post 2013-05-11T22:14:43+00:00 NA TRUE TRUE TRUE TRUE content/posts/2013-05-11-setting-up-wordpress-on-azure.md TRUE

I then used the function over all paths.

info <- purrr::map_df(gh_posts$path, get_one_yaml)
gh_posts <- dplyr::left_join(gh_posts, info, by = "path")

readr::write_csv(gh_posts, path = "data/gh_posts.csv")

Tags and categories can be useful to perform an action on blog posts depending on them (e.g., make a list of all posts related to X), and to analyse the use of tags and categories: what topics did Steph blog about over time? Coupled with traffic data, what topics are the most read?

So, I know a lot about blog posts now, but if I were to say read or webshoot them, where should I go?

Where do you live, dear blog posts?

Often, the URL of a blog post can be guessed based on its title, e.g. this one can be read here. But even if the transition from the Markdown file information to an URL is logical, it was best to get URLs from the in situ blog posts, and then join them to the blog post information collected previously, since some special characters got special treatment that I could not fully understand by looking at blogdown source code.

I first extracted all posts URLs from the website map.

library("magrittr")

# get links and tags
sitemap <- xml2::read_xml("https://itsalocke.com/blog/sitemap.xml") %>%
  xml2::as_list() %>%
  .$urlset

# probably re-inventing the wheel
get_one <- function(element, what){
  one <- unlist(element[[what]])
  if(is.null(one)){
    one <- ""
  }
  
  one
}

# tibble with everything
sitemap <- tibble::tibble(url = purrr::map_chr(sitemap, get_one, "loc"),
                       date = purrr::map_chr(sitemap, get_one, "lastmod"))

# only blog posts
blog <- dplyr::filter(sitemap, !stringr::str_detect(url, "tags\\/"))
blog <- dplyr::filter(blog, !stringr::str_detect(url, "categories\\/"))
blog <- dplyr::filter(blog, !stringr::str_detect(url, "statuses\\/"))
blog <- dplyr::filter(blog, url != "https://itsalocke.com/blog/stuff-i-read-this-week/")
blog <- dplyr::filter(blog, !stringr::str_detect(url, "https://itsalocke.com/blog/.*?\\/.*?\\/"))
blog <- dplyr::filter(blog, url != "https://itsalocke.com/blog/")
blog <- dplyr::filter(blog, url != "https://itsalocke.com/blog/posts/")

This is the resulting “sitemap”.

head(blog) %>%
  knitr::kable()
url date
https://itsalocke.com/blog/how-to-maraaverickfy-a-blog-post-without-even-reading-it/ 2018-02-12T11:49:40+00:00
https://itsalocke.com/blog/connecting-to-sql-server-on-shinyapps.io/ 2018-01-31T09:29:42+00:00
https://itsalocke.com/blog/year-2-of-locke-data/ 2018-01-29T00:00:00+00:00
https://itsalocke.com/blog/working-with-pdfs---scraping-the-pass-budget/ 2017-12-29T00:00:00+00:00
https://itsalocke.com/blog/using-blogdown-with-an-existing-hugo-site/ 2017-12-20T00:00:00+00:00
https://itsalocke.com/blog/data-manipulation-in-r/ 2017-12-18T21:29:42+00:00

Now how do I join it to the data previously collected? I first tried to reproduce what blogdown does to post titles. Note that in some cases Steph had to had a “slug” by hand when migrating her blog to blogdown, which is what I use when it’s available.

gh_info <- readr::read_csv("data/gh_posts.csv")
gh_info <- dplyr::filter(gh_info, !stringr::str_detect(name, "\\.Rmd"))

unique(gh_info$slug)
## [1] NA
## [2] "satrdays-voting-closes-may-31st"
## [3] "my-pass-summit2016-submissions-feedback"
## [4] "using-blogdown-with-an-existing-hugo-site"
## [5] "working-with-pdfs-scraping-the-pass-budget"
# https://github.com/rstudio/blogdown/blob/0c4c30dbfb3ae77b27594685902873d63c2894ad/R/utils.R#L277
dash_filename = function(string, pattern = '[^[:alnum:]^\\.]+') {
  tolower(string) %>%
    stringr::str_replace_all("â", "") %>%
    stringr::str_replace_all("DataOps.*? it.*?s a thing (honest)",
                             "dataops--its-a-thing-honest") %>%
    stringr::str_replace_all(pattern, '-') %>%
    stringr::str_replace_all('^-+|-+$', '')

}
gh_info <- dplyr::mutate(gh_info,
                         base = ifelse(!is.na(slug), slug, title),
                         base = dash_filename(base),
                         false_url = paste0("https://itsalocke.com/blog/",
                                      base, "/"))

Here are a few “false URLs” that I get. They’re often the right URLs, but not always!

tail(gh_info$false_url)
## [1] "https://itsalocke.com/blog/data-manipulation-in-r/"
## [2] "https://itsalocke.com/blog/using-blogdown-with-an-existing-hugo-site/"
## [3] "https://itsalocke.com/blog/working-with-pdfs-scraping-the-pass-budget/"
## [4] "https://itsalocke.com/blog/year-2-of-locke-data/"
## [5] "https://itsalocke.com/blog/connecting-to-sql-server-on-shinyapps.io/"
## [6] "https://itsalocke.com/blog/how-to-maraaverickfy-a-blog-post-without-even-reading-it/"

The cases in which they’re not the URL are often cases with double dashes for instance. In order to be quick, I decided to simply join them using string distance, because the false and right URLs will be quite similar anyway.

all_info <- fuzzyjoin::stringdist_left_join(blog, gh_info,
                                            by = c("url" = "false_url"),
                                            max_dist = 3)
all_info$url[(is.na(all_info$raw))]
## [1] "https://itsalocke.com/blog/being-an-organised-sponsor-sce-p3/"
all_info$title[duplicated(all_info$raw)]
## [1] "Shiny module design patterns: Pass module input to other modules"
## [2] "Shiny module design patterns: Pass module inputs to other modules"
## [3] "optiRum 0.37.1 now out"
## [4] "optiRum 0.37.3 now out"
readr::write_csv(all_info, path = "data/all_info_about_posts.csv")

So what are the posts that did not get mapped properly, in brief?

  • two very close announcements of a new version of optiRum. I can correct that by hand, but since URLs are needed to webshoot evergreen posts, I will probably ignore them.
  • two very close blog post titles about Shiny that I shall correct.

A taste of the usefulness of such data!

But hey here is what one gets from the website!

head(all_info) %>%
  knitr::kable()
url date.x name path raw title author type date.y dsq_thread_id azure blogging ec2 wordpress cat_Misc Technology check boxes Excel macros tick vba merge ribbon chart datawarehouse mssql mysql ssis cat_Microsoft Data Platform format presentation sql server statuspost user group tutorial cat_Community marketing sqlrelay windows analysis r cat_R presentations SSRS blog best practices continuous integration ssas unit testing quick tip sql fundamentals code hacks lookup VB speaking socialauthorbio_custom_checkbox_meta data analysis r basics web scraping cat_Data Science presenting photoshop productivity knitr rmarkdown conferences tip mentoring professional development magrittr software development in r stuff I read this week reports shiny docker security microsoft business data visualisation open source twitter git httr source control tfs tfsR visual studio online visual studio team services cat_DataOps machine learning api blob storage stream analytics editing managing a team software development data.table fonts visual studio zoomit gini coefficient logistic regression optiRum statistics latex test coverage travis-ci agile azure data factory auto deploying r documentation documentation github spacious_page_layout dataops dlm wdt wit mango Community satrday enclosure anchor model data modelling medrianchor sixth normal form r consortium dell icons resolution scaling text xps13 surviving business intelligence microsoft edge pdf linux from windows pageant plink putty ssh ssh tunnel powerbi sql tricks sqlcardiff haveibeenpwned hibpwned mockaroo shiny design patterns odbc slug azureml chocolatey censornet data breaches powershell feedback pass experts diversity elitism aws azure automation etl failures lightning talks novalite_template sponsor sponsoring community events sponsorship basics asteroids game gamemaker cat_Uncategorized sql relay elections ssl ssdt slack css hugo call for contributors gwdp opportunity bash linux scripts attendee mvp mvp summit azure functions data science data mining process time series python sentiment analysis cardiff logistic regressions video magick rocker rstudio training get started coveralls troubleshooting cran datasauRus linear regression bot services bots qnamaker skype microsoft r server temporal tables executive briefing microsoft cognitive services purrr image fundamentals data manipulation data wrangling tidyverse blogdown Locke-Data freetds shinyapps base false_url
https://itsalocke.com/blog/how-to-maraaverickfy-a-blog-post-without-even-reading-it/ 2018-02-12T11:49:40+00:00 2018-02-12-maraaverickfyer.md content/posts/2018-02-12-maraaverickfyer.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2018-02-12-maraaverickfyer.md How to maraaverickfy a blog post without even reading it maelle post 2018-02-12 11:49:40 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA default_layout NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA how-to-maraaverickfy-a-blog-post-without-even-reading-it https://itsalocke.com/blog/how-to-maraaverickfy-a-blog-post-without-even-reading-it/
https://itsalocke.com/blog/connecting-to-sql-server-on-shinyapps.io/ 2018-01-31T09:29:42+00:00 2018-01-31-freetds.md content/posts/2018-01-31-freetds.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2018-01-31-freetds.md Connecting to SQL Server on shinyapps.io steph post 2018-01-31 09:29:42 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA TRUE TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA default_layout NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA img/WorkingWithR.png NA NA NA NA NA NA TRUE TRUE connecting-to-sql-server-on-shinyapps.io https://itsalocke.com/blog/connecting-to-sql-server-on-shinyapps.io/
https://itsalocke.com/blog/year-2-of-locke-data/ 2018-01-29T00:00:00+00:00 2018-01-29-locke-data-update.md content/posts/2018-01-29-locke-data-update.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2018-01-29-locke-data-update.md Year 2 of Locke Data Steph NA 2018-01-29 00:00:00 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA year-2-of-locke-data https://itsalocke.com/blog/year-2-of-locke-data/
https://itsalocke.com/blog/working-with-pdfs---scraping-the-pass-budget/ 2017-12-29T00:00:00+00:00 2017-12-29-working-with-pdfs-scraping-the-pass-budget.md content/posts/2017-12-29-working-with-pdfs-scraping-the-pass-budget.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2017-12-29-working-with-pdfs-scraping-the-pass-budget.md Working with PDFs - scraping the PASS budget Steph NA 2017-12-29 00:00:00 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA working-with-pdfs-scraping-the-pass-budget NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA working-with-pdfs-scraping-the-pass-budget https://itsalocke.com/blog/working-with-pdfs-scraping-the-pass-budget/
https://itsalocke.com/blog/using-blogdown-with-an-existing-hugo-site/ 2017-12-20T00:00:00+00:00 2017-12-20-using-blogdown-with-an-existing-hugo-site.md content/posts/2017-12-20-using-blogdown-with-an-existing-hugo-site.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2017-12-20-using-blogdown-with-an-existing-hugo-site.md Using blogdown with an existing Hugo site steph NA 2017-12-20 00:00:00 NA NA TRUE NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA TRUE NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA using-blogdown-with-an-existing-hugo-site NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE NA NA NA using-blogdown-with-an-existing-hugo-site https://itsalocke.com/blog/using-blogdown-with-an-existing-hugo-site/
https://itsalocke.com/blog/data-manipulation-in-r/ 2017-12-18T21:29:42+00:00 2017-12-18-datamanipulationinr.md content/posts/2017-12-18-datamanipulationinr.md https://raw.githubusercontent.com/lockedatapublished/blog/master/content/posts/2017-12-18-datamanipulationinr.md Data Manipulation in R steph post 2017-12-18 21:29:42 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA default_layout NA NA NA NA NA TRUE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA TRUE TRUE TRUE TRUE NA NA NA NA data-manipulation-in-r https://itsalocke.com/blog/data-manipulation-in-r/

When were blog posts published?

library("ggplot2")
all_info <- dplyr::mutate(all_info, date = anytime::anytime(date.x))
ggplot(all_info) +
  geom_point(aes(date), y = 0.5, col = "#2165B6", size = 0.9) +
  hrbrthemes::theme_ipsum(grid = "Y")

So posting, from this crude viz, look fairly regular with a few gaps.

One could also look at the categories.

all_info <- dplyr::select(all_info, - base, - false_url)
categories_info <- all_info %>%
  tidyr::gather("category", "value", 11:ncol(all_info)) %>%
  dplyr::filter(!is.na(value)) %>%
  dplyr::filter(stringr::str_detect(category, "cat\\_")) %>%
  dplyr::mutate(category = stringr::str_replace(category, "cat\\_", ""))
categories <- categories_info %>%
  dplyr::count(category, sort = TRUE)

knitr::kable(categories)
category n
R 89
Community 61
Microsoft Data Platform 42
Data Science 36
Misc Technology 29
DataOps 25
Uncategorized 3

And when where these categories used?

categories_info <- dplyr::mutate(categories_info, date = anytime::anytime(date.x))
ggplot(categories_info) +
  geom_point(aes(date, category), col = "#2165B6", size = 0.9) +
  hrbrthemes::theme_ipsum(grid = "Y")

In the most recent period, R and Data Science seem to be getting more love than the other categories.

Let’s see what other/more exciting things we can do with this data, to help make Locke Data blog even better and more read!