One weiRd tip Cut the time it takes to analyse data

R Has No Primitives

Some weeks ago Hadley tweeted this graphic about objects and names in R. Someone asked him to give a situation where this was important and he said:

I haven’t been able to figure that out. But you’ll make terrible predictions about performance unless you know

Read more →

Penalised spline regression

Sometimes you don’t know the functional form of a regression relationship. In such an instance, the use of a penalised spline regression can help you model it without having a ridiculously wiggly smooth function.

Read more →

Subsetting Dataframes by Column Name with Regular Expressions


Selecting columns of a dataframe with regular expressions.

Code Snippet/Console Buffer Yank

Lets make a test set of data. Column names that follow some sort of system will make this example easier to understand.

> CN.df <- expand.grid(LETTERS,
> head(CN.df)
  Var1 Var2
1    A  Jan
2    B  Jan
3    C  Jan
4    D  Jan
5    E  Jan
6    F  Jan
> tail(CN.df)
    Var1 Var2
307    U  Dec
308    V  Dec
309    W  Dec
310    X  Dec
311    Y  Dec
312    Z  Dec
> CN.df$CN <- paste(CN.df$Var1, CN.df$Var2, sep = '_')
Read more →

What the shist()?

More than cheap wordplay

I love hist(). It is both a go to plot for data exploration and a really simple way to dazzle users of Microsoft Excel. base::hist() is fast, both to type and in execution, but its downfall is you end up using it many times in a row while you fumble for the right bin width. All that fumbling can kill the magic.

Enter shist() the shifting-histogram… or something… it sounded cool. shist() is a histogram I built from Hadley’s ggvis that lets you interactively select the bin width while it updates the frequencies in real time. This means you only need to plot at most twice: One for shape, two for pretty.

Read more →

Plotting reference maps from shapefiles using ggplot2

This is about plotting reference maps from shapefiles using ggplot2. But it’s not just about plotting reference maps per se; it’s about plotting the reference map over some sort of raster or other data layer, like you would in a GIS application.

I will show you the ggplot2 approach and how it avoids the problems inherent in other approaches.

You need these packages: rgdal, sp, ggplot2

library(rgdal)    # to read in the shapefile
library(sp)       # for Spatial* classes and coordinate projections`
library(ggplot2)  # for visuallising the data`

To do what I have done with my data you will also need: gstat, dplyr

library(gstat)     # to support geostatistical stuff
library(dplyr)     # for aggregation of data
Read more →