Google Trends with R

I spent some time figuring out how to access Google Trends from R. There appears to be a  package for it, RGoogleTrends, as well as a blog post here outlining how to do it. However,  I could not quite get it to work, some parts in the RGoogleTrends packaged seemed to be missing/incomplete or maybe Google changed the interface in the mean time. So I spent some time plugging together various bits and pieces from the above two sources. The critical issue seems to be to get proper authentication with the Google service as Google does not support (allow?) session-less calls.

The basic steps involved are

  1. login to Google authentification page using credentials to create a valid session; this is done using a CurlHandle from the {RCurl} package.
  2. Once login is established, we can call Google Trends using the desired query term q
  3. We parse the output which contains normalized search counts by week

Here is the R code.

UPDATE 2013-11-04: After some feedback and some additional bug-fixing by Philippe Massicotte I’ve updated the code. It should handle cookies a bit better now and work a bit more robust. Thanks Philippe.

############################################
##    Query GoogleTrends from R
##
## by Christoph Riedl, Northeastern University
## Additional help and bug-fixing re cookies by
## Philippe Massicotte Université du Québec à Trois-Rivières (UQTR)
############################################
 
 
# Load required libraries
library(RCurl)		# For getURL() and curl handler / cookie / google login
library(stringr)	# For str_trim() to trip whitespace from strings
 
# Google account settings
username <- "YOUR_NAME@gmail.com"
password <- "YOUR_PASSWORD"
 
# URLs
loginURL 		<- "https://accounts.google.com/accounts/ServiceLogin"
authenticateURL <- "https://accounts.google.com/accounts/ServiceLoginAuth"
trendsURL 		<- "http://www.google.com/trends/TrendsRepport?"
 
 
 
############################################
## This gets the GALX cookie which we need to pass back with the login form
############################################
getGALX <- function(curl) {
	txt = basicTextGatherer()
	curlPerform( url=loginURL, curl=curl, writefunction=txt$update, header=TRUE, ssl.verifypeer=FALSE )
 
	tmp <- txt$value()
 
	val <- grep("Cookie: GALX", strsplit(tmp, "\n")[[1]], val = TRUE)
	strsplit(val, "[:=;]")[[1]][3]
 
	return( strsplit( val, "[:=;]")[[1]][3]) 
}
 
 
############################################
## Function to perform Google login and get cookies ready
############################################
gLogin <- function(username, password) {
	ch <- getCurlHandle()
 
	ans <- (curlSetOpt(curl = ch,
                    ssl.verifypeer = FALSE,
                    useragent = getOption('HTTPUserAgent', "R"),
                    timeout = 60,         
                    followlocation = TRUE,
                    cookiejar = "./cookies",
                    cookiefile = ""))
 
	galx <- getGALX(ch)
	authenticatePage <- postForm(authenticateURL, .params=list(Email=username, Passwd=password, GALX=galx, PersistentCookie="yes", continue="http://www.google.com/trends"), curl=ch)
 
	authenticatePage2 <- getURL("http://www.google.com", curl=ch)
 
	if(getCurlInfo(ch)$response.code == 200) {
    		print("Google login successful!")
	} else {
		print("Google login failed!")
	}
	return(ch)
}
 
 
############################################
## Read data for a query
############################################
ch <- gLogin( username, password )
authenticatePage2 <- getURL("http://www.google.com", curl=ch)
res <- getForm(trendsURL, q="boston", content=1, export=1, graph="all_csv", curl=ch)
res
# Check if quota limit reached
if( grepl( "You have reached your quota limit", res ) ) {
	stop( "Quota limit reached; You should wait a while and try again lateer" )
}
 
# Parse resonse and store in CSV
# We skip ther first 5 rows which contain the Google header; we then read 503 rows up to the current date
x <- try( read.table(text=res, sep=",", col.names=c("Week", "TrendsCount"), skip=32, nrows=513) )

Created by Pretty R at inside-R.org