Octocat This notebook is part of a GitHub repository: https://github.com/pessini/SFI-Grants-and-Awards
MIT Licensed
Author: Leandro Pessini

SFI - Awards Distribution Analysis

An overview of science funding distribution in Ireland

1- Introduction

This report will provide an overview of awards applied by Science Foundation Ireland (SFI) which is the national foundation for investment in scientific and engineering research. The data provided covers a period of time from 2000 to 2019.

The Agreed Programme for Government, published June 2002, provided for establishing SFI as a separate legal entity. In July 2003, SFI was established on a statutory basis under the Industrial Development (Science Foundation Ireland) Act, 2003.

SFI provides awards to support scientists and engineers working in the fields of science and engineering that underpin biotechnology, information and communications technology and sustainable energy and energy-efficient technologies.

The focus of this report is on research funding and geographical distribution of grant awardees.

Audience

A core principle of data analysis is understanding your audience before designing your visualization. It is important to match your visualization to your viewer’s information needs.

To provide better context, this report will be based on a hypothetical presentation to SFI Board Members. The Board has several responsibilities which include the revision of strategies of the Agency and major plans of action. One of their major functions is to establish the Agency's direction and how the resources are allocated.

The dataset used is Science Foundation Ireland Grant Commitments and it details all STEM (science, technology, engineering and maths) research projects funded by Science Foundation Ireland (SFI) since its foundation in 2000. For more information, check out the Data Dictionary available.

Dataset provided by Ireland's Open Data Portal which helds public data from Irish Public Sectors such as Agriculture, Economy, Housing, Transportation etc.

Libraries

In [2]:
# Change the default plots size 
options(repr.plot.width=15, repr.plot.height=10)
options(warn=-1)
# Suppress summarise info
options(dplyr.summarise.inform = FALSE)
options(dplyr = FALSE)
In [20]:
# Check if the packages that we need are installed
want = c("dplyr", "ggplot2", "ggthemes", "gghighlight", 
         "grid", "foreign", "scales", "ggpubr", "forcats", 
         "stringr", "lubridate")
have = want %in% rownames(installed.packages())
# Install the packages that we miss
if ( any(!have) ) { install.packages( want[!have] ) }
# Load the packages
junk <- lapply(want, library, character.only = T)
# Remove the objects we created
rm(have, want, junk)
In [4]:
# Importing dataset
sfi.grants <- read.csv('../data/Open-Data-Final.csv')

# Checking dataset's structure
head(sfi.grants)
A data.frame: 6 × 13
Proposal.IDProgramme.NameSub.programme.NameSupplementLead.ApplicantORCIDResearch.BodyFunder.NameCrossref.Funder.Registry.IDProposal.TitleStart.DateEnd.DateRevised.Total.Commitment
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><dbl>
100/PI.1/B038SFI Principal Investigator ProgrammeSeamus Martin https://orcid.org/0000-0002-8539-3143Trinity College Dublin (TCD) Science Foundation Ireland10.13039/501100001602Establishing functional proteomic maps of proteins involved in apoptosis 01/10/200104/03/20075471668
200/PI.1/B045SFI Principal Investigator ProgrammeKingston Millshttps://orcid.org/0000-0003-3646-8222Trinity College Dublin (TCD) Science Foundation Ireland10.13039/501100001602Pathogen-derived immunomodulatory molecules: future immunotherapeutics and vaccines01/10/200104/12/20068069352
300/PI.1/B052SFI Principal Investigator ProgrammeKenneth Wolfe https://orcid.org/0000-0003-4992-4979Trinity College Dublin (TCD) Science Foundation Ireland10.13039/501100001602Functional Organisation of Eukaryotic Genomes 01/10/200104/12/20065632098
400/PI.1/C017SFI Principal Investigator ProgrammeJohn Lewis Technological University Dublin (TU Dublin)Science Foundation Ireland10.13039/501100001602Measurement-Based Resource Management in Communication Networks. 01/12/200105/11/20062993355
500/PI.1/C028SFI Principal Investigator ProgrammeJohn Pethica Trinity College Dublin (TCD) Science Foundation Ireland10.13039/501100001602Nanostructures and Molecule Mechanics. 01/12/200104/05/20077567189
600/PI.1/C042SFI Principal Investigator ProgrammeIgor Shvets https://orcid.org/0000-0001-7451-5435Trinity College Dublin (TCD) Science Foundation Ireland10.13039/501100001602Studies of Surfaces and Interfaces of Magnetic Spinel Oxides. 01/12/200105/11/20067785757

2- Data Cleaning & Wrangling

In [5]:
# Cleaning data
sfi.new.grants <- sfi.grants

# Because we are going to analyze new grants from the SFI...
# There are a few grants from funding partners and a few splitting of awards where an award
# was transferred from one institution to another.
# Removing those
sfi.new.grants <- sfi.new.grants[-grep("(N)", sfi.grants$Proposal.ID, fixed = T),]
sfi.new.grants <- sfi.new.grants[-grep("(X)", sfi.grants$Proposal.ID, fixed = T),]
sfi.new.grants <- sfi.new.grants[-grep("(T)", sfi.grants$Proposal.ID, fixed = T),]
In [6]:
# Total of rows from Partners Funds and Splittings - 68
nrow(sfi.grants) - nrow(sfi.new.grants)
68
In [7]:
# There are 11 "negatives" or 0 grants which will interfere in the analysis
# Decided to remove those (only one will be transform to absolute value)
negatives <- sfi.new.grants %>% filter(Revised.Total.Commitment < 10)
In [8]:
# Remove all negative grants
sfi.new.grants$Revised.Total.Commitment <- abs(sfi.new.grants$Revised.Total.Commitment)
sfi.new.grants <- sfi.new.grants[sfi.new.grants$Revised.Total.Commitment > 10,]
In [9]:
sfi.new.grants <- sfi.new.grants %>% mutate(Programme.Name.Clean = str_replace(Programme.Name, "SFI ", ""),
                                           Programme.Name.Clean = str_replace(Programme.Name.Clean, " Programme", ""))
In [10]:
sfi.new.grants$Date <- as.Date(sfi.new.grants$Start.Date, format = "%d/%m/%Y")
sfi.new.grants$Date <- format(sfi.new.grants$Date, "%Y-%m-%d")
In [11]:
allMissing <- is.na(sfi.new.grants)
#get a count for each column
counts <- colSums(allMissing)
counts
Proposal.ID
0
Programme.Name
0
Sub.programme.Name
0
Supplement
0
Lead.Applicant
0
ORCID
0
Research.Body
0
Funder.Name
0
Crossref.Funder.Registry.ID
0
Proposal.Title
0
Start.Date
0
End.Date
0
Revised.Total.Commitment
0
Programme.Name.Clean
0
Date
0
In [12]:
paste0("Number of rows in the dataset: ", nrow(sfi.new.grants))
'Number of rows in the dataset: 5317'

3- Exploratory Data Analysis

In [13]:
sfi.new.grants %>% mutate(year = year(Date)) %>%
    group_by(year) %>% 
    summarise(total = sum(Revised.Total.Commitment),
        n= n(), average = sum(total)/sum(n)) %>%
    ggplot( aes(x = year, y= total) ) + 
        geom_line( color="#69b3a2", size = 1) +
        geom_point( color="#69b3a2",size=3) +
        labs(x = "Year", y = "",
             title = "Total amount awarded throughout the years",
            subtitle = "Amount awarded (€)") +
        scale_y_continuous(labels = scales::label_number_si(accuracy=0.1)) +
        scale_x_continuous(breaks = seq(from = 2000, to = 2019, by = 1)) +
        scale_fill_brewer(palette='Dark2') +
        theme_minimal() +
        theme(axis.text.x = element_text(face="bold", color="#636363", size=12),
              axis.text.y = element_text(face="bold", color="#636363", size=12),
              plot.title=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=20),
              plot.subtitle=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=15))

Most awarded Institutes by amount €

In [14]:
# Grouping by Institutes and creating the sum, mean and total number of grants
by_institute <- sfi.new.grants %>% 
  group_by(Research.Body) %>% 
  summarise(total = sum(Revised.Total.Commitment),
            mean = mean(Revised.Total.Commitment),
            n= n()) %>% ungroup()
In [15]:
# Plot the Top 10 Institutes which received the highest grants amount
top10.institutes.value <- by_institute %>% arrange(desc(total)) %>% head(10)

top10.institutes.value %>% 
ggplot( aes(x =reorder(as.factor(Research.Body), total), 
              y= total, fill="") ) + 
    geom_bar(inherit.aes = TRUE, lineend = 'round',
             stat = "identity", width = .5, alpha=.9) +
    scale_y_continuous(labels = scales::label_number_si(accuracy=0.1)) +
    annotate("segment", x = 8, xend = 8, y = 437562222, yend = 749722478,
               arrow = arrow(ends = "both", angle = 90, length = unit(.5,"cm"))) +
    annotate("curve", curvature = -.3, x = 9.7, xend = 8.1, y = 6e+08, yend = 6e+08,
               colour = "#636363", size = 2, arrow = arrow()) +
    annotate("text", x = 7, y = 6e+08, family = "", fontface = 3, size=6,
               label = "Huge gap between \n the most awarded institutions\n Dublin x Outside-Dublin") +
    scale_fill_brewer(palette='Dark2') +
    labs(x = "", y = "",
         title = "Top 10 Granted Institutes - Amount awarded (€)",
        subtitle = "The grants values are in Millions") +
    theme_minimal() + coord_flip() +
    theme(legend.position = "none",
          axis.text.x = element_text(face="bold", color="#636363", size=12),
          axis.text.y = element_text(face="bold", color="#636363", size=12),
          plot.title=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=15))

From the chart we can see that Trinity College Dublin was awarded almost the double amount comparing to the first top College outside Dublin (University College Cork).

Most awarded Institutes by number of grants

In [16]:
top10.institutes.total <- by_institute %>% arrange(desc(total)) %>% top_n(10)

top10.institutes.total %>% 
ggplot( aes(x =reorder(as.factor(Research.Body), n), 
              y= n) ) + 
    geom_bar(lineend = 'round',
             stat = "identity", width = .5, alpha=.9, fill="steelblue") +
    annotate("segment", x = 8, xend = 8, y = 601, yend = 1150,
               arrow = arrow(ends = "both", angle = 90, length = unit(.5,"cm"))) +
    annotate("curve", curvature = -.3, x = 9.7, xend = 8.1, y = 1100, yend = 1100,
               colour = "#636363", size = 2, arrow = arrow()) +
    annotate("text", x = 7, y = 900, family = "", fontface = 3, size=6,
               label = "Almost 2x gap between \n the most awarded institutions\n Dublin vs Outside-Dublin") +
    scale_fill_brewer(palette='Set2') +
    scale_y_continuous(name="Number of Grants") +
    labs(x = "", y = "",
         title = "Top 10 Granted Institutes - Number of Grants",
        subtitle = "") +
    theme_minimal() + coord_flip() +
    theme(legend.position = "none",
          axis.text.x = element_text(face="bold", color="#636363", size=12),
          axis.text.y = element_text(face="bold", color="#636363", size=12),
          plot.title=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=15))
Selecting by n

Comparing number of Grants awarded, Trinity College is again the winner. Has received almost double of grants in numbers comparing to University College Cork, which is the first top outside-Dublin institution.

Grants distribution by Programmes

In [17]:
# Grouping by Programmes and creating the sum, mean and total number of grants

by_programme <- sfi.new.grants %>% 
  group_by(Programme.Name.Clean) %>% 
  summarise(total = sum(Revised.Total.Commitment),
            mean = mean(Revised.Total.Commitment),
            n= n()) %>% ungroup()
In [18]:
top10.programmes.value <- by_programme %>% arrange(desc(total)) %>% head(10)

top10.programmes.value %>% 
ggplot( aes(x =reorder(as.factor(Programme.Name.Clean), total), 
              y= total, fill="") ) + 
    geom_bar(lineend = 'round',
             stat = "identity", width = .5, 
             alpha= ifelse(top10.programmes.value$Programme.Name.Clean == "Principal Investigator" | top10.programmes.value$Programme.Name.Clean == "Research Centres", 
                           .9, .4)) +
    scale_fill_brewer(palette='Dark2') +
    scale_y_continuous(labels = scales::label_number_si(accuracy=0.1)) +
    annotate("text", x = 8, y = 4.7e+08, family = "", fontface = 3, size=6,
               label = "Average amount awarded \n Principal Investigator = 1.1M \n Research Centres = 7.5M") +
    labs(x = "", y = "",
         title = "Leading programmes are Principal Investigator and Research Centres",
        subtitle = "Total amount awarded (€)") +
    theme_minimal() + coord_flip() +
    theme(legend.position = "none",
          axis.text.x = element_text(face="bold", color="#636363", size=12),
          axis.text.y = element_text(face="bold", color="#636363", size=12),
          plot.title=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=20),
          plot.subtitle=element_text(vjust=1.5, family='', face='bold', colour='#636363', size=15))