Principal Component Analysis

By- Rounak Choudhary

Principal Component Analysis (PCA) is a statistical method used to simplify complex data by reducing the number of variables while preserving as much important information as possible. It works by identifying new variables, called principal components, which are combinations of the original traits and capture the most variation in the dataset. These components are independent of each other and are ordered so that the first few explain the majority of the variation. For example, if you're studying raptors and have data on several traits like wing length, body weight, and beak size, PCA can reduce this to just two or three components that summarize most of the differences between species. This makes it easier to visualize patterns, detect clusters or groups, and understand relationships in the data, all while minimizing redundancy. PCA is especially useful when working with large, multidimensional ecological or biological datasets.

Pre Requirements: PCA requires numeric data.

CSV 1 named as 'raptor1'

If your dataset includes categorical variables (like habitat type, diet, or color), you have two options:

Exclude them from the PCA (use only numeric traits).
Convert them to numeric form using techniques like:
One-hot encoding (creates binary columns like "Forest = 1, Grassland = 0")
Gower distance + PCA alternatives like PCoA (Principal Coordinates Analysis)

# Load the CSV file containing functional traits of raptors

# Set stringsAsFactors = FALSE to avoid automatic factor conversion

# Set row.names = 1 to use the first column as species names

PCA <- read.csv("C:/Users/Rounak Choudhary/Desktop/Raptor Trial mFD/raptor1.csv",

header=TRUE, sep=",", stringsAsFactors=FALSE, row.names = 1)

# Display the first 6 rows of the data to verify the structure

head(PCA)

# Load required libraries (uncomment install.packages() lines if not installed)

# install.packages("ade4") # For PCA and other multivariate methods

library(ade4)

# install.packages("vegan") # Tools for ecological analysis

library(vegan)

# install.packages("factoextra") # For visualizing PCA and clustering

library(factoextra)

# install.packages("FD") # For functional diversity metrics

library(FD)

# install.packages("tidyverse") # For data manipulation and visualization

library(tidyverse)

# install.packages("NbClust") # For determining optimal number of clusters

library(NbClust)

# install.packages("gtools") # General R programming utilities

library(gtools)

# Perform Principal Component Analysis (PCA) on the first 10 columns (traits) of the first 43 species

# scannf = FALSE disables interactive scree plot

# nf = 2 means retain the first two principal components

mutlispca <- dudi.pca(PCA[1:43, c(1:10)], scannf = FALSE, nf = 2)

# Extract the PCA scores for each species (individuals) from the 'li' component

pca_scores <- mutlispca$li

# Set a random seed to make k-means clustering results reproducible

set.seed(123)

# Perform k-means clustering on the PCA scores with 3 clusters

clusters <- kmeans(pca_scores, centers = 3)$cluster

# Convert the numeric cluster labels to a factor for plotting

clusters <- as.factor(clusters)

# Create a PCA biplot with individuals colored by their cluster

# repel = TRUE prevents overlapping text labels

# col.var = "black" colors variable arrows black

# col.ind = clusters colors individuals by their cluster

# palette = "Dark2" specifies the color palette

# labelsize = 3 sets the size of the text labels

# addEllipses = TRUE draws ellipses around each cluster

# theme() adjusts the font size of the entire plot

fviz_pca_biplot(mutlispca, repel = TRUE,

col.var = "black",

col.ind = clusters,

palette = "Dark2",

labelsize = 3,

addEllipses = TRUE) +

theme(text = element_text(size = 15))

# Note:

# If you encounter errors in dudi.pca, try reinstalling the 'ade4' and 'factoextra' packages

# To change the input size, modify:

# - 'c(1:10)' to select different columns (traits)

# - '1:43' to include a different number of species

Page updated

Google Sites

Report abuse

Principal Component Analysis

Get involved: