Filtering Excel Data using R
By- Rounak Choudhary
By- Rounak Choudhary
Suppose you have two CSV files containing different data, one CSV file has a checklist of 200 birds with Species|Common Name|Abundance|, and another CSV file has Species| Functional trails| of 10,000 birds. Now, to get the functional traits filters out for your 200 birds, this R script will help you out.
This R script performs data filtering and comparison between two CSV files: species_list.csv, which contains a list of bird species of interest, and bird_traits.csv, which contains trait data for various bird species. First, both CSV files are read into R as data frames. The script then filters the bird_traits dataset to include only the species that are present in the species_list, storing the result in filtered_traits. To identify species that are in the species_list but missing from the bird_traits data, it creates a separate data frame called missing_species. The script prints out the number of matched and missing species, and finally, it saves the filtered trait data and the list of missing species as new CSV files. This workflow is useful for ensuring that all desired species have corresponding trait data, and for identifying any gaps that may need further attention or data collection.
Pre Requirements:- Two sperate CSV Files, Remember both must have same Heading for data to be filtered, here it will be 'Species'
# Read your CSVs (same as before)
species_list <- read.csv("C:/your_path/species_list.csv", stringsAsFactors = FALSE)
bird_traits <- read.csv("C:/your_path/bird_traits.csv", stringsAsFactors = FALSE)
# Filter trait data
filtered_traits <- bird_traits %>%
filter(Species %in% species_list$Species)
# Get missing species (in species list but not in trait data)
missing_species <- species_list %>%
filter(!Species %in% bird_traits$Species)
# Output counts
cat("Matched species:", nrow(filtered_traits), "\n")
cat("Missing species:", nrow(missing_species), "\n")
# Save missing species to CSV
write.csv(missing_species, "missing_species.csv", row.names = FALSE)
# Save filtered data too (optional)
write.csv(filtered_traits, "filtered_bird_traits.csv", row.names = FALSE)
After this you will have 2 new CSV Files, one have list of birds with traits and another have birds not found (Might be due to spelling error).