The "Sample" column in Maxent's interface is for uploading species occurrence data, typically a CSV file with coordinates where the species has been observed. The "Environmental Layers" column is for uploading spatial layers like climate or habitat data in grid format (e.g., ASCII files) that provide environmental information across the study area. Maxent uses this input to model and predict habitat suitability.
The three checkboxes at the bottom-right of the Maxent interface serve the following purposes:
Create response curves: This option generates plots showing the relationship between environmental variables and predicted suitability.
Make pictures of predictions: Enables graphical outputs of habitat suitability predictions.
Do jackknife to measure variable importance: Performs jackknife analysis to assess the contribution of each environmental variable to the model.
Jackknife analysis is a statistical method used to assess the importance of variables in a model. In Maxent, it works by creating models with and without each environmental variable, one at a time, and comparing the predictive power (e.g., using AUC). This process evaluates the contribution of each variable to the model's accuracy and helps identify which variables are most or least influential in predicting habitat suitability.
The Output format drop-down in Maxent specifies how the model outputs suitability predictions:
Cloglog: Scales predictions to represent the relative likelihood of species presence across the landscape.
Logistic: Outputs values between 0 and 1, interpreted as suitability or probability of presence.
Cumulative: Provides cumulative probabilities, representing areas of higher suitability first.
Raw: Gives unscaled values reflecting the underlying exponential model output.
For most use cases, Logistic or Cloglog are recommended for their clarity and practicality.
Logistic and Cloglog are often preferred because they provide scaled outputs (e.g., 0–1 for Logistic), making it easy to interpret suitability as a probability of presence. They offer clearer insights into ecological relevance, where values represent relative likelihoods or probabilities of species presence, thus making them practical and straightforward for conservation, planning, or species habitat assessments. These formats are widely used due to their biological interpretability.
The Output file type options in Maxent specify the format for saving prediction outputs:
ASC: ASCII grid, a plain-text format for spatial data, widely used in GIS software.
MXE: Maxent's proprietary format, useful for Maxent-specific analyses.
GRD: Grid format, supported by several GIS tools for spatial analysis.
BIL: Band interleaved by line, typically used for remote sensing or raster data.
The Projection layers directory/file option is for providing files representing future or alternative environmental conditions (such as different climate scenarios). These layers are used to project your current model (species suitability) into different environmental conditions. You need to browse and select the appropriate projection layers in raster format (like ASCII or GeoTIFF). This allows Maxent to predict habitat suitability in a future or alternate context.
The feature checkboxes in Maxent control how environmental variables are modeled in the relationship with species distribution. Here's a breakdown:
· Linear: Assumes a straight-line relationship.
· Quadratic: Accounts for curved relationships.
· Product: Models interactions between variables.
· Threshold: Catches sharp environmental changes.
· Hinge: Handles non-linear relationships with flexibility.
· Auto: Lets Maxent automatically select the best features.
Each type impacts how variables interact and how complex the model becomes, offering more flexibility or simplicity.
For representing species distribution effectively, Auto features is often the best choice, as Maxent automatically selects the most appropriate combination of feature types (like linear, quadratic, hinge, etc.) based on the data provided. This reduces the need for manual feature selection and allows the model to better adapt to the complexities of your data, ensuring an optimal representation of habitat suitability. If you want control over the model's complexity, manually selecting features may help refine it.
explanation of the Basic settings in Maxent:
· Random seed: Ensures model results are reproducible by fixing random variations.
· Give visual warnings: Displays alerts about potential problems or unusual behavior in your model.
· Show tooltips: Offers brief explanations of options as you hover over them, helping you understand the interface.
· Ask before overwriting: Prompts a confirmation if you're trying to overwrite an existing file.
· Skip if output exists: Prevents overwriting by automatically skipping existing output files.
· Remove duplicate presence records: Filters out repeated records of species observations, ensuring only unique presence points are used.
· Write clamp grid when projecting: Saves clamped grids, which represent areas where environmental conditions are beyond those seen in training data.
· Do MESS analysis when projecting: Conducts a Multivariate Environmental Similarity Surface analysis to assess environmental similarity during model projection.
The "Random test percentage" setting determines the proportion of your occurrence records that will be used for testing rather than training the model. Typically, values between 10–30% are used, depending on the dataset size. Smaller values might overfit to training data, while larger values provide a better test but reduce training data size.
The Regularization multiplier controls how strictly Maxent limits the complexity of the model. A value of 1 is neutral, meaning no extra regularization is applied. Increasing this value simplifies the model by preventing it from learning too many fine details that could lead to overfitting. Lower values allow the model to capture more complex patterns in the data. It’s about finding the balance between simplicity and accuracy.
The Max number of background points setting in Maxent defines how many random points from the background area (areas where the species is not observed) will be sampled and used in the model. These points are compared to the presence data to help the model learn what conditions are less suitable for the species. The higher the number, the more background points the model uses, which can improve accuracy but increase computation time. The prefilled value of 10,000 for Max number of background points in Maxent is a common default. It represents the maximum number of random background points sampled from the area outside your species' presence. Higher values help provide better contrasts for the model, improving prediction accuracy. However, this can also increase computation time, so it's important to adjust based on your data size and computational capacity.
The Replicates setting in Maxent specifies how many times the model will run with different random splits of the data (training and testing data) to ensure robustness and reliability of the results. Common values are 1 or 10 replicates. Running multiple replicates helps reduce variability in model predictions, ensuring that the results are not driven by a single random split.
The Replicated run type setting offers three options for how Maxent splits the data for model evaluation:
1. Bootstrap: Randomly samples data with replacement, so some data points may appear more than once.
2. Crossvalidate: Divides data into subsets and trains the model on different combinations, testing on the remaining subset (cross-validation).
3. Subsample: Randomly splits data into two subsets for training and testing, used in subsampling.
explanation of the Advanced settings in Maxent:
· Add samples to background: Includes observed species points as background for comparison.
· Write plot data: Saves data used for creating response curves.
· Extrapolate: Allows predictions outside of training data’s environmental range.
· Do clamping: Prevents predictions for environments outside the training data range.
· Write output grids: Saves the spatial suitability maps.
· Write plots: Saves the graphical plots.
· Append summary results: Adds new results to an existing CSV file.
· Cache ASCII files: Stores files to reduce future model computation time.
The Maximum iteration setting in Maxent specifies the maximum number of iterations the model will run for optimization. The default value of 500 is a standard setting, which typically provides a balance between model accuracy and computation time. If the model reaches this number of iterations without converging, it will stop. You can adjust this value if the model isn't converging or if you're willing to allow more computation time for potentially better accuracy.
An iteration refers to a single pass or cycle in the optimization process. In Maxent, each iteration involves adjusting the model's parameters to improve the fit between the environmental variables and species occurrence data. The process repeats until the model converges (reaches stability) or the maximum number of iterations is reached. Increasing iterations might help refine the model further but can also increase computation time.
The Convergence threshold in Maxent specifies the degree of improvement needed between iterations for the model to stop. It determines how much change must occur in the model's output from one iteration to the next. When the change is smaller than the threshold, the model has effectively "converged," and further iterations aren’t necessary. A smaller threshold results in a more precise model but may take longer to run, while a larger threshold may lead to quicker results but less precision.
Adjust sample radius: This changes the spatial scale of species occurrence points, modifying how they are buffered during analysis.
Log files: Enables or disables the saving of detailed logs of the modeling process, useful for troubleshooting.
Default prevalence: Sets the default assumed proportion of species presence in the background, useful if you're unsure about the actual prevalence in your study area.
OUTPUT
The Analysis of omission/commission plot compares the training omission rate (false negatives or areas incorrectly predicted as unsuitable) and the predicted area (extent of suitable habitat) as a function of the cumulative threshold. Interpreting this: a low omission rate means the model accurately predicts presence, but too many false positives (commission errors) could indicate an overly broad prediction. You want a balance where omission errors are low, but commission errors are minimized for an optimal predictive model.
The Receiver Operating Characteristic (ROC) curve is a graphical representation of model performance, showing the trade-off between sensitivity (true positive rate) and specificity (true negative rate) across thresholds. The AUC (Area Under the Curve) of 0.896 indicates the model's overall discrimination ability, with values closer to 1 signifying better performance. A standard deviation of 0.003 indicates minimal variation across replicate runs, suggesting stable model performance. The specificity here is calculated differently, focusing on predicted area, not true commission.
Response curves display how each environmental variable influences the predicted probability of species presence, adjusting one variable while holding others constant at their average values. They show the average response across replicate runs, with standard deviation indicated by blue shading. Interpreting the curves can be challenging when variables are strongly correlated because the model might rely on these correlations, affecting the results beyond individual variables. Categorical variables are presented differently with two shades of blue for variability.
Variable contribution analysis, the table shows two measures for each environmental variable:
Percent contribution: The relative contribution of each variable to the model’s prediction (e.g., BIO 5 contributes 27.6%).
Permutation importance: The decrease in model performance (AUC) after permuting each variable, which reflects the importance of the variable (e.g., BIO 15 has a high permutation importance of 26.8).
Variables like BIO 5, BIO 15, and BIO 4 have significant contributions. Variables such as BIO 2 and BIO 16 have minimal impacts.
Percent contribution shows how much each variable affects the model’s predictions during training, with higher values indicating greater influence. Permutation importance reveals how much the model's performance (AUC) drops when a variable is randomly shuffled, highlighting its importance. A bigger performance drop means the variable is more critical for accuracy. In short, percent contribution shows direct influence on predictions, while permutation importance shows its impact on model accuracy.
For research focus, Permutation importance may be more relevant. It shows how crucial each variable is in determining the model’s accuracy, which is important for understanding which variables significantly impact prediction performance. This is especially useful if your goal is to identify the most influential factors in your research.
The jackknife test of variable importance shows that BIO 15 is both the most useful environmental variable by itself (highest gain when used in isolation) and the most important when omitted (most significant drop in gain). This suggests that BIO 15 provides unique information not captured by the other variables, and its absence negatively impacts the model. These values represent the average of multiple runs, indicating consistency across replicate models.