Solar Farm Site Suitability Modeling Using Forest-based Classification

Introduction

Solar energy plays a crucial role in meeting energy requirements in a sustainable manner. Though solar energy is abundant, finding suitable sites where it can be economically and efficiently harnessed is a major challenge. These challenges arise from geographical settings, solar resource availability, and the infrastructure needed to build, utilize, and efficiently harvest solar energy

For planning suitable sites, users can apply multi‑criteria approaches to identify optimal locations. Methods involving Multi-Criteria Decision Analysis (MCDA) are commonly used. These are empirical algorithms, but with the advent of machine learning, MCDA‑type (binary) problems can also be solved using forest‑based classification algorithms.

Forest Based Classification

A random forest is a meta‑estimator that generates a best‑fit model by training a large number of decision tree classifiers on different sub‑samples of the dataset. It uses averaging and voting mechanisms to improve generalization and prediction accuracy. Every tree produces its own prediction, and these predictions are then aggregated to arrive at the final output. Thus, the final prediction is not determined by any single tree but by the combined knowledge of the entire forest.

Forest Based Classification Workflow in ArcGIS Pro

Forest Based Classification in ArcGIS Pro requires three major components:

1. Predictor Variables (Raster features used by the ML model)

2. Label Data (Positive & negative training samples i.e. combination of existing location where solar farms currently exist and where they do not)

3. Mask Raster (Optional: areas to exclude)

Inputs to Forest Based Classification

Predictor Variables

These rasters represent environmental, infrastructural, and topographical variables that influence solar site selection. The Forest-based Classification model uses these layers as independent variables. The criteria include distance from roads, distance from power lines, distance from settlements, Digital Elevation Model (DEM), aspect, slope, Global Horizontal Irradiation (GHI), and triple‑crop areas. Exclusion areas such as water bodies, wetlands, and forests are treated as binary no‑go zones coded as 0 or 1. Details of the criteria and their descriptions are provided in Table 1.

Criteria	Input Type	Description
Distance from major roads (state, district, national)	Distance Raster	Indicates accessibility & transport feasibility
Distance from powerlines	Distance Raster	Proximity to grid reduces interconnection cost
Distance from settlements	Distance Raster	Avoids conflicts with residential areas
DEM	Raster	Base elevation used for site engineering
Slope	Raster	Critical for panel installation and drainage
Aspect	Derived rasters: Northness (cos) and Eastness (sin)	Directional influence on solar irradiance
Global Horizontal Irradiation (GHI)	Continuous Raster	Key predictor of solar potential
Land Use / Triple Crop Area	Binary Raster (0/1)	Avoidance of high-value agricultural land
Water bodies, wetlands, forests	Binary Raster	Used in creating negative samples & masks

Table 1: Criteria and description

Figure 1 shows the raster inputs used as criteria for the model. It should be noted that for better results, all predictors must share the same projection, cell size, and alignment.

Figure 1: Variables for Site suitability

Label Data (Target Variable)

Forest-based Classification requires a classification label. Positive samples are point features derived from existing solar farm polygons or their centroids. Negative samples are generated from settlements, forest areas, water bodies, wetlands, triple‑crop agricultural land, a 500‑meter coastline buffer, and other non‑permissible (no‑go) zones. Table 2 shows the label data, meaning and related description. Image A shows the locations of the solar farms, and Image B shows the corresponding positive and negative samples. These positive and negative samples are then used as training data for the model to learn.

Label	Meaning	Description
1	Suitable	Existing solar farms (positive samples)
0	Unsuitable	Random points generated from No-Go zones

Table 2: Label Data

Forest Based Classification in ArcGIS Pro

ArcGIS Pro provides easy to use tools to work with Forest Based Classification. These are part of both Spatial Statistics Toolbox and GeoAI Toolbox. In the current blog Spatial Statistics Toolbox has been used.

Run the tool “Forest-based and Boosted Classification and Regression” tool by selecting model type as Forest-based and selecting required predictor variables. Figure 3 shows tools with parameter including training samples, predictor variables and other necessary inputs to run the tool.

The tool generates a set of outputs that can be further utilized for analysis. These include a trained Esri model file (.eml), which stores the final classifier so it can be applied to new areas or updated datasets without retraining.

Figure 3: Forest-based classification tool

The Solar Suitability Raster (Figure 5) provides a binary surface with 0 (unsuitable) and 1 (highly suitable), helping users visualize where conditions are most favorable. To explain the factors driving the predictions, the Feature Importance Report (Figure 6) ranks the variables that contribute most to the model, enabling transparent communication of assumptions and supporting regulatory reviews.

Figure 4: Output as Suitable(green) and not suitable (reddish pink)

To validate model performance, the Accuracy Metrics include a confusion matrix, precision–recall values, and cross‑validation results. This provides a clear understanding of the model and its overall accuracy.

We can use the raster output and apply policy‑based filters to identify contiguous areas-typically 50 hectares or more- that meet the minimum size requirement for solar farm development. These suitable zones can then be overlaid with land ownership and relevant land‑use layers to assess ground feasibility. The resulting shortlisted sites can subsequently be validated by field teams using standard field verification techniques.

Shivaprakash, manager on the Presales team, designs India-specific GIS solutions for Esri India customers.

Shivaprakash Yaragal Esri India

How Hyperspectral Imagery Is Unlocking New Possibilities for Indian Mining and Geology

Read this article