A vector illustration of a tech city using latest cloud technologies & infrastructure

TorchGeo of PyTorch

August 17, 2022

Geospatial Datasets-

‍

Geospatial DataSet is designed for datasets that contain geospatial information, like latitude, longitude, coordinate system, and projection.

Canadian Building Footprints

‍

CLASS

‍

torchgeo.datasets.CanadianBuildingFootprints(root='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)

Bases: torchgeo.datasets.VectorDataset

Canadian Building Footprints dataset.

The Canadian Building Footprints dataset contains 11,842,186 computer generated building footprints in all Canadian provinces and territories in GeoJSON format. This data is freely available for download and use.

__init__(root='data', crs=None, res=1e-05, transforms=None, download=False, checksum=False)

‍

Parameters-

‍

root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float) – resolution of the dataset in units of CRS
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

‍

Raises

FileNotFoundError – if no files are found in root
RuntimeError – if download=False and data is not found, or checksum=True and checksums don’t match

‍

Chesapeake Bay High-Resolution Land Cover Project

‍

CLASS

torchgeo.datasets.Chesapeake(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)

‍

Bases: torchgeo.datasets.RasterDataset, abc.ABC

Abstract base class for all Chesapeake datasets.

Chesapeake Bay High-Resolution Land Cover Project dataset.

This dataset was collected by the Chesapeake Conservancy’s Conservation Innovation Center (CIC) in partnership with the University of Vermont and WorldView Solutions, Inc. It consists of one-meter resolution land cover information for the Chesapeake Bay watershed (~100,000 square miles of land).

‍

__init__(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)

‍

Parameters

‍

root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

‍

Raises

‍

FileNotFoundError – if no files are found in root
RuntimeError – if download=False but dataset is missing or checksum fails

CVPR 2019 Chesapeake Land Cover dataset.

The CVPR 2019 Chesapeake Land Cover dataset contains two layers of NAIP aerial imagery, Landsat 8 leaf-on and leaf-off imagery, Chesapeake Bay land cover labels, NLCD land cover labels, and Microsoft building footprint labels.

This dataset was organized to accompany the 2019 CVPR paper, “Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data”.

The paper “Resolving label uncertainty with implicit generative models” added an additional layer of data to this dataset containing a prior over the Chesapeake Bay land cover classes generated from the NLCD land cover labels.

Parameters-

root (str) – root directory where dataset can be found
splits (Sequence[str]) – a list of strings in the format “{state}-{train,val,test}” indicating the subset of data to use, for example “ny-train”
layers (List[str]) – a list containing a subset of “naip-new”, “naip-old”, “lc”, “nlcd”, “landsat-leaf-on”, “landsat-leaf-off”, “buildings”, or “prior_from_cooccurrences_101_31_no_osm_no_buildings” indicating which layers to load
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises

FileNotFoundError – if no files are found in root
RuntimeError – if download=False but dataset is missing or checksum fails
Cropland Data Layer (CDL)

‍

CLASS

torchgeo.datasets.CDL(root='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False)

Bases: torchgeo.datasets.RasterDataset

Cropland Data Layer (CDL) dataset.

The Cropland Data Layer, hosted on CropScape, provides a raster, geo-referenced, crop-specific land cover map for the continental United States. The CDL also includes a crop mask layer and planting frequency layers, as well as boundary, water and road layers. The Boundary Layer options provided are County, Agricultural Statistics Districts (ASD), State, and Region. The data is created annually using moderate resolution satellite imagery and extensive agricultural ground truth.

Parameters

root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises

FileNotFoundError – if no files are found in root
RuntimeError – if download=False but dataset is missing or checksum fails

Landsat-

‍

CLASS

torchgeo.datasets.Landsat(root='data', crs=None, res=None, bands=[], transforms=None, cache=True)

Bases: torchgeo.datasets.RasterDataset, abc.ABC

Abstract base class for all Landsat datasets.

Landsat is a joint NASA/USGS program, providing the longest continuous space-based record of Earth’s land in existence.

Parameters

root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises

FileNotFoundError – if no files are found in root

‍

National Agriculture Imagery Program (NAIP)-

‍

CLASS

torchgeo.datasets.NAIP(root, crs=None, res=None, transforms=None, cache=True)

Bases: torchgeo.datasets.RasterDataset

National Agriculture Imagery Program (NAIP) dataset.

The National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. A primary goal of the NAIP program is to make digital ortho photography available to governmental agencies and the public within a year of acquisition.

NAIP is administered by the USDA’s Farm Service Agency (FSA) through the Aerial Photography Field Office in Salt Lake City. This “leaf-on” imagery is used as a base layer for GIS programs in FSA’s County Service Centers, and is used to maintain the Common Land Unit (CLU) boundaries.

Sentinel-

‍

CLASS

torchgeo.datasets.Sentinel(root, crs=None, res=None, transforms=None, cache=True)

Bases: torchgeo.datasets.RasterDataset

Abstract base class for all Sentinel datasets.

Sentinel is a family of satellites launched by the European Space Agency (ESA) under the Copernicus Programme.

Sentinel-2 dataset.

The Copernicus Sentinel-2 mission comprises a constellation of two polar-orbiting satellites placed in the same sun-synchronous orbit, phased at 180° to each other. It aims at monitoring variability in land surface conditions, and its wide swath width (290 km) and high revisit time (10 days at the equator with one satellite, and 5 days with 2 satellites under cloud-free conditions which results in 2-3 days at mid-latitudes) will support monitoring of Earth’s surface changes.

Parameters

root (str) – root directory where dataset can be found
crs (Optional[rasterio.crs.CRS]) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (Optional[float]) – resolution of the dataset in units of CRS (defaults to the resolution of the first file found)
bands (Sequence[str]) – bands to return (defaults to all bands)
transforms (Optional[Callable[[Dict[str, Any]], Dict[str, Any]]]) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling

Raises

FileNotFoundError – if no files are found in root.

Non-geospatial Datasets

‍

Vision DataSet is designed for datasets that lack geospatial information. These datasets can still be combined using concat dataset.

ADVANCE (AuDio Visual Aerial sceNe reCognition datasEt)-

‍

CLASS

torchgeo.datasets.ADVANCE(root='data', transforms=None, download=False, checksum=False)

Bases: torchgeo.datasets.VisionDataset

ADVANCE dataset.

The ADVANCE dataset is a dataset for audio visual scene recognition.

Dataset features:

5,075 pairs of geotagged audio recordings and images
three spectral bands - RGB (512x512 px)
10-second audio recordings

Dataset format:

images are three-channel jpgs
audio files are in wav format

Parameters

root (str) – root directory where dataset can be found
transforms (Optional[Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]) – a function/transform that takes input sample and its target as entry and returns a transformed version
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises

RuntimeError – if download=False and data is not found, or checksums don’t match

Smallholder Cashew Plantations in Benin-

‍

CLASS

torchgeo.datasets.BeninSmallHolderCashews(root='data', chip_size=256, stride=128, bands=('B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'CLD'), transforms=None, download=False, api_key=None, checksum=False, verbose=False)

Bases: torchgeo.datasets.VisionDataset

Smallholder Cashew Plantations in Benin dataset.

This dataset contains labels for cashew plantations in a 120 km2 area in the center of Benin. Each pixel is classified for Well-managed plantation, Poorly-managed plantation, No plantation and other classes. The labels are generated using a combination of ground da

Sign up for Free Trial

Latest Blogs

TorchGeo of PyTorch

Geospatial Datasets-

Non-geospatial Datasets

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

No-Code Deployment of Fine-Tuned Models on TIR Foundation Studio: BYOM Made Easy

Building Production Ready Visual Query Systems: Llama 3.2 Vision on TIR

Exploring TIR GenAI APIs: Quickstart Guide with Llama 3 Chatbot

GPU Clusters: What It Is, Key Components, and Why They Matter

9 Cloud Computing Trends Shaping India’s Digital Future in 2025

LoRA fine-tune Gemma 7B Using TIR with 10 Easy Steps

How Does RAG Improve the Accuracy of LLM Responses?

Top 10 Cloud GPU Providers in 2025