Demonstration of DroneWQ functions and processing code

Summary

This notebook demonstrates an example workflow of DroneWQ using multispectral images collected by the MicaSense RedEdge-MX sensor over western Lake Erie on August 17, 2022. The workflow converts raw imagery to remote sensing reflectance (\(\mathrm{R}_{\mathrm{rs}}\)) and applies bio-optical algorithms to estimate chlorophyll a and total suspended matter (TSM) concentrations. The images are georeferenced, producing a final chlorophyll a mosaic of the entire dataset.

The Lake Erie drone dataset is available on Zenodo and is 5.84 GB unzipped. Depending on your computer’s speed, you may want to subset the data before running the full workflow. This notebook is set up to run directly if the Lake_Erie dataset is placed in the examples directory.

Contents

  1. Setup

  2. View metadata

  3. Process raw imagery to remote sensing reflectance

  4. Convert to point samples

  5. Apply bio-optical algorithms

  6. Georeference and mosaic

1. Setup

Import all the libraries needed for this notebook.

[42]:
import os
import dronewq
import glob
import numpy as np
import pandas as pd
import contextily as cx
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from cartopy.crs import Mercator
from rasterio.enums import Resampling

plt.rcParams['mathtext.default'] = 'regular'

Ensure that your imagery is manually organized using the following directory structure (you may name the main_dir directory anything you like, but the remaining folder names should match exactly as shown below):

<main_dir>/
    raw_water_imgs/
    align_img/
    raw_sky_imgs/
    panel/

Once your data are organized in this structure, simply specify the path to your main_dir below. In this example, the main_dir is named Lake_Erie.

This may need modification if your Lake_Erie directory is placed elsewhere.

The output_dir will contain all processed data products after total radiance (\(\mathrm{L}_{\mathrm{t}}\)). lt_imgs and lt_thumbnails will be saved in the Lake_Erie main directory, while all other downstream products will be saved in a user-specificed directory. In this example, it is named hedley_dls_rrs to reflect the processing options used in this demonstration, but you may name it whatever you prefer.

Note that the Lake Erie drone dataset does not include raw_sky_imgs.

[43]:
dronewq.configure(main_dir='../Lake_Erie')
dronewq.configure(output_dir="../Lake_Erie/hedley_dls_rrs")
settings = dronewq.settings

2. View metadata

MicaSense raw images contain metatdata including sensor information, GPS coordianates, and more. We can use the write_metadata_csv() function to extract image metadata and save it as .csv.

Let’s open the first five lines of the .csv to take a look.

[44]:
metadata_path = dronewq.write_metadata_csv(settings.raw_water_dir, settings.output_dir)

img_metadata = pd.read_csv(metadata_path)
img_metadata.head()
Loading ImageSet from: ../Lake_Erie/raw_water_imgs
[44]:
filename dirname DateStamp TimeStamp Latitude LatitudeRef Longitude LongitudeRef Altitude SensorX ... FocalLength Yaw Pitch Roll SolarElevation ImageWidth ImageHeight XResolution YResolution ResolutionUnits
0 capture_1.tif ../Lake_Erie/hedley_dls_rrs/capture_1.tif 2022-08-17 15:37:04 41.828528 N -83.398762 W 265.061 4.8 ... 5.43432 218.376096 25.899967 335.837017 0.895362 1280 960 266.666667 266.666667 mm
1 capture_2.tif ../Lake_Erie/hedley_dls_rrs/capture_2.tif 2022-08-17 15:37:08 41.828670 N -83.398764 W 265.554 4.8 ... 5.43432 221.427592 16.821391 341.184053 0.895530 1280 960 266.666667 266.666667 mm
2 capture_3.tif ../Lake_Erie/hedley_dls_rrs/capture_3.tif 2022-08-17 15:37:10 41.828809 N -83.398760 W 265.968 4.8 ... 5.43432 218.898605 16.695405 343.643092 0.895655 1280 960 266.666667 266.666667 mm
3 capture_4.tif ../Lake_Erie/hedley_dls_rrs/capture_4.tif 2022-08-17 15:37:13 41.828956 N -83.398761 W 266.420 4.8 ... 5.43432 216.716936 18.188710 345.889851 0.895738 1280 960 266.666667 266.666667 mm
4 capture_5.tif ../Lake_Erie/hedley_dls_rrs/capture_5.tif 2022-08-17 15:37:16 41.829081 N -83.398763 W 267.018 4.8 ... 5.43432 214.429136 18.385695 347.772515 0.895863 1280 960 266.666667 266.666667 mm

5 rows × 21 columns

We can plot the altitude, latitude/longitude, and yaw angle metadata of the image captures to get a sense of the flight plan.

[45]:
fig, ax = plt.subplots(1,3, figsize=(10,3), layout='tight')

ax[0].plot(list(range(len(img_metadata))),img_metadata['Altitude'])
ax[0].set_ylabel('Altitude (m)')

ax[1].scatter(img_metadata['Longitude'], img_metadata['Latitude'])
ax[1].set_ylabel('Latitude')
ax[1].set_xlabel('Longitude')

ax[2].plot(list(range(len(img_metadata))),img_metadata['Yaw'])
ax[2].set_ylabel('Yaw')
[45]:
Text(0, 0.5, 'Yaw')
_images/primary_demo_12_1.png

3. Process raw imagery to remote sensing reflectance

Now, let’s get to processing.

The RrsPipeline class is the main processing unit of DroneWQ. It allows you to define which methods are used at each stage of the processing pipeline, and the parameters for each selected method can be customized to suit your workflow.

[46]:
?dronewq.RrsPipeline
Init signature:
dronewq.RrsPipeline(
    lw_method: dronewq.utils.data_types.Base_Compute_Method,
    ed_method: dronewq.utils.data_types.Base_Compute_Method,
    masking_method: dronewq.utils.data_types.Base_Compute_Method | None = None,
    overwrite_lt: bool = False,
    generate_thumbnails: bool = True,
    workers: int = 1,
)
Docstring:
Main pipeline for dronewq.

Parameters
----------
output_folder : Path | str
    Path to the output folder.
lw_method : Base_Compute_Method
    Method used to calculate water leaving radiance.
    If uncertain, you can start with `Mobley_rho()`.
ed_method : Base_Compute_Method
    Method used to calculate downwelling irradiance (Ed).
    If uncertain, you can start with `Dls_ed()`.
masking_method : Base_Compute_Method | None, optional
    Method to mask pixels. Options are
    `ThresholdMasking`, `StdMasking`, or None. Default is None.
overwrite_lt : bool, optional
    Whether to overwrite existing Lt images.
    Should have this set to True if you are running the pipeline for the
    first time. Defaults to False, which saves time if you are rerunning.
generate_thumbnails : bool, optional
    Whether to generate thumbnails. Defaults to True.
workers : int, optional
    Number of parallel image processing instances. Defaults to 1.
Init docstring: Initialize the pipeline.
File:           ~/Developer/DroneWQ/src/dronewq/core/pipeline.py
Type:           type
Subclasses:

For this example, we’re calculating water leaving radiance (\(\mathrm{L}_{\mathrm{w}}\)) using the Hedley approach, calculating downwelling irradiance (\(\mathrm{E}_{\mathrm{d}}\)) using the downwelling light sensor (DLS), and are applying the default masking procedure. We’ll save processed images out to two directories called rrs_imgs and masked_rrs_imgs. Please see the Processing and theory section in the DroneWQ readthedocs for more information on the different processing methods used here.

In summary, this code will process: Raw imagery -> \(\mathrm{L}_{\mathrm{t}}\) -> \(\mathrm{L}_{\mathrm{w}}\) (Hedley method) -> \(\mathrm{R}_{\mathrm{rs}}\) (using \(\mathrm{E}_{\mathrm{d}}\) from DLS) with pixel masking.

Note that we are changing the default nir_threshold for pixel masking from 0.01 to 0.02 since 0.01 was not high enough to mask glint pixels in this dataset. It is important to consider these values since they may change according to your dataset.

If your hardware allows, you can adjust the workers parameter to potentially speed up processing. Keep in mind that the optimal value depends on your system’s hardware capabilities.

[47]:
from dronewq import Hedley, DlsEd, ThresholdMasking
pipeline = dronewq.RrsPipeline(
    lw_method=Hedley(save_images=True),
    ed_method=DlsEd(settings.output_dir),
    masking_method=ThresholdMasking(nir_threshold=0.02),
    workers=4,
)

pipeline.run()
Loading ImageSet from: ../Lake_Erie/raw_water_imgs
Loading ImageSet from: ../Lake_Erie/raw_water_imgs
Processing images: 100%|██████████| 470/470 [02:28<00:00,  3.17it/s]
[48]:
# Setting the paths for our outputs
lw_dir = settings.lw_dir
rrs_dir = settings.rrs_dir
masked_rrs_dir = settings.masked_rrs_dir

To grab these processed images and their metadata you can use the helper functions load_imgs() and load_metadata().

The images can easily be loaded in with load_imgs() function or even just with rasterio’s open() function. The higher level load_imgs() allows you to apply an altitude cutoff and limit the number of files being opened.

In the case of large processing jobs it will likely be more appropriate to loop through the files individually to save memory and not crash the python kernel.

Note that the load_imgs() function returns a generator (one image at a time) instead of a list. This means that you should iterate over the returned object. However, you can also get all of the images by wrapping the returned object with a list() like in the following code.

[49]:
?dronewq.load_imgs
Signature:
dronewq.load_imgs(
    img_dir: str | pathlib._local.Path,
    count=10000,
    start=0,
    altitude_cutoff=0,
    random=False,
)
Docstring:
Load images from a directory as numpy arrays with metadata filtering.

This function reads raster images from a specified directory and returns them
as an iterator of numpy arrays. Images can be filtered and selected based on
metadata criteria including count, starting position, altitude threshold, and
random sampling. The function leverages metadata stored in a CSV file to
efficiently select and load images.

Parameters
----------
img_dir : str | Path
    Directory path containing the images to be loaded. The directory or its
    parent must contain a 'metadata.csv' file with image information.
count : int, optional
    Maximum number of images to load. If count exceeds the number of available
    images after filtering, all available images are loaded. Default is 10000.
start : int, optional
    Index of the first image to load when loading sequentially (random=False).
    Zero-based indexing. Default is 0.
altitude_cutoff : float, optional
    Minimum altitude threshold in meters. Images captured below this altitude
    are excluded. Default is 0 (no altitude filtering).
random : bool, optional
    If True, randomly samples 'count' images from the available set. If False,
    loads images sequentially starting from 'start' index. Default is False.

Yields
------
numpy.ndarray
    3D numpy array for each image with shape (bands, height, width), where
    bands is the number of spectral bands in the raster image.

Notes
-----
The function expects a 'metadata.csv' file containing at least the following columns:
- 'filename': Name of the image file (used as index)
- 'Altitude': Altitude at which the image was captured

Images are loaded using rasterio, which supports various raster formats including
GeoTIFF. The function uses lazy loading via a generator pattern to minimize
memory usage when processing large datasets.

The metadata CSV location is determined by:
- If 'sky' is in img_dir: looks for metadata.csv in img_dir
- Otherwise: looks for metadata.csv in the parent directory of img_dir

Examples
--------
>>> # Load first 10 images from directory
>>> imgs = load_imgs('/path/to/images', count=10)
>>> for img in imgs:
...     print(img.shape)

>>> # Load first 10 images without iterating
>>> imgs = load_imgs('/path/to/images', count=10)
>>> imgs = list(imgs)
File:      ~/Developer/DroneWQ/src/dronewq/utils/images.py
Type:      function
[50]:
?dronewq.load_metadata
Signature:
dronewq.load_metadata(
    img_dir: str | pathlib._local.Path,
    count=10000,
    start=0,
    altitude_cutoff=0,
    random=False,
)
Docstring:
Load and filter image metadata from a CSV file.

This function reads image metadata from a CSV file and returns a filtered
pandas DataFrame based on specified criteria including image count, starting
position, altitude threshold, and random sampling. The metadata is used to
efficiently select images for loading without reading the actual image files.

Parameters
----------
img_dir : str | Path
    Directory path containing the images. The directory or its parent must
    contain a 'metadata.csv' file with image information.
count : int, optional
    Maximum number of images to include in the returned DataFrame. If count
    exceeds the number of available images, all available images are included.
    Default is 10000.
start : int, optional
    Index of the first image to include when selecting sequentially (random=False).
    Zero-based indexing. Default is 0.
altitude_cutoff : float, optional
    Minimum altitude threshold in meters. Images captured below this altitude
    are excluded from the DataFrame. Applied after count/start filtering.
    Default is 0 (no altitude filtering).
random : bool, optional
    If True, randomly samples 'count' images from the available metadata.
    If False, selects images sequentially starting from 'start' index.
    Default is False.

Returns
-------
pandas.DataFrame
    DataFrame containing metadata for selected images, with 'filename' as
    the index. Only includes images meeting the altitude_cutoff criterion.

Notes
-----
The function expects a 'metadata.csv' file with at least the following columns:
- 'filename': Name of the image file (becomes DataFrame index)
- 'Altitude': Altitude at which the image was captured (in meters)

Additional columns may include:
- 'UTC-Time': Timestamp of image capture
- 'Latitude', 'Longitude': Geographic coordinates
- Other sensor-specific metadata

The metadata CSV location is determined by:
- If 'sky' appears in img_dir path: looks for metadata.csv in img_dir
- Otherwise: looks for metadata.csv in the parent directory of img_dir

This logic accommodates directory structures where sky images may have
separate metadata from water/ground images.

The filtering order is:
1. Load full metadata CSV
2. Set filename as index
3. Apply count and start (or random sampling)
4. Apply altitude_cutoff filter

Examples
--------
>>> # Load metadata for first 50 images
>>> df = load_metadata('/path/to/images', count=50)
>>> print(df.columns)

>>> # Load metadata for random sample with altitude filter
>>> df = load_metadata('/path/to/images', count=100, altitude_cutoff=15, random=True)
>>> print(f"Selected {len(df)} images above 15m altitude")

>>> # Load metadata starting from image 20
>>> df = load_metadata('/path/to/images', count=30, start=20)
>>> filenames = df.index.tolist()

>>> # Get all available metadata with altitude filter
>>> df = load_metadata('/path/to/images', altitude_cutoff=10)
>>> print(f"Mean altitude: {df['Altitude'].mean():.2f}m")
File:      ~/Developer/DroneWQ/src/dronewq/utils/images.py
Type:      function

Let’s retrieve the \(\mathrm{R}_{\mathrm{rs}}\) data (5 bands) and visualize the first five images:

[51]:
rrs_imgs_gen = dronewq.load_imgs(img_dir=rrs_dir, start=0, count=5)

img_metadata = dronewq.load_metadata(img_dir=rrs_dir, count=5)

# you can convert the generator to a list or array if needed
rrs_imgs = np.array(list(rrs_imgs_gen))

print(img_metadata.index)

band_names = ['blue', 'green', 'red', 'red edge', 'nir']

for j in range(len(rrs_imgs[0:5])):
    fig, ax = plt.subplots(1,5, figsize=(10,3.5))
    for i in range(5):
        im = ax[i].imshow(rrs_imgs[j,i], cmap='viridis', vmin=0.001, vmax=0.02)
        ax[i].set_xticks([])
        ax[i].set_yticks([])
        ax[i].set_title(band_names[i])
    cbar = fig.colorbar(im, ax=ax[i], fraction=0.046, pad=0.04)
    cbar.set_label(r'$R_{rs}\ (sr^{-1}$)', rotation=270, labelpad=15)
    fig.tight_layout()
    plt.show()
Index(['capture_1.tif', 'capture_2.tif', 'capture_3.tif', 'capture_4.tif',
       'capture_5.tif'],
      dtype='str', name='filename')
_images/primary_demo_23_1.png
_images/primary_demo_23_2.png
_images/primary_demo_23_3.png
_images/primary_demo_23_4.png
_images/primary_demo_23_5.png

We can also visualize the masked \(\mathrm{R}_{\mathrm{rs}}\):

[52]:
masked_rrs_imgs_gen = dronewq.load_imgs(img_dir=masked_rrs_dir, start=0, count=5)

img_metadata = dronewq.load_metadata(img_dir=masked_rrs_dir, count=5)

masked_rrs_imgs = np.array(list(masked_rrs_imgs_gen))

print(img_metadata.index)

for j in range(len(masked_rrs_imgs[0:5])):
    fig, ax = plt.subplots(1,5, figsize=(10,3.5))
    for i in range(5):
        im = ax[i].imshow(masked_rrs_imgs[j,i],cmap='viridis', vmin=0.001, vmax=0.02)
        ax[i].set_xticks([])
        ax[i].set_yticks([])
        ax[i].set_title(band_names[i])
    cbar = fig.colorbar(im, ax=ax[i], fraction=0.046, pad=0.04)
    cbar.set_label(r'$R_{rs}\ (sr^{-1}$)', rotation=270, labelpad=15)
    fig.tight_layout()
    plt.show()
Index(['capture_1.tif', 'capture_2.tif', 'capture_3.tif', 'capture_4.tif',
       'capture_5.tif'],
      dtype='str', name='filename')
_images/primary_demo_25_1.png
_images/primary_demo_25_2.png
_images/primary_demo_25_3.png
_images/primary_demo_25_4.png
_images/primary_demo_25_5.png

Let’s plot the masked \(\mathrm{R}_{\mathrm{rs}}\) spectra of the first 25 images:

[53]:
masked_rrs_imgs_gen = dronewq.load_imgs(img_dir = masked_rrs_dir, count=25)

masked_rrs_imgs = np.array(list(masked_rrs_imgs_gen))

fig, ax = plt.subplots(1,1, figsize=(6,3))

wv = [475, 560, 668, 717, 842]

color_map = plt.get_cmap("viridis")
colors = [color_map(i) for i in np.linspace(0, 1, len(masked_rrs_imgs))]

for i in range(len(masked_rrs_imgs)):
    plt.plot(wv, np.nanmean(masked_rrs_imgs[i,0:5,:,:],axis=(1,2)), marker = 'o', color=colors[i], label="")
    plt.xlabel('Wavelength (nm)')
    plt.ylabel(r'$R_{rs}\ (sr^{-1}$)')
plt.plot(wv, np.nanmean(masked_rrs_imgs[:,0:5,:,:], axis=(0,2,3)),  marker = 'o', color='black', linewidth=5, label='Mean')

plt.legend(frameon=False)
[53]:
<matplotlib.legend.Legend at 0x7fad35002e90>
_images/primary_demo_27_1.png

Let’s compare the \(\mathrm{R}{\mathrm{rs}}\) spectra from our MicaSense data to an \(\mathrm{R}{\mathrm{rs}}\) spectrum retrieved by a satellite at the same location and time. Below is a plot of \(\mathrm{R}{\mathrm{rs}}\) from the Ocean and Land Color Instrument (OLCI) on board the European Sentinel-3A satellite. OLCI measures additional spectral bands, which is why there are more points on the plot. Despite this, the overall spectral shape closely matches the MicaSense-derived \(\mathrm{R}{\mathrm{rs}}\). The troughs at 475 nm and 668 nm correspond to chlorophyll absorption, while the peak at 717 nm is likely due to scattering by phytoplankton particles.

|56d53b86a77846ba9986642fe1d60424|

Let’s plot the \(\mathrm{L}{\mathrm{t}}\), \(\mathrm{L}{\mathrm{w}}\), \(\mathrm{E}{\mathrm{d}}\) spectra as well. First, we need to retrieve all the data.

[54]:
lt_imgs = dronewq.load_imgs(img_dir = settings.lt_dir, count=25)
lw_imgs = dronewq.load_imgs(img_dir = settings.lw_dir, count=25)
csv_path = os.path.join(settings.output_dir, 'dls_ed.csv')
dls_ed = pd.read_csv(csv_path)
rrs_imgs = dronewq.load_imgs(img_dir = rrs_dir, count=25)
masked_rrs_imgs = dronewq.load_imgs(img_dir = masked_rrs_dir, count=25)
[55]:
fig, axs = plt.subplots(2,2, figsize=(10,5))

axs = axs.ravel()

wv = [475, 560, 668, 717, 842]
colors = plt.get_cmap("viridis")(np.linspace(0,1,25))

#lt
total_mean = []
for i, img in enumerate(lt_imgs):
    img_mean = np.nanmean(img[0:5,:,:], axis=(1,2))
    total_mean.append(img_mean)
    axs[0].plot(wv, img_mean,  marker = 'o', color=colors[i], label="")
    axs[0].set_xlabel('Wavelength (nm)')
    axs[0].set_ylabel(r'$L_t\ (mW\ m^2\ sr^{-1}\ nm^{-1}$)')
axs[0].plot(wv, np.mean(total_mean, axis=0),  marker = 'o', color='black', linewidth=5, label='Mean')
axs[0].legend(frameon=False)

#dls ed
dls_colors = plt.get_cmap("viridis")(np.linspace(0,1,len(dls_ed)))

for i in range(len(dls_ed)):
    axs[1].plot(wv, dls_ed.iloc[i,1:6],  marker = 'o', color=dls_colors[i])
    axs[1].set_xlabel('Wavelength (nm)')
    axs[1].set_ylabel(r'$E_d\ (mW\ m^2\ nm^{-1}$)')
axs[1].plot(wv, dls_ed.iloc[:,1:6].mean(axis=0),  marker = 'o', color='black', linewidth=5, label='Mean')

#rrs_imgs
total_mean = []
for i, img in enumerate(rrs_imgs):
    img_mean = np.nanmean(img[0:5,:,:],axis=(1,2))
    total_mean.append(img_mean)
    axs[2].plot(wv, img_mean,  marker = 'o', color=colors[i], label="")
    axs[2].set_xlabel('Wavelength (nm)')
    axs[2].set_ylabel(r'$R_{rs}\ (sr^{-1}$)')
axs[2].plot(wv, np.mean(total_mean, axis=0),  marker = 'o', color='black', linewidth=5, label='Mean')

#rrs_imgs_masked
total_mean = []
for i, img in enumerate(masked_rrs_imgs):
    img_mean = np.nanmean(img[0:5,:,:],axis=(1,2))
    total_mean.append(img_mean)
    axs[3].plot(wv, img_mean,  marker = 'o', color=colors[i], label="")
    axs[3].set_xlabel('Wavelength (nm)')
    axs[3].set_ylabel(r'$R_{rs}\ (sr^{-1}$)')
axs[3].plot(wv, np.mean(total_mean, axis=0),  marker = 'o', color='black', linewidth=5, label='Mean')

fig.tight_layout()
_images/primary_demo_31_0.png

The RrsPipeline class also saves images processed to total radiance (\(\mathrm{L}{\mathrm{t}}\)) as stacked red-green-blue (RGB) .jpg files in a directory called lt_thumbnails. These thumbnails provide a quick way to visually inspect the imagery and see what the water surface looked like during flight. Let’s open a few to take a look:

[56]:
def capture_path_to_int(path : str) -> int:
    return int(os.path.basename(path).split('_')[-1].split('.')[0])

thumbnail_path = os.path.join(settings.main_dir, 'lt_thumbnails')
lt_thumbnails = sorted(glob.glob(thumbnail_path + '/*.jpg'), key = capture_path_to_int)[10:]

fig, axs = plt.subplots(2,3, figsize=(6,4), sharex=True, sharey=True, layout='tight')
axs = axs.ravel()

for i in range(6):
    image = mpimg.imread(lt_thumbnails[i])
    axs[i].imshow(image)
    axs[i].set_title(os.path.basename(lt_thumbnails[i]))

plt.show()
_images/primary_demo_33_0.png

4. Convert to point samples

You may want to average values from each image to work with point-based samples. The code below calculates the median \(\mathrm{R}_{\mathrm{rs}}\) across bands for each image and saves the results to a Pandas DataFrame along with the directory name, latitude, and longitude.

[57]:
masked_rrs_imgs = dronewq.load_imgs(img_dir = masked_rrs_dir)
img_metadata = dronewq.load_metadata(img_dir = masked_rrs_dir)

# Compute per-image median for first 5 bands (shape -> (n_images, 5))
# We take median across spatial dims (H, W)
medians = []
for img in masked_rrs_imgs:
    # Handle the all-NaN case here
    if np.isnan(img[:5, :, :]).all():
        med = np.full(5, np.nan)  # or any default value
    else:
        med = np.nanmedian(img[:5, :, :], axis=(1, 2))
    medians.append(med)

# Build dataframe safely and assign median band values
df = img_metadata[['dirname', 'Latitude', 'Longitude']].copy()
df[['rrs_blue', 'rrs_green', 'rrs_red', 'rrs_rededge', 'rrs_nir']] = medians

# Show result
df.head()
[57]:
dirname Latitude Longitude rrs_blue rrs_green rrs_red rrs_rededge rrs_nir
filename
capture_1.tif ../Lake_Erie/hedley_dls_rrs/capture_1.tif 41.828528 -83.398762 0.007012 0.017829 0.008522 0.012956 0.005471
capture_2.tif ../Lake_Erie/hedley_dls_rrs/capture_2.tif 41.828670 -83.398764 0.007162 0.018282 0.008805 0.013387 0.005055
capture_3.tif ../Lake_Erie/hedley_dls_rrs/capture_3.tif 41.828809 -83.398760 0.007174 0.018256 0.008776 0.013331 0.005070
capture_4.tif ../Lake_Erie/hedley_dls_rrs/capture_4.tif 41.828956 -83.398761 0.007158 0.018260 0.008775 0.013308 0.005082
capture_5.tif ../Lake_Erie/hedley_dls_rrs/capture_5.tif 41.829081 -83.398763 0.007195 0.018344 0.008787 0.013316 0.005089

These values can be easily saved to a csv file:

[58]:
median_rrs_path = settings.output_dir / 'median_rrs.csv'
df.to_csv(median_rrs_path)

5. Apply bio-optical algorithms

We can also apply bio-optical algorithms to estimate chlorophyll a and total suspended matter (TSM) concentrations. For more information on each algorithm, see the Processing and theory section in the DroneWQ readthedocs. Here, we apply the blended NASA chlorophyll a algorithm (Hu et al., 2019).

[59]:
masked_rrs_imgs = dronewq.load_imgs(img_dir = masked_rrs_dir)

chl_hu_ocx_values = []
for img in masked_rrs_imgs:
    result = dronewq.chl_hu_ocx(img)
    chl_hu_ocx_values.append(result)

chl_hu_ocx_values = np.array(chl_hu_ocx_values)
print(chl_hu_ocx_values.shape)
(470, 928, 1227)

Let’s take a quick look at a histogram of all chlorophyll a values:

[60]:
plt.hist(chl_hu_ocx_values.ravel(), range=(0, 30))
print(np.nanmedian(chl_hu_ocx_values))
11.653076
_images/primary_demo_43_1.png
[61]:
# Cleanup Memory
del chl_hu_ocx_values

The median value is approximately 10 mg \(m^{-3}\), which is reasonable for Lake Erie.

Next, we can save new .tif files generated with this algorithm.

[62]:
dronewq.save_wq_imgs(rrs_dir = masked_rrs_dir, wq_algs = ['chl_hu_ocx'])
Processing images: 100%|██████████| 470/470 [00:29<00:00, 15.97it/s]

Let’s visualize a few chlorophyll a images.

[63]:
chl_imgs = dronewq.load_imgs(img_dir = settings.chl_hu_ocx_dir, start=0, count=6)
img_metadata = dronewq.load_metadata(img_dir = settings.chl_hu_ocx_dir, count=6)
print(img_metadata.index)

fig, axs = plt.subplots(2,3, figsize=(6,3), sharex=True, sharey=True, constrained_layout=True)

axs = axs.ravel()

for i, img in enumerate(chl_imgs):
    im = axs[i].imshow(img[0,:,:], cmap='Greens', vmin=0, vmax=20)

cbar_ax = fig.add_axes((0.99, 0.1, 0.02, 0.8))
fig.colorbar(im, cax=cbar_ax, label='Chlorophyll a (mg $m^{-3}$)')

plt.show()
Index(['capture_1.tif', 'capture_2.tif', 'capture_3.tif', 'capture_4.tif',
       'capture_5.tif', 'capture_6.tif'],
      dtype='str', name='filename')
_images/primary_demo_48_1.png

We can also compute and save the average chlorophyll a and TSM concentrations for each image in DataFrame, allowing them to be used as point-based data:

[64]:
df = pd.read_csv(median_rrs_path)
masked_rrs_imgs_hedley = dronewq.load_imgs(img_dir = masked_rrs_dir)

chl_hu_ocx_imgs = []
tsm_nechad_imgs = []
for img in masked_rrs_imgs_hedley:
    # Check if image is completely NaN
    if np.isnan(img[:5, :, :]).all():
        med = np.nan
        chl_hu_ocx_imgs.append(med)
        tsm_nechad_imgs.append(med)
        continue
    chl_median = np.nanmedian(dronewq.chl_hu_ocx(img), axis=(0,1))
    tsm_median = np.nanmedian(dronewq.tsm_nechad(img), axis=(0,1))
    chl_hu_ocx_imgs.append(chl_median)
    tsm_nechad_imgs.append(tsm_median)

chl_hu_ocx_imgs = np.array(chl_hu_ocx_imgs)
tsm_nechad_imgs = np.array(tsm_nechad_imgs)

df['chl_hu_ocx'] = chl_hu_ocx_imgs
df['tsm_nechad'] = tsm_nechad_imgs

del chl_hu_ocx_imgs
del tsm_nechad_imgs

df.head()
[64]:
filename dirname Latitude Longitude rrs_blue rrs_green rrs_red rrs_rededge rrs_nir chl_hu_ocx tsm_nechad
0 capture_1.tif ../Lake_Erie/hedley_dls_rrs/capture_1.tif 41.828528 -83.398762 0.007012 0.017829 0.008522 0.012956 0.005471 11.630777 4.799673
1 capture_2.tif ../Lake_Erie/hedley_dls_rrs/capture_2.tif 41.828670 -83.398764 0.007162 0.018282 0.008805 0.013387 0.005055 11.683230 4.905617
2 capture_3.tif ../Lake_Erie/hedley_dls_rrs/capture_3.tif 41.828809 -83.398760 0.007174 0.018256 0.008776 0.013331 0.005070 11.609665 4.894979
3 capture_4.tif ../Lake_Erie/hedley_dls_rrs/capture_4.tif 41.828956 -83.398761 0.007158 0.018260 0.008775 0.013308 0.005082 11.653489 4.894382
4 capture_5.tif ../Lake_Erie/hedley_dls_rrs/capture_5.tif 41.829081 -83.398763 0.007195 0.018344 0.008787 0.013316 0.005089 11.658862 4.898822

These values can be easily saved to a csv file:

[65]:
# Save as csv
median_rrs_wq_path = settings.output_dir / 'median_rrs_and_wq.csv'
df.to_csv(median_rrs_wq_path)

We can use this DataFrame to create maps of point samples showing the mean chlorophyll a and TSM concentrations:

[82]:
df = pd.read_csv(median_rrs_wq_path)
fig, ax = plt.subplots(1,2, figsize=(8,3), layout='tight')
g1 = ax[0].scatter(df['Latitude'], df['Longitude'], c=df['chl_hu_ocx'], cmap='Greens', vmin=10, vmax=12)
g2 = ax[1].scatter(df['Latitude'], df['Longitude'], c=df['tsm_nechad'], cmap='YlOrRd', vmin=4, vmax=6)

cbar = fig.colorbar(g1, ax=ax[0])
cbar.set_label('Chlorophyll a (mg $m^{-3}$)', rotation=270, labelpad=12)
cbar = fig.colorbar(g2, ax=ax[1])
cbar.set_label('TSM (mg/L)', rotation=270, labelpad=12)

plt.show()
_images/primary_demo_54_0.png

6. Georeference and mosaic chlorophyll a images

We can georeference the images using the sensor’s yaw, pitch, roll, latitude, longitude, and altitude. Note that georeferencing may be inaccurate if the yaw, pitch, and roll data are imprecise. Let’s review the options available in the georeference() function.

[67]:
?dronewq.georeference
Signature:
dronewq.georeference(
    metadata,
    input_dir,
    output_dir,
    lines=None,
    altitude=None,
    yaw=None,
    pitch=0,
    roll=0,
    axis_to_flip=1,
) -> None
Docstring:
Georeference a set of image captures using their metadata and orientation
parameters, producing georeferenced GeoTIFF files.

For each capture, this function builds an affine transformation using:
- focal length,
- sensor size,
- image dimensions,
- GPS coordinates,
- yaw/pitch/roll (from metadata or provided overrides).

The function then applies the transform and saves the output as a GeoTIFF
in `output_dir`. Captures may be restricted to specific flight lines
(as produced by `compute_flight_lines()`).

Parameters
----------
metadata : pandas.DataFrame
    DataFrame containing per-image metadata. Must include:
    FocalLength, SensorX, SensorY, ImageWidth, ImageHeight,
    Latitude, Longitude, Altitude, Pitch, Roll, Yaw, filename.

input_dir : str
    Directory containing the source images.

output_dir : str
    Directory where georeferenced images are written.

lines : list[dict] or None, optional
    Flight line definitions such as those returned by
    `compute_flight_lines()`. If None, all captures are processed.

altitude : float or None, optional
    Override altitude for all captures. If None, metadata altitude is used.

yaw : float or None, optional
    Override yaw for all captures. If None, metadata yaw is used.

pitch : float, optional
    Override pitch angle for all captures. Default is 0.

roll : float, optional
    Override roll angle for all captures. Default is 0.

axis_to_flip : int or None, optional
    Axis along which to flip the image before writing. Default is 1.
    Set to None to disable flipping.

Returns
-------
None
    Writes georeferenced `.tif` files into `output_dir`.
File:      ~/Developer/DroneWQ/src/dronewq/core/georeference.py
Type:      function

There are two options for georeferencing: you can either set fixed parameters or allow the function to use values saved in the image metadata.

In this example, the image metadata lists the altitude of ~260-265 m, but the flight was planned for 87 m. We will therfore use a set altitude of 87 m.

For the sensor yaw angle: if it remains consistent throughout the flight, a single fixed value can be used. If the yaw changes between transects, as in this example, you should provide a separate yaw value for each transect. The georeference() function can accept a lines parameter, which can be created manually or generated automatically using the compute_flight_lines() function. This produces a list of capture indices per transect with the median yaw angle, ignoring images collected during turns. More details are available in the Processing and Theory section of the DroneWQ Read the Docs.

It is recommended to test the georeferencing on a few captures containing shoreline or land features. You can visualize these against a satellite basemap or import them into GIS software to assess accuracy. In this example, we created subsets in new folders (rrs_hedley_subset and lt_thumbnails_subset) with captures 433–450, which include the shoreline, for testing.

Option 1: Using georeference() with the automatic compute_flight_lines() function generates a flight_lines list, where the yaw angle varies from 207° to 25°.

[68]:
rrs_imgs_subset_path = settings.output_dir / 'rrs_imgs_subset'
img_metadata = dronewq.load_metadata(img_dir = rrs_imgs_subset_path, start=432, count=18)
img_metadata = img_metadata.reset_index()

flight_lines = dronewq.compute_flight_lines(img_metadata.Yaw, 87, 0, 0, threshold=100)
flight_lines
[68]:
[{'start': 0,
  'end': 9,
  'yaw': 207.89967631978038,
  'pitch': 0,
  'roll': 0,
  'alt': 87},
 {'start': 10,
  'end': 18,
  'yaw': 25.85588214075483,
  'pitch': 0,
  'roll': 0,
  'alt': 87}]

We can update the metadata to change file extensions from .tif to .jpg and then run georeference() on the lt_thumbnails_subset:

[69]:
img_metadata['filename'] = img_metadata['filename'].apply(lambda x : x.replace('.tif', '.jpg'))

input_dir = settings.main_dir / 'lt_thumbnails_subset'
output_dir = settings.main_dir / 'automatic_georeferenced_lt_thumbnails_subset'
lines = flight_lines

dronewq.georeference(img_metadata, input_dir, output_dir, lines)
100%|██████████| 17/17 [00:00<00:00, 89.78it/s]

When we plot these images in QGIS, it looks like: |3cc51a723b854be8952029b48dd1d912|

You can observe that using the metadata yaw angle for georeferencing does not always produce accurate results.

Option 2: Running georeference() with fixed yaw angles of 90° and 270°.

[70]:
flight_lines = dronewq.compute_flight_lines(img_metadata.Yaw, 87, 0, 0, threshold=100)

even_yaw = 90
odd_yaw = 270
threshold = np.median(img_metadata.Yaw)
for line in flight_lines:
    line.update(yaw = even_yaw if line['yaw'] < threshold else odd_yaw)

flight_lines
[70]:
[{'start': 0, 'end': 9, 'yaw': 270, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 10, 'end': 18, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87}]
[71]:
input_dir = settings.main_dir / 'lt_thumbnails_subset'
output_dir = settings.main_dir / 'manual_georeferenced_lt_thumbnails_subset'
lines = flight_lines

dronewq.georeference(img_metadata, input_dir, output_dir, lines)
100%|██████████| 17/17 [00:00<00:00, 87.10it/s]

When we plot these images in QGIS: |e289855cc3ea4043b75325e595c6df80|

Using these fixed yaw angles results in improved georeferencing, with the shoreline aligning more closely with the satellite basemap.

Therefore, we will apply fixed yaw angles of 90° and 270° to georeference the masked_chl_hu_ocx_imgs:

[72]:
metadata = pd.read_csv(metadata_path)

flight_lines = dronewq.compute_flight_lines(metadata.Yaw, 87, 0, 0)
even_yaw = 90
odd_yaw = 270
threshold = np.median(metadata.Yaw)
for line in flight_lines:
    line.update(yaw = even_yaw if line['yaw'] < threshold else odd_yaw)
flight_lines[0:5]
[72]:
[{'start': 11, 'end': 17, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 22, 'end': 32, 'yaw': 270, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 34, 'end': 42, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 46, 'end': 56, 'yaw': 270, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 58, 'end': 66, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87}]
[73]:
input_dir = settings.output_dir / 'masked_chl_hu_ocx_imgs'
output_dir = settings.output_dir / 'georeferenced_masked_chl_hu_ocx'
lines = flight_lines

dronewq.georeference(metadata, input_dir, output_dir, lines)
100%|██████████| 328/328 [00:07<00:00, 43.96it/s]

Let’s plot a few of these georeferenced images by the shoreline:

[74]:
fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize=(12, 6), subplot_kw = dict(projection = Mercator()))
ax_0, mappable_0 = dronewq.plot_georeferenced_data(ax=ax, filename=f"{settings.output_dir}/georeferenced_masked_chl_hu_ocx/capture_444.tif",
                                           vmin=5, vmax=20, cmap='Greens', norm=None, zoom=19, basemap = cx.providers.OpenStreetMap.Mapnik)

ax_0, mappable_0 = dronewq.plot_georeferenced_data(ax=ax, filename=f"{settings.output_dir}/georeferenced_masked_chl_hu_ocx/capture_445.tif",
                                           vmin=5, vmax=20, cmap='Greens', norm=None, zoom=19, basemap = cx.providers.OpenStreetMap.Mapnik)

ax_0.set_title('')
plt.colorbar(mappable_0, label='Chlorophyll a (mg $m^{-3}$)')
[74]:
<matplotlib.colorbar.Colorbar at 0x7fad34572ad0>
_images/primary_demo_71_1.png

We can create a mosaic from all the individual georeferenced images. Let’s review the available options.

[75]:
?dronewq.mosaic
Signature:
dronewq.mosaic(
    input_dir,
    output_dir,
    output_name,
    method='mean',
    dtype=<class 'numpy.float32'>,
    band_names=None,
)
Docstring:
Mosaic multiple raster files into a single GeoTIFF.

This function reads all raster files in `input_dir` and merges them into
one mosaic using the specified merging method. If `band_names` is provided,
the mosaicked result is written as multiple single-band files—one file per
band—named using the provided band names. Otherwise, a single multi-band
raster is written.

Parameters
----------
input_dir : str
    Path to a directory containing the input rasters to mosaic.

output_dir : str
    Path to the directory where the output raster(s) will be saved.
    The directory is created if it does not exist.

output_name : str
    Base filename (without extension) for the mosaicked output.

method : {"mean", "first", "min", "max"}, optional
    Method used to combine overlapping pixels when multiple rasters
    cover the same area. Default is `"mean"`.

dtype : numpy.dtype, optional
    Data type of the output raster. Defaults to `np.float32`.

band_names : list[str] or None, optional
    If provided, one output file will be created per band, using the given
    band names (e.g., `["red", "green", "blue"]`).
    The number of entries must match the number of bands in the input
    rasters.

Returns
-------
str
    The filepath of the generated mosaic (or the first band file if
    `band_names` is provided).

Notes
-----
- This function assumes all rasters share the same coordinate reference
  system (CRS).
- When multiple input files are found, the output extent is computed to
  cover all rasters.
File:      ~/Developer/DroneWQ/src/dronewq/core/mosaic.py
Type:      function

We will make a mosaic from all the georeferenced chlorophyll a images, calculating the mean of overlapping pixels.

[76]:
input_dir = settings.output_dir / 'georeferenced_masked_chl_hu_ocx'
output_dir = settings.output_dir / 'mosaic_chl_hu_ocx'

dronewq.mosaic(input_dir, output_dir, output_name = 'mean_mosaic_chl_hu_ocx',
           method='mean', band_names = None)
100%|██████████| 328/328 [00:40<00:00,  8.01it/s]
[76]:
'../Lake_Erie/hedley_dls_rrs/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx.tif'

Now let’s plot the mosaicked image.

[77]:
fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize=(12, 6), subplot_kw = dict(projection = Mercator()))
ax_0, mappable_0 = dronewq.plot_georeferenced_data(ax=ax, filename=f"{settings.output_dir}/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx.tif",
                                           vmin=0, vmax=20, cmap='Greens', norm=None, basemap = cx.providers.Esri.WorldImagery)
ax_0.set_title('')
plt.colorbar(mappable_0, label='Chlowrophyll a (mg $m^{-3}$)')
[77]:
<matplotlib.colorbar.Colorbar at 0x7fadbdd35310>
_images/primary_demo_77_1.png

We can also downsample the mosaic to reduce its spatial resolution. This is useful for decreasing the file size when working with large mosaics.

[78]:
?dronewq.downsample
Signature:
dronewq.downsample(
    input_tif,
    output_dir,
    scale_x,
    scale_y,
    method=<Resampling.average: 5>,
)
Docstring:
Downsample a raster by reducing its spatial resolution.

This function reads the input raster and resizes it by the specified
factors (`scale_x`, `scale_y`) using the given resampling method.
The resulting downsampled raster is saved to `output_dir`.

Parameters
----------
input_tif : str
    Path to the input GeoTIFF to downsample.

output_dir : str
    Directory where the downsampled raster will be saved.
    The directory is created if it does not exist.

scale_x : int
    Horizontal downsampling factor.
    For example, `scale_x=2` halves the raster width.

scale_y : int
    Vertical downsampling factor.
    For example, `scale_y=2` halves the raster height.

method : rasterio.enums.Resampling, optional
    The resampling algorithm to apply (e.g., `Resampling.average`,
    `Resampling.nearest`, `Resampling.bilinear`).
    Default is `Resampling.average`.

Returns
-------
str
    The filepath of the generated downsampled raster.

Notes
-----
- The affine transform is automatically scaled to match the new resolution.
- For a full list of resampling methods, see Rasterio's documentation.
File:      ~/Developer/DroneWQ/src/dronewq/core/mosaic.py
Type:      function

We will downsample the width and height of the mosaic by 15%, using a nearest neighbor resampling algorithm.

[79]:
input_dir = settings.output_dir / 'mosaic_chl_hu_ocx' / "mean_mosaic_chl_hu_ocx.tif"
output_dir = settings.output_dir / 'mosaic_chl_hu_ocx'

dronewq.downsample(input_dir, output_dir, scale_x = 15, scale_y = 15, method = Resampling.nearest)
[79]:
'../Lake_Erie/hedley_dls_rrs/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx_x_15_y_15_method_nearest.tif'

And let’s plot it:

[80]:
fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize=(12, 6), subplot_kw = dict(projection = Mercator()))
ax_0, mappable_0 = dronewq.plot_georeferenced_data(ax=ax, filename=f"{settings.output_dir}/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx_x_15_y_15_method_nearest.tif",
                                           vmin=5, vmax=15, cmap='Greens', norm=None, basemap = cx.providers.Esri.WorldImagery)
ax_0.set_title('')
plt.colorbar(mappable_0, label='Chlorophyll a (mg $m^{-3}$)')
[80]:
<matplotlib.colorbar.Colorbar at 0x7fad35190690>
_images/primary_demo_83_1.png

Some striping in the mosaic is visible because the drone/sensor changes yaw angles between transects. Processing and mosaicking a larger dataset could reveal broader water quality patterns or the presence of algal blooms.