Demonstration of `DroneWQ` functions and processing code

Summary

This notebook provides an example workflow of DroneWQ using multispectral images collected by the MicaSense RedEdge-MX sensor over western Lake Erie on August 17, 2022. The workflow processes raw imagery to remote sensing reflectance and applies bio-optical algorithms to retrieve chlorophyll a concentrations. It georeferences the images producing a final chlorophyll a mosaic of the full dataset.

The Lake Erie drone dataset can be found in the Zenodo DOI, please note it is 5.84 GB unzipped. Depending on computer speed, you may need to subset the data before running the code workflow shown here.

1. Setup

Pull in all the libraries needed for this notebook.

[1]:

import os
import numpy as np
import matplotlib.pyplot as plt
from cartopy.crs import Mercator
import matplotlib.image as mpimg

plt.rcParams['mathtext.default'] = 'regular'

from utils import *

Make sure your imagery has been manually organized into this exact structure (name the main_dir whatever you want but keep the rest exactly as named here):

<main_dir>/
    raw_water_imgs/
    align_img/
    raw_sky_imgs/
    panel/

With that structure now just specify the path to your main_dir below. Here, we have the main_dir called ‘Lake_Erie’.

[2]:

project_path = os.path.join('Lake_Erie')
project_path

[2]:

'Lake_Erie'

2. View metadata

MicaSense raw images contain metatdata including imager information, GPS coordianates, and more. We can use the write_metadata_csv() function to extract image metadata and save it as .csv.

Let’s open the first five lines of the .csv to take a look.

[3]:

write_metadata_csv(img_set = imageset.ImageSet.from_directory(project_path+'/raw_water_imgs'),
                                                            csv_output_path=project_path)

img_metadata = pd.read_csv(os.path.join(project_path, 'metadata.csv'))
img_metadata.head()

Loading ImageSet from: Lake_Erie/raw_water_imgs

[3]:

	filename	dirname	DateStamp	TimeStamp	Latitude	LatitudeRef	Longitude	LongitudeRef	Altitude	SensorX	...	FocalLength	Yaw	Pitch	Roll	SolarElevation	ImageWidth	ImageHeight	XResolution	YResolution	ResolutionUnits
0	capture_1.tif	Lake_Erie/capture_1.tif	2022-08-17	15:37:04	41.828528	N	-83.398762	W	265.061	4.8	...	5.43432	218.376096	25.899967	335.837017	0.895362	1280	960	266.666667	266.666667	mm
1	capture_2.tif	Lake_Erie/capture_2.tif	2022-08-17	15:37:08	41.828670	N	-83.398764	W	265.554	4.8	...	5.43432	221.427592	16.821391	341.184053	0.895530	1280	960	266.666667	266.666667	mm
2	capture_3.tif	Lake_Erie/capture_3.tif	2022-08-17	15:37:10	41.828809	N	-83.398760	W	265.968	4.8	...	5.43432	218.898605	16.695405	343.643092	0.895655	1280	960	266.666667	266.666667	mm
3	capture_4.tif	Lake_Erie/capture_4.tif	2022-08-17	15:37:13	41.828956	N	-83.398761	W	266.420	4.8	...	5.43432	216.716936	18.188710	345.889851	0.895738	1280	960	266.666667	266.666667	mm
4	capture_5.tif	Lake_Erie/capture_5.tif	2022-08-17	15:37:16	41.829081	N	-83.398763	W	267.018	4.8	...	5.43432	214.429136	18.385695	347.772515	0.895863	1280	960	266.666667	266.666667	mm

5 rows × 21 columns

We can plot the altitude and lat,lon, and yaw angle metadata of the image captures to get a sense of the flight plan.

[4]:

fig, ax = plt.subplots(1,3, figsize=(10,3), layout='tight')

ax[0].plot(list(range(len(img_metadata))),img_metadata['Altitude'])
ax[0].set_ylabel('Altitude (m)')

ax[1].scatter(img_metadata['Longitude'], img_metadata['Latitude'])
ax[1].set_ylabel('Latitude')
ax[1].set_xlabel('Longitude')

ax[2].plot(list(range(len(img_metadata))),img_metadata['Yaw'])
ax[2].set_ylabel('Yaw')

[4]:

Text(0, 0.5, 'Yaw')

3. Process raw imagery to remote sensing reflectance (Rrs) and visualize data

Now let’s get to processing. The process_raw_to_rrs() function has a number of options that you can see here:

[5]:

?process_raw_to_rrs

Signature:
process_raw_to_rrs(
    main_dir,
    rrs_dir_name,
    output_csv_path,
    lw_method='mobley_rho_method',
    mask_pixels=False,
    random_n=10,
    pixel_masking_method='value_threshold',
    mask_std_factor=1,
    nir_threshold=0.01,
    green_threshold=0.005,
    ed_method='dls_ed',
    overwrite_lt_lw=False,
    clean_intermediates=True,
)
Docstring:
This functions is the main processing script that processs raw imagery to units of remote sensing reflectance (Rrs). Users can select which processing parameters to use to calculate Rrs.

Inputs:
main_dir: A string containing the main image directory
rrs_dir_name: A string containing the directory filepath of new rrs images
output_csv_path: A string containing the filepath to write the metadata.csv
lw_method: Method used to calculate water leaving radiance. Default is mobley_rho_method().
random_n: The amount of random images to calculate ambient NIR level. Default is 10. Only need if lw_method = 'hedley_method'
mask_pixels: Option to mask pixels containing specular sun glint, shadowing, adjacent vegetation, etc. Default is False.
pixel_masking_method: Method to mask pixels. Options are 'value_threshold' or 'std_threshold'. Default is value_threshold.
mask_std_factor: A factor to multiply to the standard deviation of NIR values. Default is 1. Only need if pixel_masking_method = 'std_threshold'
nir_threshold: An Rrs(NIR) value where pixels above this will be masked. Default is 0.01. These are usually pixels of specular sun glint or land features. Only need if pixel_masking_method = 'value_threshold'.
green_threshold: A Rrs(green) value where pixels below this will be masked. Default is 0.005. These are usually pixels of vegetation shadowing.  Only need if pixel_masking_method = 'value_threshold'.
ed_method: Method used to calculate downwelling irradiance (Ed). Default is dls_ed().
overwrite_lt_lw: Option to overwrite lt and lw files that have been written previously. Default is False but this is only applied to the Lt images.
clean_intermediates: Option to erase intermediates of processing (Lt, Lw, unmasked Rrs)

Output: New Rrs tifs (masked or unmasked) with units of sr^-1.
File:      /glusteruser/awindled/DroneWQ/utils.py
Type:      function

For this example, we’re calculating water leaving radiance (Lw) using the Hedley approach, calculating downwelling irradiance (Ed) using the downwelling light sensor (DLS). We are applying the default masking procedure with a NIR threshold of 0.02. We’ll save processed images out two directories called rrs_hedley and masked_rrs_hedley. Please see paper.md for more information on the different processing methods used here.

In summary, this code will process: Raw -> Lt -> Lw (Hedley method) -> Rrs (using Ed from DLS) with pixel masking.

Note that we are changing the default nir_threshold for pixel masking from 0.01 to 0.02 since 0.01 was not high enough to mask glint pixels in this dataset. It is important to consider these values since they may change according to your dataset.

[6]:

process_raw_to_rrs(main_dir=project_path, rrs_dir_name='rrs_hedley', output_csv_path=project_path, lw_method='hedley_method',
                   random_n=10, mask_pixels=True, pixel_masking_method='value_threshold', nir_threshold=0.02, ed_method='dls_ed',
                   overwrite_lt_lw=False, clean_intermediates=False)

Loading ImageSet from: Lake_Erie/align_img
Processing a total of 2351 images or 470 captures.
Converting raw images to radiance (raw -> Lt).
Loading ImageSet from: Lake_Erie/raw_water_imgs
Loading ImageSet from: Lake_Erie/align_img
used warp dir Lake_Erie/align_img
Aligning images. Depending on settings this can take from a few seconds to many minutes
Finished aligning band 0
Finished aligning band 1
Finished aligning band 2
Finished aligning band 3
Finished aligning band 4
Saving time: 0:00:00.525930
Alignment+Saving rate: 893.66 images per second
Finished saving images.
Finished saving image metadata.
Applying the Hochberg/Hedley (Lt -> Lw)
Normalizing by DLS irradiance (Lw/Ed -> Rrs).
Loading ImageSet from: Lake_Erie/raw_water_imgs
All data has been saved as Rrs using the hedley_method to calculate Lw and normalized by dls_ed irradiance.
Masking pixels using NIR and green Rrs thresholds

[6]:

True

To grab these processed images and their metadata you can use the helper function retrieve_imgs_and_metadata(). The images can also easily be loaded in with the lower level load_images() function or even just with rasterio’s open() function but the higher level retrieve_imgs_and_metadata() allows you to apply an altitude cutoff and limit the number of files being opened. In the case of large processing jobs it will likely be more appropriate to loop through the files individually or in batches which can be done by looping the start and count arguments in retrieve_imgs_and_metadata().

[7]:

?retrieve_imgs_and_metadata

Signature:
retrieve_imgs_and_metadata(
    img_dir,
    count=10000,
    start=0,
    altitude_cutoff=0,
    sky=False,
    random=False,
)
Docstring:
This function is the main interface we expect the user to use when grabbing a subset of imagery from any stage in processing. This returns the images as a numpy array and metadata as a pandas dataframe.

Inputs:
img_dir: A string containing the directory filepath of images to be retrieved
count: The amount of images you want to list. Default is 10000
start: The number of image to start on. Default is 0 (first image in img_dir).
random: A boolean to load random images. Default is False

Outputs: A multidimensional numpy array of all image captures in a directory and a Pandas dataframe of image metadata.
File:      /glusteruser/awindled/DroneWQ/utils.py
Type:      function

Let’s retrieve the Rrs data (5 bands) and visualize the first 5 images:

[8]:

rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = os.path.join(project_path, 'rrs_hedley'), start=0, count=5)
print(img_metadata.index)

band_names = ['blue', 'green', 'red', 'red edge', 'nir']

for j in range(len(rrs_imgs_hedley[0:5])):
    fig, ax = plt.subplots(1,5, figsize=(10,3.5))
    for i in range(5):
        im = ax[i].imshow(rrs_imgs_hedley[j,i], cmap='viridis', vmin=0.001, vmax=0.02)
        ax[i].set_xticks([])
        ax[i].set_yticks([])
        ax[i].set_title(band_names[i])
    cbar = fig.colorbar(im, ax=ax[i], fraction=0.046, pad=0.04)
    cbar.set_label('$R_{rs}\ (sr^{-1}$)', rotation=270, labelpad=15)
    fig.tight_layout()
    plt.show()

Index(['capture_1.tif', 'capture_2.tif', 'capture_3.tif', 'capture_4.tif',
       'capture_5.tif'],
      dtype='object', name='filename')

We can also visualize the masked Rrs:

[9]:

masked_rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = os.path.join(project_path, 'masked_rrs_hedley'), start=0, count=5)
print(img_metadata.index)

for j in range(len(masked_rrs_imgs_hedley[0:5])):
    fig, ax = plt.subplots(1,5, figsize=(10,3.5))
    for i in range(5):
        im = ax[i].imshow(masked_rrs_imgs_hedley[j,i],cmap='viridis', vmin=0.001, vmax=0.02)
        ax[i].set_xticks([])
        ax[i].set_yticks([])
        ax[i].set_title(band_names[i])
    cbar = fig.colorbar(im, ax=ax[i], fraction=0.046, pad=0.04)
    cbar.set_label('$R_{rs}\ (sr^{-1}$)', rotation=270, labelpad=15)
    fig.tight_layout()
    plt.show()

Index(['capture_1.tif', 'capture_2.tif', 'capture_3.tif', 'capture_4.tif',
       'capture_5.tif'],
      dtype='object', name='filename')

Let’s plot the masked Rrs spectra of the first 25 images:

[10]:

masked_rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = os.path.join(project_path, 'masked_rrs_hedley'), count=25)

fig, ax = plt.subplots(1,1, figsize=(6,3))

wv = [475, 560, 668, 717, 842]
colors = plt.cm.viridis(np.linspace(0,1,len(masked_rrs_imgs_hedley)))

for i in range(len(masked_rrs_imgs_hedley)):
    plt.plot(wv, np.nanmean(masked_rrs_imgs_hedley[i,0:5,:,:],axis=(1,2)), marker = 'o', color=colors[i], label="")
    plt.xlabel('Wavelength (nm)')
    plt.ylabel('$R_{rs}\ (sr^{-1}$)')
plt.plot(wv, np.nanmean(masked_rrs_imgs_hedley[:,0:5,:,:], axis=(0,2,3)),  marker = 'o', color='black', linewidth=5, label='Mean')

plt.legend(frameon=False)

[10]:

<matplotlib.legend.Legend at 0x7f9ed1712530>

Let’s compare this Rrs spectra to Rrs spectra retrived by a satellite at the same location and time. Below is a plot of Rrs from the Ocean Land Color Imager (OLCI) on board the European satellite, Sentinel-3A. OLCI collects in a few more bands which is why there are more points on the plot. However, the shape looks similar to the MicaSense retrieved Rrs. The troughs at 475 nm and 668 nm is associated with chlorophyll absorption and the peak at 717 nm is likely due to phytoplankton particle scattering.

Lake Erie OLCI

Let’s plot the Lt, Lw, Ed spectra as well. First, we need to retrieve all the data.

[11]:

lt_imgs, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/lt_imgs', count=25)
lw_imgs, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/lw_imgs', count=25)
dls_ed = pd.read_csv(project_path+'/dls_ed.csv')
rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/rrs_hedley', count=25)
masked_rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/masked_rrs_hedley', count=25)

[12]:

fig, axs = plt.subplots(2,2, figsize=(10,5))

axs = axs.ravel()

wv = [475, 560, 668, 717, 842]
colors = plt.cm.viridis(np.linspace(0,1,len(lt_imgs)))

#lt
for i in range(len(lt_imgs)):
    axs[0].plot(wv, np.nanmean(lt_imgs[i,0:5,:,:],axis=(1,2)),  marker = 'o', color=colors[i], label="")
    axs[0].set_xlabel('Wavelength (nm)')
    axs[0].set_ylabel('$L_t\ (mW\ m^2\ sr^{-1}\ nm^{-1}$)')
axs[0].plot(wv, np.nanmean(lt_imgs[:,0:5,:,:], axis=(0,2,3)),  marker = 'o', color='black', linewidth=5, label='Mean')
axs[0].legend(frameon=False)

#dls ed
ed = pd.read_csv(project_path+'/dls_ed.csv')
colors = plt.cm.viridis(np.linspace(0,1,len(ed)))

for i in range(len(ed)):
    axs[1].plot(wv, ed.iloc[i,1:6],  marker = 'o', color=colors[i])
    axs[1].set_xlabel('Wavelength (nm)')
    axs[1].set_ylabel('$E_d\ (mW\ m^2\ nm^{-1}$)')
axs[1].plot(wv, ed.iloc[:,1:6].mean(axis=0),  marker = 'o', color='black', linewidth=5, label='Mean')

#rrs_imgs_hedley
colors = plt.cm.viridis(np.linspace(0,1,len(rrs_imgs_hedley)))

for i in range(len(rrs_imgs_hedley)):
    axs[2].plot(wv, np.nanmean(rrs_imgs_hedley[i,0:5,:,:],axis=(1,2)),  marker = 'o', color=colors[i], label="")
    axs[2].set_xlabel('Wavelength (nm)')
    axs[2].set_ylabel('$R_{rs}\ (sr^{-1}$)')
axs[2].plot(wv, np.nanmean(rrs_imgs_hedley[:,0:5,:,:], axis=(0,2,3)),  marker = 'o', color='black', linewidth=5, label='Mean')


#rrs_imgs_hedley_masked
colors = plt.cm.viridis(np.linspace(0,1,len(masked_rrs_imgs_hedley)))

for i in range(len(masked_rrs_imgs_hedley)):
    axs[3].plot(wv, np.nanmean(masked_rrs_imgs_hedley[i,0:5,:,:],axis=(1,2)),  marker = 'o', color=colors[i], label="")
    axs[3].set_xlabel('Wavelength (nm)')
    axs[3].set_ylabel('$R_{rs}\ (sr^{-1}$)')
axs[3].plot(wv, np.nanmean(masked_rrs_imgs_hedley[:,0:5,:,:], axis=(0,2,3)),  marker = 'o', color='black', linewidth=5, label='Mean')

fig.tight_layout()

The process_raw_to_rrs() function also saves images processed to total radiance (Lt) as stacked red-green-blue (RGB) .jpg’s in a folder called ‘lt_thumbnails’. This is an easy way to quickly view the images to see what the water surface looked like. Let’s open a few to take a look:

[13]:

def capture_path_to_int(path : str) -> int:
    return int(os.path.basename(path).split('_')[-1].split('.')[0])

lt_thumbnails = sorted(glob.glob(project_path + '/lt_thumbnails/*.jpg'), key = capture_path_to_int)[10:]

fig, axs = plt.subplots(2,3, figsize=(6,4), sharex=True, sharey=True, layout='tight')
axs = axs.ravel()

for i in range(6):
    image = mpimg.imread(lt_thumbnails[i])
    axs[i].imshow(image)
    axs[i].set_title(os.path.basename(lt_thumbnails[i]))

plt.show()

4. Convert to point samples

You might be interested in averaging values for every image, to work with point samples. This code calculates the median Rrs across bands and saves them to a Pandas dataframe with the dirname, lat, and lon.

[14]:

masked_rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/masked_rrs_hedley')

rrs_blue = []
rrs_green = []
rrs_red = []
rrs_rededge = []
rrs_nir= []

for i in range(len(masked_rrs_imgs_hedley)):
    rrs_blue.append(np.nanmedian(masked_rrs_imgs_hedley[i,0,:,:]))
    rrs_green.append(np.nanmedian(masked_rrs_imgs_hedley[i,1,:,:]))
    rrs_red.append(np.nanmedian(masked_rrs_imgs_hedley[i,2,:,:]))
    rrs_rededge.append(np.nanmedian(masked_rrs_imgs_hedley[i,3,:,:]))
    rrs_nir.append(np.nanmedian(masked_rrs_imgs_hedley[i,4,:,:]))

df = img_metadata[['dirname', 'Latitude', 'Longitude']]
df['rrs_blue'] = rrs_blue
df['rrs_green'] = rrs_green
df['rrs_red'] = rrs_red
df['rrs_rededge'] = rrs_rededge
df['rrs_nir'] = rrs_nir
df.head()

[14]:

	dirname	Latitude	Longitude	rrs_blue	rrs_green	rrs_red	rrs_rededge	rrs_nir
filename
capture_1.tif	Lake_Erie/capture_1.tif	41.828528	-83.398762	0.007600	0.018665	0.009293	0.013961	0.004451
capture_2.tif	Lake_Erie/capture_2.tif	41.828670	-83.398764	0.007605	0.018555	0.009364	0.013934	0.004396
capture_3.tif	Lake_Erie/capture_3.tif	41.828809	-83.398760	0.008022	0.019594	0.009729	0.014719	0.004636
capture_4.tif	Lake_Erie/capture_4.tif	41.828956	-83.398761	0.007390	0.018215	0.009205	0.013672	0.004342
capture_5.tif	Lake_Erie/capture_5.tif	41.829081	-83.398763	0.007621	0.018760	0.009456	0.014053	0.004457

Easy to save as a .csv:

[15]:

df.to_csv(project_path + '/median_rrs.csv')

5. Apply bio-optical algorithms

We can also apply bio-optical algorithms to derive chlorophyll a and total suspended matter (TSM) concentrations. See the paper.md for more information on each algorithm. Here, we are applying the blended NASA chlorophyll a algorithm (Hu et al., 2019).

[16]:

masked_rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/masked_rrs_hedley')

chl_hu_ocx_imgs = chl_hu_ocx(Rrsblue=masked_rrs_imgs_hedley[:,0,:,:],
                             Rrsgreen=masked_rrs_imgs_hedley[:,1,:,:],
                             Rrsred=masked_rrs_imgs_hedley[:,2,:,:])
print(chl_hu_ocx_imgs.shape)

(470, 928, 1227)

Let’s take a quick look at a histogram of all chl values:

[17]:

plt.hist(chl_hu_ocx_imgs.ravel(), range=[0, 30])
print(np.nanmedian(chl_hu_ocx_imgs))

10.943304

The median value looks to be around 11 mg \(m^{-3}\), which is reasonable for inland waters.

Now, we can save new .tifs processed with this algorithm.

[18]:

save_wq_imgs(main_dir=project_path, rrs_img_dir='masked_rrs_hedley',
             wq_dir_name='masked_chl_hu_ocx_imgs', wq_alg ='chl_hu_ocx')

100%|██████████| 470/470 [04:12<00:00,  1.86it/s]

Let’s visualize a few chl a images.

[37]:

chl_imgs, img_metadata = retrieve_imgs_and_metadata(img_dir = os.path.join(project_path, 'masked_chl_hu_ocx_imgs'), start=0, count=6)
print(img_metadata.index)

fig, axs = plt.subplots(2,3, figsize=(6,3), sharex=True, sharey=True, layout='tight')

axs = axs.ravel()

for i in range(6):
    im = axs[i].imshow(chl_imgs[i,0,:,:], cmap='Greens', vmin=0, vmax=20)

cbar_ax = fig.add_axes([0.99, 0.15, 0.05, 0.7])
fig.colorbar(im, cax=cbar_ax, label='Chl a (mg $m^{-3}$)')

plt.show()

Index(['capture_1.tif', 'capture_2.tif', 'capture_3.tif', 'capture_4.tif',
       'capture_5.tif', 'capture_6.tif'],
      dtype='object', name='filename')

We can also save average chl-a/TSM concentrations for each image in dataframe to use as point data:

[20]:

masked_rrs_imgs_hedley, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/masked_rrs_hedley')

chl_hu_ocx_imgs = chl_hu_ocx(Rrsblue=masked_rrs_imgs_hedley[:,0,:,:],
                             Rrsgreen=masked_rrs_imgs_hedley[:,1,:,:],
                             Rrsred=masked_rrs_imgs_hedley[:,2,:,:])

tsm_nechad_imgs = tsm_nechad(Rrsred=masked_rrs_imgs_hedley[:,2,:,:])

chl_hu_ocx_values = []
tsm_nechad_values = []

for i in range(len(masked_rrs_imgs_hedley)):
    chl_hu_ocx_values.append(np.nanmedian(chl_hu_ocx_imgs[i,:,:]))
    tsm_nechad_values.append(np.nanmedian(tsm_nechad_imgs[i,:,:]))

df['chl_hu_ocx'] = chl_hu_ocx_values
df['tsm_nechad'] = tsm_nechad_values

df.head()

[20]:

	dirname	Latitude	Longitude	rrs_blue	rrs_green	rrs_red	rrs_rededge	rrs_nir	chl_hu_ocx	tsm_nechad
filename
capture_1.tif	Lake_Erie/capture_1.tif	41.828528	-83.398762	0.007600	0.018665	0.009293	0.013961	0.004451	10.849049	5.088389
capture_2.tif	Lake_Erie/capture_2.tif	41.828670	-83.398764	0.007605	0.018555	0.009364	0.013934	0.004396	10.691706	5.115107
capture_3.tif	Lake_Erie/capture_3.tif	41.828809	-83.398760	0.008022	0.019594	0.009729	0.014719	0.004636	10.681633	5.251874
capture_4.tif	Lake_Erie/capture_4.tif	41.828956	-83.398761	0.007390	0.018215	0.009205	0.013672	0.004342	10.880504	5.055672
capture_5.tif	Lake_Erie/capture_5.tif	41.829081	-83.398763	0.007621	0.018760	0.009456	0.014053	0.004457	10.863681	5.149392

Easy to save as .csv:

[21]:

#save as csv
df.to_csv(project_path + '/median_rrs_and_wq.csv')

We can use this dataframe to map point samples of mean chl-a and TSM concentrations:

[38]:

fig, ax = plt.subplots(1,2, figsize=(8,3), layout='tight')
for i in range(len(df)):
    g1 = ax[0].scatter(df['Latitude'], df['Longitude'], c=df['chl_hu_ocx'], cmap='Greens', vmin=10, vmax=12)
    g2 = ax[1].scatter(df['Latitude'], df['Longitude'], c=df['tsm_nechad'], cmap='YlOrRd', vmin=5, vmax=6)

cbar = fig.colorbar(g1, ax=ax[0])
cbar.set_label('Chlorophyll a (mg $m^{-3}$)', rotation=270, labelpad=12)
cbar = fig.colorbar(g2, ax=ax[1])
cbar.set_label('TSM (mg/L)', rotation=270, labelpad=12)

plt.show()

6. Georeference and mosaic chlorophyll a images

We can georeference the images based on the sensor yaw, pitch, roll, lat, lon, and altitude. Note that georeferencing can be inaccurate if the yaw, pitch, and roll information is imprecise. Let’s look at the options in the georeference() function.

[23]:

?georeference

Signature:
georeference(
    metadata,
    input_dir,
    output_dir,
    lines=None,
    altitude=None,
    yaw=None,
    pitch=0,
    roll=0,
    axis_to_flip=1,
)
Docstring:
This function georeferences all the captures indicated in the line parameter following the specification of the other parameters such as altitude, yaw, pitch, roll, axis_to_flip

Inputs:
metadata: A Pandas dataframe of the metadata
input_dir: A string containing the directory filepath of the images to be retrieved for georeferencing.
output_dir: A string containing the directory filepath to be saved.
lines: Selection of images to be processed. Defaults to None. Example: [slice(0,10)]
altitude: sets the altitude where all captures were taken. Defaults to None which uses the altitude data saved in the metadata for each respective capture.
yaw: sets the sensor's direction angle during all captures. Defaults to None which uses the yaw angle saved in the metadata for each respective capture.
pitch: sets the sensor's pitch angle during all captures. Defaults to 0 which means the sensor was horizontal to the ground.
roll: sets the sensor's roll angle during all captures. Defaults to 0 which means the sensor was horizontal to the ground.
axis_to_flip: The axis to apply a flip. Defaults to 1.

Output: Georeferenced .tifs in output_dir
File:      /glusteruser/awindled/DroneWQ/utils.py
Type:      function

We have two options here. We can set fixed georeferencing parameters, or allow the function to use parameters saved in the metadata.

In this example, the image metadata lists the altitude around 260-265 m; however, the altitude was set at 87 m in the flight planning software. Therefore, we will set the altitude to a fixed altitude of 87 m.

If you can maintain a consistent sensor yaw angle throughout the whole flight, you should use a single fixed angle. If your sensor changes yaw angle during every transect (such as in this example), a different yaw angle should be provided in the function for each transect. The georeference() function can read in a lines parameter that can be created either manually or you can use the function compute_flight_lines() to automatically produce a list of capture indices per transect with the median yaw angle, ignoring capturing collected during turns. More information can be found in the paper.md.

It is recommended to run the georeference() function on a few captures that contain a shoreline or land if possible and plot them with a satellite basemap or pull them into a GIS software to assess the georeferencing. We have provided examples here by manually creating a new folder of rrs_hedley_subset and lt_thumbnails_subset with captures 443 to 450 containing the shoreline.

Running georeference() with the automatic compute_flight_lines() function produces a flight_lines list with yaw angle changing from 207 to 24 degrees:

[24]:

rrs_hedley_subset, img_metadata = retrieve_imgs_and_metadata(img_dir = project_path+'/rrs_hedley_subset', start=432, count=18)
img_metadata = img_metadata.reset_index()

flight_lines = compute_flight_lines(img_metadata.Yaw, 87, 0, 0, threshold=100)
flight_lines

[24]:

[{'start': 0,
  'end': 9,
  'yaw': 207.89967631978038,
  'pitch': 0,
  'roll': 0,
  'alt': 87},
 {'start': 10,
  'end': 18,
  'yaw': 25.85588214075483,
  'pitch': 0,
  'roll': 0,
  'alt': 87}]

We can edit the metadata to change file extension from’.tif’ to ‘.jpg’ and run georeference() on the lt_thumbnails_subset:

[25]:

img_metadata['filename'] = img_metadata['filename'].apply(lambda x : x.replace('.tif', '.jpg'))

input_dir = os.path.join(project_path, 'lt_thumbnails_subset')
output_dir = os.path.join(project_path, 'automatic_georeferenced_lt_thumbnails_subset')
lines = flight_lines

georeference(img_metadata, input_dir, output_dir, lines)

100%|██████████| 17/17 [00:02<00:00,  7.87it/s]

When we plot these images in QGIS, it looks like:

Auto Flightlines QGIS

You can see that using the metadata yaw angle in georeferencing doesn’t always lead to accurate georeferencing.

Running georeference() using fixed yaw angles of 90 and 270 degrees:

[26]:

flight_lines = compute_flight_lines(img_metadata.Yaw, 87, 0, 0, threshold=100)

even_yaw = 90
odd_yaw = 270
threshold = np.median(img_metadata.Yaw)
for line in flight_lines:
    line.update(yaw = even_yaw if line['yaw'] < threshold else odd_yaw)

flight_lines

[26]:

[{'start': 0, 'end': 9, 'yaw': 270, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 10, 'end': 18, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87}]

[27]:

input_dir = os.path.join(project_path, 'lt_thumbnails_subset')
output_dir = os.path.join(project_path, 'manual_georeferenced_lt_thumbnails_subset')
lines = flight_lines

georeference(img_metadata, input_dir, output_dir, lines)

100%|██████████| 17/17 [00:01<00:00, 10.16it/s]

When we plot these images in QGIS:

Manual Flightlines QGIS

You can see that using these fixed yaw angles leads to better georeferencing where the shoreline better matches to the satellite basemap.

Therefore, we will use fixed yaw angles of 90 and 270 degrees to georeference the masked_chl_hu_ocx_imgs:

[28]:

metadata = pd.read_csv(os.path.join(project_path, 'metadata.csv'))

flight_lines = compute_flight_lines(metadata.Yaw, 87, 0, 0)
even_yaw = 90
odd_yaw = 270
threshold = np.median(metadata.Yaw)
for line in flight_lines:
    line.update(yaw = even_yaw if line['yaw'] < threshold else odd_yaw)
flight_lines[0:5]

[28]:

[{'start': 11, 'end': 17, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 22, 'end': 32, 'yaw': 270, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 34, 'end': 42, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 46, 'end': 56, 'yaw': 270, 'pitch': 0, 'roll': 0, 'alt': 87},
 {'start': 58, 'end': 66, 'yaw': 90, 'pitch': 0, 'roll': 0, 'alt': 87}]

[29]:

metadata = pd.read_csv(os.path.join(project_path, 'metadata.csv'))
input_dir = os.path.join(project_path, 'masked_chl_hu_ocx_imgs')
output_dir = os.path.join(project_path, 'georeferenced_masked_chl_hu_ocx')
lines = flight_lines

georeference(metadata, input_dir, output_dir, lines)

100%|██████████| 328/328 [01:18<00:00,  4.16it/s]

Let’s plot a few of these georeferenced images by the shoreline:

[48]:

fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize=(12, 6), subplot_kw = dict(projection = Mercator()))
ax_0, mappable_0 = plot_georeferenced_data(ax=ax, filename="Lake_Erie/georeferenced_masked_chl_hu_ocx/capture_444.tif",
                                           vmin=5, vmax=20, cmap='Greens', norm=None, basemap = cx.providers.OpenStreetMap.Mapnik)

ax_0, mappable_0 = plot_georeferenced_data(ax=ax, filename="Lake_Erie/georeferenced_masked_chl_hu_ocx/capture_445.tif",
                                           vmin=5, vmax=20, cmap='Greens', norm=None, basemap = cx.providers.OpenStreetMap.Mapnik)

ax_0.set_title('Hu OCX Chl Alg')
plt.colorbar(mappable_0, label='Chlorophyll a (mg $m^{-3}$)')

[48]:

<matplotlib.colorbar.Colorbar at 0x7f9e93a15720>

We can mosaic all of the individual georeferenced images. Let’s look at the options.

[31]:

?mosaic

Signature:
mosaic(
    input_dir,
    output_dir,
    output_name,
    method='mean',
    dtype=<class 'numpy.float32'>,
    band_names=None,
)
Docstring:
This function moasics all the given rasters into a single raster file

Inputs:
input_dir: a string containing the directory filepath of images to be mosaicked
output_dir: a string containing the directory filepath to save the output
output_name: a string of the output name of mosaicked .tif
method: Method to be used when multiple captures coincide at same location. Options: 'mean', 'first', 'min', 'max'. Defaults to 'mean'.
dtype: dtype of the mosaicked raster. Defaults to np.float32.
band_names: List of band names. If it is not None, it writes one file for each band instead of one file with all the bands. Defaults to None.

Returns:
Mosaicked .tif file
File:      /glusteruser/awindled/DroneWQ/utils.py
Type:      function

We will make a mosaic from all the georeferenced chl images, calculating the mean of overlapping pixels.

[32]:

input_dir = os.path.join(project_path, 'georeferenced_masked_chl_hu_ocx')
output_dir = os.path.join(project_path, 'mosaic_chl_hu_ocx')

mosaic(input_dir, output_dir, output_name = 'mean_mosaic_chl_hu_ocx',
           method='mean', band_names = None)

100%|██████████| 328/328 [09:12<00:00,  1.69s/it]

[32]:

'Lake_Erie/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx.tif'

Now let’s plot the mosaicked image.

[45]:

fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize=(12, 6), subplot_kw = dict(projection = Mercator()))
ax_0, mappable_0 = plot_georeferenced_data(ax=ax, filename="Lake_Erie/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx.tif",
                                           vmin=0, vmax=20, cmap='Greens', norm=None, basemap = cx.providers.Esri.WorldImagery)
ax_0.set_title('Hu OCX Chl')
plt.colorbar(mappable_0, label='Chlorophyll a (mg $m^{-3}$)')

[45]:

<matplotlib.colorbar.Colorbar at 0x7f9e93dd0c10>

We can also downsample or reduce the spatial resolution of the mosaic. This is helpful to reduce the file size of a large mosaic.

[34]:

?downsample

Signature:
downsample(
    input_dir,
    output_dir,
    scale_x,
    scale_y,
    method=<Resampling.average: 5>,
)
Docstring:
This function performs a downsampling to reduce the spatial resolution of the final mosaic.

Inputs:
input_dir: A string containing input directory filepath
output_dir: A string containing output directory filepath
scale_x: proportion by which the width of each file will be resized
scale_y: proportion by which the height of each file will be resized
method: the resampling method to perform. Defaults to Resampling.nearest. Please see
https://rasterio.readthedocs.io/en/stable/api/rasterio.enums.html#rasterio.enums.Resampling for other resampling methods.
File:      /glusteruser/awindled/DroneWQ/utils.py
Type:      function

We will downsample the width and height of the mosaic by 15%, using a nearest neighbor resampling algorithm.

[35]:

input_dir = os.path.join(project_path, 'mosaic_chl_hu_ocx')
output_dir = os.path.join(project_path, 'mosaic_chl_hu_ocx')

downsample(input_dir, output_dir, scale_x = 15, scale_y = 15, method = Resampling.nearest)

100%|██████████| 1/1 [00:00<00:00,  7.90it/s]

And we can plot it

[52]:

fig, ax = plt.subplots(nrows = 1, ncols = 1, figsize=(12, 6), subplot_kw = dict(projection = Mercator()))
ax_0, mappable_0 = plot_georeferenced_data(ax=ax, filename="Lake_Erie/mosaic_chl_hu_ocx/mean_mosaic_chl_hu_ocx_x_15_y_15_method_nearest.tif",
                                           vmin=5, vmax=15, cmap='Greens', norm=None, basemap = cx.providers.Esri.WorldImagery)
ax_0.set_title('Hu OCX Chl Downsampled')
plt.colorbar(mappable_0, label='Chlorophyll a (mg $m^{-3}$)')

[52]:

<matplotlib.colorbar.Colorbar at 0x7f9d08db37f0>

Some striping in the mosaic is visible due to the drone/sensor changing yaw angles every transect. Processing and mosaicking a larger dataset has the potential to reveal water quality patterns or algal blooms.

[ ]:

Demonstration of DroneWQ functions and processing code