# 3. Plotting and Numerical Python¶

## 3.1 In this session¶

In this session, we will introduce and use some packages that you will commonly use in scientific programming.

These are:

• numpy: NumPy is the fundamental package for scientific computing with Python
• matplotlib: Python 2D plotting library

We will also introduce some additional programming concepts, and set an exercise that you can do and get feedback on.

### 3.2.1 Getting Started¶

If you haven’t previously downloaded the course, type the following at the unix prompt (illustrated by berlin% here):

berlin% cd DATA
berlin% git clone https://github.com/profLewis/geogg122.git
berlin% cd geogg122/Chapter3_Scientific_Numerical_Python

If you already have a clone of the course repository (which you should have from last week):

berlin% cd DATA/geogg122
berlin% git pull
berlin% cd Chapter3_Scientific_Numerical_Python

This will update your local copy with any new notes or files. Any additional files that you yourself have created should be still where they were.

## E3.1 Exercise: listing¶

Using Python, produce a listing of the files in the subdirectory data of geogg122/Chapter3_Scientific_Numerical_Python that end with .nc and put this listing in a file called data/data.dat with each entry on a different line

## 3.2 Getting and Plotting Some Data: netCDF format¶

### 3.2.1 GlobAlbedo data¶

Before we start to use these new packages, we will start by getting some data for you to use and show you how to read it in and do some basic plotting.

Today, we will be using the ESA GlobAlbedo data.

These data come in different spatial resolutions, but here we will use the global product at 0.5 degrees.

The files are acessible via http in the directory:

http://www.globalbedo.org/GlobAlbedoXX/mosaics/YYYY/0.5/monthly

The filenames are of the pattern:

GlobAlbedo.YYYYMM.mosaic.5.nc.gz

where YYYY is the year (e.g. 2009) and MM is the month (01 is January, 12 is December).

XX is a code which is given below.

The data are in netCDF format which is a common binary data format.

# versioning (?) codes. At present these are:
XX = {1998:95,1999:95,2000:97,2001:97,2002:26,2003:66,2004:54,2005:54,\
2006:29,2007:25,2008:53,2009:56,2010:56,2011:78}

# but check on http://www.globalbedo.org by trying to order some data

year = 2009

root = 'http://www.globalbedo.org/GlobAlbedo%d/mosaics/%d/0.5/monthly/'%(XX[year],year)

# example filename : use formatting string:
# %d%02d
month = 1
url = root + '/GlobAlbedo.%d%02d.mosaic.5.nc.gz'%(year,month)

print url
http://www.globalbedo.org/GlobAlbedo56/mosaics/2009/0.5/monthly//GlobAlbedo.200901.mosaic.5.nc.gz

You could download the data youself (explained in the advanced section), but these files are available to you in this case in the directory files/data for the year 2009.

ls data/GlobAlbedo*.nc
data/GlobAlbedo.200901.mosaic.5.nc  data/GlobAlbedo.200907.mosaic.5.nc  data/GlobAlbedo.201001.mosaic.5.nc  data/GlobAlbedo.201007.mosaic.5.nc
data/GlobAlbedo.200902.mosaic.5.nc  data/GlobAlbedo.200908.mosaic.5.nc  data/GlobAlbedo.201002.mosaic.5.nc  data/GlobAlbedo.201008.mosaic.5.nc
data/GlobAlbedo.200903.mosaic.5.nc  data/GlobAlbedo.200909.mosaic.5.nc  data/GlobAlbedo.201003.mosaic.5.nc  data/GlobAlbedo.201009.mosaic.5.nc
data/GlobAlbedo.200904.mosaic.5.nc  data/GlobAlbedo.200910.mosaic.5.nc  data/GlobAlbedo.201004.mosaic.5.nc  data/GlobAlbedo.201010.mosaic.5.nc
data/GlobAlbedo.200905.mosaic.5.nc  data/GlobAlbedo.200911.mosaic.5.nc  data/GlobAlbedo.201005.mosaic.5.nc  data/GlobAlbedo.201011.mosaic.5.nc
data/GlobAlbedo.200906.mosaic.5.nc  data/GlobAlbedo.200912.mosaic.5.nc  data/GlobAlbedo.201006.mosaic.5.nc  data/GlobAlbedo.201012.mosaic.5.nc

We use the gdal module to read netCDF data:

import gdal
root = 'data/'

# example filename : use formatting string:
# %d%02d
year = 2009
month = 1
filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)
print filename

g = gdal.Open(filename)
# g should now be a GDAL dataset, but if the file isn't found
# g will be none. Let's test this:
if g is None:
print "Problem opening file %s!" % filename
else:
print "File %s opened fine" % filename

subdatasets = g.GetSubDatasets()
for fname, name in subdatasets:
print name
print "\t", fname
data/GlobAlbedo.200901.mosaic.5.nc
File data/GlobAlbedo.200901.mosaic.5.nc opened fine
[360x720] DHR_VIS (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS
[360x720] DHR_NIR (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_NIR
[360x720] DHR_SW (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_SW
[360x720] BHR_VIS (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":BHR_VIS
[360x720] BHR_NIR (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":BHR_NIR
[360x720] BHR_SW (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":BHR_SW
[360x720] DHR_sigmaVIS (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_sigmaVIS
[360x720] DHR_sigmaNIR (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_sigmaNIR
[360x720] DHR_sigmaSW (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_sigmaSW
[360x720] BHR_sigmaVIS (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":BHR_sigmaVIS
[360x720] BHR_sigmaNIR (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":BHR_sigmaNIR
[360x720] BHR_sigmaSW (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":BHR_sigmaSW
[360x720] Weighted_Number_of_Samples (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":Weighted_Number_of_Samples
[360x720] Relative_Entropy (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":Relative_Entropy
[360x720] Goodness_of_Fit (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":Goodness_of_Fit
[360x720] Snow_Fraction (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":Snow_Fraction
[360x720] Solar_Zenith_Angle (32-bit floating-point)
NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":Solar_Zenith_Angle

Here, we will want to access 'DHR_VIS', 'DHR_NIR' and 'DHR_SW', which are the bihemispherical reflectance (DHR: ‘black sky albedo’ – the albedo under directional illumination conditions) for visible, near infrared and total shortwave wavebands.

From the information printed in the loop above, we note that the field of information we need to access is of the form:

NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS

The way we do this in gdal is to use gdal.Open(f) where f is the full data layer specification (in this case e.g. NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS) to open a particular dataset within the file.

for example then:

f = 'NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS'

print data
[[        nan         nan         nan ...,         nan         nan
nan]
[        nan         nan         nan ...,         nan         nan
nan]
[        nan         nan         nan ...,         nan         nan
nan]
...,
[ 0.69980866  0.69980866  0.69980866 ...,  0.70370543  0.70370543
0.70370543]
[ 0.69980866  0.69980866  0.69980866 ...,  0.70370543  0.70370543
0.70370543]
[ 0.69980866  0.69980866  0.69980866 ...,  0.70370543  0.70370543
0.70370543]]

We can make this more flexible by building the string f from the specific pieces of information we require.

In this case, it helps to use a form of template for the string we need:

# form a generic string of the form
# NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS

file_template = 'NETCDF:"%s":%s'

# now make a list of the datset names we want
# so we can loop over this

selected_layers = ['DHR_VIS','DHR_NIR','DHR_SW']

# ----------------------------------

# try it out:

root = 'data/'

# example filename : use formatting string:
# %d%02d
year = 2009
month = 1
filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)
print filename

# set up an empty dictionary to load the data into
data = {}

# use enumerate here to loop over
# the list selected_layers and also have

for i, layer in enumerate ( selected_layers ):
this_file = file_template % ( filename, layer )
print "Opening Layer %d: %s" % (i, this_file )
g = gdal.Open ( this_file )

# test that the opening worked
# and raise an error otherwise

if g is None:
raise IOError
print "\t>>> Read %s!" % layer
data/GlobAlbedo.200901.mosaic.5.nc
Opening Layer 0: NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS
Opening Layer 1: NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_NIR
Opening Layer 2: NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_SW

If we really want the data in a single (3D) array, we could then create this from the dictionary entries:

# here, we just make a *list*
# as we will see below, a list might be a bit limited for these purposes and
# we would normally use a numpy array

albedo = [data[f] for f in selected_layers]

print len(albedo),len(albedo[0]),len(albedo[0][0])
3 360 720

### Exercise E3.2 Read Image Data into 3D List¶

Write some python code that directly reads the data layers 'DHR_VIS','DHR_NIR','DHR_SW' into a list for a given month and year.

### Exercise E3.3 Read More Image Data into a 4D List¶

You should now have some code that reads the 3 albedo datasets into a 3D list (3 x 360 x 720) for a given month and year.

Put a loop around this code to make a 4D list dataset (12 x 3 x 360 x 720) for all months in a given year.

### 3.2.3 Plotting an image¶

We can plot image data using the package pylab (from matplotlib).

If you are running this as a notebook, make sure you use:

ipython notebook --pylab=inline if you want the images to appear in the notebook.

First we have to import pylab, which we will refer to a plt (se import pylab as plt).

The basic function for plotting an image is plt.imshow:

import pylab as plt

# declare a new figure and its size
plt.figure(figsize=(10,4))

plt.imshow(albedo[0])
<matplotlib.image.AxesImage at 0x7f8124bc5c90>

Thats fine, but we might apply a few modifications to make a better plot:

import pylab as plt

# change the figure size
plt.figure(figsize=(10,4))

# use nearest neighbour interpolation
plt.imshow(albedo[0],interpolation='nearest')

# show a colour bar
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x7f8119ef3320>

Albedo should lie between 0 and 1, so there are clearly a few ‘funnies’ that we might flag later.

For now, we might like to plot only values between 0 and 1 (thresholding at those values).

Also if we don’t like this colour map, we can try another:

import pylab as plt

# change the figure size
plt.figure(figsize=(10,4))

# no interpolation
plt.imshow(albedo[0],interpolation='none',cmap=plt.get_cmap('binary'),vmin=0.0,vmax=1.0)

# show a colour bar
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x7f8119d9a200>

If we want to make use of strings to label each month, we might consider makeing a ‘month’ dictionary:

# set up a month dictionary
#
# notice how we lay this out for clarity
#
months  =    {1:'January',\
2:'February',\
3:'March',\
4:'April',\
5:'May',\
6:'June',\
7:'July',\
8:'August',\
9:'September',\
10:'October',\
11:'November',\
12:'December'}
# test it
m = 3

print 'month',m,'is',months[m]
month 3 is March

There are other (and in some ways neater) ways of achieving this, e.g.:

num_list   = range(1,13)
month_list = ['January','February','March','April','May',\
'June','July','August','September','October','November',\
'December']

print 'numbers',num_list
print 'months',month_list

# make a dictionary from 2 lists
month_dict = dict(zip(num_list,month_list))

print month_dict

# test it
m = 3

print '\n\tmonth',m,'is',month_dict[m]
numbers [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
months ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
{1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June', 7: 'July', 8: 'August', 9: 'September', 10: 'October', 11: 'November', 12: 'December'}

month 3 is March

But as you might expect, this is quite a common task, so we might use the python package calendar for this:

import calendar

m = 3

print '\n\tmonth',m,'is',calendar.month_name[m],'\n\n'

# or:

for m,n in enumerate(calendar.month_name):
# check its a valid string
# why do this???
if len(n) > 0:
print 'month',m,'is',n
month 3 is March

month 1 is January
month 2 is February
month 3 is March
month 4 is April
month 5 is May
month 6 is June
month 7 is July
month 8 is August
month 9 is September
month 10 is October
month 11 is November
month 12 is December

As a slight aside ... one advantage of using standard methods for accessing e.g. month names is that if we change the locale (the language settings etc.), we would expect to be able to do this in different languages automatically.

# after http://stackoverflow.com/questions/13037370/python-calendar-day-month-names-in-specific-locale
# Note how I reference where I found an example of this on the web ...

import locale
# get the default
def_locale = locale.getdefaultlocale()

# set to e.g. Galician
# see e.g.
# http://svn.python.org/projects/python/trunk/Lib/locale.py
locale.setlocale(locale.LC_ALL,'gl_ES')

for m,n in enumerate(calendar.month_name):
# check its valid string
# why do this???
if len(n) > 0:
print 'month',m,'is',n
month 1 is Xaneiro
month 2 is Febreiro
month 3 is Marzo
month 4 is Abril
month 5 is Maio
month 6 is Xu�o
month 7 is Xullo
month 8 is Agosto
month 9 is Setembro
month 10 is Outubro
month 11 is Novembro
month 12 is Decembro
# set it back again

locale.setlocale(locale.LC_ALL, def_locale)

for m,n in enumerate(calendar.month_name):
# check its valid string
# why do this???
if len(n) > 0:
print 'month',m,'is',n
month 1 is January
month 2 is February
month 3 is March
month 4 is April
month 5 is May
month 6 is June
month 7 is July
month 8 is August
month 9 is September
month 10 is October
month 11 is November
month 12 is December

If we wanted to save the plot and put a title on:

import pylab as plt
import calendar

# change the figure size
plt.figure(figsize=(10,4))
# plt.clf(): clear the figure in case anything in it before
plt.clf()

month = 1

plt.title('VIS BHR albedo for %s %d'%(calendar.month_name[month],year))
# use nearest neighbour interpolation
plt.imshow(albedo[0],interpolation='nearest',cmap=plt.get_cmap('Spectral'),vmin=0.0,vmax=1.0)

# show a colour bar
plt.colorbar()

# save the plot to a jpeg image file
plt.savefig('data/albedo.jpg')

### E3.2.1 Software¶

You can sort of make movies in pylab, but you generally have to make a system call to unix at some point, so it’s probably easier to do this all in unix with the utility convert <http://www.imagemagick.org/script/convert.php>__.

berlin% which convert
/usr/bin/convert

If this doesn’t come up with anything useful, there is probably a version in /usr/bin/convert or /usr/local/bin/convert (If you don’t have it on your local machine, install ImageMagick <http://www.imagemagick.org/script/index.php>__ which contains the command line tool convert).

To use this, e.g.:

from the unix command line:

berlin% cd ~/DATA/geogg122/Chapter3_Scientific_Numerical_Python
berlin% convert data/albedo.jpg data/albedo.gif

or from within a notebook:

!convert data/albedo.jpg data/albedo.gif

Or, more practically here, you can run a unix command directly from Python:

import os
cmd = 'convert data/albedo.jpg data/albedo.gif'
os.system(cmd)
0

This will convert the file files/data/albedo.jpg (in jpeg format) to files/data/albedo.gif (in gif format).

We can also use convert to make animated gifs, which is one way of making a movie.

### E3.2.2 Looping over a set of images¶

You have all of the code you need above to be able to read a GlobAlbedo file for a given month and waveband in Python and save a picture in jpeg format, but to recap for BHR_VIS:

import gdal
import pylab as plt
import os
import calendar

layer = 'BHR_VIS'

# form a generic string of the form
# NETCDF:"data/GlobAlbedo.200901.mosaic.5.nc":DHR_VIS

file_template = 'NETCDF:"%s":%s'

root = 'data/'

# example filename : use formatting string:
# %d%02d
year = 2009

# set the month (1-based, i.e. 1 == January)
month = 1

filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

g = gdal.Open (  file_template % ( filename, layer ) )

if g is None:
raise IOError

''' Plot the data and save as picture jpeg format '''
# make a string with the output file name
out_file = root + 'GlobAlbedo.%d%02d.jpg'%(year,month)
# plot
plt.figure(figsize=(10,4))
plt.clf()
# %9s forces the string to be 8 characters long
plt.title('VIS BHR albedo for %8s %d'%(calendar.month_name[month],year))
# use nearest neighbour interpolation
plt.imshow(data,interpolation='nearest',cmap=plt.get_cmap('Spectral'),vmin=0.0,vmax=1.0)
# show a colour bar
plt.colorbar()
plt.savefig(out_file)

# convert to gif
# set up the unix command which is of the form
# convert input output
# Here input will be out_file
# and output we can get with out_file.replace('.jpg','.gif')
# i.e. replacing where it says .jpg with .gif
cmd = 'convert %s %s'%(out_file,out_file.replace('.jpg','.gif'))
os.system(cmd)
0

Modify the code above to loop over each month, so that it generates a set of gif format files for the TOTAL SHORTWAVE ALBEDO

You should confirm that these exist, and that the file modification time is when you ran it (not when I generated the files for these notes, which is Oct 10 2013).

ls -l data/GlobAlbedo.2009??.gif
-rw-rw-r--. 1 plewis plewis 51195 Oct  7  2014 [0m[38;5;13mdata/GlobAlbedo.200901.gif[0m
-rw-rw-r--. 1 plewis plewis 45450 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200902.gif[0m
-rw-rw-r--. 1 plewis plewis 37857 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200903.gif[0m
-rw-rw-r--. 1 plewis plewis 34722 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200904.gif[0m
-rw-rw-r--. 1 plewis plewis 33199 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200905.gif[0m
-rw-rw-r--. 1 plewis plewis 34251 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200906.gif[0m
-rw-rw-r--. 1 plewis plewis 34969 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200907.gif[0m
-rw-rw-r--. 1 plewis plewis 34854 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200908.gif[0m
-rw-rw-r--. 1 plewis plewis 36803 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200909.gif[0m
-rw-rw-r--. 1 plewis plewis 38035 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200910.gif[0m
-rw-rw-r--. 1 plewis plewis 39616 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200911.gif[0m
-rw-rw-r--. 1 plewis plewis 40297 Oct  3 12:51 [38;5;13mdata/GlobAlbedo.200912.gif[0m
[m

### E3.2.3 Make the movie¶

The unix command to convert these files to an animated gif is:

convert -delay 100  -loop 0  files/data/GlobAlbedo.2009??.gif files/data/GlobAlbedo.2009.SW.gif

Run this (ideally, from within Python) to create the animated gif GlobAlbedo.2009.SW.gif

Again, confirm that you created this file (and it is not just a version you downloaded):

ls -l data/GlobAlbedo.2009.SW.gif
-rw-rw-r--. 1 plewis plewis 625640 Oct  3 12:38 [0m[38;5;13mdata/GlobAlbedo.2009.SW.gif[0m
[m

## 3.3 Numpy¶

### 3.3.1 Numpy Arrays¶

At the heart of the numpy <http://docs.scipy.org/doc/numpy/contents.html>__ module is a a powerful N-dimensional array object.

Among many other things, this allows us to do arithmetic (and other operations) directly on arrays, rather than having to loop over each element (as we did with lists for example).

Two main benefits of this are:

• the code is much easier to read
• running the code is much more efficient (because, ‘under the hood’ as it were of the programming language, we can do fast operations such as vector processing).

As our first example, let’s read the shortwave albedo data we examined above into a numpy array.

import gdal
import numpy as np

#  first, we write a method to read
# the GA type files

'''
Method to read a GlobAlbedo file
'''
file_template = 'NETCDF:"%s":%s'

filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

g = gdal.Open (  file_template % ( filename, layer ) )

if g is None:
raise IOError

# return a numpy array
return(np.array(data))

# some interesting things about numpy arrays
print "the array type is now",type(band)
print "the shape of the array is",band.shape
print "the size of the array is",band.size
print "the number of dimensions is",band.ndim
print "the data type in the array is",band.dtype
the array type is now <type 'numpy.ndarray'>
the shape of the array is (360, 720)
the size of the array is 259200
the number of dimensions is 2
the data type in the array is float32

A few things to notice here.

First, we imported the numpy module using

import numpy as np

which means that locally, we refer to this module as np.

Second, we converted from the format that was read from the netCDF file to a numpy array with:

return np.array(data)

so this is one way to convert from other data types to numpy representation (e.g. np.array([1,2,3]) for a conversion from a list).

Then we saw at the end of this code a number of commonly used methods for the numpy class that provide us with information on the shape of the array (N.B. this is (rows,cols) as you might notice from the plots we generated), the total size (number of elements), number of dimansions.

We also saw that we could access the array data type (dtype). This is very different to a list or tuple then, because a numpy array can only contain data of a single data type (but lists or tuples can have different data types in each element).

#### astype¶

We can convert between data types using astype:

a = np.array([1,2,3])
print a.dtype
int64
b = a.astype(float)
print b
print b.dtype
[ 1.  2.  3.]
float64

#### slice¶

As you would expect, we can slice in a numpy array. All that is different is that we set up a slice of each dimension, e.g.:

# see if you can work out why this is
# the size, shape and ndim it is

print band[-3:-1,10:30:5]
print band[-3:-1,10:30:5].shape
print band[-3:-1,10:30:5].ndim
[[ 0.63276225  0.6558162   0.6557712   0.65737408]
[ 0.6311323   0.6311323   0.63627392  0.65473425]]
(2, 4)
2

If you want to specify all elements in a given dimension you need to use at least one :, e.g.

# what does this mean?

print band[10:20,0]
print band[10:20,0].shape
[ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]
(10,)

#### nan, inf¶

Some of the data in the array band appear as nan (‘not a number’), which is how, in this dataset, non-valid pixels are specified.

nan is also what you would get for undefined results from arithmetic:

a = np.array([0., 1.])
b = np.array([0., 0.])
c = a/b
print c
print 'so 0./0. =',c[0]
print 'so 1./0. =',c[1]
[ nan  inf]
so 0./0. = nan
so 1./0. = inf
-c:3: RuntimeWarning: divide by zero encountered in divide
-c:3: RuntimeWarning: invalid value encountered in divide

inf here means ‘infinity’, which is what anything other than zero divided by zero is.

We can check to see if some value is nan or inf e.g:

print c
print 'is c nan?',np.isnan(c)
print 'is c inf?',np.isinf(c)
[ nan  inf]
is c nan? [ True False]
is c inf? [False  True]

Generally we would want to avoid processing array values that contained nan or inf as any arithmetic involving these is unlikely tell us more than we already know:

print 'inf + 1\t\t=',np.inf + 1
print 'inf - np.inf\t=',np.inf - np.inf
print 'nan / 2 \t\t=',np.nan / 2
inf + 1             = inf
inf - np.inf        = nan
nan / 2             = nan

Or, trying to e.g. calculate the sum of values in the array band:

print band.sum()
nan

which is not really what we wanted.

We have seen above that np.isnan(c) and np.isinf(c) result in a bool array of the same shape as the input array.

We can use these to mask out things we don’t want, e.g.

no_data = np.isnan(band)
print no_data
[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

This will be True where the data in band were nan ... but what we might often want is the opposite of this, i.e. True where the data are valid.

We might guess that we could simply type:

valid_data = ~np.isnan(band)
print valid_data
[[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]
...,
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]]

where ~ means a logical NOT.

There is a special type of object in numpy called a masked array. With this array representation, operations are only performed where the data mask is False. We need to import the masked array module ma from numpy:

import numpy.ma as ma

False

When we convert the representation from a normal numpy array to a masked array using ma.array(), the default mask is False (which means no mask).

In this case, we want to set the mask to True where the data are nan. We would normally do this when setting up the masked array:

print "so the array is now\n",masked_band
so the array is now
[[-- -- -- ..., -- -- --]
[-- -- -- ..., -- -- --]
[-- -- -- ..., -- -- --]
...,
[0.651062548160553 0.651062548160553 0.651062548160553 ...,
0.6595190763473511 0.6595190763473511 0.6595190763473511]
[0.651062548160553 0.651062548160553 0.651062548160553 ...,
0.6595190763473511 0.6595190763473511 0.6595190763473511]
[0.651062548160553 0.651062548160553 0.651062548160553 ...,
0.6595190763473511 0.6595190763473511 0.6595190763473511]]

[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

### 3.3.2 Reading data from multiple files¶

#### zeros, ones, empty¶

Suppose we want to read in all 12 months of SW albedo data following on from the example above.

import gdal
import numpy as np

# define the basic reading method as above

'''
Method to read a GlobAlbedo file
'''
file_template = 'NETCDF:"%s":%s'

filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

g = gdal.Open (  file_template % ( filename, layer ) )

if g is None:
raise IOError

# return a numpy array
return(np.array(data))

# which months?
months = xrange(1,13)

# loop over month
# but right now we'll just pretend that we're
# doing that and consider what happens the first time
# we are in our loop
#
# set i as index counter
# set the variable month to be
# months[0]
i = 0

Looking at what would happen when we read the first file, we can see that we don’t actually know how large the whole dataset will be until we have read our first file.

We know that we will want dimensions of (12,nrows,ncols) (or better still, (len(months),nrows,ncols) in case we decide to change months), but until we have read the data into the variable band with:

we don’t know how many rows or columns the dataset has (well, we might do ... but we want to try to design flexible code, where if the size of the datasets changed, our code would still operate correctly!).

But, when we have first read some data, then we would be in a position to say what the shape of the whole dataset should be. At that point, we could use the method:

np.empty(shape)

to (in essence) set aside (‘allocate’) some memory to store these data. There are three main options for this:

• np.zeros(shape) : set up an array with shape defined by shape and initialise with 0
• np.ones(shape) : set up an array with shape defined by shape and initialise with 1
• np.empty(shape) : set up an array with shape defined by shape but don’t initialise (so, in theory, a little faster, but can be filled with randon junk)

we can control the dtype of the array with a dtype option. e.g.:

print '-'*30;print '1-D arrays';print '-'*30;
print 'np.zeros(1):',np.zeros(1)  # 1 : 1D array
print 'np.zeros(2):',np.zeros(2)  # 2 : 1D array

print '\n' + '-'*30;print '2- and 3-D arrays';print '-'*30;
print 'np.zeros((1,2)):\n',np.zeros((1,2))  # default dtype is float
print 'np.zeros((2,2)):\n',np.zeros((2,2))  # 2 X 2
print 'np.zeros((2,2,3)):\n',np.zeros((2,2,3))  # 2 X 2 X 3

print '\n' + '-'*30;print 'empty and ones';print '-'*30;
print 'np.empty((1,2)):\n',np.empty((1,2))  # default dtype is float
print 'np.ones((1,2)):\n',np.ones((1,2))  # default dtype is float

print '\n' + '-'*30;print 'Changing the data type';print '-'*30;
print 'np.ones((1,2),dtype=int):\n',np.ones((1,2),dtype=int)  #  dtype int
print 'np.empty((1,2),dtype=int):\n',np.empty((1,2),dtype=int)  #  dtype int
print 'np.ones((1,2),dtype=bool):\n',np.ones((1,2),dtype=bool)  #  dtype bool
print 'np.zeros((1,2),dtype=bool):\n',np.zeros((1,2),dtype=bool)  #  dtype bool
print 'np.empty((1,2),dtype=bool):\n',np.empty((1,2),dtype=bool)  #  dtype bool

print '\n' + '-'*30;print 'Note that we can have a string array';print '-'*30;
print 'np.ones((1,2),dtype=str):\n',np.ones((1,2),dtype=str)  #  dtype str

print '\n' + '-'*30;print 'Or even e.g. a list array';print '-'*30;
print 'np.ones((2,2),dtype=list):\n',np.ones((2,2),dtype=list)  #  dtype list
print 'np.empty((2,2),dtype=list):\n',np.empty((2,2),dtype=list)  #  dtype list
------------------------------
1-D arrays
------------------------------
np.zeros(1): [ 0.]
np.zeros(2): [ 0.  0.]

------------------------------
2- and 3-D arrays
------------------------------
np.zeros((1,2)):
[[ 0.  0.]]
np.zeros((2,2)):
[[ 0.  0.]
[ 0.  0.]]
np.zeros((2,2,3)):
[[[ 0.  0.  0.]
[ 0.  0.  0.]]

[[ 0.  0.  0.]
[ 0.  0.  0.]]]

------------------------------
empty and ones
------------------------------
np.empty((1,2)):
[[ 0.  0.]]
np.ones((1,2)):
[[ 1.  1.]]

------------------------------
Changing the data type
------------------------------
np.ones((1,2),dtype=int):
[[1 1]]
np.empty((1,2),dtype=int):
[[1 1]]
np.ones((1,2),dtype=bool):
[[ True  True]]
np.zeros((1,2),dtype=bool):
[[False False]]
np.empty((1,2),dtype=bool):
[[False False]]

------------------------------
Note that we can have a string array
------------------------------
np.ones((1,2),dtype=str):
[['1' '1']]

------------------------------
Or even e.g. a list array
------------------------------
np.ones((2,2),dtype=list):
[[1 1]
[1 1]]
np.empty((2,2),dtype=list):
[[None None]
[None None]]

So, once we have read one band, we could write:

import gdal
import numpy as np

# define the basic reading method as above

,filename=None):
'''
Method to read a GlobAlbedo file
'''
file_template = 'NETCDF:"%s":%s'

# what does this do???
filename = filename or root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

g = gdal.Open (  file_template % ( filename, layer ) )

if g is None:
raise IOError

# return a numpy array
return(np.array(data))

# which months?
months = xrange(1,13)

# loop over month
# but right now we'll just pretend that we're
# doing that and consider what happens the first time
# we are in our loop
#
# set i as index counter
# set the variable month to be
# months[0]

i = 0
month=months[i]

if i == 0:  # first band read
# set up a tuple (or list) saying what shape we want the data
shape = (len(months),) + band.shape
data = np.zeros(shape)

print "I set up the array data of shape",data.shape,"with",data.ndim,"dimensions"
I set up the array data of shape (12, 360, 720) with 3 dimensions

We also use slicing when assigning (‘loading’, if you like) data into a larger array:

data[i,:,:] = band

But this will only work if what you are trying to load and where you are tryng to load it are the same shape:

data[i,:,:].shape == band.shape

which they are in this case.

Putting this all together then:

# this then is the file we want
local_file = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

# load the netCDF data from the file local_file

if i == 0:  # first band read
# set up a tuple (or list) saying what shape we want the data
shape = (len(months),) + band.shape
data = np.empty(shape)

# load band into the data array
data[i,::] = band

And setting the loop around this:

import numpy as np
import calendar

root = 'data/'
year = 2009

# which months?
months = xrange(1,13)

# loop over month
# use enumerate so we have an index counter
for i,month in enumerate(calendar.month_name[1:]):
# this then is the file we want
local_file = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,i+1)

# load the netCDF data from the file local_file

if i == 0:  # first band read
# set up a tuple (or list) saying what shape we want the data
shape = (len(months),) + band.shape
data = np.empty(shape)

# load band into the data array
data[i,::] = band

so now we have all of the data loaded into the array data.

An alternative way of achieving this same effect is to use a list, creating an empty list before going into the loop and appending band to the list each time we read a new band.

import numpy as np
import calendar

root = 'data/'
year = 2009

# which months?
months = calendar.month_name[1:]

# empty list
data = []

# loop over month
# use enumerate so we have an index counter
for i,month in enumerate(months):
# this then is the file we want
local_file = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,i+1)

# load the netCDF data from the file local_file
# append what we read to the list called data
data.append(np.array(band))

# convert data to a numpy array (its a list of arrays at the moment)
data = np.array(data)
print 'shape',data.shape
shape (12, 360, 720)

In many ways, this is a simpler and ‘cleaner’ way of setting up a 3-D array from reading multiple 2-D arrays. If the array is very large, it can be less efficient (you have to convert a huge list to an array at the end), but that is not normally something to worry about and this approach is often preferable.

In fact, if we know how many entries (in time) we are expecting, we can simply set up the storage array before going into the loop:

import gdal
import numpy as np

# define the basic reading method as above

,filename=None):
'''
Method to read a GlobAlbedo file
'''
file_template = 'NETCDF:"%s":%s'

# what does this do???
filename = filename or root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

g = gdal.Open (  file_template % ( filename, layer ) )

if g is None:
raise IOError

# return a numpy array
return(np.array(data))

root = 'data/'
year = 2009

# which months?
months = np.array(calendar.month_name[1:])

# get the image size by reading the first image
# this is a bit wasteful, but makes things
# neater in the long run
local_file = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,i+1)

# empty list of required *full* shape
data = np.zeros((len(months),)+shape)

# loop over month
# use enumerate so we have an index counter
for i,month in enumerate(months):
# this then is the file we want
local_file = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,i+1)

# load the netCDF data from the file local_file

# insert into list in index i
data[i] = np.array(band)

# convert to proper 3D array
data = np.array(data)

print data.shape
(12, 360, 720)
# we could wrap all of this up into the reading method:

layers = ['BHR_VIS']):
'''
'''
file_template = 'NETCDF:"%s":%s'

if len(months) == 0:
months = xrange(1,13)

first = True
shape = None

i = 0
for year in years:
for month in months:
for layer in layers:

# what does this do???
filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)

g = gdal.Open (  file_template % ( filename, layer ) )

if g is None:
raise IOError

# if first time
if first:
first = False
# get the shape
shape = data_.shape
# set up the big array to return
shape = (len(months) * len(years) * len(layers),)+shape
data = np.zeros(shape)
data[i] = data_
# increment counter
i += 1
return(np.array(data))

# which months?
months = np.array(calendar.month_name[1:])

print data.shape
(12, 360, 720)

#### sum¶

To sum data in a numpy (or masked) array, we use the function sum.

e.g.:

print data.sum(),data[0,0,0]
nan nan

Thats not very useful in this case, because we have nan in the dataset ... though if we use a masked array it will work as we expect.

np.sum here adds all of the values in the array.

We could select only values that are >= 0 with the logical statement:

not_valid = np.isnan(data)
print not_valid
[[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

...,
[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]

[[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
[ True  True  True ...,  True  True  True]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]]

so not_valid here is an array of the same size and shape as data, with True where the value is not valid (nan) and False elsewhere.

We can use this to select only data elements in data which are False in not_valid (not not_valid is valid ...), or we could change this into a valid array:

valid = not_valid == False

ndata = data[valid]
print ndata
[ 0.70465302  0.77435672  0.79079205 ...,  0.68589932  0.68589932
0.68589932]

Now, the array is a 1-D array. It has fewer elements that the original data array:

print 'original data size\t:',data.size
print 'new data size\t\t:',ndata.size
print 'data shape\t\t:',data.shape
print 'new data shape\t\t:',ndata.shape
original data size  : 3110400
new data size               : 1040376
data shape          : (12, 360, 720)
new data shape              : (1040376,)

This is however, one way we can use np.sum to add up the values in data:

print data[valid].sum()
438888.418459

#### mean¶

Similarly, if we wanted the mean or standard deviation and we tried to do this on the array that has nan in it:

print data.mean(),data.std()
nan nan

but selecting only the values that are valid:

print data[valid].mean(),data[valid].std()
0.421855577655 0.382216115949

we get what we might expect.

Remember that this is for normal (i.e. not masked) numpy arrays. Hassles of this nature with nan in the dataset is one good reason to use masked arrays.

#### axis selction¶

As used above, functions such as sum, mean and std will give you the sum, mean and standard deviation over the whole array.

Often though, we only want to apply these functions over a particular dimension.

Recall that the shape of data is:

data.shape
(12, 360, 720)

The first dimension here is ‘month’ then, the second is ‘latitude’ and the third ‘longitude’.

If we wanted to know the mean albedo for each month then then, we would want to apply mean over axis 1 and 2:

In numpy, this is done by specifying axis in the function:

month_mean = data.mean(axis=(1,2))
print month_mean.shape
(12,)

Thats the correct shape, but if we look at the values:

print month_mean
[ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]

we see that we get nan ...

That’s hardly surprising, as this is the same issue we had earlier with just calling sum or mean over the whole array.

Rather than end this section with a negative result, lets consider an array without nan.

We have previously generated the array valid, which is a logical array, but we could e.g. convert this to float:

gdata = (valid).astype(float)
print gdata.sum(axis=(1,2))
print gdata.std(axis=(1,2))
print gdata.mean(axis=(1,2))
[ 86698.  86698.  86698.  86698.  86698.  86698.  86698.  86698.  86698.
86698.  86698.  86698.]
[ 0.47180942  0.47180942  0.47180942  0.47180942  0.47180942  0.47180942
0.47180942  0.47180942  0.47180942  0.47180942  0.47180942  0.47180942]
[ 0.33448302  0.33448302  0.33448302  0.33448302  0.33448302  0.33448302
0.33448302  0.33448302  0.33448302  0.33448302  0.33448302  0.33448302]

There are many useful functions in numpy for manipulating array data, from basic arithmetic through to these function here for some basic stats.

They are all very easy to use and make for clear code, but if you have masks in your dataset (as we often do with geospatial data), you need to think carefully about how you want to process in the presence of masks.

It is often a good idea when you have masks to use a masked array.

## E3.3 Exercise: 3D Masked Array¶

Taking the code above as a starting point, generate a masked array of the GlobAlbedo dataset for the year 2009.

## 3.4 Average Albedo¶

If you run the exercise above, data should now be a masked array.

If you haven’t (e.g. when going through this in the class), you can run a function in the file python/masked.py <files/python/masked.py>__ to do this now:

import sys
# look in the local python (code) directory
sys.path.insert(0,'python')

print type(data)
print data.shape
print data.ndim
(12, 360, 720)
3

Now we have the albedo dataset for one year stacked in a 3-D masked array array (assuming you sucessfully completed exercise E3.3 or ran the code above), we can do some interesting things with it with numpy (without the nan hassles above):

### 3.4.1 Annual Mean and Standard Deviation¶

# calculate the mean albedo over the year
# The axis=0 option tells numpy which
# dimension to apply the function mean() over
mean_albedo = data.mean(axis=0)

# just confirm this is the ndim and shape we expect ...
print mean_albedo.ndim
print mean_albedo.shape
2
(360, 720)
# do a plot

# change the figure size
plt.figure(figsize=(10,4))

plt.title('SW BHR mean albedo for %d'%(year))
# use nearest neighbour interpolation
plt.imshow(mean_albedo,interpolation='none',\
cmap=plt.get_cmap('Spectral'),vmin=0.0,vmax=1.0)

# show a colour bar
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x7f8119c9f830>
#std
std_albedo = data.std(axis=0)
plt.figure(figsize=(10,4))

plt.title('SW BHR std albedo for %d'%(year))
# use nearest neighbour interpolation
plt.imshow(std_albedo,interpolation='none',\
cmap=plt.get_cmap('Spectral'),vmin=0.0)

# show a colour bar
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x7f8119bfea28>

We can see form this that most of the intra-annual variation in total shortwave albedo is at high latitudes (but not Greenland or Antarctica, for instance).

Why do you think that is?

### 3.4.2 NDVI¶

It might be interesting to look at NDVI data in this same way and to examine a few more stats using numpy. Even though this is a broadband albedo dataset, we would expect NDVI to give some indication of vegetation activity.

#### min, max¶

import sys
sys.path.insert(0,'files/python')

ndvi = (nir-vis)/(nir+vis)
# set negative and ndvi > 1 (error) to 0
#ndvi[ndvi>1] = 0
#ndvi[ndvi<0] = 0
/home/plewis/envs/my_root/lib/python2.7/site-packages/numpy/ma/core.py:783: RuntimeWarning: invalid value encountered in greater_equal
return umath.absolute(a) * self.tolerance >= umath.absolute(b)
mean_ndvi = ndvi.mean(axis=0)
max_ndvi = ndvi.max(axis=0)
min_ndvi = ndvi.min(axis=0)
range_ndvi = max_ndvi - min_ndvi
std_ndvi = ndvi.std(axis=0)

# set up a dictionary
datasets = {"mean NDVI 2009":mean_ndvi,\
"max NDVI 2009":max_ndvi,\
"min NDVI 2009":min_ndvi,\
"NDVI range 2009":range_ndvi,\
"std NDVI 2009":std_ndvi}

vmax = {"mean NDVI 2009":1.0,\
"max NDVI 2009":1.,\
"min NDVI 2009":1.,\
"NDVI range 2009":1.,\
"std NDVI 2009":0.25}

# plotting loop

for k in datasets.keys():
# note the order is not maintained
# from when we set up the dictionary
plt.figure(figsize=(10,4))
plt.title(k)
plt.imshow(datasets[k],interpolation='none',\
cmap=plt.get_cmap('YlGn'),vmin=0.0,vmax=vmax[k])
plt.colorbar()

The mean and max NDVI is pretty much what you would expect (MODIS NDVI, Nov-Dec, 2007 shown here):

but again we see the large variations in NDVI in the Northern Hemisphere.

One common use for albedo data is to calculate the amount of absorbed solar radiation at the surface.

In this section, we will build a model and dataset of downwelling solar radiation.

We can build a simple model of solar radiation:

Incoming solar radiation, $$A$$, ignoring the effect of the atmosphere, clouds, etc is given by

$A = E_{0}\sin\theta$

where $$E_{0}$$ is the solar constant (given as $$1360Wm^{-2}$$), and $$\theta$$ is the solar elevation angle. The solar elevation angle is approximately given by

$\sin \theta = \cos h \cos \delta \cos \varphi + \sin \delta \sin \varphi$

where $$h$$ is the hour angle, $$\delta$$ is the solar declination angle and $$\varphi$$ is the latitude. The solar declination can be approximated by

$\delta = - \arcsin \left [ 0.39779 \cos \left ( 0.98565 \left (N + 10 \right ) + 1.914 \sin \left ( 0.98565 \left ( N - 2 \right ) \right ) \right ) \right ]$

where $$N$$ is the day of year beginning with N=0 at 00:00:00 UTC on January 1

We have the latitude associated with the dataset from:

# example filename : use formatting string:
# %d%02d
year = 2009
month = 1
filename = root + 'GlobAlbedo.%d%02d.mosaic.5.nc'%(year,month)
print filename

# read some data to get the shape

# latitude, longitude
dd = 0.5

lat,lon = np.mgrid[-90+dd/2:90+dd/2:dd,-180+dd/2:180+dd/2:dd]
print '------ latitude ------'
print lat
print '------ longitude ------'
print lon
data/GlobAlbedo.200901.mosaic.5.nc
------ latitude ------
[[-89.75 -89.75 -89.75 ..., -89.75 -89.75 -89.75]
[-89.25 -89.25 -89.25 ..., -89.25 -89.25 -89.25]
[-88.75 -88.75 -88.75 ..., -88.75 -88.75 -88.75]
...,
[ 88.75  88.75  88.75 ...,  88.75  88.75  88.75]
[ 89.25  89.25  89.25 ...,  89.25  89.25  89.25]
[ 89.75  89.75  89.75 ...,  89.75  89.75  89.75]]
------ longitude ------
[[-179.75 -179.25 -178.75 ...,  178.75  179.25  179.75]
[-179.75 -179.25 -178.75 ...,  178.75  179.25  179.75]
[-179.75 -179.25 -178.75 ...,  178.75  179.25  179.75]
...,
[-179.75 -179.25 -178.75 ...,  178.75  179.25  179.75]
[-179.75 -179.25 -178.75 ...,  178.75  179.25  179.75]
[-179.75 -179.25 -178.75 ...,  178.75  179.25  179.75]]

where np.mgrid is a numpy method that produces sets of numbers on a grid from slice information, e.g. -90:90+dlat:dlat means from -90, to (but not including) 90+dlat, in steps of dlat.

# lets make it a masked array

lat,lon = np.mgrid[-90+dd/2:90+dd/2:dd,-180+dd/2:180+dd/2:dd]

# we want a single time slice so mask[0]

print lat.shape

plt.figure(figsize=(10,4))
plt.imshow(lat)

plt.title('latitude')
plt.colorbar()
shape of vis.mask (12, 360, 720)
(360, 720) (360, 720)
(360, 720)
<matplotlib.colorbar.Colorbar instance at 0x7f8119504ea8>

for day of year N then, we have a formula for solar declination.

Approximating N for each month by an average:

# mean days per month
av_days = 365.25 / 12.
half = av_days/2.
N = np.arange(half,365.25,av_days)
print N
[  15.21875   45.65625   76.09375  106.53125  136.96875  167.40625
197.84375  228.28125  258.71875  289.15625  319.59375  350.03125]
# or using mgrid:

N = np.mgrid[half:365.25:av_days]
print N
[  15.21875   45.65625   76.09375  106.53125  136.96875  167.40625
197.84375  228.28125  258.71875  289.15625  319.59375  350.03125]

we can approximate the solar declination as:

t1 = 0.39779*np.cos( np.deg2rad ( 0.98565*(N+10) + 1.914*np.sin ( t0 ) ) )
delta = -np.arcsin ( t1 )

plt.xlabel('day of year')
plt.ylabel('solar declination angle (degrees)')
plt.xlim(0,365.25)
(0, 365.25)

Although we might think the formulae a little complicated, the translation from the formula to numpy code is really very clear and readable. In particular, by having N as an array object, we are able to apply the formulae to the array and not have to obfuscate the code with loops.

Note that the formulae above are given in degrees, but trignometric functions in most programming languages work in radians.

In numpy, we can use the functions deg2rad and rad2deg to convert between degrees and radians and vice-versa (alternatively, multiply by np.pi/180. and 180./np.pi respectively).

The trig functions cosine and sine are cos and sin respectively, and their inverses arccos and arcsin.

For our solar energy calculations, we will set the ‘hour angle’ to zero (this is solar noon), so:

h = 0.0

To estimate the solar elevation angle then, we need to include the latitude. At present, we have an array lat for latitude and N for day of year:

print 'lat:',lat.shape
print 'N:',N.shape
lat: (360, 720)
N: (12,)

what we need is lat and N in the full 3-D array shape (12,360,720).

There are various ways to do this, but a simple one is to just use np.mlist with an extra (time) dimension:

dd = 0.5
month,lat,lon = np.mgrid[0:12,-90+dd/2:90+dd/2:dd,-180+dd/2:180+dd/2:dd]
print '------ latitude ------'
print lat.shape
print '------ longitude ------'
print lon.shape
------ latitude ------
(12, 360, 720)
------ longitude ------
(12, 360, 720)

At this point, we might consider it useful to write a function. We have used an example of this above, but go into it in more detail now.

These are of the form:

def fn_name(args):
...
return value
def declination(N):
'''Calculate solar declination (in degrees)'''
t1 = 0.39779*np.cos( np.deg2rad ( 0.98565*(N+10) + 1.914*np.sin ( t0 ) ) )
delta = -np.arcsin ( t1 )

def solar_elevation(delta,h,lat):
'''solar elevation in degrees'''
sin_theta = np.cos (h)*np.cos (delta)*np.cos(lat) + np.sin ( delta)*np.sin(lat)
# create numpy arrays of the right shape
# using mgrid for simplicity

N,lat,lon = np.mgrid[half:365.25:av_days,-90+dd/2:90+dd/2:dd,-180+dd/2:180+dd/2:dd]

# zeros_like creates array of the same shape as N here (or, lat or lon)
# filled with zeros. Similarly ones_like
h2 = np.zeros_like(N) + h
# now put these things together

# copy the arrays, so we don't overwrite the data

delta = declination(N)
e0 = 1360.
sea = solar_elevation(delta,h2,lat)
# threshold at zero

# plot to visualise the data and see it looks reasonable

plt.figure(figsize=(10,4))
plt.title('Jan')
plt.colorbar()

plt.figure(figsize=(10,4))
plt.title('Jun')
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x7f8118d58fc8>

This sort of application really shows the power of manipulating arrays.

## E3.5 Exercise: Solar Radiation Code: A chance for you to get feedback¶

In the section above, we have developed some code to estimate solar radiation and we have packaged some of it into functions.

In a file called python/solar.py, develop some Python code that will

1. Read GlobAlbedo data DHR_SW (shortwave directional-hemispherical reflectance) into a masked array data
2. Generate a spatial Solar Radiation dataset rad of the same shape as data
3. Calculate the amount of solar radiation absorbed at the surface (W m^-2)
4. Calculate the latitudinal total absorbed radiation for each month (W m^-2)
5. Calculate the global total absorbed radiation power for each month (W)

Where appropriate, produce images, graphs and/or movies of your results.

Comment on the patterns of absorbed solar radiation that you see.

All of your code should be commented as well as possible.

It should indicate which parts of the code you have simply taken from the material above and which parts you have developed yourself.

If possible, use an ipython notebook to illustrate your code and its operation. As a miniumum, submit your code and examples of its output.

If possible, wrap your codes into functions to group things that belong together together.

You will probably want to save the arrays you generated (np.savez) so that you can show your results later.

Hints:

• All of the code you need for parts 1. and 2. can be found in the material above ... you just need to put it together
• Recall that albedo is the proportion of radiation reflected.
• Recall that in a lat/long projection the area represented by each pixel varies as a function of latitude.

We will not release model answers for this exercise at present.

Instead, we would like you to complete this exercise over the coming week and submit it by the following Monday morning. If you do this, there will be sessions run in (extended) office hours (11-12 Thursday, 4-5 Thursday, 9-10 Wednesday) when you will bve given feedback on your submission. To get the most out of these feedback sessions, it would be adviseable to request a time slot (You should be getting a doodle poll email to that effect)

We strongly recommend that you attempt this exercise on your own. Given the amount of material you are given here, this should be achievable for all of you.

This work does not form part of your formal assessment. Instead it is intended to provide you with an opportunity for feedback on how you are getting on in the course at an early stage. So, you are not required to do this. If you do not though, we cannot give yuou feedback.

Come along to the feedback session prepared to explain how your code operates and how to run it (one reason why a notebook would be useful if you can manage that).

When we have gone to lots of effort (well, lots of processing time) to do some calculation, we probably want to save the result.

In numpy, we can save one or more numpy arrays (and similar data types) with np.savez().

So, e.g. to save the arrays data and rad:

import numpy as np
# there is a problem saving masked arrays at the moment,
# so convert and save the mask

As an aside here, we note that numpy also has convenient functions for loading ASCII text (which is much simplerto use than the methods we learned last session, but you need to know how to do it from scratch if the data format is very awkward).

That said, we can deal with some awkward issues in a dataset:

Heathrow (London Airport)
Location 5078E 1767N 25m amsl
Estimated data is marked with a * after the value.
Missing data (more than 2 days missing in month) is marked by  ---.
Sunshine data taken from an automatic Kipp & Zonen sensor marked with a #, otherwise sunshine data taken from a Campbell Stokes recorder.
yyyy  mm   tmax    tmin      af    rain     sun
degC    degC    days      mm   hours
1948   1    8.9     3.3    ---     85.0    ---
1948   2    7.9     2.2    ---     26.0    ---
1948   3   14.2     3.8    ---     14.0    ---
1948   4   15.4     5.1    ---     35.0    ---
1948   5   18.1     6.9    ---     57.0    ---
1948   6   19.1    10.3    ---     67.0    ---
1948   7   21.7    12.0    ---     21.0    ---
1948   8   20.8    11.7    ---     67.0    ---
# read sun hours from the heathrow data file

filename = 'data/heathrowdata.txt'

# In this dataset, we need to know what
# to do when we get '---' in the dataset
# so we set up a
# translation dictionary
def awkward(x):
if x == '---':
return np.nan
return x

trans = {6: awkward}

# skip the first 7 rows
# use the awkward translation for column 6
# unpack gets the shape the right way around

print sun_hrs.shape
# plot decimal year and sun hours
plt.plot(sun_hrs[0]+sun_hrs[1]/12.,sun_hrs[2])
# zoom in to 1970 to 1990
plt.xlim(1970,1990)
(3, 785)
(1970, 1990)

We can save this to a text file with:

# note, we transpose it if we want the data as columns
# year month sun_hours
1.948000000000000000e+03 1.000000000000000000e+00 nan
1.948000000000000000e+03 2.000000000000000000e+00 nan
1.948000000000000000e+03 3.000000000000000000e+00 nan
1.948000000000000000e+03 4.000000000000000000e+00 nan
1.948000000000000000e+03 5.000000000000000000e+00 nan
1.948000000000000000e+03 6.000000000000000000e+00 nan
1.948000000000000000e+03 7.000000000000000000e+00 nan
1.948000000000000000e+03 8.000000000000000000e+00 nan
1.948000000000000000e+03 9.000000000000000000e+00 nan

The file data/elevation.dat look like:

2013/10/8 00:00:00 -44.2719952943
2013/10/8 00:30:00 -43.5276412785
2013/10/8 00:59:59 -41.9842746582
2013/10/8 01:30:00 -39.7226999863
2013/10/8 02:00:00 -36.8452198361
2013/10/8 02:30:00 -33.459799008
2013/10/8 03:00:00 -29.6691191507
2013/10/8 03:30:00 -25.5652187709
2013/10/8 03:59:59 -21.2281801291
2013/10/8 04:30:00 -16.7272357302

Hint: You will need to write a function that decodes column 0 and 1, for example:

def col0(x):
# interpret column 1 as a sort of
# decimal year
# NB this is very rough ...
# not all months have 30 days ...
y,m,d = x.split('/')
dec_year = y + (m-1)/12. + (d-1)/30.
return dec_year

# translation ... just for column 1 here
# you'd need to define your own for column 2
trans = {0 : col0}

If you feel comfortable with the contents of this session, or simply would like something a little more challenging, examine the notes in the advanced section.

Answers for the exercises in this session are available, as are the answers for the advanced section.

#### Previous page

← 1. Introduction to Python

#### Next page

→ Chapter 4. Geospatial data