1. Introduction to Python

The aim of this Chapter is to introduce you to some of the fundamental concepts in Python. Mainly, this is based around fundamental data types in Python (int, float, str, bool etc.) and ways to group them (tuple, list and dict).

We then learn about how to loop over groups of things, which gives us control to iterate some process.

We need to spend a little time on strings, as you will likely to quite a bit of string processing in Scientific Computing (e.g. reading/writing data to/from ASCII text files).

Although some of the examples we use are very simple to explain a concept, the more developed ones should be directly applicable to the sort of programming you are likely to need to do.

A set of exercises is developed throughout the chapter, with worked answers available to you once you have had a go yourself.

In addition, a more advanced section of the chapter is available, that goes into some more detail and complkications. This too has a set of exercises with worked examples.

1.1 Python

Python is a high level programming language that is freely available, relatively easy to learn and portable across different computing systems. In Python, you can rapidly develop solutions for the sorts of problems you might need to solve in your MSc courses and in the world beyond. Code written in Python is also easy to maintain, is (or should be) self-documented, and can easily be linked to code written in other languages.

Relevant features include:

  • it is automatically compiled and executed
  • code is portable provided you have the appropriate Python modules.
  • for compute intensive tasks, you can easily make calls to methods written in (faster) lower-level languages such as C or FORTRAN
  • there is an active user and development community, which means that new capabilities appear over time and there are many existing extensions and enhancements easily available to you.

For further background on Python, look over the material on Advanced Scientific Programming in Python and/or the software-carpentry.org and python.org web sites.

In this session, you will be introduced to some of the basic concepts in Python.

1.2 Running Python Programs

1.2.1 Requirements

For this course, we suggest you use the anaconda Python distribution (this is what is installed in the unix lab computers), though you are free to use whichever version of it you like on your own computers.

If you are intending to use these notes on your opwn computer, you will need a relatively comprehensive installation of Python (such as that from anaconda), and will also need GDAL installed for some of the work. You may also find it of value to have git installed.

We are assuming that you are new to computing in this course, but that you are aware of the basic unix material covered in the previous lecture.

2.2.2 Running Python

We will generally use the ipython interpreter for running interactive Python programs.

You will probably want to run each session and store scripts in your Data (or DATA) directory.

If you are taking this course at UCL, the notes should already have been downloaded to your DATA directory.

If so, then:

berlin% cd ~/DATA/geogg122
berlin% git reset --hard HEAD
berlin% git pull

will update the notes (for any changes I make over the sessions).

If you need to download the notes and want to run the session directly in the notebook, you will need to download the course material from github and run the notebook with e.g.:

berlin% cd ~/DATA
berlin% git clone https://github.com/profLewis/geogg122.git

to obtain the notes.

You should next check that you are using the version of Python that we intend:

berlin% which ipython
/opt/anaconda/bin/ipython

If this isn’t the version of Python that you are picking up (note the use of the unix command which here), then you can either just type the full path name:

berlin% /opt/anaconda/bin/ipython

in place of where it says ipython in these notes, or modify your shell initialisation file (~/.bashrc if you are using bash or ~/.cshrc for tcsh or csh) to include /opt/anaconda/bin early on in the PATH.

To go to the directory for this session:

berlin% cd ~/Data/geogg122/Chapter2_Python_intro
berlin% ipython notebook python101.html --pylab=inline

You quit an ipython notebook session with ^C (Control C).

To exectute (‘run’) blocks of Python code in the notebook, use ^<return> (SHIFT and RETURN keys together).

Alternatively, just run ipython:

berlin% cd ~/DATA/geogg122/Chapter2_Python_intro
berlin% ipython --pylab=inline

and type your own commands in at the prompt, following the class or the material on the webpages.

2.3 Getting Started

2.3.1 Variables, Values and Data types

The idea of variables is fundamental to any programming. You can think of this as the name of something, so it is a way of allowing us to refer to some object in the language.

What the variable is set to is called its value.

So let’s start with a variable we will call (declare to be) x.

We will give the value 1 to this variable:

x = 1

In a computing language, the sort of thing the variable can be set to is called its data type.

In the example above, the datatype is an integer number (e.g. 1, 2, 3, 4).

In ‘natural language’, we might read the example above as ‘x is one’.

This is different to:

x = 'one'

because here we have set value of the variable x to a string (i.e. some text).

A string is enclosed in quotes, e.g. "one" or 'one', or even "'one'" or '"one"'.

print "one"
print 'one'
print "'one'"
print '"one"'
one
one
'one'
"one"

This is different to:

x = 1.0

because here we have set value of the variable x to a floating point number (these are treated and stored differently to integers in computing).

This is different to:

x = True

where True is a logical or boolean datatype (something is True or False).

We have so far seen three datatypes:

  • integer (int): 32 bits long on most machines
  • (double-precision) floating point (float): (64 bits long)
  • Boolean (bool)
  • string (str)

but we will come across more (and even create our own!) as we go through the course.

type

In each of these cases above, we have used the variable x to contain these different data types. If you want to know what the data type of a variable is, you can use the method type()

print type(1);
print type(1.0);
print type('one');
print type(True);
<type 'int'>
<type 'float'>
<type 'str'>
<type 'bool'>

You can explicitly convert between data types, e.g.:

print 'int(1.1) = ',int(1.1)
print 'float(1) = ',float(1)
print 'str(1) = ',str(1)
print 'bool(1) = ',bool(1)
int(1.1) =  1
float(1) =  1.0
str(1) =  1
bool(1) =  True

but only when it makes sense:

print "converting the string '1' to an integer makes sense:",int('1')
converting the string '1' to an integer makes sense: 1
print "converting the string 'one' to an integer doesn't:",int('one')
converting the string 'one' to an integer doesn't:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

<ipython-input-9-20c6d9c1fc49> in <module>()
----> 1 print "converting the string 'one' to an integer doesn't:",int('one')


ValueError: invalid literal for int() with base 10: 'one'

When you get an error (such as above), you will need to learn to read the error message to work out what you did wrong.

del

You can delete a variable with del:

x = 100.
print x
x = 100.
del x

# so now if we try to do anything with the variable
# x, it should fail as x is no longer defined
print x

2.3.2 Arithmetic

Often we will want to do some arithmetic with numbers in a program, and we use the ‘normal’ (derived from C) operators for this.

Note the way this works for integers and floating point representations.

'''
    Some examples of arithmetic operations in Python

    Note how, if we mix float and int, the result is raised to float
    (as the more general form)
'''

print 10 + 100     # int addition
print 10. - 100    # float subtraction
print 1./2.        # float division
print 1/2          # int division
print 10.*20.      # float multiplication
print 2 ** 3.      # float exponent
print 8%2          # int remainder

print '========'
# demonstration of floor (//) and remainder (%)
number    = 9.5
base      = 2.0
remainder = number%base # float remainder
floor     = number//base # 'floor' operation
print number,'is',floor,'times',base,'plus',remainder

Exercise

Change the numbers in the examples above to make sure you understand these basic operations.

Try combining operations and use brackets () to check that that works as expected.

2.3.3 Assignment Operators

'''
    Assignment operators

    x = 3   assigns the value 3 to the variable x
    x += 2  adds 2 onto the value of x
            so is the same as x = x + 2
            similarly /=, *=, -=
    x %= 2  is the same as x = x % 2
    x **= 2 is the same as x = x ** 2
    x //= 2 is the same as x = x // 2

    A 'magic' trick
    ===============

    http://www.wikihow.com/Read-Someone\
        %27s-Mind-With-Math-%28Math-Trick%29

    whatever you put as myNumber, the answer is 3

    Try this with integers or floating point numbers ...
'''

# pick a number
myNumber = 34.67

# assign this to the variable x
x = myNumber

# multiply it by 2
x *= 2

# multiply this by 5
x *= 5

# divide by the original number
x /= myNumber

# subtract 7
x -= 7

# The answer will always be 3
print x

2.3.4 Logical Operators

'''
    Logical operators
'''
alive = True
dead = not alive
print 'dead or alive is',dead or alive
print 'dead and alive is',dead and alive

The result of running comparison operators will give a logical (i.e. bool) output.

Most of this should be obvious, but consider carefully how this works for string data types.

2.3.5 Comparison Operators

'''
    Related, comparison operators:

    ==  : equvalence
    !=  : not equivalent
    >   : greater than
    >=  : greater than or equal to
    <   : less than
    <=  : less than or equal to
'''

print "is one plus one equal to two?"
print 1 + 1 == 2

print "is one less than or equal to 0.999?"
print 1 <= 0.999

print "is one plus one not equal to two?"
print 1 + 1 != 2

# note the use of double quotes inside a single quoted string here
print 'is "Hello" not the same as "hello"?'
print 'Hello' != 'hello'

# note the use of single quotes inside a double quoted string here
print "is 'more' greater than 'less'?"
print "more" > "less"

print "is '100' less than '2'?"
print '100' < '2'

print "is 100 less than 2?"
print 100 < 2

# a boolean example just to see what happens
print "is True greater than False?"
print True > False

We can combine such logical statements, bracketing the terms as required:

print (1 < 2) and (True or False)
#!/usr/bin/env python

"""Exercise in logical statements

   P. Lewis p.lewis@ucl.ac.uk

   Tue  8 Oct 2013 10:11:03 BST
"""

# hunger threshold in hours
hungerThreshold = 3.0
# sleep threshold in hours
sleepThreshold = 8.0

# time since fed, in hours
timeSinceFed = 4.0
# time since sleep, in hours
timeSinceSleep = 3.0

# Note use of \ as line continuation here
# It is poor style to have code lines > 79 characters
#
# see http://www.python.org/dev/peps/pep-0008/#maximum-line-length
#
print "Tired and hungry?",(timeSinceSleep >= sleepThreshold) and \
                            (timeSinceFed >= hungerThreshold)
print "Just tired?",(timeSinceSleep >= sleepThreshold) and \
                            (not (timeSinceFed >= hungerThreshold))
print "Just hungry?",(not (timeSinceSleep >= sleepThreshold)) and \
                            (timeSinceFed >= hungerThreshold)

Exercise 2.1

The code above works fine, but the large blocks of logical tests are not very clear or readable, and contain repeated items.

Type the code into a file (or download it).

To run the code either type at the unix prompt:

berlin% python hungry.py

OR, within ipython:

In [17]: run hungry.py

Modify this block of code to be clearer by assigning the individual logical tests to variables,

e.g.

tired = timeSinceSleep >= sleepThreshold

2.4 Groups of things

2.4.1 tuples, lists and dictionaries

Very often, we will want to group items together.

There are several main mechanisms for doing this in Python, known as:

  • tuple, e.g. (1, 2, 3)
  • list, e.g. [1, 2, 3]
  • dict, e.g. {1:’one’, 2:’two’, 3:’three’}

You will notice that each of these grouping structures uses a different form of bracket.

tuple

A tuple is a group of items separated by commas.

t = 1, 2, 'three', False
print t

Note that when you declare the tuple, you don’t need to put the braces (brackets) as this is implicit.

Often though, it is a good idea to do so.

t = (1, 2, 'three', False)
print t

If there is only one element in a tuple, you must put a comma , at the end, otherwise it is not interpreted as a tuple:

t = (1)
print t,type(t)
t = (1,)
print t,type(t)

You can have an empty tuple though:

t = ()
print t,type(t)

Notice that the tuple can contain data of different types.

It can also be nested (i.e. a tuple can contain a tuple):

t = ('one', 2), 3, ((4,5),6)
print t

Some operations we can perform on tuples include:

  • length : len()
  • selection (‘slice’) : []

len

# set up a simple tuple
t = (1,2,3)
print "The length of the tuple (1,2,3) is",len(t)

# a nested example
t = (1,('2a','2b'),3)
print "The length of the tuple (1,(1,('2a','2b'),3),3) is",len(t)

slice

# select an item with []
# note the first item is 0, i.e. we start counting at 0
# Python uses a 0-based indexing system

t = ('it','is','a','truth','universally','acknowledged')

print 'item 0',t[0]
print 'item 4',t[4]
# using negative to count from the end
# so -1 is the last item, -2 the second to last etc

t = ('it','is','a','truth','universally','acknowledged')

print 'item -1',t[-1]
print 'item -3',t[-3]
# select a range (a 'slice') of items with [start:end:step]

t = ('it','is','a','truth','universally','acknowledged')

print 'items 0:1\t',t[0:1]        # 0 to 1, so only item 0
print 'items 0:2\t',t[0:2]        # 0 to 2, so items 0 and 1
print 'items :4:2\t',t[:4:2]      # :4:2 so items 0 (implicit) to 4, in steps of 2
                                  # so items 0 and 3
print 'items ::-1\t',t[::-1]      # 0 to end in steps of -1, so reverse order
print 'items 1:-1:2\t',t[1:-1:2]  # 1 to -1 in steps of 2 so items 1 and 3

# Note the use of \t in the strings above e.g. 'items 0:1\t'
# where \t is a tab character (for prettier formatting)

In effect, when we set up a tuple, we are packing some group of items together:

t = ('the', 'past', ('is', 'a', 'foreign', 'country'))

And we can similarly unpack:

a, b, c = t

print 'a:',a
print 'b:',b
print 'c:',c
t = ('As', 'Gregor', 'Samsa', 'awoke', 'one', 'morning')

a,b = t[1:4:2]
print 'a:',a
print 'b:',b

The set of operations we can perform on tuples includes:

Name

Example

Result

Meaning

len()
len((1,2,(3,4))
3

Length

+
(1,2) + (3,4,5)
(1,2,3,4,5)

Concatenate (join)

\*
(1,2) * 3
(1, 2, 1, 2, 1, 2, 1, 2)

Repetition

in
2 in (1,2,3,4)
True

Membership

index
('a','b','c','d').index('c')
2

Index

T.index(value, [start, [stop]]) -> integer – return first index of value.

Raises ValueError if the value is not present

count
('a','b','c','c','d','c').count('c')
3

Count

T.count(value) -> integer – return number of occurrences of value

min(), max()

min(('a','b','c'))

max((3,4,1))

'a'

4

Minimum, maximum

tuple()

tuple([1,2,3])

tuple('hello')

tuple({1:'one',2:'two'})

(1,2,3)

('h', 'e', 'l', 'l', 'o')

(1,2)

Convert to tuple

You will find that these basic operators work with any of the group types, so you should make sure you are aware of them.

# some examples

print "\ntuple('hello world'):\n\t",tuple('hello world')
print "\ntuple('hello world').count('o'):\n\t",tuple('hello world').count('o')
print "\n('1',)*5 + (1,)*5:\n\t",('1',)*5 + (1,)*5

# Note use of \t in string for tab and \n for newline

You cannot directly replace an element in a tuple:

t = (1,2,3,4)
t[2] = 'three'

so you would need to find another way to do this, e.g.

t = (1,2,3,4)

t = t[:2] + ('three',) + t[3:]
print t

Neither can you delete an item in a tuple:

t = (1,2,3,4)

del t[2]
print t
# again, find another way around

t = (1,2,3,4)

t = t[:2] + t[3:]
print t

string as a group

You might have noticed that a string data type str acts as a collection of individual characters, e.g.:

word = 'hello world'

print "word =\t",word,"\n"

print "tuple(word) =\t",tuple(word)
# slice
print "word[2:5] =\t",word[2:5]
# len
print "len(word) =\t",len(word)
# max (similarly min)
print "max(word) =\t",max(word)
# in (membership)
print "'w' in word =\t",'w' in word
# count
print "word.count('l')=\t",word.count('l')
# index
print "word.index('l')=\t",word.index('l')
# + (concatenation)
print "word + ' again'=\t",word + ' again'
# * (repetition)
print "word * 2=\t",word*2

This sort of consistency or operation is one of the things that makes Python a good language to program in.

list

lists or sequences are contained within square brackets []:

The operators available for tuple work in much the same way as for lists (more formally, sequences):

t = ['It', 'was', 'the', 'best', 'of', 'times']
print t

"""slicing is the same as for tuples"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----slice----"
print 't[:4:2]:\n\t',t[:4:2]       # 0th item to 4th in steps of 2, so 0,2
print 't[::-1]:\n\t',t[::-1]       # reversal


"""index"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----index----"
print 't.index("best"):\n\t',t.index('best')


"""plus"""
t = ['It', 'was'] + ['the', 'best', 'of', 'times']

print "\n----plus----"
print "t = ['It', 'was'] + ['the', 'best', 'of', 'times']\n\t",t


"""multiply"""
t = ['It', 'was', 'the'] + ['best'] * 3 + ['of', 'times']

print "\n----multiply----"
print "t = ['It', 'was', 'the'] + ['best'] * 3 + ['of', 'times']\n\t",t

But there are many other things one can do with a list, e.g.:

"""replace"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----replace----"
t[3:5] = ['New','York']
print "t[3:5] = 'New York':\n\t",t


"""index and replace"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----index and replace----"
t[t.index('best')] = 'worst'
print 't[t.index("best")] = "worst":\n\t',t


"""can delete one or more items"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----del item----"
del t[2:4]                         # delete items 2 to 4, i.e. 2, 3
                                   # i.e. 'the', 'best'
print 'del t[2:4]:\n\t',t


"""can sort"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----sort----"
t.sort()                           # sort inplace

print 't.sort():\n\t',t


"""can insert"""
t = ['It', 'was', 'the', 'best', 'of', 'times']

print "\n----insert----"
t.insert(1,'really')               # insert inplace

print "t.insert(1,'really'):\n\t",t

range

Many functions that return multiple items will make use of a list to do so.

An example of this that we will use below is `range(start,stop,step) <http://docs.python.org/2/library/functions.html#range>`__ that returns a list of integers from start to (but not including) stop in steps of step.

print range(10)
print range(1,3)
print range(-10,10,2)
# set a value 3 to the variable x
x = 3

# range(10) produces the lis
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
x in range(10)

2.5 Loops and Conditional Statements: if, for, while

So far, we have come across the ideas of variables, data types, and two ways of grouping objects together (tuples and lists).

Another fundamental aspect of any programming language is conditional statements. The simplest form of this is an if ... else ... statement:

pockets = ['phone','keys','wallet','frog']

this_item = 'nothing'

# test if something is in the list
if this_item in pockets:
    print "You do have",this_item,'in your pocket'
else:
    print "You don't have",this_item,'in your pocket'

Here,

this_item in pockets

is a membership test as we have seen above (it returns True or False).

If it’s True, the code block

print "You do have",this_item,'in your pocket'

is executed. If False, then the next condition is checked.

Note the use of indentation here (using tab or spaces) to represent the structure of the conditional statements.

Note also the use of a colon (:) to mark the end of the conditional test.

'''An if example

   Threshold the value of x at zero
'''

x = 3

# threshold at zero
# and print some information about what we did

if x < 0:
    print 'x less than 0'
    x = 0
elif x == 0:
    print 'x is zero'
else:
    print 'x is more than zero'

print 'thresholded x = ',x

The syntax of this is:

if condition1 is True:
    ...
elif condition2 is True:
    ...
else:
    ...

where condition1 and condition2 are logical tests (e.g. this_item in pockets, today == "Wednesday", x > 10 etc.) and the is True part of the syntax is implicit (i.e. you don’t need to type is True).

The word elif means else if. The tests are considered in the order they are givem so that if condition1 is not True, we examine condition2. If that is not True, we fall through to the final else block.

# nested conditional statements: If

what_you_keep = 'your head'
when_you_do_it = 'all about you are losing theirs'
whom_you_trust_when_all_men_doubt_you = 'yourself'

# first tests
if ( what_you_keep == 'your head' ) and \
    (when_you_do_it == 'all about you are losing theirs'):

    # second level tests
    if whom_you_trust_when_all_men_doubt_you == 'yourself':
        print "Yours is the Earth and everything that’s in it ..."
    else:
        print "Nearly there ..."

else:
    print "Have a look at http://www.poetryfoundation.org/poem/175772"

Exercise 2.2

Exercise 2.2 A.

A small piece of Python code that will set the variable today to be a string with the day of the week today is:

# This imports a module that we can use to access dates
from datetime import datetime

# set up a list of days of the week, starting Monday
# Note the line continuation here with \
week = ['Monday','Tuesday','Wednesday','Thursday',\
        'Friday','Saturday','Sunday']

# This part gives the day of the week
# as an integer, 0 -> Monday, 1 -> Tuesday etc.
day_number = datetime.now().weekday()

today = week[day_number]

# print item day_number in the list week
print "today is",today

Based on the example below, set up a diary for youself for the week to print out what you should be doing today, using the conditional structure ``if .. elif ... else``.

if day_number == 2:
    print "Remember to wake up early to get to the Python class at UCL"
elif day_number == 4:
    print "Remember to wake up early to go to classes at Imperial College"
else:
    print "get some sleep"

Exercise 2.2 B.

You could set up the basic calendar for the week in a list, with the first entry representing Monday, the second Tuesday etc.

my_diary = ['Spend the day practicing Python',\
            'Do some reading in the library at UCL', \
            'Remember to wake up early to get to the Python class at UCL',\
            'Spend the day practicing Python',\
            'Remember to wake up early to go to classes at Imperial College',\
            'Work at Python exercises from home',\
            'Work at Python exercises from home']

Using a list of this sort, print the diary entry for today *without* using conditional statements.

Criticise the code you develop and make suggestions for improvement.

for

Very commonly, we need to iterate or ‘loop’ over some set of items.

The basic stucture for doing this (in Python, and many other languages) is for ... in ...:

count_list = range(1,4)

# for loop
for count in count_list:
    '''print counter in loop'''
    print count

print 'blast off'

which has the syntax:

for var in list:
    ...

where the variable var is set to each of the items in list, in the order in which they appear in list.

xrange

When we have a loop, there nust be something that defines what it is we loop over. In the example above, this was a list, count_list, which here is [1, 2, 3].

In Python, we can use either use some explicit list, tuple etc. to define what we loop over, or, we can use a generator expression, which is something that returns one of its members at a time.

Normally then, instead or using range above, which involves an explicit calculation and storage of all elements of the list, we use a generator function, `xrange <http://docs.python.org/2/library/functions.html#xrange>`__, which, in essence, returns the elements ‘on demand’ as needed in the loop (and so uses less memory, though there is little real difference except for very large loops).

# for loop
for count in xrange(1,4):
    '''print counter in loop'''
    print count

print 'blast off'

If you need to force an `iterable <http://docs.python.org/2/glossary.html#term-iterable>`__ to e.g. return a list, you can convert the data type to list:

print xrange(1,4)
print list(xrange(1,4))

enumerate

Commonly, when iterating over a set of items, we also need access to a counter, telling us which item in the list we are currently on.

This is done using the function `enumerate() <http://docs.python.org/2/library/functions.html#enumerate>`__. This returns the tuple (count,item) where count is the index of item in list.

word_list = ['Call', 'me', 'Ishmael']

for i,w in enumerate(word_list):
    print 'The',i,'th','word is',w

Here the syntax:

for count,var in enumerate(list):
    ...

2.6 Strings and things

We have seen the data type str above and noted some of the operations we can use of strings.

You can look over some more detailed notes on strings at some point, but here we will now go through some other typical operations you will use in scientific computing:

2.6.1 Some basic string operations

As a recap, with some slightly more complicated examples:

word = 'hello world'

print "word \n\t=",word,"\n"

print "list(word) \n\t=",list(word)

# slice
print "word[::-1] \n\t=",word[::-1]
# len
print "len(word) \n\t=",len(word)
# min (similarly max)
print "min(word) \n\t=",min(word),'\n\t... what was printed there?'
# in (membership)
print "'n' in word \n\t=",'n' in word
# count
print "word.count('p')\n\t=",word.count('p')
# + (concatenation)
print "'hey!' + word[len('hello')::2]\n\t=",'hey!' + word[len('hello')::2]
# * (repetition)
print "word[:6]* 3\n\t=",word[:6]*3
 word
    = hello world

list(word)
    = ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
word[::-1]
    = dlrow olleh
len(word)
    = 11
min(word)
    =
    ... what was printed there?
'n' in word
    = False
word.count('p')
    = 0
'hey!' + word[len('hello')::2]
    = hey! ol
word[:6]* 3
    = hello hello hello
# look what happens if we call index
# for something that doesn't exist in the string
print "word.index('x')=\t",word.index('x')
word.index('x')=
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

<ipython-input-11-515f07310ca9> in <module>()
      1 # look what happens if we call index
      2 # for something that doesn't exist in the string
----> 3 print "word.index('x')=\t",word.index('x')


ValueError: substring not found
# Sometimes, we might wish to use
# the string operator find instead
print "word.find('x')=\t",word.find('x')

2.6.2 split

Suppose we have some data that are presented to us as a string with white space separating each data element, e.g.:

data = "1964 1220 1974 2470 1984 2706 1994 4812 2004 2707"

These data are total fossil fuel emissions for Zimbabwe (thousand metric tons of C) for selected years (dataset doi 10.3334/CDIAC/00001_V2011).

The even elements are the year (1964, 1974 etc.) and the odd elements (1220, 2470) the data for that year.

We can use the string operator split() to separate this into a list of strings:

data = "1964 1220 1974 2470 1984 2706 1994 4812 2004 2707"
sdata = data.split()
print sdata

We could the convert these to integer:

data = "1964 1220 1974 2470 1984 2706 1994 4812 2004 2707"
sdata = data.split()

# how many items are there?
# use len(), and divide by 2 in this case
n_items = len(sdata)

# create an empty list: years
years = []

# create an empty list: emissions
emissions = []

# loop over sdata in steps of 2
# and append years and emissions
# data as int

# xrange(0,n_items,2) because
# we want to step every 2 in this case
for i in xrange(0,n_items,2):
    years.append(int(sdata[i]))
    emissions.append(int(sdata[i+1]))
print years
print emissions

2.6.3 join

The ‘opposite’ of split is join.

This returns an iterable of the form S.join(list), where S is the separator and list is a list of strings e.g.:

str1 = 'hello'
str2 = 'world'

# joint with space
print ' '.join([str1,str2])
# joint with no space
print ''.join([str1,str2])
# join with tab
print '\t'.join([str1,str1,str2])
# join with colon :
# note what happens we pass a
# string, rather than a list
print ':'.join(str1)
# remember that it has to be a list
# of strings: years here is a list
# of integers
print ' '.join(years)
data = "1964 1220 1974 2470 1984 2706 1994 4812 2004 2707"
sdata = data.split()

for i in xrange(0,len(sdata),2):
    print ' '.join(sdata[i:i+2])

2.6.4 replace

Another useful string operator is replace, e.g.:

# change white space separation to comma separation
data = "1964 1220 1974 2470 1984 2706 1994 4812 2004 2707"
print data.replace(' ',',')

2.6.5 format

The most common way you are likely to be formatting strings is using expressions such as:

how_many = 10
how_much = "hours"

print "There are only %d things to learn.\
        \nBut\tit will take you %s to do so."%(how_many,how_much)

Using this style of string formatting, you put control characters into the string, e.g.:

"%d: hello %s"

and put a tuple of variables after the string, separated by % which are inserted into the string in the order in which they appear:

"%d: hello %s"%(10,'ten')

Note that you must get the data types correct, or you will generate an error.

The most common formatting codes are:

  • %d represents an integer
  • %s represents a string
  • %f represents a float
  • %e represents a float in exponential form

e.g.:

print "\
        integer     %d\n\
        float       %f\n\
        string      %s\n\
        exponential %e"%(3,3.1415926536,"pies are squared",3.1415926536)

2.6.6 strip

A useful method is strip() that strips off any unnecessary white space and newlines e.g.:

string = '   hello world      \t'
print '|%s|'%string
print '|%s|'%string.strip()
|   hello world             |
|hello world|

2.6.9 Getting and splitting filenames

glob

# example, with directory names

# glob unix style pattern matching for files and directories
import glob

# returns a list (or [] if empty)
# to match the pattern given
file_list = glob.glob("files/data/*.txt")
print "file_list:\n\t",file_list

# e.g. the first string in the list
this_file = file_list[0]
file_list:
    ['files/data/modis_files2a.txt', 'files/data/modis_files2b.txt', 'files/data/some_modis_files.txt', 'files/data/HadSEEP_monthly_qc.txt', 'files/data/heathrowdata.txt', 'files/data/modis_files.txt']
this_file = 'files/data/HadSEEP_monthly_qc.txt'

# split the filename on the field '/'
print "\nthis_file.split('/'):\n\t",this_file.split('/')

# so the filename is just the last element in this list
print "\nthis_file.split('/')[-1]:\n\t",this_file.split('/')[-1]
this_file.split('/'):
    ['files', 'data', 'HadSEEP_monthly_qc.txt']

this_file.split('/')[-1]:
    HadSEEP_monthly_qc.txt
# another example, with directory names

# glob unix style pattern matching for files and directories
import glob

# returns a list
file_list = glob.glob("data/*.txt")
print "file_list:\n\t",file_list

print "\nfile names:"
# loop over the list of file namnes
for this_file in file_list:
    # for each of these
    # split the filename on the field '/'
    print "\t",this_file.split('/')[-1]
file_list:
    ['data/modis_files2a.txt', 'data/modis_files2b.txt', 'data/some_modis_files.txt', 'data/HadSEEP_monthly_qc.txt', 'data/heathrowdata.txt', 'data/modis_files.txt']

file names:
    modis_files2a.txt
    modis_files2b.txt
    some_modis_files.txt
    HadSEEP_monthly_qc.txt
    heathrowdata.txt
    modis_files.txt

Exercise 2.3

The data below are fields of:

0 year
1 month
2 tmax (degC)
3 tmin (degC)
4 air frost (days)
5 rain (mm)
6 sun (hours)

for Lowestoft in the UK for the year 2012, taken from Met Office data.

data = """   2012   1    8.7    3.1      5   33.1   53.9
   2012   2    7.1    1.6     13   13.8   86.6
   2012   3   11.3    3.7      2   64.2  141.3
   2012   4   10.9    4.3      3  108.9  151.1
   2012   5   15.1    8.6      0   46.6  171.3
   2012   6   17.9   10.9      0   74.4  189.0
   2012   7   20.3   12.8      0   93.6  206.9
   2012   8   22.0   14.0      0   59.6  217.3
   2012   9   18.9    9.5      0   38.8  200.8
   2012  10   13.6    7.9      0   92.7   94.7
   2012  11   10.5    4.4      2   62.1   79.6
   2012  12    7.9    2.4      8   95.6   41.9  """

print data
2012   1    8.7    3.1      5   33.1   53.9
2012   2    7.1    1.6     13   13.8   86.6
2012   3   11.3    3.7      2   64.2  141.3
2012   4   10.9    4.3      3  108.9  151.1
2012   5   15.1    8.6      0   46.6  171.3
2012   6   17.9   10.9      0   74.4  189.0
2012   7   20.3   12.8      0   93.6  206.9
2012   8   22.0   14.0      0   59.6  217.3
2012   9   18.9    9.5      0   38.8  200.8
2012  10   13.6    7.9      0   92.7   94.7
2012  11   10.5    4.4      2   62.1   79.6
2012  12    7.9    2.4      8   95.6   41.9

You can use the Python package pylab to simply plot data on a graph:

# import the pylab module
import pylab as plt

# some e.g. x and y data
x = range(100)
y = [i**0.5 for i in x]

plt.plot(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Plot of sqrt(x)')
<matplotlib.text.Text at 0x7f5aee2da890>
../_images/python101_173_1.png

Produce a plot of the number of sunshine hours for Lowestoft for the year 2012 using the data given above.

Hint: the data have newline chcracters \n at the end of each line of data, and are separated by white space within each line.

2.7 Files

To open a file that already exists for reading, we use:

fp = open(filename,'r')

where filename here is the name of a file and the 'r' argument tells us that we want to open in ‘read’ mode.

This returns a file object, fp here that we can use to read data from the file etc.

When we have finished doing what we want to do, we should close the file:

fp.close()

To read ASCII data from a file as a list of strings for each line, use:

fp.readlines()

To write ASCII text to the file, use:

fp.write("some text")

or to write a list of strings all at once:

fp.writelines(["some text\n","some more\n"])

As a first example, let’s open a file for reading from a file `files/data/elevation.dat <files/data/elevation.dat>`__ that contains a list of dates, times and solar elevation angles (degrees).

!head -10 < data/elevation.dat
2014/9/30 00:00:00 -41.1227180114
2014/9/30 00:30:00 -40.4796625312
2014/9/30 00:59:59 -39.0649015428
2014/9/30 01:30:00 -36.9463783528
2014/9/30 02:00:00 -34.2136162508
2014/9/30 02:30:00 -30.9643006022
2014/9/30 03:00:00 -27.2941667536
2014/9/30 03:30:00 -23.2912319377
2014/9/30 03:59:59 -19.0337171884
2014/9/30 04:30:00 -14.5903017653

What we are going to want to do is to create a new file which has the time, specified in decimal hours, and the solar zenith angle (i.e. 90 degrees minus the elevation) into a new file files/data/zenith.dat, but only when the Sun is above the horizon.

Let’s first concentrate on reading the data in:

filename = 'data/elevation.dat'
fp = open(filename,"r")

Now we will use readlines to return a list of strings:

sdata = fp.readlines()
print sdata
['2014/9/30 00:00:00 -41.1227180114n', '2014/9/30 00:30:00 -40.4796625312n', '2014/9/30 00:59:59 -39.0649015428n', '2014/9/30 01:30:00 -36.9463783528n', '2014/9/30 02:00:00 -34.2136162508n', '2014/9/30 02:30:00 -30.9643006022n', '2014/9/30 03:00:00 -27.2941667536n', '2014/9/30 03:30:00 -23.2912319377n', '2014/9/30 03:59:59 -19.0337171884n', '2014/9/30 04:30:00 -14.5903017653n', '2014/9/30 05:00:00 -10.0216590927n', '2014/9/30 05:30:00 -5.01932715546n', '2014/9/30 06:00:00 -0.134849371939n', '2014/9/30 06:30:00 4.09286322097n', '2014/9/30 06:59:59 8.55453445901n', '2014/9/30 07:30:00 12.938958392n', '2014/9/30 08:00:00 17.1514399965n', '2014/9/30 08:30:00 21.1181133381n', '2014/9/30 09:00:00 24.7672136192n', '2014/9/30 09:30:00 28.020436136n', '2014/9/30 09:59:59 30.7949255712n', '2014/9/30 10:30:00 33.0070257679n', '2014/9/30 11:00:00 34.579161145n', '2014/9/30 11:30:00 35.4491052641n', '2014/9/30 12:00:00 35.5794764999n', '2014/9/30 12:30:00 34.9643838142n', '2014/9/30 12:59:59 33.6305127561n', '2014/9/30 13:30:00 31.6322282163n', '2014/9/30 14:00:00 29.0427156717n', '2014/9/30 14:30:00 25.9443096378n', '2014/9/30 15:00:00 22.4206252902n', '2014/9/30 15:30:00 18.5516782953n', '2014/9/30 15:59:59 14.4111578562n', '2014/9/30 16:30:00 10.0714597095n', '2014/9/30 17:00:00 5.61372180666n', '2014/9/30 17:30:00 1.2203105187n', '2014/9/30 18:00:00 -2.38322183516n', '2014/9/30 18:30:00 -8.47981048194n', '2014/9/30 18:59:59 -13.0951972617n', '2014/9/30 19:30:00 -17.6077341991n', '2014/9/30 20:00:00 -21.9594218892n', '2014/9/30 20:30:00 -26.0849005441n', '2014/9/30 21:00:00 -29.9096032812n', '2014/9/30 21:30:00 -33.3488186792n', '2014/9/30 21:59:59 -36.3086640806n', '2014/9/30 22:30:00 -38.6902793272n', '2014/9/30 23:00:00 -40.3983491291n', '2014/9/30 23:30:00 -41.3537969714n']

We can see that each line of the file contains three fields e.g.:

print sdata[0]
2014/9/30 00:00:00 -41.1227180114

The first field is the date (year, day, month), the second is the time (hour, minute, second), and the third is the solar elevation at UCL at that time/date.

To decode each line then we can use split() e.g.

print sdata[0].split()
['2014/9/30', '00:00:00', '-41.1227180114']

what we want is the time and elevation fields, so we will make a loop to get this, but break from the loop after the first entry at the moment:

filename = 'data/elevation.dat'
fp = open(filename,"r")

for i in fp.readlines():
    data = i.split()
    time = data[1]
    elevation = float(data[2])
    print time,elevation
    break
fp.close()
00:00:00 -41.1227180114

We need to convert the time field to minutes. We can start this by splitting the string on ::

time = data[1].split(':')
print time
['00', '00', '00']

then convert these to float and add up the minutes:

time = [float(i) for i in data[1].split(':')]
hours = time[0] + time[1]/60. + time[2]/(60.*60)
print hours
0.0

we can easily convert elevation to zenith angle.

zenith = 90. - elevation
print zenith
131.122718011

Putting this together, only printing when the zenith is less than or equal to 90.:

filename = 'data/elevation.dat'
fp = open(filename,"r")

for i in fp.readlines():
    data = i.split()
    time = [float(i) for i in data[1].split(':')]
    hours = time[0] + time[1]/60. + time[2]/(60.*60)
    zenith = 90. - float(data[2])
    if zenith <= 90.:
        print hours,zenith
fp.close()
6.5 85.907136779
6.99972222222 81.445465541
7.5 77.061041608
8.0 72.8485600035
8.5 68.8818866619
9.0 65.2327863808
9.5 61.979563864
9.99972222222 59.2050744288
10.5 56.9929742321
11.0 55.420838855
11.5 54.5508947359
12.0 54.4205235001
12.5 55.0356161858
12.9997222222 56.3694872439
13.5 58.3677717837
14.0 60.9572843283
14.5 64.0556903622
15.0 67.5793747098
15.5 71.4483217047
15.9997222222 75.5888421438
16.5 79.9285402905
17.0 84.3862781933
17.5 88.7796894813

Now, writing to an output file files/data/zenith.dat:

ifilename = 'data/elevation.dat'
ofilename = 'data/zenith.dat'

ifp = open(ifilename,"r")
ofp = open(ofilename,"w")

for i in ifp.readlines():
    data = i.split()
    time = [float(i) for i in data[1].split(':')]
    hours = time[0] + time[1]/60. + time[2]/(60.*60)
    zenith = 90. - float(data[2])
    if zenith <= 90.:
        ofp.write("%.1f %.3f\n"%(hours,zenith))
ifp.close()
ofp.close()

We could check the output file from unix:

!head -10 < data/zenith.dat
6.5 85.907
7.0 81.445
7.5 77.061
8.0 72.849
8.5 68.882
9.0 65.233
9.5 61.980
10.0 59.205
10.5 56.993
11.0 55.421
!ls -l data/zenith.dat
-rw-rw-r--. 1 plewis plewis 269 Oct  7  2014 data/zenith.dat

Exercise 2.4

The text file data/modis_files.txt contains a listing of hdf format files that are in the directory /data/geospatial_19/ucfajlg/fire/Angola/MOD09 on the UCL Geography system. The contents of the file looks like (first 10 lines):

!head -10 < data/modis_files.txt
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004001.h19v10.005.2008109063923.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004002.h19v10.005.2008108084250.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004003.h19v10.005.2008108054126.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004004.h19v10.005.2008108112322.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004005.h19v10.005.2008108173219.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004006.h19v10.005.2008108214033.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004007.h19v10.005.2008109081257.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004008.h19v10.005.2008109111447.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004009.h19v10.005.2008109211421.hdf
/data/geospatial_19/ucfajlg/fire/Angola/MOD09/MOD09GA.A2004010.h19v10.005.2008110031925.hdf

Your task is to create a new file `data/some_modis_files.txt <data/some_modis_files.txt>`__ that contains only the file names for the month of August.

You will notice that the file names have a field in them such as A2004006. This is the one you will need to concentrate on, as it specifies the year (2004 here) and the day of year (doy), (006 in this example).

There are various ways to find the day of year for a particular month / year, e,g, look on a website.

2.8 Doing Some Science

Maximum Precipitation

The Problem

We want to calculate the maximum monthly precipitation for regions of the UK for all years in the 20th Century.

The Data

We have access to monthly average precipitation data as regional totals from the UK Met Office.

These data are in ASCII format, available over the internet.

e.g. http://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt

or locally as `files/data/HadSEEP_monthly_qc.txt <files/data/HadSEEP_monthly_qc.txt>`__.

for South East England.

Solving the Problem

With just the Python skills you have learned so far, you should be able to solve a problem of this nature.

Before diving into this though, you need to think through what steps you need to go through to achieve your aim?

At a ‘high’ level, this could be:

  1. Examine the data
  2. Read the data into the computer program
  3. Select which years you want
  4. Find the maximum value for each year over all months
  5. Print the results

So now we need to think about how to implement these steps.

Examine the data

For the first step, there are many ways you could do this.

You will probably want to look at the data set in a browser.

You should see that the data are ‘white space’ separated.

The first 4 lines are ‘header’ text, giving contextual information on the data.

Subsequent lines have the year in the first column, then 12 columns of monthly precipitation, then an annual total.

Read the data into the computer program

  1. You could simply save the file using ‘Save As ...’ from the browser.
  2. You could, if you really wanted, just copy and paste the data into a file on the local system.
  3. You could use the unix command wget:
berlin% mkdir -p ~/DATA/geogg122/Chapter2_Python_intro/files/data
berlin% cd ~/DATA/geogg122/Chapter2_Python_intro
berlin% wget -O data/HadSEEP_monthly_qc.txt \         http://www.metoffice.gov.uk/hadobs/hadukp/data/monthly/HadSEEP_monthly_qc.txt
  1. You could download and read the file directly from a URL within Python

Let us suppose that you had downloaded the file and saved it as files/data/HadSEEP_monthly_qc.txt.

In this case, we would use:

filename = 'data/HadSEEP_monthly_qc.txt'

fp = open(filename,'r')
raw_data = fp.readlines()
fp.close()

print raw_data[:10]
['Monthly Southeast England precipitation (mm). Daily automated values used after 1996.n', 'Wigley & Jones (J.Climatol.,1987), Gregory et al. (Int.J.Clim.,1991)n', 'Jones & Conway (Int.J.Climatol.,1997), Alexander & Jones (ASL,2001). Values may change after QC.n', 'YEAR   JAN   FEB   MAR   APR   MAY   JUN   JUL   AUG   SEP   OCT   NOV   DEC   ANNn', ' 1873  87.1  50.4  52.9  19.9  41.1  63.6  53.2  56.4  62.0  86.0  59.4  15.7  647.7n', ' 1874  46.8  44.9  15.8  48.4  24.1  49.9  28.3  43.6  79.4  96.1  63.9  52.3  593.5n', ' 1875  96.9  39.7  22.9  37.0  39.1  76.1 125.1  40.8  54.7 137.7 106.4  27.1  803.5n', ' 1876  31.8  71.9  79.5  63.6  16.5  37.2  22.3  66.3 118.2  34.1  89.0 162.9  793.3n', ' 1877 146.0  47.7  56.2  66.4  62.3  24.9  78.5  82.4  38.4  58.1 144.5  54.2  859.6n', ' 1878  39.9  44.7  34.2  76.6  96.0  46.8  42.3 133.1  35.7  72.9  94.1  40.7  757.0n']

So we have read the data in well enough, but it’s not really in a convenient format.

First, it has 4 lines at the top that we don’t want.
Second, although the data are in a list for each line, each line is

stored as a string.

Since we know about lists, we might suppose that it would be better to have each line as a list, with each ‘white space’ separated item being an element of the list.

If we have a string such as:

' 1873  87.1  50.4  52.9  19.9  41.1  63.6  53.2  56.4  62.0  86.0  59.4  15.7  647.7\n'

one way to achieve this would be to use split():

line_data = ' 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\n'
year_data = line_data.split()
print year_data
['1873', '87.1', '50.4', '52.9', '19.9', '41.1', '63.6', '53.2', '56.4', '62.0', '86.0', '59.4', '15.7', '647.7']

Thats useful, and we could loop over each line and perform this to get line lists with the first element as the year, second as precipitation in January, etc.

But each element is still a string, and really, we want these as float.

We can convert str to float using float(), but we have to do this for each string individually.

We can do this in a loop:

line_data = ' 1873 87.1 50.4 52.9 19.9 41.1 63.6 53.2 56.4 62.0 86.0 59.4 15.7 647.7\n'
year_data = line_data.split()

for column,this_element in enumerate(year_data):
    year_data[column] = float(this_element)

print 'format now',type(year_data[0])
print year_data
format now <type 'float'>
[1873.0, 87.1, 50.4, 52.9, 19.9, 41.1, 63.6, 53.2, 56.4, 62.0, 86.0, 59.4, 15.7, 647.7]

Now we know how to convert each line of data into a list of floating point numbers.

In practice, we will see later in the course that there are simpler ways of achieving (using `numpy <http://www.numpy.org/>`__).

We should first chop off the first 4 lines of data:

required_data = raw_data[4:]

# lets check what the first line is now
print required_data[0]
1873  87.1  50.4  52.9  19.9  41.1  63.6  53.2  56.4  62.0  86.0  59.4  15.7  647.7

Putting all of this together:

# specify filename
filename = 'data/HadSEEP_monthly_qc.txt'

# read the data, chop off first 4 lines
# and store in required_data
fp = open(filename,'r')
raw_data = fp.readlines()
fp.close()
required_data = raw_data[4:]

# set up list to store data in
data = []


# loop over each line
for line_data in required_data:
    # split on white space
    year_data = line_data.split()

    # convert data to float
    for column,this_element in enumerate(year_data):
        year_data[column] = float(this_element)
    data.append(year_data)

Now we have the data read in, as floating point values, in the 2-D list called data:

print data[0]
print data[1]
[1873.0, 87.1, 50.4, 52.9, 19.9, 41.1, 63.6, 53.2, 56.4, 62.0, 86.0, 59.4, 15.7, 647.7]
[1874.0, 46.8, 44.9, 15.8, 48.4, 24.1, 49.9, 28.3, 43.6, 79.4, 96.1, 63.9, 52.3, 593.5]

and we can compare this with the original data we saw on the web or the file we downloaded to check it’s been read in correctly.

Select which years you want

We want years 1900 to 1999.

The year is stored in the first column, e.g. data[10][0], and we want columns 1 to -1 (i.e. skip the first and last column):

print data[10][0]
print data[10][1:-1]
1883.0
[60.2, 97.4, 23.6, 36.5, 48.1, 53.3, 69.7, 21.7, 100.6, 58.1, 97.8, 22.5]
c20_data = []
for line in data:
    if (line[0] >= 1900) and (line[0] < 2000):
        c20_data.append(line[1:-1])

Find the maximum value for each year over all months

We want to find the maximum value in each row of c20_data now.

If we consider row 0, we can use e.g.:

print c20_data[0]
print max(c20_data[0])
[92.8, 125.9, 23.3, 30.7, 30.0, 74.6, 31.0, 73.3, 22.3, 56.1, 72.1, 88.2]
125.9

i.e. in the year 1900 (row 0), for S.E. England, the maximum rainfall was in February.

# Aside: show which month that was
month_names = [ "Jan", "Feb", "Mar", "Apr", \
    "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" ]

print "In South East England"
for row in xrange(0,100,10):
    year = 1900 + row
    max_precip = max(c20_data[row])
    month = c20_data[row].index(max_precip)

    print "In the year",year,"the rainiest month was",month_names[month],"with",max_precip,"mm"
In South East England
In the year 1900 the rainiest month was Feb with 125.9 mm
In the year 1910 the rainiest month was Dec with 119.4 mm
In the year 1920 the rainiest month was Jul with 116.9 mm
In the year 1930 the rainiest month was Nov with 121.0 mm
In the year 1940 the rainiest month was Nov with 203.0 mm
In the year 1950 the rainiest month was Nov with 137.9 mm
In the year 1960 the rainiest month was Oct with 167.0 mm
In the year 1970 the rainiest month was Nov with 186.4 mm
In the year 1980 the rainiest month was Oct with 110.1 mm
In the year 1990 the rainiest month was Feb with 122.3 mm
# max precip for all years

result = []
for row in xrange(100):
    result.append(max(c20_data[row]))

Exercise 2.5

That is quite an achievement, given the limited amount of programming you know so far.

If you go through this though, you will (should) see that it is really not very efficient.

For example:

  • we read all the data in and then filter out the years we want (what if the dataset were huge?)
  • we loop over the 100 years multiple times
  • we store intermediate results

For this exercise, you should look through the code we developed and try to make it more efficient.

Efficiency should not override clarity and understanding though, so make sure you can understand what is going on at each stage.

Exercise 2.6

Average Temperature

We want to calculate the long-term average temperature (tmax degC) using observational data at one or more meteorological stations. Such data are relevant to understanding climate and its dynamics.

We choose the period 1960 to 1990 (30 years average to even out natural variability).

We can obtain monthly average data for a number of UK stations from the UK Met. Office.

Not all station records are complete enough for this calculation, so we select, for example Heathrow.

Just with the Python skills you have learned so far, you should be able to solve a problem of this nature.

Before diving into this though, you need to think through what steps you need to go through to achieve your aim?

At a ‘high’ level, this could be:

  1. Get hold of the data
  2. Read the data into the computer program
  3. Select which years you want
  4. Average the data for each month over all selected years
  5. Print the results

So now we need to think about how to implement these steps.

For the first step, there are many ways you could do this.

2.9 Dictionaries

Dictionaries (type dict) are another way of grouping objects.

These are defined within curley brackets {} and are distinguished by having a 'key and value for each item.

e.g.:

a = {'one': 1, 'two': 2, 'three': 3}

# we then refer to the keys and values in the dict as:

print 'a:\n\t',a
print 'a.keys():\n\t',a.keys()     # the keys
print 'a.values():\n\t',a.values() # returns the values
print 'a.items():\n\t',a.items()   # returns a list of tuples
a:
    {'three': 3, 'two': 2, 'one': 1}
a.keys():
    ['three', 'two', 'one']
a.values():
    [3, 2, 1]
a.items():
    [('three', 3), ('two', 2), ('one', 1)]

Notice that the order they appear in is not necessarily the same as when we generated the dictionary.

We refer to specific items as e.g.:

print a['one']
1

We can loop over dictionaries in various interesting ways:

for k in a.keys():
    print a[k]
3
2
1
for k,v in a.items():
    print k,v
three 3
two 2
one 1

If you really need to process in some order, you need to sort the keys in some way:

for k in sort(a.keys()):
    print a[k]
1
3
2

though in this case this still might not be what you want, so be careful.

You can add to a list with update:

a.update({'four':4,'five':5})
print a
{'four': 4, 'three': 3, 'five': 5, 'two': 2, 'one': 1}

or for a single item:

a['six'] = 6
print a
{'six': 6, 'three': 3, 'two': 2, 'four': 4, 'five': 5, 'one': 1}

or delete items:

del a['three']
print a
{'six': 6, 'two': 2, 'four': 4, 'five': 5, 'one': 1}

These trivial examples are useful for understanding some basic operations on dictionaries, but don’t show their real power.

A good example is to consider a configuration file.

When we have to run complicated processing jobs (e.g. on stacks of satellite data), we will often control the processing of these jobs with some text that we may put in a file.

This would describe the particular conditions that one subset of the jobs would process, for example.

Following from exercise 2.4 above, we might create a configuration file with:

[TERRA]
dir  = /data/geospatial_19/ucfajlg/fire/Angola/MOD09
name = MODIS TERRA data
year = 2004
doy_start = 214
doy_end = 245
file_list = files/data/modis_files2a.txt

[AQUA]
dir  = /data/geospatial_19/ucfajlg/fire/Angola/MYD09
name = MODIS AQUA data
year = 2004
doy_start = 214
doy_end = 245
file_list = files/data/modis_files2b.txt

This information is in the file `files/data/modis.cfg <files/data/modis.cfg>`__.

We could read and parse this file ourselves:

fp = open('data/modis.cfg')

# empty dict
modis = {}
this_section = modis

# loop over each line
for line in fp.readlines():
    # strip any extra white space
    line = line.strip()
    # check that there is some text
    # and it starts with [ and ends with ]
    if len(line) and line[0] == '[' and line[-1] == ']':
        section = line[1:-1]
        modis[section] = this_section = {}
    elif len(line) and line.find("=") != -1:
        key,value = line.split("=")
        this_section[key.strip()] = value.strip()

fp.close()
print modis
{'AQUA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS AQUA data', 'year': '2004', 'file_list': 'files/data/modis_files2b.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MYD09'}, 'TERRA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS TERRA data', 'year': '2004', 'file_list': 'files/data/modis_files2a.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MOD09'}}

though in practice, we might choose to use the `ConfigParser <http://docs.python.org/2/library/configparser.html>`__ module:

import ConfigParser

config = ConfigParser.ConfigParser()
config.read('data/modis.cfg')

# we can convert this to a normal dictionary
modis = {}
for k in config.sections():
    modis[k] = dict(config.items(k))
print modis
{'AQUA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS AQUA data', 'year': '2004', 'file_list': 'files/data/modis_files2b.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MYD09'}, 'TERRA': {'doy_end': '245', 'doy_start': '214', 'name': 'MODIS TERRA data', 'year': '2004', 'file_list': 'files/data/modis_files2a.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MOD09'}}

The text file data/modis_files2a.txt contains a listing of hdf format files that are in the directory /data/geospatial_19/ucfajlg/fire/Angola/MOD09 and data/modis_files2b.txt those in /data/geospatial_19/ucfajlg/fire/Angola/MYD09 on the UCL Geography system. The contents of the files looks like (first 10 lines):

!head -10 <  data/modis_files2b.txt
MYD09GA.A2004001.h19v10.005.2008035021539.hdf
MYD09GA.A2004002.h19v10.005.2008035115941.hdf
MYD09GA.A2004003.h19v10.005.2008035223215.hdf
MYD09GA.A2004004.h19v10.005.2008036154947.hdf
MYD09GA.A2004005.h19v10.005.2008036025835.hdf
MYD09GA.A2004006.h19v10.005.2008037030304.hdf
MYD09GA.A2004007.h19v10.005.2008037072048.hdf
MYD09GA.A2004008.h19v10.005.2008037155636.hdf
MYD09GA.A2004009.h19v10.005.2008037162301.hdf
MYD09GA.A2004010.h19v10.005.2008038034819.hdf

Let’s try to use the information in the configuration dictionary modis to generate a list of the files we want to process:

# first, work out how to loop over config sections
# and get the sub-dictionary

for k,v in modis.items():
    print "\nexamining section",k
    sub_dict = v
    print v
examining section AQUA
{'doy_end': '245', 'doy_start': '214', 'name': 'MODIS AQUA data', 'year': '2004', 'file_list': 'files/data/modis_files2b.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MYD09'}

examining section TERRA
{'doy_end': '245', 'doy_start': '214', 'name': 'MODIS TERRA data', 'year': '2004', 'file_list': 'files/data/modis_files2a.txt', 'dir': '/data/geospatial_19/ucfajlg/fire/Angola/MOD09'}

Next, make sure we can read the file_list:

print 'reading',sub_dict['file_list']
fp = open(sub_dict['file_list'],'r')
file_data = fp.readlines()
fp.close()

# print the first one, just to see what it looks like
count = 0
print file_data[count]
reading files/data/modis_files2a.txt
MOD09GA.A2004001.h19v10.005.2008109063923.hdf

Make sure we can get the day of year from this:

print file_data[count].split('.')
['MOD09GA', 'A2004001', 'h19v10', '005', '2008109063923', 'hdfn']
print file_data[count].split('.')[1]
A2004001
doy = int(file_data[count].split('.')[1][-3:])
print doy
1

whats the range of days we want?

doy_range = range(int(sub_dict['doy_start']),int(sub_dict['doy_end']))
print doy_range
[214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244]

Is doy in the range?

print doy in doy_range
False

So now we have all of the parts we need to create this code:

# 1. Read the configuration file
# into the dict modis
import ConfigParser

config = ConfigParser.ConfigParser()
config.read('files/data/modis.cfg')

# we can convert this to a normal dictionary
modis = {}
for k in config.sections():
    modis[k] = dict(config.items(k))

# 2. Now, loop over config sections
# and get the sub-dictionary which we call sub_dict

# 3. set up anb empty list to contain the
# files we want to process
wanted_files = []

for k,v in modis.items():

    sub_dict = v

    # 3a. Read the file list
    fp = open(sub_dict['file_list'],'r')
    file_data = fp.readlines()
    fp.close()

    # 3b. find the doy range
    doy_range = range(int(sub_dict['doy_start']),\
                          int(sub_dict['doy_end']))

    # 3c. loop over each file read from
    #     sub_dict['file_list']
    for count in xrange(len(file_data)):
        # 3d. extract doy from the file name
        this_file = file_data[count]

        doy = int(this_file.split('.')[1][-3:])

        # 3e. see if doy is in the range we want?
        if doy in doy_range:

            # 3f. put the directory on the fornt
            full_name = sub_dict['dir'] + \
                        '/' + this_file
            wanted_files.append(full_name)

print "I found %d files to process"%len(wanted_files)

# I won't print the whole list as its too long
# just the first 10
for f in wanted_files[:10]:
    print f
I found 62 files to process
/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004214.h19v10.005.2007299212915.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004215.h19v10.005.2007300042347.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004216.h19v10.005.2007300091257.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004217.h19v10.005.2007300153436.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004218.h19v10.005.2007300215826.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004219.h19v10.005.2007302194509.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004220.h19v10.005.2007302093547.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004221.h19v10.005.2007302222054.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004222.h19v10.005.2007303011606.hdf

/data/geospatial_19/ucfajlg/fire/Angola/MYD09/MYD09GA.A2004223.h19v10.005.2007303073538.hdf

Exercise 2.7

You should modify the example above to make it simpler, if you can spot any places for that (don’t make it more complicated!).

You should then modify it so that the list of files that we want to process is printed to a file.

2.10 Where we have reached

That is plenty of Python for one day. We would not expect you to be able to remember all of this the first time you go through it: you will have to work at it and go through it multiple times. Make sure that the time you spend going through codes and examples is profitable though: it is all too easy to sit and stare at a section of code for hours without learning anything ... you need to be engaged with the learning for this to happen, so take frequent breaks, write your own summary notes and do whatever you need to get to grips with the basics here.

Although we have rather crammed a lot into this session, that is for timetabling reasons more than for effectiveness. When you go through it in your own time, break it into sections and try to get to grips with each part.

We would not expect you to be able to develop codes of the sort we have developed above from scratch to start with, but as you go through these notes and do the exercises, you should start to understand something of how we build a piece of code.

Our experience is that you should be able to get to grips with these examples (or at least most of them). Don’t start by looking at some finished piece of code ... start by going through the simple examples to learn the commands and syntax, then try to put the pieces together.

Once you have learned the basic tools, you will be in a much better position to think about algorithms, i.e. how to break a problem down into smaller parts that you can solve with the Python that you know.

2.11 Answers

Answers to the exersises in this session are made available to you, though you should obviously only consult these when you are finished (or if you are very stuck).

2.12 Advanced

Once you have got to grips with the basics in this session, you might consider stretching yourself and going through the advanced section.

Consult the answers to the exercises once you have completed them, and come along to office hours if yoiu want to go through anything.