This is a tutorial of indices and slices for people new to Python. We use examples to show slicing with positive and negative indices and steps.
Many people new to Python have the same reaction to indices: “this is weird”. Indices, though, permeate Python, in lists, in strings, in conditionals, and are a potential source of errors until we get used to them. Thus we might as well bite the bullet, get them straight, and move on. First, my apologies to the experienced programmers that will find some sections of this post very basic but, after all, this is a tutorial so we’ll go very slowly.
Slicing the Python way
We’ll discuss string slicing because that is the first thing that we all learn, but the examples will serve us equally well later, in list indexing and setting ranges.
$$a = ‘0123456789’$$
where the character at the k-th position is k.
We slice a using:
$$b = a[start:stop:step]$$
i.e, in Python, both the string and the variable that points to it are objects so we can slice either (actually, in Python everything is an object: a number, a variable, a string, a function, a file).
There are only three things to remember
1. start is the first item that we want (of course)
2. stop is the first item that we do not want
3. step, being positive or negative, defines whether we are moving
forwards (from the first position of the string towards its end) or backwards (from the last position of the string towards its start)
A caveat for when we move to languages other than Python: the definition of stop is one of the reasons slicing and indexing in Python looks weird to programmers familiar with other languages, because in most computer languages stop would be “the last item that we want”. Neither definition is better or worse than the other, but Python’s is definitely the unusual one.
Indexing and slicing with positive and negative numbers
We can denote positions in the string with positive numbers, counting from 0 at the beginning of the string:
b = "my mistress' eyes are nothing like the sun"} ↑ ↑ ↑ b b b
We can find the length of a string with the function len( ). In this case, b has 42 characters (from 0 to 41) so len(b) = 42. Thus, since the last char of b is
len(b) is 1 more than the last position of the string.
Sometimes it is useful to refer to the characters of the string as seen from the end of the string. In this case, we use negative numbers and count backwards, from -1 (not from 0):
b = "my mistress' eyes are nothing like the sun" ^ ^ ^ b[-42] b[-32] b[-1]
Since the last char, as seen from the end of the string, is b[-42], then the position that precedes it would be -len(b)-1 = -43.
Hence, in this example where len(b) = 42:
b = b[-len(b)] = b[-42] = 'm' b[len(b)-1] = b[-1] = b = 'n' b = b[-32] = 's'
and, in general,
b[k] = b[-len(b)+k]
Ugh… this is confusing… happily, it is a good idea to understand the general mechanism of indices but, in general, we don’t need to remember any of these boundaries: that is what defaults values are for.
Indexing and slicing with positive steps
If step is positve we are moving forwards; if it is ommited, it defaults to +1.
a[2:6] = '0123456789'[2:6:1] = '2345'
i.e., the first char that we want is that in the 2nd position (i.e., the 2), the first char that we do not want is that in the 6-th position (i.e., the 6).
Alternatively, seeing it from the end of the string:
a[-8:-4] = '0123456789'[-8:-4:1] = '2345'
i.e., the first char that we want is that in the 8-th position from the end (i.e., the 2), the first char that we do not want is that in the 4-th position from the end (i.e., the 6).
Hence, for any positive `step` we have the following defaults:
|-> -> ->| a = '0123456789' ^ ^ start:0 stop: len(a), i.e, one position beyond the end of the string
a[:] = a[0:len(a):1] = '0123456789' # a +1 step is the default a[::2] = a[0:len(a):2] = '02468' # all the even positions a[1::2] = '13579' # all the odd positions a[::3] = '0369' # all the multiples of 3
As long as we are starting and/or ending a slice with the start or the end of the string, we can omit them, and Python will calculate them and use them as the defaults.
An unusual side effect of the designation of the string boundaries is that Python takes len(a) as meaning ‘after the end of the string’. Thus, any number equal or larger than len(a) is equally suitable to indicate ‘after the end of the string’, e.g.,
a[:] = a[0:len(a)] = a[0:1000000] = '0123456789'
Indexing and slicing with negative steps
If step is negative then we are moving backwards:
a[6:2:-1] = '0123456789'[6:2:-1] = '6543'
i.e., the first char that we want is that in the 6-th position (i.e., the 6), the first char that we do not want is that in the 2nd position (i.e., the 2), or, alternatively
a[-4:-8:-1] = '0123456789'[-4:-8:-1] = '6543'
i.e., the first char that we want is that in the 4-th position from the back (i.e., the 6), the first char that we do not want is that in the 8-th position from the back (i.e., the 2).
Notice that we can use positive or negative indices going either forwards or backwards on the string.. we can even mix them:
a[6:-8:-1] = '6543' a[-4:2:-1] = '6543'
Sometimes this mixing might come handy, to strip a given number of characters from the front or end of the string:
url = ''[9:-2] = 'htp://foo.com'
Still, the point is to remember that using negative indices does not mean that we are moving backwards, only that we are indexing from the end. The sign of the step variable determines if we are moving forwards or backwards.
To move backwards we need to reverse the string in our minds:
|<- <- <-| a = '0123456789' ^ ^ ^ start:-1 stop:one position before the start of the string
a[::-1] = a[-1::-1] = '9876543210' # and, btw, we just reversed the string a[::-2] = a[-1:-len(a)-1:-2] = '97531' a[::-3] = a[-1:-len(a)-1:-3] = '9630'
Again, as long as we are starting and/or ending the slice with the start or the end of the string, we can leave the `start` and/or `stop` variables out and Python will use the defaults.
Not surprisingly, we now find the counterpart of the ‘unusual’ side effect of Python’s designation of the string boundaries previously mentioned: –len(a)-1 means ‘before the start of the string’, a role satisfied by any number equal or smaller than -len(a)-1, e.g.,
a[-1::-1] = a[-1:-len(a)-1:-1] = a[-1:-1000000:-1] = '9876543210'
So now, we have mastered Python indices and should be able to understand:
'0123456789'[8:2:-2] = '864' '0123456789'[8:-8:-2] = '864' '0123456789'[-2:2:-2] = '864' '0123456789'[-2:-8:-2] = '864'
Python – Indices and slicing
This post is an extended version of the following original post:
Title: Python-101-Unit 1 Understanding Indices and Slicing
Forum of “Introduction to Computer Science” by David Evans offered under Udacity, presently offered as “Introduction to Python programming”
First publication date: 17 Jul, 2012, 20:11 PST
Link: http://forums.udacity.com/cs101/questions/17002/python-101-unit-1-understanding-indices-and-slicing; this post might still be reachable using a Udacity account but it is no longer public; I posted it using the screenname ‘Goldsong’.