slice() objects in Python

I’ve been using Python for a while now. I’ve generally learned it organically, figuring out what I need to know to solve the problem at hand, and moving on to the next one. This has worked out OK, but has left some gaps.

For example, while reading Writing Idiomatic Python just a few years ago, I was surprised to learn about dictionary comprehension, set comprehension, and generator expressions. I don’t need those things, but they sure are nice.

Lately, I’ve been flipping through Fluent Python. One thing that has stood out as an important part of understanding the language, that I just had no idea about: slice objects.

I’ve known for a while that, behind the scenes, code like this:

my_object[3]
my_object["blue"]

…gets interpreted as something more like this:

my_object.__getitem__(3)
my_object.__getitem__("blue")

This is useful, for example, if you want to write your own classes that support numeric or key-based indexes. You just provide your own implementation of __getitem__.

What about this, though?

my_object[3:9:2]

That’s python’s syntax for “slicing” a sequence. We’re asking for the third-to-seventh items from the sequence, while skipping every two. Translated into a __getitem__ call, it looks like this:

my_object.__getitem__(slice(3,9,2))

“slice” is a built-in class for representing a slice operation. It has no connection to the actual data you’re dealing with. It simply represents the abstract idea of “every other item from third-to-ninth” of some (any!) sequence. The metaphor isn’t perfect, but think of it as a piece of paper with a hole cut out of it, which you can lay over any sequence and see just the items you are interested in.

This might be an interesting language detail. Is it useful?

The canonical example seems to involve emulating named keys in fixed-width data. Let’s say we have just the data columns of this file (hours of daylight in Springfield, VA, for 2016). The first column is the day of the month, and the rest all refer to a particular month. For February 13, we would look at the 13th row, 3rd column.

By examining the file, we can figure out that the January column starts 8 characters in, and ends at character 13. Each month’s data starts 9 more characters in. This took some trial and error, but this produces all of the slices we need (ignoring the first column):

JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC = [slice(x*9+8,x*9+13) for x in range(0,12)]

That’s not going to win any prizes for clear, readable code. It does enable you to now use those variables in slicing operations, though. Which of these is easier to read?

daylight_hours = row[MAR]

or

daylight_hours=[26:31]

I think it’s useful.