17 minutes
Notes - Book - The Hacker’s Guide to Python By Julien Danjou
Contents
1: Starting your project
2: Modules and Libraries
3: Documentation
4: Distribution
5: Virtual Environments
6: Unit Testing
7: Methods and decorators
8: Functional Programming
9: The AST
10: Performances and optimizations
11: Scaling and architecture
12: RDBMS and ORM
13: Python 3 support strategies
14: Write less, code more
Starting your project
Project Layout
One common mistake is leaving unit tests outside the package directory. These tests should definitely be included in a sub-package or your software.
setup.py
is the standard name for Python installation script.
distuitls
-> Python distribution utilities
Having a functions.py
file or exceptions.py
file is a terrible approach. It doesn’t help anything at all with code organization and forces a reader to jump between files for no good reason.
Organize your code based on features, not type.
Don’t create hooks/__init__.py
where hooks.py
would have been enough. If you create a directory, it should contain several other Python files that belongs to the category/module the directory depends
Coding style & automated checks
Encode files using ASCII or UTF-8
One module import per import
statement per line, at the top of the file, after comments and docstrings, grouped first by standard, then third-party and finally local library imports
Name classes in CamelCase
Suffix exceptions with Error
(if applicable)
Name functions in lowercase with words separated by underscores
Use a leading underscore for _private
attributes or methods
Use pep8
checks. Also use pylint
. If you already have a codebase, a good approach is to run them with most of the warnings disabled and fix issues one category at a time
Modules and Libraries
The import system
sys.path
variable tells Python where to look for modules to load. You can also use the PYTHONPATH
variable for this.
Some useful Standar Libraries
- atexit allows you to register functions to call when your program exits
- argparse provides functions for parsing command line arguments
- bisect provides a bisection algorithms for sorting lists
- calendar provides a number of date-related functions
- codecs provides a variety of useful data structures
- collections provides a variety of useful data structures
- copy provides functions for copying data
- csv
- datetime
- fnmatch provides functions for matching Unix-style filename patterns
- glob provides functions for matching Unix-style path patterns
- io provides functions for handling I/O streams. In Python 3, it also contains StringIO (which is in the module of the same name in Python 2), which allows you to treat strings as files
- json
- logging
- multiprocessing
- operator
- os
- random
- re
- select provides the
select()
andpoll()
functions for creating event loops - shutil provides access to high-level file functions
- signal provides functions for handling POSIX signals
- tempfile
- threading
- urllib
- uuid
The entire standard library is written in Python.
External libraries
There’s no way you can know for sure whether a library that is zealously maintained today will still be like that in a few months.
Openstack checklist for deciding if a library is likely to be supported in the future:
- Python 3 compatibility
- Active development
- Active maintenance
- Packaged with OS distribution
It is sometimes better to write your own API - a wrapper tha encapsulates your external libraries and keeps them out of your source code
Frameworks
Difference between frameworks and external libraries is that applications make use of frameworks by building on top of them: your code will extend the framework rather than vice versa. Unlike a library, which is basically an add-on you can bring in to give your code some extra oomph, a framework forms the chassis of your code: everything you do is going to build on that chassis in some way, which can be a double-edged sword
Interview with Doug Hellmann
-
When creating a new application, I create some code and run it by hand, then write tests to make sure I’ve covered all of the edge cases after I have the basic aspect of a feature working. Creating the tests may also lead to some refactoring to make the code easier to work with.
-
While designing and app, I think about how the user interface works, but for libraries, I focus on how a developer will use the API
-
I have also found that writing the documentation for a library before writing any code at all gives me a way to think through the features and workflows for using it without committing to the implementation details
-
I like to use namedtuple for creating small class-like data structures that just need to hold data but don’t have any associated logic
-
If I have more than a handful of imports, I reconsider the design of the module and think about splitting it up into a package
-
Application are collection of “glue code” holding libraries together for a specific purpose. Design based on implementing those features as a library first and then building the application ensures that code is properly organized into logical units, which in turn makes testing simpler. It also means the features of an application are accessible through the library and can be remixed to create other applications. Failing to take this approach means the features of the application are tightly bound to the user interface, which makes them harder to modify and reuse.
-
Design libraries and APIs from the top down
-
Single Responsibility Principle (SRP) for each layer
-
Convert filtering loops to generator expressions
-
Use a
dict()
as a lookup table instead of a longif:then:else
block -
functions should always return the same type of object (e.g. an empty list instead of
None
) -
Reduce the number of arguments to a function by combining related values into an object with either a tuple or a new class
-
You may end up fighting with the framework if you try to use different patterns or idioms than it recommends
Managing API Changes
When building an API, it’s rare to get everything right the first try. Your API will have to evolve, adding, removing or changing the features it provides.
The first thing and the most important step when modifying an API is to heavily document the change. This includes:
-
documenting the new interface
-
documenting that the old interface is deprecated
-
documenting how to migrate to the new interface
Example
class Car(object):
def turn_left(self):
"""Turn the car left.
.. deprecated:: 1.1
Use :func:`turn` instead with the direction argument set to left
"""
self.turn(direction='left')
def turn(self, direction):
"""Turn the car in some direction.
:param direction: The direction to turn to.
:type direction: str
"""
# Write the actual code here instead
pass
Python provides an interesting module called warnings
. This module allows your code to issue various kinds of warnings, such as PendingDeprecationWarning
and DeprecationWarning
.
import warnings
class Car(object):
def turn_left(self):
"""Turn the car left.
.. deprecated:: 1.1
Use :func:`turn` instead with the direction argument set to "left"
"""
warnings.warn("turn_left is deprecated, use turn instead", DeprecationWarning)
self.turn(direction='left')
Run test suites with -W error
option, which transforms warnings into exceptions. This means that every time an obsolete function is called, an error will be raised, and it will be easy for developers using your library to know exactly where their code needs to be fixed
Interview with Christophe de Vienne
-
Coming up with good use cases makes it easier to design and API
-
Most web frameworks assume they’re running on a multi-threaded server and treat all this information as TSD (Thread-Specific Data)
-
Document early and include your documentation build in continuous
-
Use docstrings to document classes and functions in your API. Use PEP 257
Documentation
reStructuredText or reST
Sphinx
doctest is a standard Python module which searches your documentation for code snippets and runs them to test whether they accurately reflect what your code actually does. Every paragraph starting with >>>
(i.e. the primary prompt) is treated as a code snippet to test
It’s easy to end up leaving your examples unchanged as your API evolves; doctest helps you make sure this doesn’t happen
Documentation-Driven Development (DDD): write your documentation and examples first, and then write your code to match your documentation
Distribution
distutils
setuptools is the distribution library to use for the time being, but keep an eye out for distlib in the future
pbr (Python Build Reasonableness). Use it to write your next setup.py
Virtual Environments
To have access to your system installed packages, enable them when creating virtual environment by passing the --system-site-packages
flag to the virtualenv
command
Virtual environments are very useful for automated run of unit test suite » tox
The -m
flag loads the module. Eg.
python3 -m venv
For creating a virtual environment,
python3 -m venv myvenv
Unit Testing
Writing code that is not tested is essentially useless, as there’s no way to conclusively prove that it works
Your tests should be stored inside a tests
submodule of your application or library
Use a hierarchy in your tests that mimics the hierarchy you have in your module tree. This means that the tests covering the code of mylib/foobar.py
should be inside mylib/tests/test_foobar.py
To deliberately fail a test right away, use the fail(msg)
method
import unittest
class TestFail(unittest.TestCase):
def test_rang(self):
for x in range(5):
if x > 4:
self.fail("Testing manual fail")
To run a test conditionally based on the presence of a particular library, you can raise the unittest.SkipTest
exception. When this exception is raised by a test, it is simply marked as having been skipped. Alternatives are unittest.TestCase.skipTest()
method and using the unittest.skip
decorator
class TestSkipped(unittest.TestCase):
@unittest.skip("Do not run this")
def test_fail(self):
self.fail("this should not be run")
@unittst.skipIf(mylib is None, 'mylib is not available')
def test_mylib(self):
self.assertEqual(mylib.foobar(), 42)
def test_skip_at_runtime(self):
if True:
self.skipTest('Finally I dont want to run it')
Fixtures represent components that are set up before a test and cleaned up after the test is done
Mocking
Mock objects are simulated objects that mimic the behaviour of real application objects, but in particular and controlled ways
Standard library » mock. In Python 3.3+, it has been merged into the python standard library as unittest.mock
try:
from unittest import mock
except:
import mock
Basic Mock usage:
>>> import Mock
>>> m = mock.Mock()
>>> m.some_method.return_value = 42
>>> m.some_method()
42
>>> def print_hello():
... print('hello world !')
...
>>> m.some_method.side_effect = print_hello
>>> m.some_method()
hello world !
>>> def print_hello():
... print('hello world !')
... return 43
...
>>> m.some_method.side_effect = print_hello
>>> m.some_method()
hello world !
43
>>> m.some_method.call_count
3
Even using jus this set of features, you should be able to mimic a lot of your internal objects in order to fake various data scenarios
Mock uses the action/assertion pattern: this means that once your test has run, you will have to check that the actions you are mocking were correctly executed.
>>> import mock
>>> m = mock.Mock()
>>> m.some_method('foo', 'bar')
>>> m.some_method.assert_called_once_with('foo', 'bar')
>>> m.some_method.assert_called_once_with('foo', mock.ANY)
>>> m.some_method.assert_called_once_with('foo', 'baz')
... Throws error !!
Using mock.patch
>>> import mock
>>> import os
>>> def fake_os_unlink(path):
... raise IOError('Testing!')
...
>>> with mock.patch('os.unlink', fake_os_unlink):
... os.unlink('foobar')
...
Traceback ...
IOError: Testing!
With the mock.patch
method, it’s possible to change any part of an external piece of code - makig it behave in the required way in order to test all conditions in your software
There is also a decorator version of mock.patch
def get_fake_get(status_code, content):
m = mock.Mock()
m.status_code = status_code
m.content = content
def fake_get(url):
return m
return fake_get
class WhereIsPythonError(Error):
pass
def check_for_something():
try:
r = requests.get('http://python.org')
except IOError:
pass
else:
if r.status_code == 200:
return 'Check successful !'
raise WhereIsPythonError('Something bad happened')
class TestPythonError(unittest.TestCase):
@mock.patch('requests.get', get_fake_get(404, 'Whatever))
def test_ioerror(self):
self.assertRaises(WhereIsPythonError, check_for_something)
Use testscenarios
to run a class test against a different set of scenarios generated as run-time
import mock
import requests
import testscenarios
class CustomTestError(Exception):
pass
def check_something_online():
r = requests.get('http://some.url')
if r.status_code == 200:
return 'Test data' in r.content
raise CustomTestError('Something bad happened')
def get_fake_get(status_code, content):
m = mock.Mock()
m.status_code = status_code
m.content = content
def fake_get(url):
return m
return fake_get
class MyTestErrorCode(testscenarios.TestWithScenarios):
scenarios = [
('Not found', dict(status=404)),
('Client error', dict(status=400)),
('Server error', dict(status=500))
]
def test_some_external_stuff(self):
with mock.patch('requests.get',
get_fake_get(
self.status,
'Test data string')):
self.assertRaises(WhereIsPythonError, check_something_online)
Construct the scenario list as a list of tuples that consists of the scenario name as the first argument, and the dictionary of attributes to be added to the test class for this scenario as the second argument
Tox
Creates a virtual environment, installs setuptools and the installs all of the dependencies required for both your application/library runtime and unittests.
tox.ini
By default tox can simulate many environments: py27, py34 etc. To add an environment or to create a new one, you just need to add another section named [testenv:_envname_]
Sample tox.ini file
[tox]
envlist=py27,py34,pep8
[testenv]
deps=nose
-r requirements.txt
commands=pytest
[testenv:pep8]
deps=flake8
commands=flake8
To run tox in parallel use detox which runs all of the default environments for the envlist in parallel
Testing Policy
You should have a zero tolerance policy on untested code. No code should be merged unless there is a proper set of unit tests to cover it
Methods and decorators
Creating Decorators
A decorator is essentially a function that takes another function as an argument and replaces it with a new, modified argument.
The primary us case for decorators is factoring common code that needs to be called before, after or around multiple functions.
Use the functools module’s update_wrapper
to update the attributes to the wrapper itself
It can get tedious to use update_wrapper
manually when creating decorats, so functools provides a decorator for decorators called wraps
.
import functools
def check_is_admin(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
if kwargs.get('username') != 'admin':
raise Exception('This user is not allowed here')
return f(*args, **kwargs)
return wrapper
The inspect module allows us to retrieve a function’s signature and operate on it:
import functools
import inspect
def check_is_admin(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
func_args = inspect.getcallargs(f, *args, **kwargs)
if func_args.get('username') != 'admin':
raise Exception('This user is not allowed here')
return f(*args, **kwargs)
return wrapper
@check_is_admin
def get_food(username, type='chocolate'):
return type + ' nom nom nom!'
In this case, inspect.getcallargs
returns {'username': 'admin', 'type': 'chocolate'}
. The advantage of this approach is that our decorator doesnt have to check if the username
parameter is a positional or a keyword argument: all it has to do is look for it in the dictionary
How methods work in Python
A method is a function that is stored as a class attribute
In Python 3, the concept of unbound method has been removed entirely, and trying to call a method that is not tied to any particular object would raise error about missing positional argument ‘self’
If you have a reference to a method and want to find out which object it’s bound to, use the method’s __self__
property
>>> m = Pizza(42).get_size
>>> m.__self__
<__main__.Pizza object at #########>
>>> m = m.__self__.get_size
True
Static Methods
methods which belong to a class, but don’t actually operate on class instances
When we see @staticmethod
, we know that the method does not depend on the state of the object
Class method
methods that are bound directly to a class rather than its instances
However you choose to access this method (by class name or object), it will be always bound to the class it is attached to, and its first argument will be the class itself (remember classes are objects too!)
implement your abstract methods using Python’s built-in
abc
module
import abc
class BasePizza(object):
__metaclass__ = abc.ABCMeta
@abc.abstractmethod
def get_radius(self):
"""Method that should do something"""
It is also possible to use the @staticmethod
and @classmethod
decorators on top of @abstractmethd
:
import abc
class BasePizza(object):
__metaclass__ = abc.ABCMeta
default_ingredients = ['cheese']
@classmethod
@abc.abstractmethod
def get_ingredients(cls):
"""Returns the ingredient list"""
return cls.default_ingredients
class DietPizza(BasePizza):
def get_ingredients(self):
return [Egg()] + super(DietPizza, self).get_ingredients()
There’s no way to force subclasses to implement abstract methods as a specific kind of method
The truth about super
Multiple inheritance is still used in many places, and especially in code where the mixin pattern is involved
A mixin is a class tha inherits from two or more other classes, combining their features together
mro()
» method resolution order used to resolve attributes
super()
is actually a constructor, and you instantiate a super object each time you call it. It takes either one or two arguments: the first argument is a class, and the second argument is either a subclass or an instance of the first argument. The object returned by the construcor functions as a proxy for the parent classes of the first arguments.
Descriptor protocol is the mechanism in Python that allows an object that’s stored as an attribute to return something other thank itself. (__get__
)
In Python 3, super()
can be called from within a method without any arguments
class B(A):
def foo(self):
super().foo()
super
is the standard way of accessing parent attributes in subclasses, and you should alway use it. It allows cooperative calls of parent methods without any surprises, such as parent methods not being called or being called twice when using multiple inheritance
Functional Programming
Functional programming allows you to write more concise and efficient code.
When you write code using functional style, your functions are designed not to have side effect: they take an input and produce an output without keeping state or modifying anything not reflected in the return value > purely functional
Generators
an object that returns a value on each call of its
next()
method until it raisesStopIteration
.
Iterator protocol
generator statements » yield
statement
To check if a function is a generator or not, use the inspect.isgeneratorfunction
Python 3 » inspect.getgeneratorstate
- waiting to be run for the first time -
GEN_CREATED
- currently being executed by the interpreter -
GEN_RUNNING
- waiting to be resumed by a call to
next()
-GEN_SUSPENDED
- finished running -
GEN_CLOSED
Generators allow you to handle large data sets with minimal consumption of memory and processing cycles by generating values on-the fly.
One-line generators - sytax is similar to list comprehensions
>>> (x.upper() for x in ['hello', 'world'])
<generator object>
>>> gen = (x.upper() for x in ['hello', 'world'])
Using first
>>> from first import first
>>> first([0, False, None, [], ()])
42
>>> first([-1, 0, 1])
-1
>>> first([-1, 0, 2], key=lambda x: x > 0)
2
lambda
was actually added to Python in the first place to facilitate functional programming functions such as map()
and filter()
Use partial
functools.partial
is typically useful in replacement of lambda, and is to be considered as a superior alternative. lambda
is to be considered an anomaly in Python language, due to its limited body size of one line long single expression
Use operator
module
The AST
Abstract Syntax Tree A tree representation of the abstract structure of the source code of any programming language
Performances and optimizations
Data Structures
Often, there is a temptation to code your own custom data structures - this is invariably a vain, useless, doomed idea. Python almost always has better data structures and code to offer - learn to use them
The set data structures have methods which can solve many problems that would otherwise need to be addressed by writing nested for/if blocks
def has_invalid_fields(fields):
for field in fields:
if field not in ['foo', 'bar']:
return True
return False
This can be written without a loop:
def has_invalid_fields(fields):
return bool(set(fields) - set(['foo', 'bar']))
Each time that you try to access a non-existent item from your dict, the defaultdict will use the function that was passed as argument to its constructor to build a new value - instead than raising a KeyError
OrderedDict
Counter
Profiling
cProfile
» standard tool for profiling
dis
module » a disassembler of Python byte code. It prints the list of bytecode instructions that are run by the function
A common wrong habit is defining of functions inside functions for no reason. This has a cost - as the function is going to be redefined over an over for no reason. The function calling in Python is already inefficient. The only case in which it is required to define a function within a function is when building a function closure.
Ordered list and bisect
bisect
module - provides bisection algorithm
bisect.bisect(sorted_list, new_item)
- allows you to retrieve the index where a new list element should be inserted, while keeping the list sorted
bisect.insort(sorted_list, new_item)
- in case you wish to insert the element immediately
Namedtuple and slots
Classes in Python can define a __slots__
attribute that will list the only attributes allowed for instances of this class. It seems that by using the __slots__
attribute of Python classes, we can halve our memory usage - this means that when creating a large amount of simple objects, the __slots__
attribute is an effective and efficient choice.
The usage of the namedtuple
class factory is almost as efficient as using an object with __slots__
, the only difference being that it is compatible with the tuple class. It can therefore be passed to many native Python functions and libraries that expect an iterable type as an argument.
Memoization
caching
Python 3.3+ » functools.lru_cache
decorator
import functools
import math
@functools.lru_cache(max_size=2)
def memoized_sin(x):
return math.sin(x)
Scaling and architecture
RDBMS and ORM
Python 3 support strategies
The only way to be sure that your code works under both Python version is to have unit testing (use tox to simplify this)
Remember string vs unicode
Write less, code more
Context managers
Use context management protocol if you identify the following pattern:
- Call method A
- Execute some code
- Call method B
Use contextlib
» contextmanager
. Work on usage of enter and exit methods
Remember that with statement supports having multiple arguments so you should write
with open('file1', 'r') as source, open('file2', 'w') as dest:
destination.write(source.read())