PV248 Python
Petr Ročkai
October 22, 2020
Parti: Object Model
PV248 Python
2/301
October 22, 2020
Objects
• the basic 'unit' of OOP
• also known as 'instances'
• they bundle data and behaviour
• provide encapsulation
• local (object) invariants
• make code re-use easier
PV248 Python 3/301 October 22, 2020
Classes
• each (Python) object belongs to a class
• templates for objects
• calling a class creates an instance o my_foo = Foo()
• classes themselves are also objects
PV248 Python
4/301
October 22, 2020
Types vs Objects
• class system is a type system
• since Python 3, types are classes
• everything is dynamic in Python
o variables are not type-constrained
PV248 Python 5/301 October 22, 2020
Poking at Classes
• you can pass classes as function parameters
• you can create classes at runtime
• and interact with existing classes: o {}.__class__, (0).__class__
o {}.__class_____class__
o compare type(0), etc.
o n = numbers.Number(); n.__class__
PV248 Python
6/301
October 22, 2020
Encapsulation
• objects hide implementation details
• classic types structure data
o objects also structure behaviour
• facilitates loose coupling
PV248 Python 7/301 October 22, 2020
Loose Coupling
• coupling is a degree of interdependence
• more coupling makes things harder to change o it also makes reasoning harder
• good programs are loosely coupled
• cf. modularity, composability
PV248 Python
8/301
October 22, 2020
Polymorphism
• objects are (at least in Python) polymorphic
• different implementation, same interface
o only the interface matters for composition
• facilitates genericity and code re-use
• cf. 'duck typing'
PV248 Python
9/301
October 22, 2020
Generic Programming
• code re-use often saves time
o not just coding but also debugging o re-usable code often couples loosely
• but not everything that can be re-used should be o code can be too generic
o and too hard to read
PV248 Python
10/301
October 22, 2020
Attributes
• data members of objects
• each instance gets its own copy
o like variables scoped to object lifetime
• they get names and values
PV248 Python
11/301
October 22, 2020
Methods
• functions (procedures) tied to objects
• implement the behaviour of the object
• they can access the object (self)
• their signatures (usually) provide the interface
• methods are also objects
PV248 Python
12/301
October 22, 2020
Class and Instance Methods
• methods are usually tied to instances
• recall that classes are also objects
• class methods work on the class (els)
• static methods are just namespaced functions
• decorators (jclassmethod, @staticmethod
PV248 Python
13/301
October 22, 2020
Inheritance
shape
r ^ ellipse rectangle ^ -j
i i
r ^ circle square
class Ellipse^ Shape J: ... usually encodes an is-a relationship
PV248 Python
14/301
October 22, 2020
Multiple Inheritance
• more than one base class is possible
• many languages restrict this
• Python allows general M-I
o class Bat( Mammal, Winged ): pass
• 'true' M-I is somewhat rare
o typical use cases: mixins and interfaces
PV248 Python
15/301
October 22, 2020
Mixins
• used to pull in implementation
o not part of the is-a relationship
o by convention, not enforced by the language
• common bits of functionality
o e.g. implement __gt__, __eq__ &c. using __lt__ o you only need to implement __lt__ in your class
PV248 Python
16/301
October 22, 2020
Interfaces
• realized as 'abstract' classes in Python
o just throw a Not Implemented exception o document the intent in a docstring
• participates in is-a relationships
• partially displaced by duck typing
o more important in other languages (think Java)
PV248 Python
17/301
October 22, 2020
Composition
• attributes of objects can be other objects
o (also, everything is an object in Python)
• encodes a has-a relationship
o a circle has a center and a radius o a circle is a shape
PV248 Python
18/301
October 22, 2020
Constructors
• this is the __init__ method
• initializes the attributes of the instance
• can call superclass constructors explicitly
o not called automatically (unlike C++, Java) o MySuperClass.__init__( self ) o super(). __init__ (if unambiguous)
PV248 Python
19/301
October 22, 2020
Class and Object Dictionaries
• most objects are basically dictionaries
• try e.g. foo. __dict__ (for a suitable foo)
• saying foo.x means foo.__dict__["x"]
o if that fails, type(foo) .__dict__["x"] follows
o then superclasses of type(foo), according to MRO
• this is what makes monkey patching possible
PV248 Python
20/301
October 22, 2020
Writing Classes
class Person: def __init__( self, name ):
self.name = name def greet( self ): print( "hello 11 + self.name )
p = Person( "you" ) p.greet()
PV248 Python 21/301 October 22,2020
Functions
• top-level functions/procedures are possible
• they are usually 'scoped' via the module system
• functions are also objects
o try print. __class__ (or type (print))
• some functions are built in (print, len,...)
PV248 Python
22/301
October 22, 2020
Modules in Python
• modules are just normal .py files
• import executes a file by name
o it will look into system-defined locations
o the search path includes the current directory
o they typically only define classes & functions
• import sys —> lets you use sys. argv
• from sys import argv —>■ you can write just argv
PV248 Python
23/301
October 22, 2020
Part 2: Memory Management & Builtin
Types
PV248 Python
24/301
October 22, 2020
Memory
• most program data is stored in 'memory'
o an array of byte-addressable data storage o address space managed by the OS o 32 or 64 bit numbers as addresses
• typically backed by RAM
PV248 Python
25/301
October 22, 2020
Language vs Computer
• programs use high-level concepts o objects, procedures, closures
o values can be passed around
• the computer has a single array of bytes o and a bunch of registers
PV248 Python
26/301
October 22, 2020
Memory Management
• deciding where to store data
• high-level objects are stored in flat memory o they have a given (usually fixed) size
o have limited lifetime
PV248 Python
27/301
October 22, 2020
Memory Management Terminology
• object: an entity with an address and size o can contain references to other objects o not the same as language-level object
• lifetime: when is the object valid
o live: references exist to the object
o dead: the object is unreachable - garbage
PV248 Python
28/301
October 22, 2020
Memory Management by Type
• manual: malloc and free in C
• static automatic
o e.g. stack variables in C and C++
• dynamic automatic
o pioneered by LISP, widely used
PV248 Python 29/301 October 22, 2020
Automatic Memory Management
• static vs dynamic
o when do we make decisions about lifetime o compile time vs run time
• safe vs unsafe
o can the program read unused memory?
PV248 Python
30/301
October 22, 2020
Object Lifetime
• the time between malloc and free
• another view: when is the object needed o often impossible to tell
o can be safely over-approximated o at the expense of memory leaks
PV248 Python
31/301
October 22, 2020
Static Automatic
• usually binds lifetime to lexical scope
• no passing references up the call stack o may or may not be enforced
• no lexical closures
• examples: C, C++
PV248 Python
32/301
October 22, 2020
Dynamic Automatic
• over-approximate lifetime dynamically
• usually easiest for the programmer
o until you need to debug a space leak
• reference counting, mark & sweep collectors
• examples: Java, almost every dynamic language
PV248 Python
33/301
October 22, 2020
Reference Counting
• attach a counter to each object
• whenever a reference is made, increase
• whenever a reference is lost, decrease
• the object is dead when the counter hits 0
• fails to reclaim reference cycles
PV248 Python
34/301
October 22, 2020
Mark and Sweep
• start from a root set (in-scope variables)
• follow references, mark every object encountered
• sweep: throw away all unmarked memory
• usually stops the program while running
• garbage is retained until the GC runs
PV248 Python
35/301
October 22, 2020
Memory Management in CPython
• primarily based on reference countin;
• optional mark & sweep collector o enabled by default
o configure via import gc o reclaims cycles
PV248 Python
36/301
October 22, 2020
Refcounting Advantages
• simple to implement in a 'managed' language
• reclaims objects quickly
• no need to pause the program
• easily made concurrent
PV248 Python
37/301
October 22, 2020
Refcounting Problems
• significant memory overhead
• problems with cache locality
• bad performance for data shared between threads
• fails to reclaim cyclic structures
PV248 Python
38/301
October 22, 2020
Data Structures
• an abstract description of data
• leaves out low-level details
• makes writing programs easier
• makes reading programs easier, too
PV248 Python 39/301 October 22, 2020
Building Data Structures
• there are two kinds of types in python o built-in, implemented in C
o user-defined (includes libraries)
• both kinds are based on objects
o but built-ins only look that way
PV248 Python
40/301
October 22, 2020
Mutability
• some objects can be modified o we say they are mutable
o otherwise, they are immutable
• immutability is an abstraction
o physical memory is always mutable
• in python, immutability is not recursive'
PV248 Python
41/301
October 22, 2020
Built-in: int
• arbitrary precision integer
o no overflows and other nasty behaviour
• it is an object, i.e. held by reference
o uniform with any other kind of object o immutable
• both of the above make it slow
o machine integers only in C-based modules
PV248 Python
42/301
October 22, 2020
Additional Numeric Objects
• boot: True or False
o how much is True + True? o is 0 true? is empty string?
• numbers.Real: floating point numbers
• numbers. Complex: a pair of above
PV248 Python
43/301
October 22, 2020
Built-in: bytes
• a sequence of bytes (raw data)
• exists for efficiency reasons
o in the abstract is just a tuple
• models data as stored in files
o or incoming through a socket
o or as stored in raw memory
PV248 Python 44/301 October 22, 2020
Properties of bytes
• can be indexed and iterated
o both create objects of type int
o try this sequence: id(x[l]), id(x[2])
• mutable version: bytearray
o the equivalent of C char arrays
PV248 Python
45/301
October 22, 2020
Built-in: str
• immutable Unicode strings o not the same as bytes
o bytes must be decoded to obtain str o (and str encoded to obtain bytes)
• represented as utf-8 sequences in CPython o implemented in PyCompactUnicodeObject
PV248 Python
46/301
October 22, 2020
Built-in: tuple
• an immutable sequence type
o the number of elements is fixed o so is the type of each element
• but elements themselves may be mutable o x = [] then y = (x, 0)
o x.append(l) —>• y == ([l], 0)
• implemented as a C array of object references
PV248 Python
47/301
October 22, 2020
Built-in: list
• a mutable version of tuple
o items can be assigned x[3] = 5 o items can be append-ed
• implemented as a dynamic array
o many operations are amortised O(l) o insert is 0(n)
PV248 Python
48/301
October 22, 2020
Built-in: diet
• implemented as a hash table
• some of the most performance-critical code
o dictionaries appear everywhere in python o heavily hand-tuned C code
• both keys and values are objects
PV248 Python
49/301
October 22, 2020
Hashes and Mutability
• dictionary keys must be hashable
o this implies recursive immutability
• what would happen if a key is mutated? o most likely the hash would change
o all hash tables with the key become invalid o this would be very expensive to fix
PV248 Python
50/301
October 22, 2020
Built-in: set
• implements the math concept of a set
• also a hash table, but with keys only o a separate C implementation
• mutable - items can be added o but they must be hashable o hence cannot be changed
PV248 Python
51/301
October 22, 2020
Built-in: frozenset
• an immutable version of set
• always hashable (since all items must be) o can appear in set or another frozenset o can be used as a key in diet
• the C implementation is shared with set
PV248 Python
52/301
October 22, 2020
Efficient Objects: __slots__
• fixes the attribute names allowed in an object
• saves memory: consider 1-attribute object o with __dict__: 56 +112 bytes
o with __slots__: 48 bytes
• makes code faster: no need to hash anything
o more compact in memory —>■ better cache efficiency
PV248 Python
53/301
October 22, 2020
Part 3: Text, JSON and XML
PV248 Python
54/301
October 22, 2020
Transient Data
• lives in program memory
• data structures, objects
• interpreter state
• often implicit manipulation
• more on this next week
PV248 Python 55/301 October 22, 2020
Persistent Data
• (structured) text or binary files
• relational (SQL) databases
• object and 'flat' databases (NoSQL)
• manipulated explicitly
PV248 Python 56/301 October 22, 2020
Persistent Storage
• 'local' file system
o stored on HDD, SSD,...
o stored somwhere in a local network
• 'remote', using an application-level protocol o local or remote databases
o cloud storage &c.
PV248 Python
57/301
October 22, 2020
Reading Files
• opening files: open(1 file .txt1, 'r')
• files can be iterated
f = open( 1 file.txt1 1 -p 1 1 )
for line in f:
print( line )
PV248 Python 58/301 October 22, 2020
Resource Acquisition
• plain open is prone to resource leaks
o what happens during an exception? o holding a file open is not free
• pythonic solution: with blocks o denned in PEP 343
o binds resources to scopes
PV248 Python
59/301
October 22, 2020
Detour: PEP
• PEP stands for Python Enhancement Proposal
• akin to RFC documents managed by IETF
• initially formalise future changes to Python o later serve as documentation for the same
•
PV248 Python
60/301
October 22, 2020
Using with
with open('/etc/passwd1, 'r') as f:
for line in f:
do_stuff( line )
• still safe if do_stuff raises an exception
PV248 Python 61/301 October 22, 2020
Finalizers
• there is a __del__ method
• but it is not guaranteed to run o it may run arbitrarily late
o or never
• not very good for resource management
PV248 Python
62/301
October 22, 2020
Context Managers
• with has an associated protocol
• you can use with on any context manager
• which is an object with __enter__ and __exit__
• you can create your own
PV248 Python
63/301
October 22, 2020
Part 3.1: Text and Unicode
PV248 Python
64/301
October 22, 2020
Representing Text
• ASCII: one byte = one character
o total of 127 different characters o not very universal
• 8-bit encodings: 255 characters
• multi-byte encodings for non-Latin scripts
PV248 Python
65/301
October 22, 2020
Unicode
• one character encoding to rule them all
• supports all extant scripts and writing systems o and a whole bunch of dead scripts, too
• approx. 143000 code points
• collation, segmentation, comparison,...
PV248 Python
66/301
October 22, 2020
Code Point
• basic unit of encoding characters
• letters, punctuation, symbols
• combining diacritical marks
• not the same thing as a character
• code points range from 1 to 10FFFF
PV248 Python 67/301 October 22, 2020
Unicode Encodings
• deals with representing code points
• UCS = Universal Coded Character Set o fixed-length encoding
o two variants: UCS-2 (16 bit) and UCS-4 (32 bit)
• UTF = Unicode Transformation Format o variable-length encoding
o variants: UTF-8, UTF-16 and UTF-32
PV248 Python
68/301
October 22, 2020
Grapheme
• technically 'extended grapheme cluster'
• a logical character, as expected by users o encoded using 1 or more code points
• multiple encodings of the same grapheme o e.g. composed vs decomposed
o U+0041 U+0300 vs U+0C00: A vs A
PV248 Python
69/301
October 22, 2020
Segmentation
• breaking text into smaller units
o graphemes, words and sentences
• algorithms defined by the Unicode spec o Unicode Standard Annex #29
o graphemes and words are quite reliable
o sentences not so much (too much ambiguity)
PV248 Python
70/301
October 22, 2020
Normal Form
• Unicode defines 4 canonical (normal) forms o NFC, NFD, NFKC, NFKD
o NFC = Normal Form Composed o NFD = Normal Form Decomposed
• K variants = looser, lossy conversion
• all normalization is idempotent
• NFC does not give you 1 code point per grapheme
PV248 Python
71/301
October 22, 2020
str vs bytes
• iterating bytes gives individual bytes
o indexing is fast - fixed-size elements
• iterating str gives code points
o slightly slower, because it uses UTF-8 o does not iterate over graphemes
• going back and forth: str. encode, bytes. decode
PV248 Python
72/301
October 22, 2020
Python vs Unicode
• no native support for Unicode segmentation
o hence no grapheme iteration or word splitting
• convert everything into NFC and hope for the best o unicodedata.normalize()
o will sometimes break (we'll discuss regexes in a bit)
o most people don't bother
o correctness is overrated —>■ worse is better
PV248 Python
73/301
October 22, 2020
Regular Expressions
• compiling: r = re.compile( r"key: (.*)" )
• matching: m = r.match( "key: some value" )
• extracting captures: print( m.group( 1 ) ) o prints some value
• substitutions: s2 = re.sub( r"\s*$", M, si ) o strips all trailing whitespace in si
PV248 Python
74/301
October 22, 2020
Detour: Raw String Literals
• the r in r"..." stands for raw (not regex)
• normally, \ is magical in strings
o but \ is also magical in regexes o nobody wants to write \\s &c. o not to mention \\\\ to match a literal \
• not super useful outside of regexes
PV248 Python
75/301
October 22, 2020
Detour: Other Literal Types
• byte strings: b"abc" —> bytes
• formatted string literals: f"x {y}"
x = 12
print( f"x - {x}" )
• triple-quote literals: xy
PV248 Python 76/301 October 22, 2020
Regular Expressions vs Unicode
import re
s = M\u004l\u0: m" it Á
t = M\u00c0M
print( s, t )
print( re.mate! i( 11..11, s ), re.match( 11..11, t ) )
print( re.mate! i( M\w+$M, s ), re.match( M\w+$M, t ) )
print( re.mate! i( "A", s ), re.match( "A", t ) )
PV248 Python 77/301 October 22, 2020
Regexes and Normal Forms
• some of the problems can be fixed by NFC
o some go away completely (literal Unicode matching) o some become rarer (the and "\w" problems)
• most text in the wild is already in NFC o but not all of it
o case in point: filenames on macOS (NFD)
PV248 Python
78/301
October 22, 2020
Decomposing Strings
• recall that str is immutable
• splitting: str. split(1:1)
o None = split on any whitespace
• split on first delimiter: partition
• better whitespace stripping: s2 = si.strip() o also lstripQ and rstripQ
PV248 Python
79/301
October 22, 2020
Searching and Matching
• startswith and endswith
o often convenient shortcuts
• find = index
o generic substring search
PV248 Python 80/301 October 22, 2020
Building Strings
• format literals and str. format
• str. replace - substring search and replace
• str. j oin - turn lists of strings into a string
PV248 Python
81/301
October 22, 2020
Part 3.2: Structured Text
PV248 Python
82/301
October 22, 2020
JSON
• structured, text-based data format
• atoms: integers, strings, booleans
• objects (dictionaries), arrays (lists)
• widely used around the web &c.
• simple (compared to XML or YAML)
PV248 Python
83/301
October 22, 2020
JSON: Example
composer": [ "Bach, Johann Sebastian" ] key": "g", voices": {
"1": "oboe",
"2": "bassoon"
PV248 Python
84/301
October 22, 2020
JSON: Writing
• printing JSON seems straightforward enough
• but: double quotes in strings
• strings must be properly \-escaped during output
• also pesky commas
• keeping track of indentation for human readability
• better use an existing library: 'import jsonx
PV248 Python
85/301
October 22, 2020
JSON in Python
• j son. dumps = short for dump to string
• python dict/list/str/... data comes in
• a string with valid JSON comes out
Workflow
• just convert everything to diet and list
• runjson.dumps or json.dump( data, file )
PV248 Python
86/301
October 22, 2020
Python Example
d = {}
d[McomposerM] = ["Bach, Johann Sebastian11] d["keyM] = "g"
d[MvoicesM] = { 1: "oboe", 2: "bassoon" } json.dump( d, sys.stdout, indent=4 )
Beware: keys are always strings in JSON
PV248 Python
87/301
October 22, 2020
Parsing JSON
• import json
• j son. load is the counterpart to j son. dump from above o de-serialise data from an open file
o builds lists, dictionaries, etc.
• j son. loads corresponds to j son. dumps
PV248 Python
88/301
October 22, 2020
XML
• meant as a lightweight and consistent redesign of SGML o turned into a very complex format
• heaps of invalid XML floating around
o parsing real-world XML is a nightmare o even valid XML is pretty challenging
PV248 Python
89/301
October 22, 2020
XML: Example
Ellen Adams
123 Maple Street
Lawnmo¥er
1
PV248 Python 90/301 October 22,2020
XML: Another Example
<0BSAH>25 bodů0BSAH> 72873
20160111104208 395879
PV248 Python
91/301
October 22, 2020
XML Features
• offers extensible, rich structure o tags, attributes, entities
o suited for structured hierarchical data
• schemas: use XML to describe XML o allows general-purpose validators o self-documenting to a degree
PV248 Python
92/301
October 22, 2020
XML vs JSON
• both work best with trees
• JSON has basically no features
o basic data structures and that's it
• JSON data is ad-hoc and usually undocumented o but: this often happens with XML anyway
PV248 Python
93/301
October 22, 2020
XML Parsers
• DOM = Document Object Model
• SAX = Simple API for XML
• expat = fast SAX-like parser (but not SAX)
• ElementTree = DOM-like but more pythonic
PV248 Python
94/301
October 22, 2020
XML: DOM
• read the entire XML document into memory
• exposes the AST (Abstract Syntax Tree)
• allows things like XPath and CSS selectors
• the API is somewhat clumsy in Python
PV248 Python
95/301
October 22, 2020
XML: SAX
• event-driven XML parsing
• much more efficient than DOM o but often harder to use
• only useful in Python for huge XML files o otherwise just use ElementTree
PV248 Python
96/301
October 22, 2020
XML: ElementTree
for child in root:
print child.tag, child.attrib
# Order { OrderDate: "1999-10-20" }
• supports tree walking, XPath
• supports serialization too
PV248 Python 97/301 October 22, 2020
Part 4: Databases, SQL
PV248 Python
98/301
October 22, 2020
NoSQL / Non-relational Databases
• umbrella term for a number of approaches o flat key/value and column stores
o document and graph stores
• no or minimal schemas
• non-standard query languages
PV248 Python
99/301
October 22, 2020
Key-Value Stores
• usually very fast and very simple
• completely unstructured values
• keys are often database-global
o workaround: prefixes for namespacin o or: multiple databases
PV248 Python
100/301
October 22, 2020
NoSQL & Python
• redis (redis-py) module (Redis is Key-Value)
• memcached (another Key-Value store)
• PyMongo for talking to MongoDB (document-oriented)
• CouchDB (another document-oriented store)
• neo4j or cayley (module pyley) for graph structures
PV248 Python
101/301
October 22, 2020
SQL and RDBMS
• SQL = Structured Query Language
• RDBMS = Relational DataBase Management System
• SQL is to NoSQL what XML is to JSON
• heavily used and extremely reliable
PV248 Python
102/301
October 22, 2020
SQL: Example
select name, grade from student;
select name from student where grade < 1C1;
insert into student ( name, grade ) values
( 1 Random X. Student1, 1C1 );
select * from student
join enrollment on student.id = enrollment.student join group on group.id = enrollment.group;
PV248 Python
103/301
October 22, 2020
SQL: Relational Data
• JSON and XML are hierarchical
o or built from functions if you like
• SQL is relational
o relations = generalized functions
o can capture more structure
o much harder to efficiently process
PV248 Python
104/301
October 22, 2020
SQL: Data Definition
• mandatory, unlike XML or JSON
• gives the data a rather rigid structure
• tables (relations) and columns (attributes)
• static data types for columns
• additional consistency constraints
PV248 Python
105/301
October 22, 2020
SQL: Constraints
• help ensure consistency of the data
• foreign keys: referential integrity
o ensures there are no dangling references o but: does not prevent accidental misuse
• unique constraints
• check constraints: arbitrary consistency checks
PV248 Python
106/301
October 22, 2020
SQL: Query Planning
• an RDBMS makes heavy use of indexing
o using B trees, hashes and similar techniques o indices are used automatically
• all the heavy lifting is done by the backend o highly-optimized, low-level code
o efficient handling of large data
PV248 Python
107/301
October 22, 2020
SQL: Reliability and Flexibility
• most RDBMS give ACID guarantees
o transparently solves a lot of problems o basically impossible with normal files
• support for schema alterations o alter table and similar
o nearly impossible in ad-hoc systems
PV248 Python
108/301
October 22, 2020
SQLite
• lightweight in-process SQL engine
• the entire database is in a single file
• convenient python module, sqlite3
• stepping stone for a "real" database
PV248 Python
109/301
October 22, 2020
Other Databases
• you can talk to most SQL DBs using python
• postgresql (psycopg2,...)
• mysql / mariadb (mysql-python, mysql-connector,...)
• big & expensive: Oracle (cx_oracle), DB2 (pyDB2)
• most of those are much more reliable than SQLite
PV248 Python
110/301
October 22, 2020
SQL Injection
sql = "SELECT * FROM t WHERE name = 111 + n +
• the above code is bad, never do it
• consider the following
n = "x1 ; drop table students —11
n = "x1; insert into passwd (user, pass) ...
PV248 Python
111/301
October 22, 2020
Avoiding SQL Injection
• use proper SQL-building APIs
o this takes care of escaping internally
• templates like insert ... values (?, ?)
o the ? get safely substituted by the module o e.g. the execute method of a cursor
PV248 Python
112/301
October 22, 2020
PEP 249
• informational PEP, for library writers
• describes how database modules should behave
o ideally, all SQL modules have the same interface o makes it easy to swap a database backend
• but: SQL itself is not 100% portable
PV248 Python
113/301
October 22, 2020
SQL Pitfalls
• sqlite does not enforce all constraints
o you need to pragma foreign_keys = on
• no portable syntax for autoincrement keys
• not all (column) types are supported everywhere
• no portable way to get the key of last insert
PV248 Python
114/301
October 22, 2020
More Resources & Stuff to Look Up
• SQL: https: / / www. w3schools. com/ sql /
• https://docs.python.Org/3/library/sqlite3.html
• Object-Relational Mapping
• SQLAlchemy: constructing portable SQL
PV248 Python
115/301
October 22, 2020
Part 5: Operators, Iterators and Exceptions
PV248 Python
116/301
October 22, 2020
Callable Objects
• user-defined functions (module-level clef)
• user-defined methods (instance and class)
• built-in functions and methods
• class objects
• objects with a __call__ method
PV248 Python
117/301
October 22, 2020
User-defined Functions
• come about from a module-level def
• metadata: __doc__, __name__, __module__
• scope: __globals__, __closure__
• arguments: __defaults__, __kwdefaults__
• type annotations: __annotations__
• the code itself: __code__
PV248 Python
118/301
October 22, 2020
Positional and Keyword Arguments
• user-defined functions have positional arguments
• and keyword arguments
o print("hello", file=sys.stderr)
o arguments are passed by name
o which style is used is up to the caller
• variadic functions: clef foo(*args, **kwargs)
o args is a tuple of unmatched positional args o kwargs is a diet of unmatched keyword args
PV248 Python
119/301
October 22, 2020
Lambdas
• def functions must have a name
• lambdas provide anonymous functions
• the body must be an expression
• syntax: lambda x: print("hello", x)
• standard user-defined functions otherwise
PV248 Python
120/301
October 22, 2020
Instance Methods
• comes about as obj ect. method
o print(x.foo) —>
• combines the class, instance and function itself
• __func__ is a user-defined function object
• let bar = x.foo, then
o x.fooQ —>• bar.__func__(bar.__self__)
PV248 Python
121/301
October 22, 2020
Iterators
• objects with __next__ (since 3.x)
o iteration ends on raise Stoplteration
• iterable objects provide __iter__
o sometimes, this is just return self
o any iterable can appear in for x in iterable
PV248 Python
122/301
October 22, 2020
class Foolter:
def __init__(self):
self.x = 10
def __iter__(self): return self
def __next__(self):
if self.x:
self.x -= 1
else:
raise Stoplteration
return self.x
PV248 Python 123/301 October 22, 2020
Generators (PEP 255)
• written as a normal function or method
• they use yield to generate a sequence
• represented as special callable objects o exist at the C level in CPython
def foo(*lst):
for i in 1st: yield i + 1 list(foo(l, 2)) # prints [2, 3_
PV248 Python
124/301
October 22, 2020
yield from
• calling a generator produces a generator object
• how do we call one generator from another?
• same as for x in foo(): yield x
def bar(*lst):
yield from foo(*lst)
yield from foo(*lst) list(bar(l, 2)) # prints [2, 3, 2, 3_
PV248 Python
125/301
October 22, 2020
Decorators
• written as @decor before a function definition
• decor is a regular function (def decor(f)) o f is bound to the decorated function
o the decorated function becomes the result of decor
• classes can be decorated too
• you can create' decorators at runtime
o @mkdecor("moo") (mkdecor returns the decorator) o you can stack decorators
PV248 Python
126/301
October 22, 2020
def decor(f):
return lambda: print(MbarM)
def mkdecor(s):
return lambda g: lambda: print(s)
(ödecor
def foo(f): print("fooM)
(3mkdecor(MmooM)
def moo(f): print(MfooM)
tt foo() prints "bar", moo() prints "moo"
PV248 Python 127/301 October 22, 2020
List Comprehension
• a concise way to build lists
• combines a filter and a map
[ 2 * x for x in range(l0) ]
[ x for x in range(l0) if x % 2 == 1 ]
[ 2 * x for x in range(l0) if x % 2 == 1 ]
[ (x, y) for x in range(3) for y in range(2) ]
PV248 Python
128/301
October 22, 2020
Operators
• operators are (mostly) syntactic sugar
• x < y rewrites to x.__lt__(y)
• is and is not are special
o are the operands the same object?
o also the ternary (conditional) operator
PV248 Python 129/301 October 22, 2020
Non-Operator Builtins
• len(x) —>• x.__len__() (length)
• abs(x)—> x.__abs__() (magnitude)
• str(x) —>• x.__str__() (printing)
• repr(x) —>■ x.__repr__() (printing for eval)
• bool(x) and if x: x.__bool__()
PV248 Python
130/301
October 22, 2020
Arithmetic
• a standard selection of operators
• / is floating point, //is integral
• += and similar are somewhat magical
o x += y—>x = x.__iadd__(y) if defined o otherwisex = x.__add__(y)
PV248 Python
131/301
October 22, 2020
x - 7 # an int is immutable
x += 3 # works, x = 10, id(x) changes
1st = [7, 3]
lst[0] +=3 # works too, id(lst) stays same
tup = (7, 3) # a tuple is immutable
tup += (1, 1) # still works (id changes)
tup[0] +=3 # fails
PV248 Python
132/301
October 22, 2020
Relational Operators
• operands can be of different types
• equality: ! =, ==
o by default uses object identity
• ordering: <, <=, >, >= (TypeError by default)
• consistency is not enforced
PV248 Python
133/301
October 22, 2020
Relational Consistency
• __eq__ must be an equivalence relation
• x. __ne__ (y) must be the same as not x. __eq__ (y)
• __lt__ must be an ordering relation o compatible with __eq__
o consistent with each other
• each operator is separate (mixins can help) o or perhaps a class decorator
PV248 Python
134/301
October 22, 2020
Collection Operators
• in is also a membership operator (outside for) o implemented as __contains__
• indexing and slicing operators o del x[y] —>• x.__delitem__(y) o x[y] —> x.__getitem__(y)
o x[y] = z —>• x.__setitem__(y, z)
PV248 Python
135/301
October 22, 2020
Conditional Operator
• also known as a ternary operator
• written x if cond else y
o in C: cond ? x : y
• forms an expression, unlike if
o can e.g. appear in a lambda
o or in function arguments, &c.
PV248 Python 136/301 October 22, 2020
Exceptions
• an exception interrupts normal control flow
• it's called an exception because it is exceptional o never mind Stop Iteration
• causes methods to be interrupted
o until a matching except block is found o also known as stack unwinding
PV248 Python
137/301
October 22, 2020
Life Without Exceptions
int fd = socket( ... );
if ( fd < 0 )
... /* handle errors */
if ( bind( fd, ... ) < 0 )
... /* handle errors */
if ( listen( fd, 5 ) < 0 )
... /* handle errors */
PV248 Python 138/301 October 22, 2020
With Exceptions
try:
sock = socket.sock et( ... )
sock.bind( ... )
sock.listen( ... )
except ...:
# handle errors
PV248 Python 139/301 October 22, 2020
Exceptions vs Resources
x = open( "file.txt" )
# stuff
raise SomeError
• who calls x.close()
• this would be a resource leak
PV248 Python 140/301 October 22, 2020
Using finally
try:
x = open( "file.txt" )
# stuff
finally:
x.close()
• works, but tedious and error-prone
PV248 Python 141/301 October 22, 2020
Using with
with open( "file.txt" ) as f: # stuff
• with takes care of the finally and close
• with x as ysetsy = x.__enter__()
o and calls x. __exit__(...) when leaving the block
PV248 Python
142/301
October 22, 2020
The ^property decorator
• attribute syntax is the preferred one in Python
• writing useless setters and getters is boring
class Foo: ^property
def x(self): return 2 * self.a x.setter
def x(self, v): self.a = v // 2
PV248 Python
143/301
October 22, 2020
Part 6: Closures, Coroutines, Concurrency
PV248 Python
144/301
October 22, 2020
Concurrency & Parallelism
• threading - thread-based parallelism
• multiprocessing
• concurrent - future-based programming
• subprocess
• sched, a general-purpose event scheduler
• queue, for sending objects between threads
PV248 Python
145/301
October 22, 2020
Threading
• low-level thread support, module threading
• Thread objects represent actual threads o threads provide start() and join()
o the run() method executes in a new thread
• mutexes, semaphores &c.
PV248 Python
146/301
October 22, 2020
The Global Interpreter Lock
• memory management in CPython is not thread-safe o Python code runs under a global lock
o pure Python code cannot use multiple cores
• C code usually runs without the lock o this includes numpy crunching
PV248 Python
147/301
October 22, 2020
Multiprocessing
• like threading but uses processes
• works around the GIL
o each worker process has its own interpreter
• queued/sent objects must be pickled o see also: the pickle module
o this causes substantial overhead
o functions, classes &c. are pickled by name
PV248 Python
148/301
October 22, 2020
Futures
• like coroutine await but for subroutines
• a Future can be waited for using f. result()
• scheduled via concurrent. futures. Executor o Executor. map is like asyncio. gather
o Executor. submit is like asyncio. create_task
• implemented using process or thread pools
PV248 Python
149/301
October 22, 2020
Native Coroutines (PEP 492)
• created using async def (since Python 3.5)
• generalisation of generators
o yield from is replaced with await
o an __await__ magic method is required
• a coroutine can be suspended and resumed
PV248 Python
150/301
October 22, 2020
Coroutine Scheduling
• coroutines need a scheduler
• one is available from asyncio. get_event.loop()
• along with many coroutine building blocks
• coroutines can actually run in parallel o via asyncio. create_task (since 3.7)
o via asyncio. gather
PV248 Python
151/301
October 22, 2020
Async Generators (PEP 525)
• async clef + yield
• semantics like simple generators
• but also allows await
• iterated with async for
o async for runs sequentially
PV248 Python 152/301 October 22, 2020
Execution Stack
• made up of activation frames
• holds local variables
• and return addresses
• in dynamic languages, often lives in the heap
PV248 Python
153/301
October 22, 2020
Variable Capture
• variables are captured lexically
• definitions are a dynamic / run-time construct o a nested definition is executed
o creates a closure object
• always by reference in Python
o but can be by-value in other languages
PV248 Python
154/301
October 22, 2020
Using Closures
• closures can be returned, stored and called o they can be called multiple times, too
o they can capture arbitrary variables
• closures naturally retain state
• this is what makes them powerful
PV248 Python
155/301
October 22, 2020
Objects from Closures
• so closures are essentially code + state
• wait, isn't that what an object is?
• indeed, you can implement objects using closures
PV248 Python
156/301
October 22, 2020
The Role of GC
• memory management becomes a lot more complicated
• forget C-style 'automatic' stack variables
• this is why the stack is actually in the heap
• this can go as far as form reference cycles
PV248 Python
157/301
October 22, 2020
Coroutines
• coroutines are a generalisation of subroutines
• they can be suspended and re-entered
• coroutines can be closures at the same time
• the code of a coroutine is like a function
• a suspended coroutine is like an activation frame
PV248 Python
158/301
October 22, 2020
Yield
• suspends execution and returns' a value
• may also obtain a new value (cf. send)
• when re-entered, continue where we left off
for i in range(5): yield i
PV248 Python
159/301
October 22, 2020
Send
• with yield, we have one-way communication
• but in many cases, we would like two-way
• a suspended coroutine is an object in Python o with a send method which takes a value
o send re-enters the coroutine
PV248 Python
160/301
October 22, 2020
Yield From and Await
• yield from is mostly a generator concept
• await basically does the same thing o call out to another coroutine
o when it suspends, so does the entire stack
PV248 Python
161/301
October 22, 2020
Suspending Native Coroutines
• this is not actually possible
o not with async-native syntax anyway
• you need a yield
o for that, you need a generator o use the types. coroutine decorator
PV248 Python
162/301
October 22, 2020
Event Loop
• not required in theory
• useful also without coroutines
• there is a synergistic effect
o event loops make coroutines easier
o coroutines make event loops easier
PV248 Python 163/301 October 22, 2020
Part 7: Communication & HTTP with
asyncio
PV248 Python
164/301
October 22, 2020
Running Programs (the old way)
• os. system is about the simplest
o also somewhat dangerous - shell injection o you only get the exit code
• os .popen allows you to read output of a program
o alternatively, you can send input to the program o you can't do both (would likely deadlock anyway) o runs the command through a shell, same as os. system
PV248 Python
165/301
October 22, 2020
Low-level Process API
• POSIX-inherited interfaces (on POSIX systems)
• os. exec: replace the current process
• os. fork: split the current process in two
• os. f orkpty: same but with a PTY
PV248 Python
166/301
October 22, 2020
Detour: bytes vs str
• strings (class str) represent text
o that is, a sequence of Unicode points
• files and network connections handle data o represented in Python as bytes
• the bytes constructor can convert from str o e.g. b = bytes("hello", "utf8")
PV248 Python
167/301
October 22, 2020
Running Programs (the new way)
• you can use the subprocess module
• subprocess can handle bidirectional 10
o it also takes care of avoiding 10 deadlocks o set input to feed data to the subprocess
• internally run uses a Popen object
o if run can't do it, Popen probably can
PV248 Python
168/301
October 22, 2020
Getting subprocess Output
• available via run since Python 3.7
• the run function returns a CompletedProcess
• it has attributes stdout and stderr
• both are bytes (byte sequences) by default
• or str if text or encoding were set
• available if you enabled capture .output
PV248 Python
169/301
October 22, 2020
Running Filters with Popen
• if you are stuck with 3.6, use Popen directly
• set stdin in the constructor to PIPE
• use the communicate method to send the input
• this gives you the outputs (as bytes)
PV248 Python
170/301
October 22, 2020
import subprocess
from subprocess import PIPE
input = bytes( Mx\na\nb\nyM, Mutf8M)
p = subprocess.Popen([MsortM], stdin=PIPE,
stdout=PIPE)
out = p.communicate(input=input)
# out[0] is the stdout, out[l] is None
PV248 Python 171/301 October 22, 2020
Subprocesses with asyncio
• import asyncio.subprocess
• create_subprocess_exec, like subprocess.run o but it returns a Process instance
o Process has a communicate async method
• can run things in background (via tasks) o also multiple processes at once
PV248 Python
172/301
October 22, 2020
Protocol-based asyncio subprocesses
• let loop be an implementation of the asyncio event loop
• there's subprocess_exec and subprocess_shell o sets up pipes by default
• integrates into the asyncio transport layer (see later)
• allows you to obtain the data piece-wise
• https://docs.python.Org/3/library/asyncio-protocol.html
PV248 Python
173/301
October 22, 2020
Sockets
• the socket API comes from early BSD Unix
• socket represents a (possible) network connection
• sockets are more complicated than normal files o establishing connections is hard
o messages get lost much more often than file data
PV248 Python
174/301
October 22, 2020
Socket Types
• sockets can be internet or unix domain
o internet sockets connect to other computers o Unix sockets live in the filesystem
• sockets can be stream or datagram o stream sockets are like files (TCP)
o you can write a continuous stream of data
o datagram sockets can send individual messages (UDP)
PV248 Python
175/301
October 22, 2020
Sockets in Python
• the socket module is available on all major OSes
• it has a nice object-oriented API
o failures are propagated as exceptions o buffer management is automatic
• useful if you need to do low-level networking o hard to use in non-blocking mode
PV248 Python
176/301
October 22, 2020
Sockets and asyncio
• asyncio provides sock_* to work with socket objects
• this makes work with non-blocking sockets a lot easier
• but your program needs to be written in async style
• only use sockets when there is no other choice
o asyncio protocols are both faster and easier to use
PV248 Python
177/301
October 22, 2020
Hyper-Text Transfer Protocol
• originally a simple text-based, stateless protocol
• however
o SSL/TLS, cryptography (https) o pipelining (somewhat stateful) o cookies (somewhat stateful in a different way)
• typically between client and a front-end server
• but also as a back-end protocol (web server to app server)
PV248 Python
178/301
October 22, 2020
Request Anatomy
• request type (see below)
• header (text-based, like e-mail)
• content
Request Types
• GET - asks the server to send a resource
• HEAD - like GET but only send back headers
• POST - send data to the server
PV248 Python
179/301
October 22, 2020
Python and HTTP
• both client and server functionality
o import http.client
o import http.server
• TLS/SSL wrappers are also available
o import ssl
• synchronous by default
PV248 Python 180/301 October 22, 2020
Serving Requests
• derive from BaseHTTPRequestHandler
• implement a do_GET method
• this gets called whenever the client does a GET
• also available: do_HEAD, do_P0ST, etc.
• pass the class (not an instance) to HTTPServer
PV248 Python
181/301
October 22, 2020
Serving Requests (cont'd)
• HTTPServer creates a new instance of your Handler
• the BaseHTTPRequestHandler machinery runs
• it calls your do_GET etc. method
• request data is available in instance variables o self.path, self.headers
PV248 Python
182/301
October 22, 2020
Talking to the Client
• HTTP responses start with a response code o self.send_response( 200, 'OK' )
• the headers follow (set at least Content-Type)
o self.send_header( 'Connection1, 'close' )
• headers and the content need to be separated o self.end_headers()
• finally, send the content by writing to self. wf ile
PV248 Python
183/301
October 22, 2020
Sending Content
• self .wfile is an open file
• it has a write() method which you can use
• sockets only accept byte sequences, not str
• use the bytes( string, encoding ) constructor o match the encoding to your Content-Type
PV248 Python
184/301
October 22, 2020
HTTP and asyncio
• the base asyncio currently doesn't directly support HTTP
• but: you can get aiohttp from PyPI
• contains a very nice web server o from aiohttp import web
o minimum boilerplate, fully asyncio-ready
PV248 Python
185/301
October 22, 2020
Aside: The Python Package Index
• colloquially known as PyPI (or cheese shop)
o do not confuse with PyPy (Python in almost-Python)
• both source packages and binaries
o the latter known as wheels (PEP 427, 491) o previously python eggs
•
PV248 Python
186/301
October 22, 2020
SSL and TLS
• you want to use the ssl module for handling HTTPS o this is especially true server-side
o aiohttp and http. server are compatible
• you need to deal with certificates (loading, checking)
• this is a rather important but complex topic
PV248 Python
187/301
October 22, 2020
Certificate Basics
• certificate is a cryptographically signed statement o it ties a server to a certain public key
o the client ensures the server knows the private key
• the server loads the certificate and its private key
• the client must validate the certificate
o this is typically a lot harder to get right
PV248 Python
188/301
October 22, 2020
SSL in Python
• start with import ssl
• almost everything happens in the SSLContext class
• get an instance from ssl. create_default_context()
o you can use wrap_socket to run an SSL handshake o you can pass the context to aiohttp
• if httpd is a http.server.HTTPServer:
httpd.socket = ssl.wrap_socket( httpd.socket, ... )
PV248 Python 189/301 October 22,2020
HTTP Clients
• there's a very basic http. client
• for a more complete library, use urllib. request
• aiohttp has client functionality
• all of the above can be used with ssl
• another 3rd party module: Python Requests
PV248 Python
190/301
October 22, 2020
Part 8: Low-level asyncio
PV248 Python
191/301
October 22, 2020
10 at the OS Level
• often defaults to blocking
o read returns when data is available o this is usually OK for files
• but what about network code? o could work for a client
PV248 Python
192/301
October 22, 2020
Threads and 10
• there may be work to do while waiting o waiting for 10 can be wasteful
• only the calling (OS) thread is blocked o another thread may do the work
o but multiple green threads may be blocked
PV248 Python
193/301
October 22, 2020
Non-Blocking 10
• the program calls read
o read returns immediately
o even if there was no data
• but how do we know when to read?
o we could poll
o for example call read every 30ms
PV248 Python 194/301 October 22, 2020
Polling
• trade-off between latency and throughput o sometimes, polling is okay
o but is often too inefficient
• alternative: 10 dispatch
o useful when multiple IOs are pending o wait only if all are blocked
PV248 Python
195/301
October 22, 2020
select
• takes a list of file descriptors
• block until one of them is ready
o next read will return data immediately
• can optionally specify a timeout
• only useful for OS-level resources
PV248 Python
196/301
October 22, 2020
Alternatives to select
• select is a rather old interface
• there is a number of more modern variants
• poll and epoll system calls
o despite the name, they do not poll o epoll is more scalable
• kqueue and kevent on BSD systems
PV248 Python
197/301
October 22, 2020
Synchronous vs Asynchronous
• the select family is synchronous
o you call the function
o it may wait some time
o you proceed when it returns
• OS threads are fully asynchronous
PV248 Python 198/301 October 22, 2020
The Thorny Issue of Disks
• a file is always ready' for reading
• this may still take time to complete
• there is no good solution on UNIX
• POSIX AIO exists but is sparsely supported
• OS threads are an option
PV248 Python
199/301
October 22, 2020
10 on Windows
• select is possible (but slow)
• Windows provides real asynchronous 10 o quite different from UNIX
o the 10 operation is directly issued o but the function returns immediately
• comes with a notification queue
PV248 Python
200/301
October 22, 2020
The asyncio Event Loop
• uses the select family of syscalls
• why is it called async 10?
o select is synchronous in principle
o this is an implementation detail
o the IOs are asynchronous to each other
PV248 Python
201/301
October 22, 2020
How Does It Work
• you must use asyncio functions for 10
• an async read does not issue an OS read
• it yields back into the event loop
• the fd is put on the select list
• the coroutine is resumed when the fd is ready
PV248 Python
202/301
October 22, 2020
Timers
asyncio allows you to set timers the event loop keeps a list of those and uses that to set the select timeout o just uses the nearest timer expiry when a timer expires, its owner is resumed
PV248 Python
203/301
October 22, 2020
Blocking 10 vs asyncio
• all user code runs on the main thread
• you must not call any blocking 10 functions
• doing so will stall the entire application o in a server, clients will time out
o even if not, latency will suffer
PV248 Python
204/301
October 22, 2020
DNS
• POSIX: getaddrinfo and getnameinfo o also the older API gethostbyname
• those are all blocking functions o and they can take a while
o but name resolution is essential
• asyncio internally uses OS threads for DNS
PV248 Python
205/301
October 22, 2020
Signals
• signals on UNIX are very asynchronous
• interact with OS threads in a messy way
• asyncio hides all this using C code
PV248 Python
206/301
October 22, 2020
Native Coroutines (Reminder)
• delared using async def
async def foo():
await asyncio.sleep( 1 )
• calling foo() returns a suspended coroutine
• which you can await
o or turn it into an asyncio. Task
PV248 Python
207/301
October 22, 2020
Tasks
• asyncio. Task is a nice wrapper around coroutines o create with asyncio. create_task()
• can be stopped prematurely using cancel ()
• has an API for asking things:
o done() tells you if the coroutine has finished o resultQ gives you the result
PV248 Python
208/301
October 22, 2020
Tasks and Exceptions
• what if a coroutine raises an exception?
• calling result will re-raise it
o i.e. it continues propagating from result()
• you can also ask directly using exception()
o returns None if the coroutine ended normally
PV248 Python
209/301
October 22, 2020
Asynchronous Context Managers
• normally, we use with for resource acquisition
o this internally uses the context manager protocol
• but sometimes you need to wait for a resource o __enter__() is a subroutine and would block o this won't work in async-enabled code
• we need __enter__() to be itself a coroutine
PV248 Python
210/301
October 22, 2020
async with
• just like wait but uses __aenter__(), __aexit__() o those are async def
• the async with behaves like an await
o it will suspend if the context manager does
o the coroutine which owns the resource can continue
• mainly used for locks and semaphores
PV248 Python
211/301
October 22, 2020
Part 9: Python Pitfalls
PV248 Python
212/301
October 22, 2020
Mixing Languages
• for many people, Python is not a first language
• some things look similar in Python and Java (C++,...) o sometimes they do the same thing
o sometimes they do something very different o sometimes the difference is subtle
PV248 Python
213/301
October 22, 2020
Python vs Java: Decorators
• Java has a thing called annotations
• looks very much like a Python decorator
• in Python, decorators can drastically change meaning
• in Java, they are just passive metadata
o other code can use them for meta-programming though
PV248 Python
214/301
October 22, 2020
Class Body Variables
class Foo: some_attr = 42
• in Java/C++, this is how you create instance variables
• in Python, this creates class attributes
o i.e. what C++/Java would call static attributes
PV248 Python
215/301
October 22, 2020
Very Late Errors
if a == 2:
priiiint(Ma is not 2M)
• no error when loading this into python
• it even works as long as a ! = 2
• most languages would tell you much earlier
PV248 Python
216/301
October 22, 2020
Very Late Errors (cont'd)
try:
foo()
except TyyyypeError print(Mmy mistake")
• does not even complain when running the code
• you only notice when foo() raises an exception
PV248 Python
217/301
October 22, 2020
Late Imports
if a == 2: import foo foo.say_hello()
• unless a == 2, mymod is not loaded
• any syntax errors don't show up until a == 2 o it may even fail to exist
PV248 Python
218/301
October 22, 2020
Block Scope
for i in range(lß): pass print(i) # not a NameError
• in Python, local variables are function-scoped
• in other languages, i is confined to the loop
PV248 Python
219/301
October 22, 2020
Assignment Pitfalls
x = [ 1, 2 ] y = x
x.append( 3 )
print( y ) # prints [ 1, 2, 3 ]
• in Python, everything is a reference
• assignment does not make copies
PV248 Python
220/301
October 22, 2020
Equality of Iterables
1] == [0, 1]
range
== range
list(range(2JJ _0, l] == range
True (obviously) ) —>• True 3, 1] —>• True ->• False
PV248 Python
221/301
October 22, 2020
Equality of bool
if 0: prirr if 1: prin False == 0 True == 1 -
0 is False
1 is True -
"yes" "yes"
> True True
> False False
nothirij yes
PV248 Python
222/301
October 22, 2020
Equality of bool (cont'd)
• if 2: print( "yes" ) —>■ yes
• True == 2 —>■ False
• False == 2 —>■ False
• if 11: print( "yes" ) —>■ nothing
• if 'x': print( "yes" ) —>• yes
• 11 == False —>• False
• 'x' == True —>• False
PV248 Python 223/301 October 22,2020
Mutable Default Arguments
def foo( x = [] ):
x.append( 7 )
return x
foo() # [ 7 ]
foo() It [ 7, 7 ]... wait, what?
PV248 Python 224/301 October 22, 2020
Late Lexical Capture
f = [ lambda x : i * x for i in range( 5 ) ]
f[ 4 ]( 3 ) # 12
f[ 0 ]( 3 ) # 12 ... ?!
g = [ lambda x, i = i: i * x for i in range( 5 ) ]
g[ 4 ]( 3 ) # 12
g[ 0 ]( 3 ) # 0 ... fml
h = [ ( lambda x : i * x )( 3 ) for i in range( 5 ) ] h # [ 0, 3, 6, 12] ... i kid you not
PV248 Python
225/301
October 22, 2020
Dictionary Iteration Order
• in python <= 3.6
o small dictionaries iterate in insertion order o big dictionaries iterate in 'random' order
• in python 3.7
o all in insertion order, but not documented
• in python >= 3.8
o guaranteed to iterate in insertion order
PV248 Python
226/301
October 22, 2020
List Multiplication
x = [ [ 1 ] * 2 ] * 3
print( x ) # [ [ 1, 1 ], [ 1, 1 ], [ 1, 1 ] ] x[ 0 ][ 0 ] = 2
print( x ) # [ [ 2, 1 ], [ 2, 1 ], [ 2, 1 ] ]
PV248 Python
227/301
October 22, 2020
Forgotten Await
import asyncio async def foo():
print( "hello" ) async def main():
foo()
asyncio.run( main() ) • gives warning coroutine 1 f oo1 was never awaited
PV248 Python
228/301
October 22, 2020
Python vs Java: Closures
• captured variables are final in Java
• but they are mutable in Python
o and of course captured by reference
• they are whatever you tell them to be in C++
PV248 Python
229/301
October 22, 2020
Explicit super ()
• Java and C++ automatically call parent constructors
• Python does not
• you have to call them yourself
PV248 Python
230/301
October 22, 2020
Setters and Getters
obj.attr obj.attr = 4
• in C++ or Java, this is an assignment
• in Python, it can run arbitrary code
o this often makes getters/setters redundant
PV248 Python
231/301
October 22, 2020
Part 10: Testing, Profilin
PV248 Python
232/301
October 22, 2020
Why Testing
• reading programs is hard
• reasoning about programs is even harder
• testing is comparatively easy
• difference between an example and a proof
PV248 Python
233/301
October 22, 2020
What is Testing
• based on trial runs
• the program is executed with some inputs
• the outputs or outcomes are checked
• almost always incomplete
PV248 Python
234/301
October 22, 2020
Testing Levels
• unit testing
o individual classes o individual functions
• functional o system
o integration
PV248 Python 235/301 October 22,2020
Testing Automation
• manual testing
o still widely used
o requires human
• semi-automated
o requires human assistance
• fully automated
o can run unattended
PV248 Python 236/301 October 22, 2020
Testing Insight
• what does the test or tester know?
• black box: nothing known about internals
• gray box: limited knowledge
• white box: complete' knowledge
PV248 Python
237/301
October 22, 2020
Why Unit Testing?
• allows testing small pieces of code
• the unit is likely to be used in other code
o make sure your code works before you use it o the less code, the easier it is to debug
• especially easier to hit all the corner cases
PV248 Python
238/301
October 22, 2020
Unit Tests with unittest
• from unittest import TestCase
• derive your test class from TestCase
• put test code into methods named test_*
• run with python -m unittest program.py o add -v for more verbose output
PV248 Python
239/301
October 22, 2020
from unittest import TestCase
class TestArith(TestCase):
def test_add(self):
self.assertEqual(l, 4-3)
def test_leq(self):
self.assertTrue(3 <= 2 * 3)
PV248 Python 240/301 October 22, 2020
Unit Tests with pytest
• a more pythonic alternative to unittest o unittest is derived from JUnit
• easier to use and less boilerplate
• you can use native python assert
• easier to run, too
o just run pytest in your source repository
PV248 Python
241/301
October 22, 2020
Test Auto-Discovery in pytest
• pytest finds your testcases for you o no need to register anything
• put your tests in test_.py or _test.py
• name your testcases (functions) test_*
PV248 Python
242/301
October 22, 2020
Fixtures in pytest
• sometimes you need the same thing in many testcases
• in unittest, you have the test class
• pytest passes fixtures as parameters o fixtures are created by a decorator
o they are matched based on their names
PV248 Python
243/301
October 22, 2020
import pytest import smtplib
(Spy test, fixture
def smtp_connection():
return smtplib.SMTP(11 smtp.gmail.com", 587)
def test_ehlo(smtp_connection):
response, msg = smtp_connection.ehlo() assert response == 250
PV248 Python
244/301
October 22, 2020
Property Testing
• writing test inputs is tedious
• sometimes, we can generate them instead
• useful for general properties like
o idempotency (e.g. serialize + deserialize) o invariants (output is sorted,...) o code does not cause exceptions
PV248 Python
245/301
October 22, 2020
Using hypothesis
• property-based testing for Python
• has strategies to generate basic data types o int, str, diet, list, set,...
• compose built-in generators to get custom types
• integrated with pytest
PV248 Python
246/301
October 22, 2020
import hypothesis
import hypothesis.strategies as s
(^hypothesis. given(s. lists (s. integers ())) def test_sorted(x):
assert sorted(x) == x # should fail
(^hypothesis. given(x=s. integers(), y=s. integers()) def test_cancel(x, y):
assert (x + y) - y == x # looks okay
PV248 Python
247/301
October 22, 2020
Going Quick and Dirty
• goal: minimize time spent on testing
• manual testing usually loses
o but it has almost 0 initial investment
• if you can write a test in 5 minutes, do it
• useful for testing small scripts
PV248 Python
248/301
October 22, 2020
Shell 101
• shell scripts are very easy to write
• they are ideal for testing 10 behaviour
• easily check for exit status: set -e
• see what is going on: set -x
• use dif f -u to check expected vs actual output
PV248 Python
249/301
October 22, 2020
Shell Test Example
set -ex
python script.py < testl.in | tee out cliff -u testl.out out python script.py < test2.in | tee out cliff -u test2.out out
PV248 Python
250/301
October 22, 2020
Continuous Integration
• automated tests need to be executed
• with many tests, this gets tedious to do by hand
• CI builds and tests your project regularly o every time you push some commits
o every night (e.g. more extensive tests)
PV248 Python
251/301
October 22, 2020
CI: Travis
• runs in the cloud (CI as a service)
• trivially integrates with pytest
• virtualenv out of the box for python projects
• integrated with github
• configure in . travis .yml in your repo
PV248 Python
252/301
October 22, 2020
CI: GitLab
• GitLab has its own CI solution (similar to travis)
• also available at FI
• runs tests when you push to your gitlab
• drop a . gitlab-ci .yml in your repository
• automatic deployment into heroku &c.
PV248 Python 253/301 October 22, 2020
CI: Buildbot
• written in python/twisted
o basically a framework to build a custom CI tool
• self-hosted and somewhat complicated to set up o more suited for complex projects
o much more flexible than most CI tools
• distributed design
PV248 Python
254/301
October 22, 2020
CI: Jenkins
• another self-hosted solution, this time in Java o widely used and well supported
• native support for python projects (including pytest) o provides a dashboard with test result graphs &c.
o supports publishing sphinx-generated documentation
PV248 Python
255/301
October 22, 2020
Print-based Debugging
• no need to be ashamed, everybody does it
• less painful in interpreted languages
• you can also use decorators for tracing
• never forget to clean your program up again
PV248 Python
256/301
October 22, 2020
def debug(e):
f = sys._getframe(l)
v = eval(e, f f_globals, f.f.locals)
1 = f.f_code.co_filename + ':'
1 += str(f.f_lineno) + ':'
print(l, e, 1=1, repr(v), file=sys.stdG ^rr)
x = 1
debug('x +1')
PV248 Python 257/301 October 22, 2020
The Python Debugger
• run as python -m pdb program.py
• there's a built-in help command
• next steps through the program
• break to set a breakpoint
• cont to run until end or a breakpoint
PV248 Python
258/301
October 22, 2020
What is Profilin
• measurement of resource consumption
• essential info for optimising programs
• answers questions about bottlenecks
o where is my program spending most time? o less often: how is memory used in the program
PV248 Python
259/301
October 22, 2020
Why Profiling
• 'blind' optimisation is often misdirected
o it is like fixing bugs without triggering them o program performance is hard to reason about
• tells you exactly which point is too slow
o allows for best speedup with least work
PV248 Python
260/301
October 22, 2020
Profiling in Python
• provided as a library cProfile
o alternative: profile is slower, but more flexible
• run as python -m cProfile program.py
• outputs a list of lines/functions and their cost
• use cProfile. mn() to profile a single expression
PV248 Python
261/301
October 22, 2020
tt python -m cProfile -s time fib.py
ncalls tottime percall file:line(function)
13638/2 0.032 0.016 fib.py:l(fib_rec)
2 0.000 0.000 {builtins.print}
2 0.000 0.000 fib.py:5(fib_mem)
PV248 Python
262/301
October 22, 2020
Part 11: Linear Algebra & Symbolic Math
PV248 Python
263/301
October 22, 2020
Numbers in Python
• recall that numbers are objects
• a tuple of real numbers has 300% overhead o compared to a C array of float values
o and 350% for integers
• this causes extremely poor cache use
• integers are arbitrary-precision
PV248 Python
264/301
October 22, 2020
Math in Python
• numeric data usually means arrays o this is inefficient in python
• we need a module written in C
o but we don't want to do that ourselves
• enter the SciPy project
o pre-made numeric and scientific packages
PV248 Python
265/301
October 22, 2020
The SciPy Family
• numpy: data types, linear algebra
• scipy: more computational machinery
• pandas: data analysis and statistics
• matplotlib: plotting and graphing
• sympy: symbolic mathematics
PV248 Python
266/301
October 22, 2020
Aside: External Libraries
• until now, we only used bundled packages
• for math, we will need external libraries
• you can use pip to install those
o use pip install —user
PV248 Python
267/301
October 22, 2020
Aside: Installing numpy
• the easiest way may be with pip o this would be pip3 on aisa
• linux distributions usually also have packages
• another option is getting the Anaconda bundle
• detailed instructions on https: //scipy. org
PV248 Python
268/301
October 22, 2020
Arrays in numpy
• compact, C-implemented data types
• flexible multi-dimensional arrays
• easy and efficient re-shaping
o typically without copying the data
PV248 Python 269/301 October 22, 2020
Entering Data
• most data is stored in numpy. array
• can be constructed from a list o a list of lists for 2D arrays
• or directly loaded from / stored to a file o binary: numpy. load, numpy. save
o text: numpy. loadtxt, numpy. savetxt
PV248 Python
270/301
October 22, 2020
LAPACK and BLAS
• BLAS is a low-level vector/matrix package
• LAPACK is built on top of BLAS
o provides higher-level operations
o tuned for modern CPUs with multiple caches
• both are written in Fortran
o ATLAS and C-LAPACK are C implementations
PV248 Python
271/301
October 22, 2020
Element-wise Functions
• the basic math function arsenal
• powers, roots, exponentials, logarithms
• trigonometric (sin, cos, tan,...)
• hyperbolic (sinh, cosh, tanh,...)
• cyclometric (arcsin, arccos, arctan,...)
PV248 Python
272/301
October 22, 2020
Matrix Operations in numpy
• import nimpy.linalg
• multiplication, inversion, rank
• eigenvalues and eigenvectors
• linear equation solver
• pseudo-inverses, linear least squares
PV248 Python 273/301 October 22, 2020
Additional Linear Algebra in scipy
• import scipy.linalg
• LU, QR, polar, etc. decomposition
• matrix exponentials and logarithms
• matrix equation solvers
• special operations for banded matrices
PV248 Python
274/301
October 22, 2020
Where is my Gaussian Elimination?
• used in lots of school linear algebra
• but not the most efficient algorithm
• a few problems with numerical stability
• not directly available in numpy
PV248 Python
275/301
October 22, 2020
Numeric Stability
• floats are imprecise / approximate • multiplication is not associative • iteration amplifies the errors
0.1**2 ==0.01 # False 1 / ( 0.1**2 - 0.01 ) # 5.8-1017
a = (0.1 * 0.1) * 10 b = 0.1 * (0.1 * 10) 1 / ( a - b ) # 7.21-1016
PV248 Python 276/301 October 22, 2020
LU Decomposition
• decompose matrix A into simpler factors
• PA = LU where
o Pis a permutation matrix o L is a lower triangular matrix o U is an upper triangular matrix
• fast and numerically stable
PV248 Python
277/301
October 22, 2020
Uses for LU
• equations, determinant, inversion,...
• e.g. de^A) = de^P"1) • det(L) • det([/)
o where det([/) = o and dei(L) = n h%
PV248 Python
278/301
October 22, 2020
Numeric Math
• float arithmetic is messy but incredibly fast
• measured data is approximate anyway
• stable algorithms exist for many things o and are available from libraries
• we often don't care about exactness
o think computer graphics, signal analysis,...
PV248 Python
279/301
October 22, 2020
Symbolic Math
• numeric math sucks for 'textbook' math
• there are problems where exactness matters o pure math and theoretical physics
• incredibly slow computation
o but much cleaner interpretation
PV248 Python
280/301
October 22, 2020
Linear Algebra in sympy
• uses exact math
o e.g. arbitrary precision rationals o and roots thereof
o and many other computable numbers
• wide repertoire of functions
o including LU, QR, etc. decompositions
PV248 Python
281/301
October 22, 2020
Exact Rationais in sympy
from sympy import *
a = QQ( 1 ) / 10 # QQ = rationals
Matrix( [ [ sqrt( a**3 ), 0, 0 ],
[ 0, sqrt( a**3 ), 0 ],
[ 0, 0, 1 ] ] ).det()
# result: 1/1000
PV248 Python 282/301 October 22, 2020
numpy for Comparison
import numpy as np import numpy.linalg as la a = 0.1
la.det( [ [ np.sqrt( a**3 ), 0, 0 ],
[ 0, np.sqrt( a**3 ), 0 ],
[ 0, 0, 1 ] ] ) # result: 0.0010000000000000002
PV248 Python
283/301
October 22, 2020
General Solutions in Symbolic Math
from sympy import *
x = symbols( 'x' )
Matrix( [ [ x, 0, 0 ],
[ 0, 1, 0 ],
[ 0, 0, x ] ] ).det()
# result: x ** 2
PV248 Python 284/301 October 22, 2020
Symbolic Differentation
x = symbols( 'x' )
diff( x**2 + 2*x + log( x/2 ) )
tt result: 2*x + 2 + l/x
diff( x**2 * exp(x) )
tt result: x**2 * exp( x ) + 2 * x * exp( x )
PV248 Python
285/301
October 22, 2020
Algebraic Equations
solve( x**2 - 7 )
# result: [ -sqrt( 7 ), sqrt( 7 ) ]
solve( x**2 - exp( x ) )
# result: [ -2 * LambertW( -1/2 ) ]
solve( x**4 - x )
# result: [ 0, 1, -1/2 - sqrt(3) * 1/2,
# -1/2 + sqrt(3) * 1/2 ] ; 1**2 = -1
PV248 Python
286/301
October 22, 2020
Ordinary Differential Equations
f = Function( 'f )
dsolve( f( x ).diff( x ) ) tt f'(x) = 0 tt result: Eq( f ( x ), CI )
dsolve( f( x ).diff( x ) - f(x) ) tt f1(x) = f(x)
tt result: Eq( f( x ), Cl * exp( x ) )
dsolve( f( x ).diff( x ) + f(x) ) tt f1(x) = -f(x)
tt result: Eq( f( x ), CI * exp( -x ) )
PV248 Python
287/301
October 22, 2020
Symbolic Integration
integrate( x**2 )
# result: x**3 / 3
integrate( log( x ) )
# result: x * log( x ) - x
integrate( cos( x ) ** 2 )
It result: x/2 + sin( x ) * cos( x ) / 2
PV248 Python
288/301
October 22, 2020
Numeric Sparse Matrices
• sparse = most elements are 0
• available in scipy. sparse
• special data types (not numpy arrays)
o do not use numpy functions on those
• less general, but more compact and faster
PV248 Python
289/301
October 22, 2020
Fourier Transform
continuous: /(£) — f(x) exp(—27T2o;£) dx senes: f(x) = V _ cnexp(^)
71=— CO
Í2rK7lX
real series: f(x) = ^ + x (an sin(^) + &n cos(^
o (complex) coefficients: cn = ö(an — z&n)
PV248 Python
290/301
October 22, 2020
Discrete Fourier Transform
• available in nimpy. f f t
• goes between time and frequency domains
• a few different variants are covered
o real-valued input (for signals, rfft) o inverse transform (ifft, irfft) o multiple dimensions (fft2, fftn)
PV248 Python
291/301
October 22, 2020
Polynomial Series
• the numpy. polynomial package
• Chebyshev, Hermite, Laguerre and Legendre
o arithmetic, calculus and special-purpose operations o numeric integration using Guassian quadrature o fitting (polynomial regression)
PV248 Python
292/301
October 22, 2020
Part 12: Statistics
PV248 Python
293/301
October 22, 2020
Statistics in numpy
• a basic statistical toolkit
o averages, medians
o variance, standard deviation
o histograms
• random sampling and distributions
PV248 Python 294/301 October 22, 2020
Linear Regression
• very fast model-fitting method
o both in computational and human terms o quick and dirty first approximation
• widely used in data interpretation o biology and sociology statistics
o finance and economics, especially prediction
PV248 Python
295/301
October 22, 2020
Polynomial Regression
• higher-order variant of linear regression
• can capture acceleration or deceleration
• harder to use and interpret o also harder to compute
• usually requires a model of the data
PV248 Python
296/301
October 22, 2020
Interpolation
• find a line or curve that approximates data
• it must pass through the data points
o this is a major difference to regression
• more dangerous than regression
o runs a serious risk of overfitting
PV248 Python
297/301
October 22, 2020
Linear and Polynomial Regression, Interpolation
• regressions using the least squares method o linear: nimpy.linalg.lstsq
o polynomial: nimpy.polyfit
• interpolation: scipy.interpolate o e.g. piecewise cubic splines
o Lagrange interpolating polynomials
PV248 Python
298/301
October 22, 2020
Pandas: Data Analysis
• the Python equivalent of R
o works with tabular data (CSV, SQL, Excel)
o time series (also variable frequency)
o primarily works with floating-point values
• partially implemented in C and Cython
PV248 Python
299/301
October 22, 2020
Pandas Series and DataFrame
• Series is a single sequence of numbers
• DataFrame represents tabular data o powerful indexing operators
o index by column —>• series o index by condition —>■ filtering
PV248 Python
300/301
October 22, 2020
Pandas Example
scores = [ ('Maxine1, 12), ('John1, 12),
('Sandra', 10) ] cols = [ 'name', 'score' ] df = pd.DataFrame( data=scores, coluinns=cols ) df['score'].max() # 12 df[ df['score'] >= 12 ] # Maxine and John
PV248 Python
301/301
October 22, 2020