PV248 Python
Petr Ročkai
December 5, 2019
Programming vs Languages
• python is unobtrusive (by design)
• if you can program, you can program in python
• there are idiosyncracies (of course)
• but you will mostly get by
PV248 Python
2/306
December 5, 2019
Programming vs Jobs
• we all want to write beautiful programs © but you didn't sleep for 2 nights
© and this thing is going into production tomorrow
• sometimes you get a chance to clean up later © and sometimes you don't
PV248 Python
3/306
December 5, 2019
Engineering Flowchart
does it move?
yes r-+\ should it? p--------
no
no ^-+\ should it?
yesj
nof —►•
yes L
duct tape no problem
WD 40
Python makes for decent duct tape and WD 40.
PV248 Python
4/306
December 5, 2019
In This Course
• you will not learn to write beautiful programs
• we will try to do things with minimum effort © perfect is the enemy of good
• ugly comes in shades
© you should always write passable code © there is a balance to strike
PV248 Python
5/306
December 5, 2019
... ugly, cont'd
• there are two main schools of writing software o do the right thing
o worse is better
• https://www.jwz.org/doc/worse-is-better.html
PV248 Python
6/306
December 5, 2019
The Right Thing
• simplicity: interface first, implementation second
• correctness: required
• consistency: required
• completeness: more important than simplicity
PV248 Python
7/306
December 5, 2019
Worse is Better
• simplicity: implementation first
• correctness: simplicity goes first
• consistency: less important than both
• completeness: least important
PV248 Python
8/306
December 5, 2019
Design Schools
• there are pros and cons to both
• right thing is often expensive
• worse is better often wins
• which one do you think python belongs to?
PV248 Python
9/306
December 5, 2019
Disclaimer
• I am not a python programmer
• please don't ask sneaky language-lawyer questions
Goals
• learn to use python in practical situations
• have a look at existing packages and what they can do
• code up some cool stuff, have fun
PV248 Python
10/306
December 5, 2019
Organisation
• the lecture and the seminars are every other week
• that's 7 lectures + 7 seminars
• there will be 6 homework assignments
• seminar attendance is semi-compulsory
PV248 Python
11/306
December 5, 2019
Homework Grading
• grading will be fully automatic
© performed every Thursday at midnight
© starting 7 days after the assignment is given
• assignments are binary: pass/fail
• passing tests early gets you bonus points
PV248 Python
12/306
December 5, 2019
Obtaining Points
• you can get up to
© 12 points for assignments
o 6 points for passing tests early
© 2 points for seminar attendance
© 3 points for peer reviews
© 1 point for activity in the seminar
• you need 16 points to pass
PV248 Python
13/306
December 5, 2019
Semester Plan
1. Object and Memory Model
2. Text, JSON, SQL and Persistence
3. Advanced Language Constructs
4. Numeric & Symbolic Math, Statistics
5. Communication, HTTP, asyncio
6. Testing, Debugging, Profiling & Pitfalls
7. Quantum Computation
PV248 Python
14/306
December 5, 2019
Part 1: Object and Memory Model
PV248 Python
15/306
December 5, 2019
Objects
• the basic 'unit' of OOP
• they bundle data and behaviour
• provide encapsulation
• make code re-use easier
• also known as 'instances'
PV248 Python 16/306 December 5, 2019
Classes
• templates for objects (class Foo: pass)
• each (python) object belongs to a class
• classes themselves are also objects
• calling a class creates an instance © my_foo = FooQ
PV248 Python
17/306
December 5, 2019
Poking at Classes
• {}.__class__
• {}.__class_____class—
• (0).__class—
• [].__class__
• compare type(0), etc.
• n = numbers.Number(); n.__class__
PV248 Python 18/306 December 5, 2019
Types vs Objects
• class system is a type system
• 'duck typing': quacks, walks like a duck
• since python 3, types are classes
• everything is dynamic in python
o you can create new classes at runtime
o you can pass classes as function parameters
PV248 Python
19/306
December 5, 2019
Encapsulation
• objects hide implementation details
• classic types structure data
o objects also structure behaviour
• facilitates weak coupling
PV248 Python 20/306 December 5, 2019
Weak Coupling
• coupling is a degree of interdependence
• more coupling makes things harder to change © it also makes reasoning harder
• good programs are weakly coupled
• cf. modularity composability
PV248 Python
21/306
December 5, 2019
Polymorphism
• objects are (at least in python) polymorphic
• different implementation, same interface
• only the interface matters for composition
• facilitates genericity and code re-use
• cf. 'duck typing'
PV248 Python
22/306
December 5, 2019
Generic Programming
• code re-use often saves time
© not just coding but also debugging © re-usable code often couples weakly
• but not everything that can be re-used should be © code can be too generic
© and too hard to read
PV248 Python
23/306
December 5, 2019
Attributes
• data members of objects
• each instance gets its own copy
• like variables scoped to object lifetime
• they get names and values
PV248 Python
24/306
December 5, 2019
Methods
• functions (procedures) tied to objects
• they can access the object (self)
• implement the behaviour of the object
• their signatures (usually) provide the interface
• methods are also objects
PV248 Python
25/306
December 5, 2019
Class and Instance Methods
• methods are usually tied to instances
• recall that classes are also objects
• class methods work on the class (els)
• static methods are just namespaced functions
• decorators (fclassmethod, @staticmethod
PV248 Python
26/306
December 5, 2019
i i
• class Ellipse( Shape ): ...
• usually encodes an is-a relationship
PV248 Python
27/306
December 5, 2019
Multiple Inheritance
• more than one base class is possible
• many languages restrict this
• python allows general M-I
© class Bat( Mammal, Winged ): pass
• 'true' M-I is somewhat rare
© typical use cases: mixins and interfaces
PV248 Python
28/306
December 5, 2019
Mixins
• used to pull in implementation
© not part of the is-a relationship
© by convention, not enforced by the language
• common bits of functionality
o e.g. implement __gt__, __eq__ &c. using __lt__ © you only need to implement __lt__ in your class
PV248 Python
29/306
December 5, 2019
Interfaces
• realized as 'abstract' classes in python
© just throw a Notlmplemented exception © document the intent in a docstring
• participates in is-a relationships
• partially displaced by duck typing
© more important in other languages (think Java)
PV248 Python
30/306
December 5, 2019
Composition
• attributes of objects can be other objects
© (also, everything is an object in python)
• encodes a has-a relationship
© a circle has a center and a radius © a circle is a shape
PV248 Python
31/306
December 5, 2019
Constructors
• this is the __init__ method
• initializes the attributes of the instance
• can call superclass constructors explicitly
© not called automatically (unlike C++, Java) © MySuperClass.__init__( self ) © super() .__init__ (if unambiguous)
PV248 Python
32/306
December 5, 2019
Class and Object Dictionaries
• most objects are basically dictionaries
• try e.g. foo.__dict__ (for a suitable foo)
• saying foo.x means foo.__dict__["x"]
© if that fails, type(foo) .__dict__["x"] follows
o then superclasses of type (foo), according to MRO
PV248 Python
33/306
December 5, 2019
Writing Classes
class Person: def __init__( self, name ):
self.name = name def greet( self ): print( "hello " + self.name )
p = Person( "you" ) p.greet()
PV248 Python 34/306 December 5,2019
Modules in Python
• modules are just normal .py files
• import executes a file by name
© it will look into system-defined locations
© the search path includes the current directory
© they typically only define classes & functions
• import sys -»lets you use sys .argv
• from sys import argv -»you can write just argv
PV248 Python
35/306
December 5, 2019
Functions
• top-level functions/procedures are possible
• they are usually 'scoped' via the module system
• functions are also objects
© try print. __class__ (or type (print))
• some functions are built in (print, len,...)
PV248 Python
36/306
December 5, 2019
Memory
• most program data is stored in 'memory'
© an array of byte-addressable data storage © address space managed by the OS © 32 or 64 bit numbers as addresses
• typically backed by RAM
PV248 Python
37/306
December 5, 2019
Language vs Computer
• programs use high-level concepts o objects, procedures, closures
© values can be passed around
• the computer has a single array of bytes © and, well, a bunch of registers
PV248 Python
38/306
December 5, 2019
Memory Management
• deciding where to store data
• high-level objects are stored in flat memory © they have a given (usually fixed) size
© can contain references to other objects © have limited lifespan
PV248 Python
39/306
December 5, 2019
Memory Management Terminology
• object: an entity with an address and size © not the same as language-level object
• lifetime: when is the object valid
© live: references exist to the object
© dead: the object unreachable - garbage
PV248 Python
40/306
December 5, 2019
Memory Management by Type
• manual: malloc and free in C
• static automatic
© e.g. stack variables in C and C++
• dynamic automatic
© pioneered by LISP, widely used
PV248 Python
41/306
December 5, 2019
Automatic Memory Management
• static vs dynamic
© when do we make decisions about lifetime © compile time vs run time
• safe vs unsafe
© can the program read unused memory?
PV248 Python
42/306
December 5, 2019
Object Lifetime
• the time between malloc and free
• another view: when is the object needed © often impossible to tell
© can be safely over-approximated © at the expense of memory leaks
PV248 Python
43/306
December 5, 2019
Static Automatic
• usually binds lifetime to lexical scope
• no passing references up the call stack o may or may not be enforced
• no lexical closures
• examples: C, C++
PV248 Python
44/306
December 5, 2019
Dynamic Automatic
• over-approximate lifetime dynamically
• usually easiest for the programmer
© until you need to debug a space leak
• reference counting, mark & sweep collectors
• examples: Java, almost every dynamic language
PV248 Python
45/306
December 5, 2019
Reference Counting
• attach a counter to each object
• whenever a reference is made, increase
• whenever a reference is lost, decrease
• the object is dead when the counter hits 0
• fails to reclaim reference cycles
PV248 Python
46/306
December 5, 2019
Mark and Sweep
• start from a root set (in-scope variables)
• follow references, mark every object encountered
• sweep: throw away all unmarked memory
• usually stops the program while running
• garbage is retained until the GC runs
PV248 Python
47/306
December 5, 2019
Memory Management in CPython
• primarily based on reference countin
• optional mark & sweep collector © enabled by default
© configure via import gc © reclaims cycles
PV248 Python
48/306
December 5, 2019
Refcounting Advantages
• simple to implement in a 'managed' language
• reclaims objects quickly
• no need to pause the program
• easily made concurrent
PV248 Python
49/306
December 5, 2019
Refcounting Problems
• significant memory overhead
• problems with cache locality
• bad performance for data shared between threads
• fails to reclaim cyclic structures
PV248 Python
50/306
December 5, 2019
Data Structures
• an abstract description of data
• leaves out low-level details
• makes writing programs easier
• makes reading programs easier, too
PV248 Python
51/306
December 5, 2019
Building Data Structures
• there are two kinds of types in python © built-in, implemented in C
© user-defined (includes libraries)
• both kinds are based on objects
© but built-ins only look that way
PV248 Python
52/306
December 5, 2019
Mutability
• some objects can be modified © we say they are mutable
© otherwise, they are immutable
• immutability is an abstraction
© physical memory is always mutable
• in python, immutability is not 'recursive'
PV248 Python
53/306
December 5, 2019
Built-in: int
• arbitrary precision integer
© no overflows and other nasty behaviour
• it is an object, i.e. held by reference
© uniform with any other kind of object © immutable
• both of the above make it slow
© machine integers only in C-based modules
PV248 Python
54/306
December 5, 2019
Additional Numeric Objects
• bool: True or False
© how much is True + True? o is 0 true? is empty string?
• numbers.Real: floating point numbers
• numbers.Complex: a pair of above
PV248 Python
55/306
December 5, 2019
Built-in: bytes
• a sequence of bytes (raw data)
• exists for efficiency reasons
o in the abstract is just a tuple
• models data as stored in files
© or incoming through a socket
© or as stored in raw memory
PV248 Python 56/306 December 5, 2019
Properties of bytes
• can be indexed and iterated
o both create objects of type int
© try this sequence: id(x[l]), id(x[2])
• mutable version: bytearray
o the equivalent of C char arrays
PV248 Python
57/306
December 5, 2019
Built-in: str
• immutable Unicode strings © not the same as bytes
o bytes must be decoded to obtain str o (and str encoded to obtain bytes)
• represented as utf-8 sequences in CPython © implemented in PyCompactUnicodeObject
PV248 Python
58/306
December 5, 2019
Built-in: tuple
• an immutable sequence type
© the number of elements is fixed © so is the type of each element
• but elements themselves may be mutable © x = [] then y = (x, 0)
© x.append(l) -»y == ([l], 0)
• implemented as a C array of object references
PV248 Python
59/306
December 5, 2019
Built-in: list
• a mutable version of tuple
© items can be assigned x[3] = 5 © items can be append-ed
• implemented as a dynamic array
© many operations are amortised 0(1) © insert is 0(ri)
PV248 Python
60/306
December 5, 2019
Built-in: diet
• implemented as a hash table
• some of the most performance-critical code © dictionaries appear everywhere in python © heavily hand-tuned C code
• both keys and values are objects
PV248 Python
61/306
December 5, 2019
Hashes and Mutability
• dictionary keys must be hashable
© this implies recursive immutability
• what would happen if a key is mutated? © most likely the hash would change
© all hash tables with the key become invalid © this would be very expensive to fix
PV248 Python
62/306
December 5, 2019
Built-in: set
• implements the math concept of a set
• also a hash table, but with keys only © a separate C implementation
• mutable - items can be added © but they must be hashable © hence cannot be changed
PV248 Python
63/306
December 5, 2019
Built-in: frozenset
• an immutable version of set
• always hashable (since all items must be) © can appear in set or another frozenset © can be used as a key in diet
• the C implementation is shared with set
PV248 Python
64/306
December 5, 2019
Efficient Objects: __slots__
• fixes the attribute names allowed in an object
• saves memory: consider 1-attribute object o with __dict__: 56 + 112 bytes
© with __slots__: 48 bytes
• makes code faster: no need to hash anything
© more compact in memory -» better cache efficiency
PV248 Python
65/306
December 5, 2019
Part 2: Text, JSON, SQL and Persistence
PV248 Python
66/306
December 5, 2019
Transient Data
• lives in program memory
• data structures, objects
• interpreter state
• often implicit manipulation
• more on this next week
PV248 Python 67/306 December 5, 2019
Persistent Data
• (structured) text or binary files
• relational (SQL) databases
• object and 'flat' databases (NoSQL)
• manipulated explicitly
PV248 Python 68/306 December 5, 2019
Persistent Storage
• 'local' file system
o stored on HDD, SSD,...
© stored somwhere in a local network
• 'remote', using an application-level protocol © local or remote databases
o cloud storage &c.
PV248 Python
69/306
December 5, 2019
Reading Files
• opening files: open(1 file.txt1, 'r')
• files can be iterated
f = open( 'file.txt', 'r1 ) for line in f: print( line )
PV248 Python
70/306
December 5, 2019
Resource Acquisition
• plain open is prone to resource leaks
© what happens during an exception? © holding a file open is not free
• pythonic solution: with blocks o defined in PEP 343
o binds resources to scopes
PV248 Python
71/306
December 5, 2019
Detour2: PEP
• PEP stands for Python Enhancement Proposal
• akin to RFC documents managed by IETF
• initially formalise future changes to python © later serve as documentation for the same
•
PV248 Python
72/306
December 5, 2019
Using with
with open('/etc/passwd', 'r') as f:
for line in f:
do_stuff( line )
• still safe if do_stuff raises an exception
PV248 Python 73/306 December 5, 2019
Finalizers
• there is a __del__ method
• but it is not guaranteed to run © it may run arbitrarily late
© or never
• not very good for resource management
PV248 Python
74/306
December 5, 2019
Context Managers
• with has an associated protocol
• you can use with on any context manager
• which is an object with __enter__ and __exit__
• you can create your own
PV248 Python
75/306
December 5, 2019
Representing Text
• ASCII: one byte = one character
© total of 127 different characters © not very universal
• 8-bit encodings: 255 characters
• multi-byte encodings for non-Latin scripts
PV248 Python
76/306
December 5, 2019
Unicode
• one character encoding to rule them all
• supports all extant scripts and writing systems © and a whole bunch of dead scripts, too
• collation, segmentation, comparison,...
• approx. 137000 code points
PV248 Python
77/306
December 5, 2019
Code Point
• basic unit of encoding characters
• letters, punctuation, symbols
• combining diacritical marks
• not the same thing as a character
• code points range from 1 to 10FFFF
PV248 Python
78/306
December 5, 2019
Unicode Encodings
• deals with representing code points
• UCS = Universal Coded Character Set © fixed-length encoding
o two variants: UCS-2 (16 bit) and UCS-4 (32 bit)
• UTF = Unicode Transformation Format © variable-length encoding
o variants: UTF-8, UTF-16 and UTF-32
PV248 Python
79/306
December 5, 2019
Grapheme
• technically 'extended grapheme cluster'
• a logical character, as expected by users © encoded using 1 or more code points
• multiple encodings of the same grapheme o e.g. composed vs decomposed
o U+0041 U+0300 vs U+0C00: A vs A
PV248 Python
80/306
December 5, 2019
Segmentation
• breaking text into smaller units
o graphemes, words and sentences
• algorithms defined by the Unicode spec o Unicode Standard Annex #29
o graphemes and words are quite reliable
o sentences not so much (too much ambiguity)
PV248 Python
81/306
December 5, 2019
Normal Form
• Unicode defines 4 canonical (normal) forms o NFC, NFD, NFKC, NFKD
© NFC = Normal Form Composed © NFD = Normal Form Decomposed
• K variants = looser, lossy conversion
• all normalization is idempotent
• NFC does not give you 1 code point per grapheme
PV248 Python
82/306
December 5, 2019
str vs bytes
• iterating bytes gives individual bytes
o indexing is fast - fixed-size elements
• iterating str gives code points
o slightly slower, because it uses UTF-8 o does not iterate over graphemes
• going back and forth: str. encode, bytes. decode
PV248 Python
83/306
December 5, 2019
Python vs Unicode
• no native support for Unicode segmentation
© hence no grapheme iteration or word splitting
• convert everything into NFC and hope for the best © unicodedata.normalize()
© will sometimes break (we'll discuss regexes in a bit)
© most people don't bother
© correctness is overrated -> worse is better
PV248 Python
84/306
December 5, 2019
Regular Expressions
• compiling: r = re.compile( r"key: (.*)" )
• matching: m = r.match( "key: some value" )
• extracting captures: print( m.group( 1 ) ) © prints some value
• substitutions: s2 = re.sub( r"\s*$", ", si ) © strips all trailing whitespace in si
PV248 Python
85/306
December 5, 2019
Detour: Raw String Literals
• the r in r"..." stands for raw (not regex)
• normally, \ is magical in strings © but \ is also magical in regexes o nobody wants to write \\s &c.
© not to mention \\\\ to match a literal \
• not super useful outside of regexes
PV248 Python
86/306
December 5, 2019
Detour2: Other Literal Types
• byte strings: b"abc" -> bytes
• formatted string literals: f"x {y}"
x = 12
print( f"x = {x}" )
• triple-quote literals:xy
PV248 Python 87/306 December 5, 2019
Regular Expressions vs Unicode
import re
s = "\u004l\u0300" 11 A t = "\u00c0" 11 Ä
print( s, t )
print( re.match( s ), re.match( t ) )
print( re.match( s ), re.match( t ) )
print( re.match( "Ä", s ), re.match( "Ä", t ) )
PV248 Python
88/306
December 5, 2019
Regexes and Normal Forms
• some of the problems can be fixed by NFC
© some go away completely (literal Unicode matching) © some become rarer (the and "\w" problems)
• most text in the wild is already in NFC © but not all of it
© case in point: filenames on macOS (NFD)
PV248 Python
89/306
December 5, 2019
Decomposing Strings
• recall that str is immutable
• splitting: str.split(':')
© None = split on any whitespace
• split on first delimiter: partition
• better whitespace stripping: s2 = si.strip() © also lstripQ and rstripQ
PV248 Python
90/306
December 5, 2019
Searching and Matching
• startswith and endswith
© often convenient shortcuts
• find = index
© generic substring search
PV248 Python 91/306 December 5, 2019
Building Strings
• format literals and str. format
• str. replace - substring search and replace
• str .join - turn lists of strings into a string
PV248 Python
92/306
December 5, 2019
JSON
• structured, text-based data format
• atoms: integers, strings, booleans
• objects (dictionaries), arrays (lists)
• widely used around the web &c.
• simple (compared to XML or YAML)
PV248 Python
93/306
December 5, 2019
JSON: Example
{
"composer": [ "Bach, Johann Sebastian" ],
"key": "g",
"voices": {
"1": "oboe",
"2": "bassoon"
}
}
PV248 Python 94/306 December 5,2019
JSON: Writing
• printing JSON seems straightforward enough
• but: double quotes in strings
• strings must be properly \-escaped during output
• also pesky commas
• keeping track of indentation for human readability
• better use an existing library: 'import jsonN
PV248 Python
95/306
December 5, 2019
JSON in Python
• json.dumps = short for dump to string
• python dict/list/str/... data comes in
• a string with valid JSON comes out
Workflow
• just convert everything to diet and list
• run json.dumps or json.dump( data, file )
PV248 Python
96/306
December 5, 2019
Python Example d = 0
d["composer"] = ["Bach, Johann Sebastian"] d["key"] = "g"
d["voices"] = { 1: "oboe", 2: "bassoon" } json.dump( d, sys.stdout, indent=4 )
Beware: keys are always strings in JSON
PV248 Python
97/306
December 5, 2019
Parsing JSON
• import json
• j son. load is the counterpart to j son. dump from above © de-serialise data from an open file
o builds lists, dictionaries, etc.
• j son. loads corresponds to j son. dumps
PV248 Python
98/306
December 5, 2019
XML
• meant as a lightweight and consistent redesign of SGML o turned into a very complex format
• heaps of invalid XML floating around
o parsing real-world XML is a nightmare o even valid XML is pretty challenging
PV248 Python
99/306
December 5, 2019
XML: Example
Order OrderDate="1999-10-20">
Ellen Adams 123 Maple Street
-
Lawnmower
l 0rder>
PV248 Python
100/306
December 5, 2019
XML: Another Example
<0BSAH>25 bodů0BSAH> 72873
20160111104208 395879
PV248 Python
101/306
December 5, 2019
XML Features
• offers extensible, rich structure o tags, attributes, entities
o suited for structured hierarchical data
• schemas: use XML to describe XML o allows general-purpose validators o self-documentins to a degree
PV248 Python
102/306
December 5, 2019
XML vs JSON
• both work best with trees
• JSON has basically no features
o basic data structures and that's it
• JSON data is ad-hoc and usually undocumented o but: this often happens with XML anyway
PV248 Python
103/306
December 5, 2019
XML Parsers
• DOM = Document Object Model
• SAX = Simple API for XML
• expat = fast SAX-like parser (but not SAX)
• ElementTree = DOM-like but more pythonic
PV248 Python
104/306
December 5, 2019
XML: DOM
• read the entire XML document into memory
• exposes the AST (Abstract Syntax Tree)
• allows things like XPath and CSS selectors
• the API is somewhat clumsy in python
PV248 Python
105/306
December 5, 2019
XML: SAX
• event-driven XML parsing
• much more efficient than DOM o but often harder to use
• only useful in python for huge XML files o otherwise just use ElementTree
PV248 Python
106/306
December 5, 2019
XML: ElementTree
for child in root:
print child.tag, child.attrib
It Order { OrderDate: "1999-10-20" }
• supports tree walking, XPath
• supports serialization too
PV248 Python
107/306
December 5, 2019
NoSQL / Non-relational Databases
• umbrella term for a number of approaches © flat key/value and column stores
© document and graph stores
• no or minimal schemas
• non-standard query languages
PV248 Python
108/306
December 5, 2019
Key-Value Stores
• usually very fast and very simple
• completely unstructured values
• keys are often database-global
© workaround: prefixes for namespacin o or: multiple databases
PV248 Python
109/306
December 5, 2019
NoSQL & Python
• redis (redis-py) module (Redis is Key-Value)
• memcached (another Key-Value store)
• PyMongo for talking to MongoDB (document-oriented)
• CouchDB (another document-oriented store)
• neo4j or cayley (module pyley) for graph structures
PV248 Python
110/306
December 5, 2019
SQL and RDBMS
• SQL = Structured Query Language
• RDBMS = Relational DataBase Management System
• SQL is to NoSQL what XML is to JSON
• heavily used and extremely reliable
PV248 Python
111/306
December 5, 2019
SQL: Example
select name, grade from student;
select name from student where grade < 'C';
insert into student ( name, grade ) values
( 'Random X. Student', 'C );
select * from student
join enrollment on student.id = enrollment.student join group on group.id = enrollment.group;
PV248 Python
112/306
December 5, 2019
SQL: Relational Data
• JSON and XML are hierarchical
o or built from functions if you like
• SQL is relational
o relations = generalized functions
o can capture more structure
o much harder to efficiently process
PV248 Python
113/306
December 5, 2019
SQL: Data Definition
• mandatory, unlike XML or JSON
• gives the data a rather rigid structure
• tables (relations) and columns (attributes)
• static data types for columns
• additional consistency constraints
PV248 Python
114/306
December 5, 2019
SQL: Constraints
• help ensure consistency of the data
• foreign keys: referential integrity
o ensures there are no dangling references o but: does not prevent accidental misuse
• unique constraints
• check constraints: arbitrary consistency checks
PV248 Python
115/306
December 5, 2019
SQL: Query Planning
• an RDBMS makes heavy use of indexing
o using B trees, hashes and similar techniques o indices are used automatically
• all the heavy lifting is done by the backend o highly-optimized, low-level code
o efficient handling of larse data
PV248 Python
116/306
December 5, 2019
SQL: Reliability and Flexibility
• most RDBMS give ACID guarantees
o transparently solves a lot of problems o basically impossible with normal files
• support for schema alterations o alter table and similar
o nearly impossible in ad-hoc systems
PV248 Python
117/306
December 5, 2019
SQLite
• lightweight in-process SQL engine
• the entire database is in a single file
• convenient python module, sqlite3
• stepping stone for a "real" database
PV248 Python
118/306
December 5, 2019
Other Databases
• you can talk to most SQL DBs using python
• postgresql (psycopg2,...)
• mysql / mariadb (mysql-python, mysql-connector,...)
• big & expensive: Oracle (cx_oracle), DB2 (pyDB2)
• most of those are much more reliable than SQLite
PV248 Python
119/306
December 5, 2019
SQL Injection
sql = "SELECT * FROM t WHERE name = 111 + n +
• the above code is bad, never do it
• consider the following
n = "x1; drop table students —"
n = "x1; insert into passwd (user, pass) ..."
PV248 Python
120/306
December 5, 2019
Avoiding SQL Injection
• use proper SQL-building APIs
© this takes care of escaping internally
• templates like insert ... values (?, ?)
o the ? get safely substituted by the module o e.g. the execute method of a cursor
PV248 Python
121/306
December 5, 2019
PEP 249
• informational PEP, for library writers
• describes how database modules should behave
© ideally, all SQL modules have the same interface © makes it easy to swap a database backend
• but: SQL itself is not 100% portable
PV248 Python
122/306
December 5, 2019
SQL Pitfalls
• sqlite does not enforce all constraints
o you need to pragma f oreign_keys = on
• no portable syntax for autoincrement keys
• not all (column) types are supported everywhere
• no portable way to get the key of last insert
PV248 Python
123/306
December 5, 2019
More Resources & Stuff to Look Up
• SQL: https://www.w3schools.com/sql/
• https://docs.python.Org/3/library/sqlite3.html
• Object-Relational Mapping
• SQLAlchemy: constructing portable SQL
PV248 Python
124/306
December 5, 2019
Part 3: Advanced Constructs
PV248 Python
125/306
December 5, 2019
Callable Objects
• user-defined functions (module-level def)
• user-defined methods (instance and class)
• built-in functions and methods
• class objects
• objects with a __call__ method
PV248 Python
126/306
December 5, 2019
User-defined Functions
• come about from a module-level def
• metadata: __doc__, __name__, __module__
• scope: __globals__, __closure__
• arguments: __defaults__, __kwdefaults__
• type annotations: _.annotations__
• the code itself: __code__
PV248 Python
127/306
December 5, 2019
Positional and Keyword Arguments
• user-defined functions have positional arguments
• and keyword arguments
© print("hello", file=sys.stderr)
© arguments are passed by name
© which style is used is up to the caller
• variadic functions: def foo(*args, **kwargs) © args is a tuple of unmatched positional args © kwargs is a diet of unmatched keyword args
PV248 Python
128/306
December 5, 2019
Lambdas
• def functions must have a name
• lambdas provide anonymous functions
• the body must be an expression
• syntax: lambda x: print("hello", x)
• standard user-defined functions otherwise
PV248 Python
129/306
December 5, 2019
Instance Methods
• comes about as object.method
© print(x.foo) ->
• combines the class, instance and function itself
• __func__ is a user-defined function object
• let bar = x.foo, then
© x.fooQ -> bar.__func__(bar.__self__)
PV248 Python
130/306
December 5, 2019
Iterators
• objects with __next__ (since 3.x)
© iteration ends on raise Stoplteration
• iterable objects provide __iter__
© sometimes, this is just return self
© any iterable can appear in for x in iterable
PV248 Python
131/306
December 5, 2019
class Foolter:
def __init__(self):
self.x = 10
def __iter__(self): return self
def __next__(self):
if self.x:
self.x -= 1
else:
raise Stoplteration
return self.x
PV248 Python 132/306 December 5, 2019
Generators (PEP 255)
• written as a normal function or method
• they use yield to generate a sequence
• represented as special callable objects © exist at the C level in CPython
def foo(*lst):
for i in 1st: yield i + 1 list(foo(l, 2)) II prints [2, 3]
PV248 Python
133/306
December 5, 2019
yield from
• calling a generator produces a generator object
• how do we call one generator from another?
• same as for x in foo(): yield x
def bar(*lst):
yield from foo(*lst)
yield from foo(*lst) list(bar(l, 2)) II prints [2, 3, 2, 3]
PV248 Python
134/306
December 5, 2019
Native Coroutines (PEP 492)
• created using async def (since Python 3.5)
• generalisation of generators
© yield from is replaced with await
© an __await__ magic method is required
• a coroutine can be suspended and resumed
PV248 Python
135/306
December 5, 2019
Coroutine Scheduling
• coroutines need a scheduler
• one is available from asyncio.get_event_loop()
• along with many coroutine building blocks
• coroutines can actually run in parallel © via asyncio. create_task (since 3.7)
© via asyncio.gather
PV248 Python
136/306
December 5, 2019
Async Generators (PEP 525)
• async def + yield
• semantics like simple generators
• but also allows await
• iterated with async for
© async for runs sequentially
PV248 Python 137/306 December 5, 2019
Decorators
• written as @decor before a function definition
• decor is a regular function (def decor(f)) © f is bound to the decorated function
© the decorated function becomes the result of decor
• classes can be decorated too
• you can 'create' decorators at runtime
© @mkdecor("moo") (mkdecor returns the decorator) o you can stack decorators
PV248 Python
138/306
December 5, 2019
def decor(f):
return lambda: print("bar") def mkdecor(s):
return lambda g: lambda: print(s)
@decor
def foo(f): print("foo")
@mkdecor("moo")
def moo(f): print("foo")
It foo() prints "bar", moo() prints "moo"
PV248 Python
139/306
December 5, 2019
List Comprehension
• a concise way to build lists
• combines a filter and a map
[ 2 * x for x in range(10) ]
[ x for x in range(l0) if x % 2 == 1 ]
[ 2 * x for x in range(l0) if x 96 2 == 1 ]
[ (x, y) for x in range(3) for y in range(2) ]
PV248 Python
140/306
December 5, 2019
Operators
• operators are (mostly) syntactic sugar
• x < y rewrites to x.__lt__(y)
• is and is not are special
© are the operands the same object? © also the ternary (conditional) operator
PV248 Python
141/306
December 5, 2019
Non-Operator Builtins
• len(x) -»x.__len__() (length)
• abs(x)-> x.__abs__() (magnitude)
• str(x) -»x.__str__() (printing)
• repr(x) -> x.__repr__() (printing for eval)
• bool(x) and if x: x.__bool__()
PV248 Python
142/306
December 5, 2019
Arithmetic
• a standard selection of operators
• / is floating point, //is integral
• += and similar are somewhat magical
© x += y->x = x.__iadd__(y) if defined © otherwise x = x.__add__(y)
PV248 Python
143/306
December 5, 2019
x - 7 # an int is immutable
x += 3 # works, x = 10, id(x) changes
1st = [7, 3]
lst[0] +-3 # works too, id(lst) stays same
tup = (7, 3) # a tuple is immutable
tup += (1, 1) # still works (id changes)
tup[0] +-3 # fails
PV248 Python
144/306
December 5, 2019
Relational Operators
• operands can be of different types
• equality: !=, ==
o by default uses object identity
• ordering: <, <=, >, >= (TypeError by default)
• consistency is not enforced
PV248 Python
145/306
December 5, 2019
Relational Consistency
• __eq__ must be an equivalence relation
• x.__ne__(y) must be the same as not x.__eq__(y)
• __lt__ must be an ordering relation © compatible with __eq__
© consistent with each other
• each operator is separate (mixins can help) o or perhaps a class decorator
PV248 Python
146/306
December 5, 2019
Collection Operators
• in is also a membership operator (outside for) © implemented as __contains__
• indexing and slicing operators © del x[y] -> x.__delitem__(y) © x[y] -»x.__getitem__(y)
© x[y] = z -> x.__setitem__(y, z)
PV248 Python
147/306
December 5, 2019
Conditional Operator
• also known as a ternary operator
• written x if cond else y
© in C: cond ? x : y
• forms an expression, unlike if
© can e.g. appear in a lambda
o or in function arguments, &c.
PV248 Python 148/306 December 5, 2019
Concurrency & Parallelism
• threading - thread-based parallelism
• multiprocessing
• concurrent - future-based programming
• subprocess
• sched, a general-purpose event scheduler
• queue, for sending objects between threads
PV248 Python
149/306
December 5, 2019
Threading
• low-level thread support, module threading
• Thread objects represent actual threads o threads provide start() and join()
o the run() method executes in a new thread
• mutexes, semaphores &c.
PV248 Python
150/306
December 5, 2019
The Global Interpreter Lock
• memory management in CPython is not thread-safe o Python code runs under a global lock
o pure Python code cannot use multiple cores
• C code usually runs without the lock o this includes numpy crunching
PV248 Python
151/306
December 5, 2019
Multiprocessing
• like threading but uses processes
• works around the GIL
© each worker process has its own interpreter
• queued/sent objects must be pickled © see also: the pickle module
© this causes substantial overhead
© functions, classes &c. are pickled by name
PV248 Python
152/306
December 5, 2019
Futures
• like coroutine await but for subroutines
• a Future can be waited for using f. result()
• scheduled via concurrent. futures. Executor © Executor.map is like asyncio. gather
© Executor. submit is like asyncio. create_task
• implemented using process or thread pools
PV248 Python
153/306
December 5, 2019
Exceptions
• an exception interrupts normal control flow
• it's called an exception because it is exceptional © never mind Stop Iteration
• causes methods to be interrupted
© until a matching except block is found © also known as stack unwinding
PV248 Python
154/306
December 5, 2019
Life Without Exceptions
int fd = socket( ... );
if ( fd < 0 )
... /* handle errors */
if ( bind( fd, ... ) < 0 )
... /* handle errors */
if ( listen( fd, 5 ) < 0 )
... /* handle errors */
PV248 Python 155/306 December 5, 2019
With Exceptions
try:
sock = socket.socket( ... )
sock.bind( ... )
sock.listen( ... )
except ...:
It handle errors
PV248 Python 156/306 December 5, 2019
Exceptions vs Resources
x = open( "file.txt" )
# stuff
raise SomeError
• who calls x.close()
• this would be a resource leak
PV248 Python 157/306 December 5, 2019
Using finally
try:
x = open( "file.txt" )
# stuff
finally:
x.close()
• works, but tedious and error-prone
PV248 Python 158/306 December 5, 2019
Using with
with open( "file.txt" ) as f: It stuff
• with takes care of the finally and close
• with x as ysetsy = x.__enter__()
© and calls x. __exit__ (...) when leaving the block
PV248 Python
159/306
December 5, 2019
The @property decorator
• attribute syntax is the preferred one in Python
• writing useless setters and getters is boring
class Foo: ^property
def x(self): return 2 * self.a (flx.setter
def x(self, v): self.a = v // 2
PV248 Python
160/306
December 5, 2019
Execution Stack
• made up of activation frames
• holds local variables
• and return addresses
• in dynamic languages, often lives in the heap
PV248 Python
161/306
December 5, 2019
Variable Capture
• variables are captured lexically
• definitions are a dynamic / run-time construct © a nested definition is executed
o creates a closure object
• always by reference in Python
© but can be by-value in other languages
PV248 Python
162/306
December 5, 2019
Using Closures
• closures can be returned, stored and called © they can be called multiple times, too
© they can capture arbitrary variables
• closures naturally retain state
• this is what makes them powerful
PV248 Python
163/306
December 5, 2019
Objects from Closures
• so closures are essentially code + state
• wait, isn't that what an object is?
• indeed, you can implement objects using closures
PV248 Python
164/306
December 5, 2019
The Role of GC
• memory management becomes a lot more complicated
• forget C-style 'automatic' stack variables
• this is why the stack is actually in the heap
• this can go as far as form reference cycles
PV248 Python
165/306
December 5, 2019
Coroutines
• coroutines are a generalisation of subroutines
• they can be suspended and re-entered
• coroutines can be closures at the same time
• the code of a coroutine is like a function
• a suspended coroutine is like an activation frame
PV248 Python
166/306
December 5, 2019
Yield
• suspends execution and 'returns' a value
• may also obtain a new value (cf. send)
• when re-entered, continue where we left off
for i in range(5): yield i
PV248 Python
167/306
December 5, 2019
Send
• with yield, we have one-way communication
• but in many cases, we would like two-way
• a suspended coroutine is an object in Python o with a send method which takes a value
o send re-enters the coroutine
PV248 Python
168/306
December 5, 2019
Yield From and Await
• yield from is mostly a generator concept
• await basically does the same thing © call out to another coroutine
© when it suspends, so does the entire stack
PV248 Python
169/306
December 5, 2019
Suspending Native Coroutines
• this is not actually possible
o not with async-native syntax anyway
• you need a yield
o for that, you need a generator o use the types. coroutine decorator
PV248 Python
170/306
December 5, 2019
Event Loop
• not required in theory
• useful also without coroutines
• there is a synergistic effect
© event loops make coroutines easier © coroutines make event loops easier
PV248 Python
171/306
December 5, 2019
Part 4: Math and Statistics
PV248 Python
172/306
December 5, 2019
Numbers in Python
• recall that numbers are objects
• a tuple of real numbers has 300% overhead © compared to a C array of float values
© and 350% for integers
• this causes extremely poor cache use
• integers are arbitrary-precision
PV248 Python
173/306
December 5, 2019
Math in Python
• numeric data usually means arrays o this is inefficient in python
• we need a module written in C
© but we don't want to do that ourselves
• enter the SciPy project
o pre-made numeric and scientific packages
PV248 Python
174/306
December 5, 2019
The SciPy Family
• numpy: data types, linear algebra
• scipy: more computational machinery
• pandas: data analysis and statistics
• matplotlib: plotting and graphing
• sympy: symbolic mathematics
PV248 Python
175/306
December 5, 2019
Aside: External Libraries
• until now, we only used bundled packages
• for math, we will need external libraries
• you can use pip to install those
© use pip install —user
PV248 Python
176/306
December 5, 2019
Aside: The Python Package Index
• colloquially known as PyPI (or cheese shop)
© do not confuse with PyPy (Python in almost-Python)
• both source packages and binaries
© the latter known as wheels (PEP 427, 491) © previously python eggs
•
PV248 Python
177/306
December 5, 2019
Aside: Installing numpy
• the easiest way may be with pip © this would be pip3 on aisa
• linux distributions usually also have packages
• another option is getting the Anaconda bundle
• detailed instructions on https: //scipy. org
PV248 Python
178/306
December 5, 2019
Arrays in numpy
• compact, C-implemented data types
• flexible multi-dimensional arrays
• easy and efficient re-shaping
o typically without copying the data
PV248 Python
179/306
December 5, 2019
Entering Data
• most data is stored in numpy. array
• can be constructed from a list © a list of lists for 2D arrays
• or directly loaded from / stored to a file © binary: numpy. load, numpy. save
© text: numpy. loadtxt, numpy. savetxt
PV248 Python
180/306
December 5, 2019
LAPACK and BLAS
• BLAS is a low-level vector/matrix package
• LAPACK is built on top of BLAS
© provides higher-level operations
© tuned for modern CPUs with multiple caches
• both are written in Fortran
© ATLAS and C-LAPACK are C implementations
PV248 Python
181/306
December 5, 2019
Element-wise Functions
• the basic math function arsenal
• powers, roots, exponentials, logarithms
• trigonometric (sin, cos, tan,...)
• hyperbolic (sinh, cosh, tanh,...)
• cyclometric (arcsin, arccos, arctan,...)
PV248 Python
182/306
December 5, 2019
Matrix Operations in numpy
• import numpy.linaig
• multiplication, inversion, rank
• eigenvalues and eigenvectors
• linear equation solver
• pseudo-inverses, linear least squares
PV248 Python
183/306
December 5, 2019
Additional Linear Algebra in scipy
• import scipy.linalg
• LU, QR, polar, etc. decomposition
• matrix exponentials and logarithms
• matrix equation solvers
• special operations for banded matrices
PV248 Python
184/306
December 5, 2019
Where is my Gaussian Elimination?
• used in lots of school linear algebra
• but not the most efficient algorithm
• a few problems with numerical stability
• not directly available in numpy
PV248 Python
185/306
December 5, 2019
Numeric Stability
• floats are imprecise / approximate
0.1**2 ==0.01 # False 1 / ( 0.1**2 - 0.01 ) II 5.8-1017
• multiplication is not associative
a = (0.1 * 0.1) * 10 b = 0.1 * (0.1 * 10) 1 / ( a - b ) II 7.21-1016
• iteration amplifies the errors
PV248 Python 186/306 December 5, 2019
LU Decomposition
• decompose matrix A into simpler factors
• PA = LU where
o Pisa permutation matrix o Lisa lower triangular matrix © U is an upper triangular matrix
• fast and numerically stable
PV248 Python
187/306
December 5, 2019
Uses for LU
• equations, determinant, inversion,...
• as an example
o detU) = det(P_1)-det(L)-det(£7) o where det{U) = Uu and o det(L) = Uu
PV248 Python
188/306
December 5, 2019
Numeric Math
• float arithmetic is messy but incredibly fast
• measured data is approximate anyway
• stable algorithms exist for many things © and are available from libraries
• we often don't care about exactness
© think computer graphics, signal analysis,...
PV248 Python
189/306
December 5, 2019
Symbolic Math
• numeric math sucks for 'textbook' math
• there are problems where exactness matters o pure math and theoretical physics
• incredibly slow computation
o but much cleaner interpretation
PV248 Python
190/306
December 5, 2019
Linear Algebra in sympy
• uses exact math
© e.g. arbitrary precision rationals © and roots thereof
© and many other computable numbers
• wide repertoire of functions
© including LU, QR, etc. decompositions
PV248 Python
191/306
December 5, 2019
Exact Rationais in sympy
from sympy import *
a = QQ( 1 ) / 10 # QQ = rationals
Matrix( [ [ sqrt( a**3 ), 0, 0 ],
[ 0, sqrt( a**3 ), 0 ],
[ 0, 0, 1 ] ] ).det()
# result: 1/1000
PV248 Python 192/306 December 5, 2019
numpy for Comparison
import numpy as np import numpy.linalg as la a = 0.1
la.det( [ [ np.sqrt( a**3 ), 0, 0 ],
[ 0, np.sqrt( a**3 ), 0 ],
[ 0, 0, 1 ] ] )
It result: 0.0010000000000000002
PV248 Python
193/306
December 5, 2019
General Solutions in Symbolic Math
from sympy import *
x = symbols( 'x' )
Matrix( [ [ x, 0, 0 ],
[ 0, 1, 0 ],
[ 0, 0, x ] ] ).det()
it result: x ** 2
PV248 Python 194/306 December 5, 2019
Symbolic Differentation
x = symbols( 'x' )
diff( x**2 + 2*x + log( x/2 ) )
11 result: 2*x + 2 + l/x
diff( x**2 * exp(x) )
11 result: x**2 * exp( x ) + 2 * x * exp( x )
PV248 Python
195/306
December 5, 2019
Algebraic Equations solve( x**2 - 7 )
# result: [ -sqrt( 7 ), sqrt( 7 ) ]
solve( x**2 - exp( x ) )
# result: [ -2 * LambertW( -1/2 ) ]
solve( x**4 - x )
# result: [ 0, 1, -1/2 - sqrt(3) * 1/2,
# -1/2 + sqrt(3) * 1/2 ] ; 1**2 = -1
PV248 Python
196/306
December 5, 2019
Ordinary Diffrential Equations
f = Function( 'f )
dsolve( f( x ).diff( x ) ) # f'(x) = 0
# result: Eq( f( x ), CI )
dsolve( f( x ).diff( x ) - f(x) ) # f'(x) = f(x)
# result: Eq( f( x ), CI * exp( x ) )
dsolve( f( x ).diff( x ) + f(x) ) # f'(x) = -«x)
# result: Eq( f( x ), CI * exp( -x ) )
PV248 Python 197/306 December 5, 2019
Symbolic Integration
integrate( x**2 ) # result: x**3 / 3
integrate( log( x ) )
II result: x * log( x ) - x
integrate( cos( x ) ** 2 )
II result: x/2 + sin( x ) * cos( x ) / 2
PV248 Python
198/306
December 5, 2019
Numeric Sparse Matrices
• sparse = most elements are 0
• available in scipy.sparse
• special data types (not numpy arrays)
© do not use numpy functions on those
• less general, but more compact and faster
PV248 Python
199/306
December 5, 2019
Fourier Transform
• continuous: /(f) = /_ /(x) exp (—27rixf) dx
senes:
/(*) =Sr=-oocnexp(i
27rnx
real series:
rr\ &o . v^00 Z' ■ (2nnx\ . 7 /2nnx\\
o f(x) = -f + £n=1 (ansin[—p-) + bncos(^-p-J)
PV248 Python
200/306
December 5, 2019
Discrete Fourier Transform
• available in numpy.fft
• goes between time and frequency domains
• a few different variants are covered © real-valued input (for signals, rfft) © inverse transform (ifft, irfft)
© multiple dimensions (fft2, fftn)
PV248 Python
201/306
December 5, 2019
Polynomial Series
• the numpy.polynomial package
• Chebyshev, Hermite, Laguerre and Legendre
© arithmetic, calculus and special-purpose operations © numeric integration using Guassian quadrature © fitting (polynomial regression)
PV248 Python
202/306
December 5, 2019
Statistics in numpy
• a basic statistical toolkit o averages, medians
o variance, standard deviation o histograms
• random sampling and distributions
PV248 Python
203/306
December 5, 2019
Linear Regression
• very fast model-fitting method
© both in computational and human terms © quick and dirty first approximation
• widely used in data interpretation o biology and sociology statistics
© finance and economics, especially prediction
PV248 Python
204/306
December 5, 2019
Polynomial Regression
• higher-order variant of linear regression
• can capture acceleration or deceleration
• harder to use and interpret © also harder to compute
• usually requires a model of the data
PV248 Python
205/306
December 5, 2019
Interpolation
• find a line or curve that approximates data
• it must pass through the data points
© this is a major difference to regression
• more dangerous than regression
© runs a serious risk of overfitting
PV248 Python
206/306
December 5, 2019
Linear and Polynomial Regression, Interpolation
• regressions using the least squares method © linear: numpy.linalg.lstsq
© polynomial: numpy.polyfit
• interpolation: scipy.interpolate
o e.g. piecewise cubic splines
© Lagrange interpolating polynomials
PV248 Python
207/306
December 5, 2019
Pandas: Data Analysis
• the Python equivalent of R
o works with tabular data (CSV, SQL, Excel)
o time series (also variable frequency)
o primarily works with floating-point values
• partially implemented in C and Cython
PV248 Python
208/306
December 5, 2019
Pandas Series and DataFrame
• Series is a single sequence of numbers
• DataFrame represents tabular data © powerful indexing operators
o index by column -> series o index by condition -> filtering
PV248 Python
209/306
December 5, 2019
Pandas Example
scores = [ ('Maxine', 12), ('John', 12),
('Sandra', 10) ] cols = [ 'name', 'score' ] df = pd.DataFrame( data=scores, columns=cols ) df['score'].max() # 12 df[ df['score'] >= 12 ] # Maxine and John
PV248 Python
210/306
December 5, 2019
Part 5: Communication, HTTP & asyncio
PV248 Python
211/306
December 5, 2019
Running Programs (the old way)
• os. system is about the simplest
© also somewhat dangerous - shell injection o you only get the exit code
• os .popen allows you to read output of a program
© alternatively, you can send input to the program © you can't do both (would likely deadlock anyway) © runs the command through a shell, same as os. system
PV248 Python
212/306
December 5, 2019
Low-level Process API
• POSIX-inherited interfaces (on POSIX systems)
• os. exec: replace the current process
• os. fork: split the current process in two
• os. f orkpty: same but with a PTY
PV248 Python
213/306
December 5, 2019
Detour: bytes vs str
• strings (class str) represent text
o that is, a sequence of Unicode points
• files and network connections handle data © represented in Python as bytes
• the bytes constructor can convert from str o e.g. b = bytes("hello", "utf8")
PV248 Python
214/306
December 5, 2019
Running Programs (the new way)
• you can use the subprocess module
• subprocess can handle bidirectional 10
© it also takes care of avoiding 10 deadlocks o set input to feed data to the subprocess
• internally run uses a Popen object
© if run can't do it, Popen probably can
PV248 Python
215/306
December 5, 2019
Getting subprocess Output
• available via run since Python 3.7
• the run function returns a CompletedProcess
• it has attributes stdout and stderr
• both are bytes (byte sequences) by default
• or str if text or encoding were set
• available if you enabled capture_output
PV248 Python
216/306
December 5, 2019
Running Filters with Popen
• if you are stuck with 3.6, use Popen directly
• set stdin in the constructor to PIPE
• use the communicate method to send the input
• this gives you the outputs (as bytes)
PV248 Python
217/306
December 5, 2019
import subprocess
from subprocess import PIPE
input = bytes( "x\na\nb\ny", "utf8")
p = subprocess.Popen(["sort"], stdin=PIPE,
stdout=PIPE) out = p communicate(input=input) # out[0] is the stdout, out[l] is None
PV248 Python
218/306
December 5, 2019
Subprocesses with asyncio
• import asyncio.subprocess
• create_subprocess_exec, like subprocess.run
o but it returns a Process instance
o Process has a communicate async method
• can run things in background (via tasks) o also multiple processes at once
PV248 Python
219/306
December 5, 2019
Protocol-based asyncio subprocesses
• let loop be an implementation of the asyncio event loop
• there's subprocess_exec and subprocess_shell
o sets up pipes by default
• integrates into the asyncio transport layer (see later)
• allows you to obtain the data piece-wise
• https://docs.python.Org/3/library/asyncio-protocol.html
PV248 Python
220/306
December 5, 2019
Sockets
• the socket API comes from early BSD Unix
• socket represents a (possible) network connection
• sockets are more complicated than normal files o establishing connections is hard
o messages get lost much more often than file data
PV248 Python
221/306
December 5, 2019
Socket Types
• sockets can be internet or unix domain
o internet sockets connect to other computers o Unix sockets live in the filesystem
• sockets can be stream or datagram o stream sockets are like files (TCP)
o you can write a continuous stream of data
o datagram sockets can send individual messages (UDP)
PV248 Python
222/306
December 5, 2019
Sockets in Python
• the socket module is available on all major OSes
• it has a nice object-oriented API
o failures are propagated as exceptions o buffer management is automatic
• useful if you need to do low-level networking o hard to use in non-blocking mode
PV248 Python
223/306
December 5, 2019
Sockets and asyncio
• asyncio provides sock_* to work with socket objects
• this makes work with non-blocking sockets a lot easier
• but your program needs to be written in async style
• only use sockets when there is no other choice
o asyncio protocols are both faster and easier to use
PV248 Python
224/306
December 5, 2019
Hyper-Text Transfer Protocol
• originally a simple text-based, stateless protocol
• however
o SSL/TLS, cryptography (https) © pipelining (somewhat stateful) © cookies (somewhat stateful in a different way)
• typically between client and a front-end server
• but also as a back-end protocol (web server to app server)
PV248 Python
225/306
December 5, 2019
Request Anatomy
• request type (see below)
• header (text-based, like e-mail)
• content
Request Types
• GET - asks the server to send a resource
• HEAD - like GET but only send back headers
• POST - send data to the server
PV248 Python
226/306
December 5, 2019
Python and HTTP
• both client and server functionality © import http.client
© import http.server
• TLS/SSL wrappers are also available © import ssl
• synchronous by default
PV248 Python
227/306
December 5, 2019
Serving Requests
• derive from BaseHTTPRequestHandler
• implement a do_6ET method
• this gets called whenever the client does a GET
• also available: do_HEAD, do_P0ST, etc.
• pass the class (not an instance) to HTTPServer
PV248 Python
228/306
December 5, 2019
Serving Requests (cont'd)
• HTTPServer creates a new instance of your Handler
• the BaseHTTPRequestHandler machinery runs
• it calls your do_6ET etc. method
• request data is available in instance variables o self.path, self.headers
PV248 Python
229/306
December 5, 2019
Talking to the Client
• HTTP responses start with a response code o self.send_response( 200, 'OK' )
• the headers follow (set at least Content-Type)
o self.send_header( 'Connection', 'close' )
• headers and the content need to be separated o self.end-headers()
• finally, send the content by writing to self. wf ile
PV248 Python
230/306
December 5, 2019
Sending Content
• self .wfile is an open file
• it has a write() method which you can use
• sockets only accept byte sequences, not str
• use the bytes( string, encoding ) constructor o match the encoding to your Content-Type
PV248 Python
231/306
December 5, 2019
HTTP and asyncio
• the base asyncio currently doesn't directly support HTTP
• but: you can get aiohttp from PyPI
• contains a very nice web server © from aiohttp import web
© minimum boilerplate, fully asyncio-ready
PV248 Python
232/306
December 5, 2019
SSL and TLS
• you want to use the ssl module for handling HTTPS o this is especially true server-side
o aiohttp and http. server are compatible
• you need to deal with certificates (loading, checking)
• this is a rather important but complex topic
PV248 Python
233/306
December 5, 2019
Certificate Basics
• certificate is a cryptographically signed statement © it ties a server to a certain public key
o the client ensures the server knows the private key
• the server loads the certificate and its private key
• the client must validate the certificate
© this is typically a lot harder to get right
PV248 Python
234/306
December 5, 2019
SSL in Python
• start with import ssl
• almost everything happens in the SSLContext class
• get an instance from ssl.create_default_context() o you can use wrap_socket to run an SSL handshake o you can pass the context to aiohttp
• if httpd is a http. server. HTTPServer:
httpd.socket = ssl.wrap_socket( httpd.socket, ... )
PV248 Python
235/306
December 5, 2019
HTTP Clients
• there's a very basic http. client
• for a more complete library use urllib.request
• aiohttp has client functionality
• all of the above can be used with ssl
• another 3rd party module: Python Requests
PV248 Python
236/306
December 5, 2019
10 at the OS Level
• often defaults to blocking
© read returns when data is available © this is usually OK for files
• but what about network code? © could work for a client
PV248 Python
237/306
December 5, 2019
Threads and 10
• there may be work to do while waiting o waiting for 10 can be wasteful
• only the calling (OS) thread is blocked o another thread may do the work
o but multiple green threads may be blocked
PV248 Python
238/306
December 5, 2019
Non-Blocking 10
• the program calls read
© read returns immediately o even if there was no data
• but how do we know when to read? o we could poll
o for example call read every 30ms
PV248 Python
239/306
December 5, 2019
Polling
• trade-off between latency and throughput © sometimes, polling is okay
© but is often too inefficient
• alternative: 10 dispatch
© useful when multiple IOs are pending © wait only if all are blocked
PV248 Python
240/306
December 5, 2019
select
• takes a list of file descriptors
• block until one of them is ready
o next read will return data immediately
• can optionally specify a timeout
• only useful for OS-level resources
PV248 Python
241/306
December 5, 2019
Alternatives to select
• select is a rather old interface
• there is a number of more modern variants
• poll and epoll system calls
© despite the name, they do not poll © epoll is more scalable
• kqueue and kevent on BSD systems
PV248 Python
242/306
December 5, 2019
Synchronous vs Asynchronous
• the select family is synchronous o you call the function
o it may wait some time
o you proceed when it returns
• OS threads are fully asynchronous
PV248 Python
243/306
December 5, 2019
The Thorny Issue of Disks
• a file is always 'ready' for reading
• this may still take time to complete
• there is no good solution on UNIX
• POSIX AIO exists but is sparsely supported
• OS threads are an option
PV248 Python
244/306
December 5, 2019
10 on Windows
• select is possible (but slow)
• Windows provides real asynchronous 10 o quite different from UNIX
o the 10 operation is directly issued o but the function returns immediately
• comes with a notification queue
PV248 Python
245/306
December 5, 2019
The asyncio Event Loop
• uses the select family of syscalls
• why is it called async 10?
o select is synchronous in principle
o this is an implementation detail
o the IOs are asynchronous to each other
PV248 Python
246/306
December 5, 2019
How Does It Work
• you must use asyncio functions for 10
• an async read does not issue an OS read
• it yields back into the event loop
• the fd is put on the select list
• the coroutine is resumed when the fd is ready
PV248 Python
247/306
December 5, 2019
Timers
• asyncio allows you to set timers
• the event loop keeps a list of those
• and uses that to set the select timeout o just uses the nearest timer expiry
• when a timer expires, its owner is resumed
PV248 Python
248/306
December 5, 2019
Blocking 10 vs asyncio
• all user code runs on the main thread
• you must not call any blocking 10 functions
• doing so will stall the entire application o in a server, clients will time out
o even if not, latency will suffer
PV248 Python
249/306
December 5, 2019
DNS
• POSIX: getaddrinfo and getnameinfo © also the older API gethostbyname
• those are all blocking functions © and they can take a while
© but name resolution is essential
• asyncio internally uses OS threads for DNS
PV248 Python
250/306
December 5, 2019
Signals
• signals on UNIX are very asynchronous
• interact with OS threads in a messy way
• asyncio hides all this using C code
PV248 Python
251/306
December 5, 2019
Native Coroutines (Reminder)
• delared using async def
async def foo():
await asyncio.sleep( 1 )
• calling foo() returns a suspended coroutine
• which you can await
© or turn it into an asyncio. Task
PV248 Python
252/306
December 5, 2019
Tasks
• asyncio. Task is a nice wrapper around coroutines o create with asyncio.create_task()
• can be stopped prematurely using cancel ()
• has an API for asking things:
o done() tells you if the coroutine has finished o resultQ gives you the result
PV248 Python
253/306
December 5, 2019
Tasks and Exceptions
• what if a coroutine raises an exception?
• calling result will re-raise it
o i.e. it continues propagating from result()
• you can also ask directly using exception()
o returns None if the coroutine ended normally
PV248 Python
254/306
December 5, 2019
Asynchronous Context Managers
• normally, we use with for resource acquisition
© this internally uses the context manager protocol
• but sometimes you need to wait for a resource © __enter__() is a subroutine and would block © this won't work in async-enabled code
• we need __enter__() to be itself a coroutine
PV248 Python
255/306
December 5, 2019
async with
• just like wait but uses __aenter__(), __aexit__() o those are async def
• the async with behaves like an await
o it will suspend if the context manager does
o the coroutine which owns the resource can continue
• mainly used for locks and semaphores
PV248 Python
256/306
December 5, 2019
Part 6: Testing, Pitfalls
PV248 Python
257/306
December 5, 2019
Mixing Languages
• for many people, Python is not a first language
• some things look similar in Python and Java (C++,...) © sometimes they do the same thing
© sometimes they do something very different © sometimes the difference is subtle
PV248 Python
258/306
December 5, 2019
Python vs Java: Decorators
• Java has a thing called annotations
• looks very much like a Python decorator
• in Python, decorators can drastically change meaning
• in Java, they are just passive metadata
© other code can use them for meta-programming though
PV248 Python
259/306
December 5, 2019
Class Body Variables
class Foo: some_attr = 42
• in Java/C++, this is how you create instance variables
• in Python, this creates class attributes
© i.e. what C++/Java would call static attributes
PV248 Python
260/306
December 5, 2019
Very Late Errors
if a == 2:
priiiint("a is not 2")
• no error when loading this into python
• it even works as long as a ! = 2
• most languages would tell you much earlier
PV248 Python
261/306
December 5, 2019
Very Late Errors (cont'd)
try:
foo()
except TyyyypeError: print("my mistake")
• does not even complain when running the code
• you only notice when foo() raises an exception
PV248 Python
262/306
December 5, 2019
Late Imports
if a == 2: import foo foo.say_hello()
• unless a == 2, mymod is not loaded
• any syntax errors don't show up until a == 2 © it may even fail to exist
PV248 Python
263/306
December 5, 2019
Block Scope
for i in range(10): pass print(i) II not a NameError
• in Python, local variables are function-scoped
• in other languages, i is confined to the loop
PV248 Python
264/306
December 5, 2019
Assignment Pitfalls x = [ l, 2 ]
y = x
x.append( 3 )
print( y ) tt prints [ 1, 2, 3 ]
• in Python, everything is a reference
• assignment does not make copies
PV248 Python
265/306
December 5, 2019
Equality of Iterables
• [0» 1] -- [0, l]-> True (obviously)
• range© == range© -»True
• list(range(2)) == [0, l] ->True
• [0» 1] -- range(2) -»False
PV248 Python
266/306
December 5, 2019
Equality of bool
• if 0: print( "yes" ) -> nothing
• if 1: print( "yes" ) -> yes
• False == 0 -» True
• True == 1 ->True
• 0 is False -» False
• 1 is True -> False
PV248 Python 267/306 December 5, 2019
Equality of bool (cont'd)
• if 2: print( "yes" )->yes
• True == 2 -> False
• False == 2 -» False
• if 11: print( "yes" ) -> nothing
• if 'x': print( "yes" )->yes
• 11 == False -> False
• 'x' == True -> False
PV248 Python 268/306 December 5, 2019
Mutable Default Arguments
def foo( x = [] ):
x.append( 7 )
return x
foo() # [ 7 ]
foo() # [ 7, 7 ]... wait, what?
PV248 Python 269/306 December 5, 2019
Late Lexical Capture
f = [ lambda x : i * x for i in range( 5 ) ]
f[ 4 ]( 3 ) # 12
f[ 0 ]( 3 ) # 12 ... ?!
g = [ lambda x, i = i: i * x for i in range( 5 ) ]
g[ 4 ]( 3 ) tt 12
g[ 0 ]( 3 ) tt 0 ... fml
h = [ ( lambda x : i * x )( 3 ) for i in range( 5 ) ] h#[0, 3, 6, 12 ] ... i kid you not
PV248 Python
270/306
December 5, 2019
Dictionary Iteration Order
• in python <= 3.6
© small dictionaries iterate in insertion order © big dictonaries iterate in 'random' order
• in python 3.7
© all dictonaries in insertion, but not documented
• in python >= 3.8
© guaranteed to iterate in insertion order
PV248 Python
271/306
December 5, 2019
x = [ [ 1 ] * 2 ] * 3
print( x ) # [ [ 1, 1 ], [ 1, i ], [ i, 1 ] ]
x[ 0 ][ 0 ] = 2
print( x ) # [ [ 2, 1 ], [ 2, i ], [ 2, 1 ] ]
PV248 Python 272/306 December 5, 2019
Forgotten Await
import asyncio async def foo():
print( "hello" ) async def main():
foo()
asyncio.run( main() ) • gives warning coroutine 1 f oo1 was never awaited
PV248 Python
273/306
December 5, 2019
Python vs Java: Closures
• captured variables are final in Java
• but they are mutable in Python
© and of course captured by reference
• they are whatever you tell them to be in C++
PV248 Python
274/306
December 5, 2019
Explicit super ()
• Java and C++ automatically call parent constructors
• Python does not
• you have to call them yourself
PV248 Python
275/306
December 5, 2019
Setters and Getters
obj.attr obj.attr = 4
• in C++ or Java, this is an assignment
• in Python, it can run arbitrary code
o this often makes getters/setters redundant
PV248 Python
276/306
December 5, 2019
Why Testing
• reading programs is hard
• reasoning about programs is even harder
• testing is comparatively easy
• difference between an example and a proof
PV248 Python
277/306
December 5, 2019
What is Testing
• based on trial runs
• the program is executed with some inputs
• the outputs or outcomes are checked
• almost always incomplete
PV248 Python
278/306
December 5, 2019
Testing Levels
• unit testing
© individual classes
© individual functions
• functional
© system
© integration
PV248 Python 279/306 December 5, 2019
Testing Automation
• manual testing
o still widely used
© requires human
• semi-automated
© requires human assistance
• fully automated
o can run unattended
PV248 Python 280/306 December 5, 2019
Testing Insight
• what does the test or tester know?
• black box: nothing known about internals
• gray box: limited knowledge
• white box: 'complete knowledge
PV248 Python
281/306
December 5, 2019
Why Unit Testing?
• allows testing small pieces of code
• the unit is likely to be used in other code
© make sure your code works before you use it o the less code, the easier it is to debug
• especially easier to hit all the corner cases
PV248 Python
282/306
December 5, 2019
Unit Tests with unittest
• from unittest import TestCase
• derive your test class from TestCase
• put test code into methods named test_*
• run with python -m unittest program.py
o add -v for more verbose output
PV248 Python
283/306
December 5, 2019
from unittest import TestCase
class TestArith(TestCase):
def test_add(self):
self.assertEqual(l, 4 - • 3)
def test_leq(self):
self.assertTrue(3 <= 2 * 3)
PV248 Python 284/306 December 5, 2019
Unit Tests with pytest
• a more pythonic alternative to unittest © unittest is derived from JUnit
• easier to use and less boilerplate
• you can use native python assert
• easier to run, too
o just run pytest in your source repository
PV248 Python
285/306
December 5, 2019
Test Auto-Discovery in pytest
• pytest finds your testcases for you o no need to register anything
• put your tests in test_.py or _test.py
• name your testcases (functions) test_*
PV248 Python
286/306
December 5, 2019
Fixtures in pytest
• sometimes you need the same thing in many testcases
• in unittest, you have the test class
• pytest passes fixtures as parameters © fixtures are created by a decorator
© they are matched based on their names
PV248 Python
287/306
December 5, 2019
import pytest import smtplib
@pytest.fixture
def smtp_connection():
return smtplib.SMTP("smtp.gmail.com", 587)
def test_ehlo(smtp_connection):
response, msg = smtp_connection.ehlo() assert response == 250
PV248 Python
288/306
December 5, 2019
Property Testing
• writing test inputs is tedious
• sometimes, we can generate them instead
• useful for general properties like
© idempotency (e.g. serialize + deserialize) o invariants (output is sorted,...) o code does not cause exceptions
PV248 Python
289/306
December 5, 2019
Using hypothesis
• property-based testing for Python
• has strategies to generate basic data types © int, str, diet, list, set,...
• compose built-in generators to get custom types
• integrated with pytest
PV248 Python
290/306
December 5, 2019
import hypothesis
import hypothesis.strategies as s
^hypothesis.given(s.lists(s.integers())) def test_sorted(x):
assert sorted(x) == x It should fail
{hypothesis.given(x=s.integers(), y=s.integers()) def test_cancel(x, y):
assert (x + y) - y == x # looks okay
PV248 Python
291/306
December 5, 2019
Going Quick and Dirty
• goal: minimize time spent on testing
• manual testing usually loses
© but it has almost 0 initial investment
• if you can write a test in 5 minutes, do it
• useful for testing small scripts
PV248 Python
292/306
December 5, 2019
Shell 101
• shell scripts are very easy to write
• they are ideal for testing 10 behaviour
• easily check for exit status: set -e
• see what is going on: set -x
• use dif f -u to check expected vs actual output
PV248 Python
293/306
December 5, 2019
Shell Test Example set -ex
python script.py < testl.in | tee out cliff -u testl.out out python script.py < test2.in | tee out cliff -u test2.out out
PV248 Python
294/306
December 5, 2019
Continuous Integration
• automated tests need to be executed
• with many tests, this gets tedious to do by hand
• CI builds and tests your project regularly © every time you push some commits
© every night (e.g. more extensive tests)
PV248 Python
295/306
December 5, 2019
CI: Travis
• runs in the cloud (CI as a service)
• trivially integrates with pytest
• virtualenv out of the box for python projects
• integrated with github
• configure in .travis.yml in your repo
PV248 Python
296/306
December 5, 2019
CI: GitLab
• GitLab has its own CI solution (similar to travis)
• also available at FI
• runs tests when you push to your gitlab
• drop a .gitlab-ci.yml in your repository
• automatic deployment into heroku &c.
PV248 Python
297/306
December 5, 2019
CI: Buildbot
• written in python/twisted
© basically a framework to build a custom CI tool
• self-hosted and somewhat complicated to set up © more suited for complex projects
© much more flexible than most CI tools
• distributed design
PV248 Python
298/306
December 5, 2019
CI: Jenkins
• another self-hosted solution, this time in Java © widely used and well supported
• native support for python projects (including pytest) © provides a dashboard with test result graphs &c.
© supports publishing sphinx-generated documentation
PV248 Python
299/306
December 5, 2019
Print-based Debugging
• no need to be ashamed, everybody does it
• less painful in interpreted languages
• you can also use decorators for tracing
• never forget to clean your program up again
PV248 Python
300/306
December 5, 2019
def debug(e):
f = sys._getframe(l)
v = eval(e, f.f.globals, f.f-locals)
1 = f,f_code.co_filename + ':'
1 += str(f.f_lineno) + ':'
print(l, e, 1=1, repr(v), file=sys.stderr)
x = 1
debug('x + 1')
PV248 Python
301/306
December 5, 2019
The Python Debugger
• run as python -m pdb program.py
• there's a built-in help command
• next steps through the program
• break to set a breakpoint
• cont to run until end or a breakpoint
PV248 Python
302/306
December 5, 2019
What is Profiling
• measurement of resource consumption
• essential info for optimising programs
• answers questions about bottlenecks
© where is my program spending most time?
© less often: how is memory used in the program
PV248 Python
303/306
December 5, 2019
Why Profiling
• 'blind' optimisation is often misdirected
© it is like fixing bugs without triggering them © program performance is hard to reason about
• tells you exactly which point is too slow
© allows for best speedup with least work
PV248 Python
304/306
December 5, 2019
Profiling in Python
• provided as a library, cProf ile
© alternative: profile is slower, but more flexible
• run as python -m cProfile program.py
• outputs a list of lines/functions and their cost
• use cProfile.runQ to profile a single expression
PV248 Python
305/306
December 5, 2019
It python -m cProfile -s time fib.py
ncalls tottime percall file:line(function)
13638/2 0.032 0.016 fib.py:l(fib_rec)
2 0.000 0.000 {builtins.print}
2 0.000 0.000 fib.py 5(fib_mem)
PV248 Python
306/306
December 5, 2019