PV248 Python Petr Ročkai December 5, 2019 Programming vs Languages • python is unobtrusive (by design) • if you can program, you can program in python • there are idiosyncracies (of course) • but you will mostly get by PV248 Python 2/306 December 5, 2019 Programming vs Jobs • we all want to write beautiful programs © but you didn't sleep for 2 nights © and this thing is going into production tomorrow • sometimes you get a chance to clean up later © and sometimes you don't PV248 Python 3/306 December 5, 2019 Engineering Flowchart does it move? yes r-+\ should it? p-------- no no ^-+\ should it? yesj nof —►• yes L duct tape no problem WD 40 Python makes for decent duct tape and WD 40. PV248 Python 4/306 December 5, 2019 In This Course • you will not learn to write beautiful programs • we will try to do things with minimum effort © perfect is the enemy of good • ugly comes in shades © you should always write passable code © there is a balance to strike PV248 Python 5/306 December 5, 2019 ... ugly, cont'd • there are two main schools of writing software o do the right thing o worse is better • https://www.jwz.org/doc/worse-is-better.html PV248 Python 6/306 December 5, 2019 The Right Thing • simplicity: interface first, implementation second • correctness: required • consistency: required • completeness: more important than simplicity PV248 Python 7/306 December 5, 2019 Worse is Better • simplicity: implementation first • correctness: simplicity goes first • consistency: less important than both • completeness: least important PV248 Python 8/306 December 5, 2019 Design Schools • there are pros and cons to both • right thing is often expensive • worse is better often wins • which one do you think python belongs to? PV248 Python 9/306 December 5, 2019 Disclaimer • I am not a python programmer • please don't ask sneaky language-lawyer questions Goals • learn to use python in practical situations • have a look at existing packages and what they can do • code up some cool stuff, have fun PV248 Python 10/306 December 5, 2019 Organisation • the lecture and the seminars are every other week • that's 7 lectures + 7 seminars • there will be 6 homework assignments • seminar attendance is semi-compulsory PV248 Python 11/306 December 5, 2019 Homework Grading • grading will be fully automatic © performed every Thursday at midnight © starting 7 days after the assignment is given • assignments are binary: pass/fail • passing tests early gets you bonus points PV248 Python 12/306 December 5, 2019 Obtaining Points • you can get up to © 12 points for assignments o 6 points for passing tests early © 2 points for seminar attendance © 3 points for peer reviews © 1 point for activity in the seminar • you need 16 points to pass PV248 Python 13/306 December 5, 2019 Semester Plan 1. Object and Memory Model 2. Text, JSON, SQL and Persistence 3. Advanced Language Constructs 4. Numeric & Symbolic Math, Statistics 5. Communication, HTTP, asyncio 6. Testing, Debugging, Profiling & Pitfalls 7. Quantum Computation PV248 Python 14/306 December 5, 2019 Part 1: Object and Memory Model PV248 Python 15/306 December 5, 2019 Objects • the basic 'unit' of OOP • they bundle data and behaviour • provide encapsulation • make code re-use easier • also known as 'instances' PV248 Python 16/306 December 5, 2019 Classes • templates for objects (class Foo: pass) • each (python) object belongs to a class • classes themselves are also objects • calling a class creates an instance © my_foo = FooQ PV248 Python 17/306 December 5, 2019 Poking at Classes • {}.__class__ • {}.__class_____class— • (0).__class— • [].__class__ • compare type(0), etc. • n = numbers.Number(); n.__class__ PV248 Python 18/306 December 5, 2019 Types vs Objects • class system is a type system • 'duck typing': quacks, walks like a duck • since python 3, types are classes • everything is dynamic in python o you can create new classes at runtime o you can pass classes as function parameters PV248 Python 19/306 December 5, 2019 Encapsulation • objects hide implementation details • classic types structure data o objects also structure behaviour • facilitates weak coupling PV248 Python 20/306 December 5, 2019 Weak Coupling • coupling is a degree of interdependence • more coupling makes things harder to change © it also makes reasoning harder • good programs are weakly coupled • cf. modularity composability PV248 Python 21/306 December 5, 2019 Polymorphism • objects are (at least in python) polymorphic • different implementation, same interface • only the interface matters for composition • facilitates genericity and code re-use • cf. 'duck typing' PV248 Python 22/306 December 5, 2019 Generic Programming • code re-use often saves time © not just coding but also debugging © re-usable code often couples weakly • but not everything that can be re-used should be © code can be too generic © and too hard to read PV248 Python 23/306 December 5, 2019 Attributes • data members of objects • each instance gets its own copy • like variables scoped to object lifetime • they get names and values PV248 Python 24/306 December 5, 2019 Methods • functions (procedures) tied to objects • they can access the object (self) • implement the behaviour of the object • their signatures (usually) provide the interface • methods are also objects PV248 Python 25/306 December 5, 2019 Class and Instance Methods • methods are usually tied to instances • recall that classes are also objects • class methods work on the class (els) • static methods are just namespaced functions • decorators (fclassmethod, @staticmethod PV248 Python 26/306 December 5, 2019 i i • class Ellipse( Shape ): ... • usually encodes an is-a relationship PV248 Python 27/306 December 5, 2019 Multiple Inheritance • more than one base class is possible • many languages restrict this • python allows general M-I © class Bat( Mammal, Winged ): pass • 'true' M-I is somewhat rare © typical use cases: mixins and interfaces PV248 Python 28/306 December 5, 2019 Mixins • used to pull in implementation © not part of the is-a relationship © by convention, not enforced by the language • common bits of functionality o e.g. implement __gt__, __eq__ &c. using __lt__ © you only need to implement __lt__ in your class PV248 Python 29/306 December 5, 2019 Interfaces • realized as 'abstract' classes in python © just throw a Notlmplemented exception © document the intent in a docstring • participates in is-a relationships • partially displaced by duck typing © more important in other languages (think Java) PV248 Python 30/306 December 5, 2019 Composition • attributes of objects can be other objects © (also, everything is an object in python) • encodes a has-a relationship © a circle has a center and a radius © a circle is a shape PV248 Python 31/306 December 5, 2019 Constructors • this is the __init__ method • initializes the attributes of the instance • can call superclass constructors explicitly © not called automatically (unlike C++, Java) © MySuperClass.__init__( self ) © super() .__init__ (if unambiguous) PV248 Python 32/306 December 5, 2019 Class and Object Dictionaries • most objects are basically dictionaries • try e.g. foo.__dict__ (for a suitable foo) • saying foo.x means foo.__dict__["x"] © if that fails, type(foo) .__dict__["x"] follows o then superclasses of type (foo), according to MRO PV248 Python 33/306 December 5, 2019 Writing Classes class Person: def __init__( self, name ): self.name = name def greet( self ): print( "hello " + self.name ) p = Person( "you" ) p.greet() PV248 Python 34/306 December 5,2019 Modules in Python • modules are just normal .py files • import executes a file by name © it will look into system-defined locations © the search path includes the current directory © they typically only define classes & functions • import sys -»lets you use sys .argv • from sys import argv -»you can write just argv PV248 Python 35/306 December 5, 2019 Functions • top-level functions/procedures are possible • they are usually 'scoped' via the module system • functions are also objects © try print. __class__ (or type (print)) • some functions are built in (print, len,...) PV248 Python 36/306 December 5, 2019 Memory • most program data is stored in 'memory' © an array of byte-addressable data storage © address space managed by the OS © 32 or 64 bit numbers as addresses • typically backed by RAM PV248 Python 37/306 December 5, 2019 Language vs Computer • programs use high-level concepts o objects, procedures, closures © values can be passed around • the computer has a single array of bytes © and, well, a bunch of registers PV248 Python 38/306 December 5, 2019 Memory Management • deciding where to store data • high-level objects are stored in flat memory © they have a given (usually fixed) size © can contain references to other objects © have limited lifespan PV248 Python 39/306 December 5, 2019 Memory Management Terminology • object: an entity with an address and size © not the same as language-level object • lifetime: when is the object valid © live: references exist to the object © dead: the object unreachable - garbage PV248 Python 40/306 December 5, 2019 Memory Management by Type • manual: malloc and free in C • static automatic © e.g. stack variables in C and C++ • dynamic automatic © pioneered by LISP, widely used PV248 Python 41/306 December 5, 2019 Automatic Memory Management • static vs dynamic © when do we make decisions about lifetime © compile time vs run time • safe vs unsafe © can the program read unused memory? PV248 Python 42/306 December 5, 2019 Object Lifetime • the time between malloc and free • another view: when is the object needed © often impossible to tell © can be safely over-approximated © at the expense of memory leaks PV248 Python 43/306 December 5, 2019 Static Automatic • usually binds lifetime to lexical scope • no passing references up the call stack o may or may not be enforced • no lexical closures • examples: C, C++ PV248 Python 44/306 December 5, 2019 Dynamic Automatic • over-approximate lifetime dynamically • usually easiest for the programmer © until you need to debug a space leak • reference counting, mark & sweep collectors • examples: Java, almost every dynamic language PV248 Python 45/306 December 5, 2019 Reference Counting • attach a counter to each object • whenever a reference is made, increase • whenever a reference is lost, decrease • the object is dead when the counter hits 0 • fails to reclaim reference cycles PV248 Python 46/306 December 5, 2019 Mark and Sweep • start from a root set (in-scope variables) • follow references, mark every object encountered • sweep: throw away all unmarked memory • usually stops the program while running • garbage is retained until the GC runs PV248 Python 47/306 December 5, 2019 Memory Management in CPython • primarily based on reference countin • optional mark & sweep collector © enabled by default © configure via import gc © reclaims cycles PV248 Python 48/306 December 5, 2019 Refcounting Advantages • simple to implement in a 'managed' language • reclaims objects quickly • no need to pause the program • easily made concurrent PV248 Python 49/306 December 5, 2019 Refcounting Problems • significant memory overhead • problems with cache locality • bad performance for data shared between threads • fails to reclaim cyclic structures PV248 Python 50/306 December 5, 2019 Data Structures • an abstract description of data • leaves out low-level details • makes writing programs easier • makes reading programs easier, too PV248 Python 51/306 December 5, 2019 Building Data Structures • there are two kinds of types in python © built-in, implemented in C © user-defined (includes libraries) • both kinds are based on objects © but built-ins only look that way PV248 Python 52/306 December 5, 2019 Mutability • some objects can be modified © we say they are mutable © otherwise, they are immutable • immutability is an abstraction © physical memory is always mutable • in python, immutability is not 'recursive' PV248 Python 53/306 December 5, 2019 Built-in: int • arbitrary precision integer © no overflows and other nasty behaviour • it is an object, i.e. held by reference © uniform with any other kind of object © immutable • both of the above make it slow © machine integers only in C-based modules PV248 Python 54/306 December 5, 2019 Additional Numeric Objects • bool: True or False © how much is True + True? o is 0 true? is empty string? • numbers.Real: floating point numbers • numbers.Complex: a pair of above PV248 Python 55/306 December 5, 2019 Built-in: bytes • a sequence of bytes (raw data) • exists for efficiency reasons o in the abstract is just a tuple • models data as stored in files © or incoming through a socket © or as stored in raw memory PV248 Python 56/306 December 5, 2019 Properties of bytes • can be indexed and iterated o both create objects of type int © try this sequence: id(x[l]), id(x[2]) • mutable version: bytearray o the equivalent of C char arrays PV248 Python 57/306 December 5, 2019 Built-in: str • immutable Unicode strings © not the same as bytes o bytes must be decoded to obtain str o (and str encoded to obtain bytes) • represented as utf-8 sequences in CPython © implemented in PyCompactUnicodeObject PV248 Python 58/306 December 5, 2019 Built-in: tuple • an immutable sequence type © the number of elements is fixed © so is the type of each element • but elements themselves may be mutable © x = [] then y = (x, 0) © x.append(l) -»y == ([l], 0) • implemented as a C array of object references PV248 Python 59/306 December 5, 2019 Built-in: list • a mutable version of tuple © items can be assigned x[3] = 5 © items can be append-ed • implemented as a dynamic array © many operations are amortised 0(1) © insert is 0(ri) PV248 Python 60/306 December 5, 2019 Built-in: diet • implemented as a hash table • some of the most performance-critical code © dictionaries appear everywhere in python © heavily hand-tuned C code • both keys and values are objects PV248 Python 61/306 December 5, 2019 Hashes and Mutability • dictionary keys must be hashable © this implies recursive immutability • what would happen if a key is mutated? © most likely the hash would change © all hash tables with the key become invalid © this would be very expensive to fix PV248 Python 62/306 December 5, 2019 Built-in: set • implements the math concept of a set • also a hash table, but with keys only © a separate C implementation • mutable - items can be added © but they must be hashable © hence cannot be changed PV248 Python 63/306 December 5, 2019 Built-in: frozenset • an immutable version of set • always hashable (since all items must be) © can appear in set or another frozenset © can be used as a key in diet • the C implementation is shared with set PV248 Python 64/306 December 5, 2019 Efficient Objects: __slots__ • fixes the attribute names allowed in an object • saves memory: consider 1-attribute object o with __dict__: 56 + 112 bytes © with __slots__: 48 bytes • makes code faster: no need to hash anything © more compact in memory -» better cache efficiency PV248 Python 65/306 December 5, 2019 Part 2: Text, JSON, SQL and Persistence PV248 Python 66/306 December 5, 2019 Transient Data • lives in program memory • data structures, objects • interpreter state • often implicit manipulation • more on this next week PV248 Python 67/306 December 5, 2019 Persistent Data • (structured) text or binary files • relational (SQL) databases • object and 'flat' databases (NoSQL) • manipulated explicitly PV248 Python 68/306 December 5, 2019 Persistent Storage • 'local' file system o stored on HDD, SSD,... © stored somwhere in a local network • 'remote', using an application-level protocol © local or remote databases o cloud storage &c. PV248 Python 69/306 December 5, 2019 Reading Files • opening files: open(1 file.txt1, 'r') • files can be iterated f = open( 'file.txt', 'r1 ) for line in f: print( line ) PV248 Python 70/306 December 5, 2019 Resource Acquisition • plain open is prone to resource leaks © what happens during an exception? © holding a file open is not free • pythonic solution: with blocks o defined in PEP 343 o binds resources to scopes PV248 Python 71/306 December 5, 2019 Detour2: PEP • PEP stands for Python Enhancement Proposal • akin to RFC documents managed by IETF • initially formalise future changes to python © later serve as documentation for the same • PV248 Python 72/306 December 5, 2019 Using with with open('/etc/passwd', 'r') as f: for line in f: do_stuff( line ) • still safe if do_stuff raises an exception PV248 Python 73/306 December 5, 2019 Finalizers • there is a __del__ method • but it is not guaranteed to run © it may run arbitrarily late © or never • not very good for resource management PV248 Python 74/306 December 5, 2019 Context Managers • with has an associated protocol • you can use with on any context manager • which is an object with __enter__ and __exit__ • you can create your own PV248 Python 75/306 December 5, 2019 Representing Text • ASCII: one byte = one character © total of 127 different characters © not very universal • 8-bit encodings: 255 characters • multi-byte encodings for non-Latin scripts PV248 Python 76/306 December 5, 2019 Unicode • one character encoding to rule them all • supports all extant scripts and writing systems © and a whole bunch of dead scripts, too • collation, segmentation, comparison,... • approx. 137000 code points PV248 Python 77/306 December 5, 2019 Code Point • basic unit of encoding characters • letters, punctuation, symbols • combining diacritical marks • not the same thing as a character • code points range from 1 to 10FFFF PV248 Python 78/306 December 5, 2019 Unicode Encodings • deals with representing code points • UCS = Universal Coded Character Set © fixed-length encoding o two variants: UCS-2 (16 bit) and UCS-4 (32 bit) • UTF = Unicode Transformation Format © variable-length encoding o variants: UTF-8, UTF-16 and UTF-32 PV248 Python 79/306 December 5, 2019 Grapheme • technically 'extended grapheme cluster' • a logical character, as expected by users © encoded using 1 or more code points • multiple encodings of the same grapheme o e.g. composed vs decomposed o U+0041 U+0300 vs U+0C00: A vs A PV248 Python 80/306 December 5, 2019 Segmentation • breaking text into smaller units o graphemes, words and sentences • algorithms defined by the Unicode spec o Unicode Standard Annex #29 o graphemes and words are quite reliable o sentences not so much (too much ambiguity) PV248 Python 81/306 December 5, 2019 Normal Form • Unicode defines 4 canonical (normal) forms o NFC, NFD, NFKC, NFKD © NFC = Normal Form Composed © NFD = Normal Form Decomposed • K variants = looser, lossy conversion • all normalization is idempotent • NFC does not give you 1 code point per grapheme PV248 Python 82/306 December 5, 2019 str vs bytes • iterating bytes gives individual bytes o indexing is fast - fixed-size elements • iterating str gives code points o slightly slower, because it uses UTF-8 o does not iterate over graphemes • going back and forth: str. encode, bytes. decode PV248 Python 83/306 December 5, 2019 Python vs Unicode • no native support for Unicode segmentation © hence no grapheme iteration or word splitting • convert everything into NFC and hope for the best © unicodedata.normalize() © will sometimes break (we'll discuss regexes in a bit) © most people don't bother © correctness is overrated -> worse is better PV248 Python 84/306 December 5, 2019 Regular Expressions • compiling: r = re.compile( r"key: (.*)" ) • matching: m = r.match( "key: some value" ) • extracting captures: print( m.group( 1 ) ) © prints some value • substitutions: s2 = re.sub( r"\s*$", ", si ) © strips all trailing whitespace in si PV248 Python 85/306 December 5, 2019 Detour: Raw String Literals • the r in r"..." stands for raw (not regex) • normally, \ is magical in strings © but \ is also magical in regexes o nobody wants to write \\s &c. © not to mention \\\\ to match a literal \ • not super useful outside of regexes PV248 Python 86/306 December 5, 2019 Detour2: Other Literal Types • byte strings: b"abc" -> bytes • formatted string literals: f"x {y}" x = 12 print( f"x = {x}" ) • triple-quote literals:xy PV248 Python 87/306 December 5, 2019 Regular Expressions vs Unicode import re s = "\u004l\u0300" 11 A t = "\u00c0" 11 Ä print( s, t ) print( re.match( s ), re.match( t ) ) print( re.match( s ), re.match( t ) ) print( re.match( "Ä", s ), re.match( "Ä", t ) ) PV248 Python 88/306 December 5, 2019 Regexes and Normal Forms • some of the problems can be fixed by NFC © some go away completely (literal Unicode matching) © some become rarer (the and "\w" problems) • most text in the wild is already in NFC © but not all of it © case in point: filenames on macOS (NFD) PV248 Python 89/306 December 5, 2019 Decomposing Strings • recall that str is immutable • splitting: str.split(':') © None = split on any whitespace • split on first delimiter: partition • better whitespace stripping: s2 = si.strip() © also lstripQ and rstripQ PV248 Python 90/306 December 5, 2019 Searching and Matching • startswith and endswith © often convenient shortcuts • find = index © generic substring search PV248 Python 91/306 December 5, 2019 Building Strings • format literals and str. format • str. replace - substring search and replace • str .join - turn lists of strings into a string PV248 Python 92/306 December 5, 2019 JSON • structured, text-based data format • atoms: integers, strings, booleans • objects (dictionaries), arrays (lists) • widely used around the web &c. • simple (compared to XML or YAML) PV248 Python 93/306 December 5, 2019 JSON: Example { "composer": [ "Bach, Johann Sebastian" ], "key": "g", "voices": { "1": "oboe", "2": "bassoon" } } PV248 Python 94/306 December 5,2019 JSON: Writing • printing JSON seems straightforward enough • but: double quotes in strings • strings must be properly \-escaped during output • also pesky commas • keeping track of indentation for human readability • better use an existing library: 'import jsonN PV248 Python 95/306 December 5, 2019 JSON in Python • json.dumps = short for dump to string • python dict/list/str/... data comes in • a string with valid JSON comes out Workflow • just convert everything to diet and list • run json.dumps or json.dump( data, file ) PV248 Python 96/306 December 5, 2019 Python Example d = 0 d["composer"] = ["Bach, Johann Sebastian"] d["key"] = "g" d["voices"] = { 1: "oboe", 2: "bassoon" } json.dump( d, sys.stdout, indent=4 ) Beware: keys are always strings in JSON PV248 Python 97/306 December 5, 2019 Parsing JSON • import json • j son. load is the counterpart to j son. dump from above © de-serialise data from an open file o builds lists, dictionaries, etc. • j son. loads corresponds to j son. dumps PV248 Python 98/306 December 5, 2019 XML • meant as a lightweight and consistent redesign of SGML o turned into a very complex format • heaps of invalid XML floating around o parsing real-world XML is a nightmare o even valid XML is pretty challenging PV248 Python 99/306 December 5, 2019 XML: Example Order OrderDate="1999-10-20">
Ellen Adams 123 Maple Street
Lawnmower l PV248 Python 100/306 December 5, 2019 XML: Another Example <0BSAH>25 bodů 72873 20160111104208 395879 PV248 Python 101/306 December 5, 2019 XML Features • offers extensible, rich structure o tags, attributes, entities o suited for structured hierarchical data • schemas: use XML to describe XML o allows general-purpose validators o self-documentins to a degree PV248 Python 102/306 December 5, 2019 XML vs JSON • both work best with trees • JSON has basically no features o basic data structures and that's it • JSON data is ad-hoc and usually undocumented o but: this often happens with XML anyway PV248 Python 103/306 December 5, 2019 XML Parsers • DOM = Document Object Model • SAX = Simple API for XML • expat = fast SAX-like parser (but not SAX) • ElementTree = DOM-like but more pythonic PV248 Python 104/306 December 5, 2019 XML: DOM • read the entire XML document into memory • exposes the AST (Abstract Syntax Tree) • allows things like XPath and CSS selectors • the API is somewhat clumsy in python PV248 Python 105/306 December 5, 2019 XML: SAX • event-driven XML parsing • much more efficient than DOM o but often harder to use • only useful in python for huge XML files o otherwise just use ElementTree PV248 Python 106/306 December 5, 2019 XML: ElementTree for child in root: print child.tag, child.attrib It Order { OrderDate: "1999-10-20" } • supports tree walking, XPath • supports serialization too PV248 Python 107/306 December 5, 2019 NoSQL / Non-relational Databases • umbrella term for a number of approaches © flat key/value and column stores © document and graph stores • no or minimal schemas • non-standard query languages PV248 Python 108/306 December 5, 2019 Key-Value Stores • usually very fast and very simple • completely unstructured values • keys are often database-global © workaround: prefixes for namespacin o or: multiple databases PV248 Python 109/306 December 5, 2019 NoSQL & Python • redis (redis-py) module (Redis is Key-Value) • memcached (another Key-Value store) • PyMongo for talking to MongoDB (document-oriented) • CouchDB (another document-oriented store) • neo4j or cayley (module pyley) for graph structures PV248 Python 110/306 December 5, 2019 SQL and RDBMS • SQL = Structured Query Language • RDBMS = Relational DataBase Management System • SQL is to NoSQL what XML is to JSON • heavily used and extremely reliable PV248 Python 111/306 December 5, 2019 SQL: Example select name, grade from student; select name from student where grade < 'C'; insert into student ( name, grade ) values ( 'Random X. Student', 'C ); select * from student join enrollment on student.id = enrollment.student join group on group.id = enrollment.group; PV248 Python 112/306 December 5, 2019 SQL: Relational Data • JSON and XML are hierarchical o or built from functions if you like • SQL is relational o relations = generalized functions o can capture more structure o much harder to efficiently process PV248 Python 113/306 December 5, 2019 SQL: Data Definition • mandatory, unlike XML or JSON • gives the data a rather rigid structure • tables (relations) and columns (attributes) • static data types for columns • additional consistency constraints PV248 Python 114/306 December 5, 2019 SQL: Constraints • help ensure consistency of the data • foreign keys: referential integrity o ensures there are no dangling references o but: does not prevent accidental misuse • unique constraints • check constraints: arbitrary consistency checks PV248 Python 115/306 December 5, 2019 SQL: Query Planning • an RDBMS makes heavy use of indexing o using B trees, hashes and similar techniques o indices are used automatically • all the heavy lifting is done by the backend o highly-optimized, low-level code o efficient handling of larse data PV248 Python 116/306 December 5, 2019 SQL: Reliability and Flexibility • most RDBMS give ACID guarantees o transparently solves a lot of problems o basically impossible with normal files • support for schema alterations o alter table and similar o nearly impossible in ad-hoc systems PV248 Python 117/306 December 5, 2019 SQLite • lightweight in-process SQL engine • the entire database is in a single file • convenient python module, sqlite3 • stepping stone for a "real" database PV248 Python 118/306 December 5, 2019 Other Databases • you can talk to most SQL DBs using python • postgresql (psycopg2,...) • mysql / mariadb (mysql-python, mysql-connector,...) • big & expensive: Oracle (cx_oracle), DB2 (pyDB2) • most of those are much more reliable than SQLite PV248 Python 119/306 December 5, 2019 SQL Injection sql = "SELECT * FROM t WHERE name = 111 + n + • the above code is bad, never do it • consider the following n = "x1; drop table students —" n = "x1; insert into passwd (user, pass) ..." PV248 Python 120/306 December 5, 2019 Avoiding SQL Injection • use proper SQL-building APIs © this takes care of escaping internally • templates like insert ... values (?, ?) o the ? get safely substituted by the module o e.g. the execute method of a cursor PV248 Python 121/306 December 5, 2019 PEP 249 • informational PEP, for library writers • describes how database modules should behave © ideally, all SQL modules have the same interface © makes it easy to swap a database backend • but: SQL itself is not 100% portable PV248 Python 122/306 December 5, 2019 SQL Pitfalls • sqlite does not enforce all constraints o you need to pragma f oreign_keys = on • no portable syntax for autoincrement keys • not all (column) types are supported everywhere • no portable way to get the key of last insert PV248 Python 123/306 December 5, 2019 More Resources & Stuff to Look Up • SQL: https://www.w3schools.com/sql/ • https://docs.python.Org/3/library/sqlite3.html • Object-Relational Mapping • SQLAlchemy: constructing portable SQL PV248 Python 124/306 December 5, 2019 Part 3: Advanced Constructs PV248 Python 125/306 December 5, 2019 Callable Objects • user-defined functions (module-level def) • user-defined methods (instance and class) • built-in functions and methods • class objects • objects with a __call__ method PV248 Python 126/306 December 5, 2019 User-defined Functions • come about from a module-level def • metadata: __doc__, __name__, __module__ • scope: __globals__, __closure__ • arguments: __defaults__, __kwdefaults__ • type annotations: _.annotations__ • the code itself: __code__ PV248 Python 127/306 December 5, 2019 Positional and Keyword Arguments • user-defined functions have positional arguments • and keyword arguments © print("hello", file=sys.stderr) © arguments are passed by name © which style is used is up to the caller • variadic functions: def foo(*args, **kwargs) © args is a tuple of unmatched positional args © kwargs is a diet of unmatched keyword args PV248 Python 128/306 December 5, 2019 Lambdas • def functions must have a name • lambdas provide anonymous functions • the body must be an expression • syntax: lambda x: print("hello", x) • standard user-defined functions otherwise PV248 Python 129/306 December 5, 2019 Instance Methods • comes about as object.method © print(x.foo) -> • combines the class, instance and function itself • __func__ is a user-defined function object • let bar = x.foo, then © x.fooQ -> bar.__func__(bar.__self__) PV248 Python 130/306 December 5, 2019 Iterators • objects with __next__ (since 3.x) © iteration ends on raise Stoplteration • iterable objects provide __iter__ © sometimes, this is just return self © any iterable can appear in for x in iterable PV248 Python 131/306 December 5, 2019 class Foolter: def __init__(self): self.x = 10 def __iter__(self): return self def __next__(self): if self.x: self.x -= 1 else: raise Stoplteration return self.x PV248 Python 132/306 December 5, 2019 Generators (PEP 255) • written as a normal function or method • they use yield to generate a sequence • represented as special callable objects © exist at the C level in CPython def foo(*lst): for i in 1st: yield i + 1 list(foo(l, 2)) II prints [2, 3] PV248 Python 133/306 December 5, 2019 yield from • calling a generator produces a generator object • how do we call one generator from another? • same as for x in foo(): yield x def bar(*lst): yield from foo(*lst) yield from foo(*lst) list(bar(l, 2)) II prints [2, 3, 2, 3] PV248 Python 134/306 December 5, 2019 Native Coroutines (PEP 492) • created using async def (since Python 3.5) • generalisation of generators © yield from is replaced with await © an __await__ magic method is required • a coroutine can be suspended and resumed PV248 Python 135/306 December 5, 2019 Coroutine Scheduling • coroutines need a scheduler • one is available from asyncio.get_event_loop() • along with many coroutine building blocks • coroutines can actually run in parallel © via asyncio. create_task (since 3.7) © via asyncio.gather PV248 Python 136/306 December 5, 2019 Async Generators (PEP 525) • async def + yield • semantics like simple generators • but also allows await • iterated with async for © async for runs sequentially PV248 Python 137/306 December 5, 2019 Decorators • written as @decor before a function definition • decor is a regular function (def decor(f)) © f is bound to the decorated function © the decorated function becomes the result of decor • classes can be decorated too • you can 'create' decorators at runtime © @mkdecor("moo") (mkdecor returns the decorator) o you can stack decorators PV248 Python 138/306 December 5, 2019 def decor(f): return lambda: print("bar") def mkdecor(s): return lambda g: lambda: print(s) @decor def foo(f): print("foo") @mkdecor("moo") def moo(f): print("foo") It foo() prints "bar", moo() prints "moo" PV248 Python 139/306 December 5, 2019 List Comprehension • a concise way to build lists • combines a filter and a map [ 2 * x for x in range(10) ] [ x for x in range(l0) if x % 2 == 1 ] [ 2 * x for x in range(l0) if x 96 2 == 1 ] [ (x, y) for x in range(3) for y in range(2) ] PV248 Python 140/306 December 5, 2019 Operators • operators are (mostly) syntactic sugar • x < y rewrites to x.__lt__(y) • is and is not are special © are the operands the same object? © also the ternary (conditional) operator PV248 Python 141/306 December 5, 2019 Non-Operator Builtins • len(x) -»x.__len__() (length) • abs(x)-> x.__abs__() (magnitude) • str(x) -»x.__str__() (printing) • repr(x) -> x.__repr__() (printing for eval) • bool(x) and if x: x.__bool__() PV248 Python 142/306 December 5, 2019 Arithmetic • a standard selection of operators • / is floating point, //is integral • += and similar are somewhat magical © x += y->x = x.__iadd__(y) if defined © otherwise x = x.__add__(y) PV248 Python 143/306 December 5, 2019 x - 7 # an int is immutable x += 3 # works, x = 10, id(x) changes 1st = [7, 3] lst[0] +-3 # works too, id(lst) stays same tup = (7, 3) # a tuple is immutable tup += (1, 1) # still works (id changes) tup[0] +-3 # fails PV248 Python 144/306 December 5, 2019 Relational Operators • operands can be of different types • equality: !=, == o by default uses object identity • ordering: <, <=, >, >= (TypeError by default) • consistency is not enforced PV248 Python 145/306 December 5, 2019 Relational Consistency • __eq__ must be an equivalence relation • x.__ne__(y) must be the same as not x.__eq__(y) • __lt__ must be an ordering relation © compatible with __eq__ © consistent with each other • each operator is separate (mixins can help) o or perhaps a class decorator PV248 Python 146/306 December 5, 2019 Collection Operators • in is also a membership operator (outside for) © implemented as __contains__ • indexing and slicing operators © del x[y] -> x.__delitem__(y) © x[y] -»x.__getitem__(y) © x[y] = z -> x.__setitem__(y, z) PV248 Python 147/306 December 5, 2019 Conditional Operator • also known as a ternary operator • written x if cond else y © in C: cond ? x : y • forms an expression, unlike if © can e.g. appear in a lambda o or in function arguments, &c. PV248 Python 148/306 December 5, 2019 Concurrency & Parallelism • threading - thread-based parallelism • multiprocessing • concurrent - future-based programming • subprocess • sched, a general-purpose event scheduler • queue, for sending objects between threads PV248 Python 149/306 December 5, 2019 Threading • low-level thread support, module threading • Thread objects represent actual threads o threads provide start() and join() o the run() method executes in a new thread • mutexes, semaphores &c. PV248 Python 150/306 December 5, 2019 The Global Interpreter Lock • memory management in CPython is not thread-safe o Python code runs under a global lock o pure Python code cannot use multiple cores • C code usually runs without the lock o this includes numpy crunching PV248 Python 151/306 December 5, 2019 Multiprocessing • like threading but uses processes • works around the GIL © each worker process has its own interpreter • queued/sent objects must be pickled © see also: the pickle module © this causes substantial overhead © functions, classes &c. are pickled by name PV248 Python 152/306 December 5, 2019 Futures • like coroutine await but for subroutines • a Future can be waited for using f. result() • scheduled via concurrent. futures. Executor © Executor.map is like asyncio. gather © Executor. submit is like asyncio. create_task • implemented using process or thread pools PV248 Python 153/306 December 5, 2019 Exceptions • an exception interrupts normal control flow • it's called an exception because it is exceptional © never mind Stop Iteration • causes methods to be interrupted © until a matching except block is found © also known as stack unwinding PV248 Python 154/306 December 5, 2019 Life Without Exceptions int fd = socket( ... ); if ( fd < 0 ) ... /* handle errors */ if ( bind( fd, ... ) < 0 ) ... /* handle errors */ if ( listen( fd, 5 ) < 0 ) ... /* handle errors */ PV248 Python 155/306 December 5, 2019 With Exceptions try: sock = socket.socket( ... ) sock.bind( ... ) sock.listen( ... ) except ...: It handle errors PV248 Python 156/306 December 5, 2019 Exceptions vs Resources x = open( "file.txt" ) # stuff raise SomeError • who calls x.close() • this would be a resource leak PV248 Python 157/306 December 5, 2019 Using finally try: x = open( "file.txt" ) # stuff finally: x.close() • works, but tedious and error-prone PV248 Python 158/306 December 5, 2019 Using with with open( "file.txt" ) as f: It stuff • with takes care of the finally and close • with x as ysetsy = x.__enter__() © and calls x. __exit__ (...) when leaving the block PV248 Python 159/306 December 5, 2019 The @property decorator • attribute syntax is the preferred one in Python • writing useless setters and getters is boring class Foo: ^property def x(self): return 2 * self.a (flx.setter def x(self, v): self.a = v // 2 PV248 Python 160/306 December 5, 2019 Execution Stack • made up of activation frames • holds local variables • and return addresses • in dynamic languages, often lives in the heap PV248 Python 161/306 December 5, 2019 Variable Capture • variables are captured lexically • definitions are a dynamic / run-time construct © a nested definition is executed o creates a closure object • always by reference in Python © but can be by-value in other languages PV248 Python 162/306 December 5, 2019 Using Closures • closures can be returned, stored and called © they can be called multiple times, too © they can capture arbitrary variables • closures naturally retain state • this is what makes them powerful PV248 Python 163/306 December 5, 2019 Objects from Closures • so closures are essentially code + state • wait, isn't that what an object is? • indeed, you can implement objects using closures PV248 Python 164/306 December 5, 2019 The Role of GC • memory management becomes a lot more complicated • forget C-style 'automatic' stack variables • this is why the stack is actually in the heap • this can go as far as form reference cycles PV248 Python 165/306 December 5, 2019 Coroutines • coroutines are a generalisation of subroutines • they can be suspended and re-entered • coroutines can be closures at the same time • the code of a coroutine is like a function • a suspended coroutine is like an activation frame PV248 Python 166/306 December 5, 2019 Yield • suspends execution and 'returns' a value • may also obtain a new value (cf. send) • when re-entered, continue where we left off for i in range(5): yield i PV248 Python 167/306 December 5, 2019 Send • with yield, we have one-way communication • but in many cases, we would like two-way • a suspended coroutine is an object in Python o with a send method which takes a value o send re-enters the coroutine PV248 Python 168/306 December 5, 2019 Yield From and Await • yield from is mostly a generator concept • await basically does the same thing © call out to another coroutine © when it suspends, so does the entire stack PV248 Python 169/306 December 5, 2019 Suspending Native Coroutines • this is not actually possible o not with async-native syntax anyway • you need a yield o for that, you need a generator o use the types. coroutine decorator PV248 Python 170/306 December 5, 2019 Event Loop • not required in theory • useful also without coroutines • there is a synergistic effect © event loops make coroutines easier © coroutines make event loops easier PV248 Python 171/306 December 5, 2019 Part 4: Math and Statistics PV248 Python 172/306 December 5, 2019 Numbers in Python • recall that numbers are objects • a tuple of real numbers has 300% overhead © compared to a C array of float values © and 350% for integers • this causes extremely poor cache use • integers are arbitrary-precision PV248 Python 173/306 December 5, 2019 Math in Python • numeric data usually means arrays o this is inefficient in python • we need a module written in C © but we don't want to do that ourselves • enter the SciPy project o pre-made numeric and scientific packages PV248 Python 174/306 December 5, 2019 The SciPy Family • numpy: data types, linear algebra • scipy: more computational machinery • pandas: data analysis and statistics • matplotlib: plotting and graphing • sympy: symbolic mathematics PV248 Python 175/306 December 5, 2019 Aside: External Libraries • until now, we only used bundled packages • for math, we will need external libraries • you can use pip to install those © use pip install —user PV248 Python 176/306 December 5, 2019 Aside: The Python Package Index • colloquially known as PyPI (or cheese shop) © do not confuse with PyPy (Python in almost-Python) • both source packages and binaries © the latter known as wheels (PEP 427, 491) © previously python eggs • PV248 Python 177/306 December 5, 2019 Aside: Installing numpy • the easiest way may be with pip © this would be pip3 on aisa • linux distributions usually also have packages • another option is getting the Anaconda bundle • detailed instructions on https: //scipy. org PV248 Python 178/306 December 5, 2019 Arrays in numpy • compact, C-implemented data types • flexible multi-dimensional arrays • easy and efficient re-shaping o typically without copying the data PV248 Python 179/306 December 5, 2019 Entering Data • most data is stored in numpy. array • can be constructed from a list © a list of lists for 2D arrays • or directly loaded from / stored to a file © binary: numpy. load, numpy. save © text: numpy. loadtxt, numpy. savetxt PV248 Python 180/306 December 5, 2019 LAPACK and BLAS • BLAS is a low-level vector/matrix package • LAPACK is built on top of BLAS © provides higher-level operations © tuned for modern CPUs with multiple caches • both are written in Fortran © ATLAS and C-LAPACK are C implementations PV248 Python 181/306 December 5, 2019 Element-wise Functions • the basic math function arsenal • powers, roots, exponentials, logarithms • trigonometric (sin, cos, tan,...) • hyperbolic (sinh, cosh, tanh,...) • cyclometric (arcsin, arccos, arctan,...) PV248 Python 182/306 December 5, 2019 Matrix Operations in numpy • import numpy.linaig • multiplication, inversion, rank • eigenvalues and eigenvectors • linear equation solver • pseudo-inverses, linear least squares PV248 Python 183/306 December 5, 2019 Additional Linear Algebra in scipy • import scipy.linalg • LU, QR, polar, etc. decomposition • matrix exponentials and logarithms • matrix equation solvers • special operations for banded matrices PV248 Python 184/306 December 5, 2019 Where is my Gaussian Elimination? • used in lots of school linear algebra • but not the most efficient algorithm • a few problems with numerical stability • not directly available in numpy PV248 Python 185/306 December 5, 2019 Numeric Stability • floats are imprecise / approximate 0.1**2 ==0.01 # False 1 / ( 0.1**2 - 0.01 ) II 5.8-1017 • multiplication is not associative a = (0.1 * 0.1) * 10 b = 0.1 * (0.1 * 10) 1 / ( a - b ) II 7.21-1016 • iteration amplifies the errors PV248 Python 186/306 December 5, 2019 LU Decomposition • decompose matrix A into simpler factors • PA = LU where o Pisa permutation matrix o Lisa lower triangular matrix © U is an upper triangular matrix • fast and numerically stable PV248 Python 187/306 December 5, 2019 Uses for LU • equations, determinant, inversion,... • as an example o detU) = det(P_1)-det(L)-det(£7) o where det{U) = Uu and o det(L) = Uu PV248 Python 188/306 December 5, 2019 Numeric Math • float arithmetic is messy but incredibly fast • measured data is approximate anyway • stable algorithms exist for many things © and are available from libraries • we often don't care about exactness © think computer graphics, signal analysis,... PV248 Python 189/306 December 5, 2019 Symbolic Math • numeric math sucks for 'textbook' math • there are problems where exactness matters o pure math and theoretical physics • incredibly slow computation o but much cleaner interpretation PV248 Python 190/306 December 5, 2019 Linear Algebra in sympy • uses exact math © e.g. arbitrary precision rationals © and roots thereof © and many other computable numbers • wide repertoire of functions © including LU, QR, etc. decompositions PV248 Python 191/306 December 5, 2019 Exact Rationais in sympy from sympy import * a = QQ( 1 ) / 10 # QQ = rationals Matrix( [ [ sqrt( a**3 ), 0, 0 ], [ 0, sqrt( a**3 ), 0 ], [ 0, 0, 1 ] ] ).det() # result: 1/1000 PV248 Python 192/306 December 5, 2019 numpy for Comparison import numpy as np import numpy.linalg as la a = 0.1 la.det( [ [ np.sqrt( a**3 ), 0, 0 ], [ 0, np.sqrt( a**3 ), 0 ], [ 0, 0, 1 ] ] ) It result: 0.0010000000000000002 PV248 Python 193/306 December 5, 2019 General Solutions in Symbolic Math from sympy import * x = symbols( 'x' ) Matrix( [ [ x, 0, 0 ], [ 0, 1, 0 ], [ 0, 0, x ] ] ).det() it result: x ** 2 PV248 Python 194/306 December 5, 2019 Symbolic Differentation x = symbols( 'x' ) diff( x**2 + 2*x + log( x/2 ) ) 11 result: 2*x + 2 + l/x diff( x**2 * exp(x) ) 11 result: x**2 * exp( x ) + 2 * x * exp( x ) PV248 Python 195/306 December 5, 2019 Algebraic Equations solve( x**2 - 7 ) # result: [ -sqrt( 7 ), sqrt( 7 ) ] solve( x**2 - exp( x ) ) # result: [ -2 * LambertW( -1/2 ) ] solve( x**4 - x ) # result: [ 0, 1, -1/2 - sqrt(3) * 1/2, # -1/2 + sqrt(3) * 1/2 ] ; 1**2 = -1 PV248 Python 196/306 December 5, 2019 Ordinary Diffrential Equations f = Function( 'f ) dsolve( f( x ).diff( x ) ) # f'(x) = 0 # result: Eq( f( x ), CI ) dsolve( f( x ).diff( x ) - f(x) ) # f'(x) = f(x) # result: Eq( f( x ), CI * exp( x ) ) dsolve( f( x ).diff( x ) + f(x) ) # f'(x) = -«x) # result: Eq( f( x ), CI * exp( -x ) ) PV248 Python 197/306 December 5, 2019 Symbolic Integration integrate( x**2 ) # result: x**3 / 3 integrate( log( x ) ) II result: x * log( x ) - x integrate( cos( x ) ** 2 ) II result: x/2 + sin( x ) * cos( x ) / 2 PV248 Python 198/306 December 5, 2019 Numeric Sparse Matrices • sparse = most elements are 0 • available in scipy.sparse • special data types (not numpy arrays) © do not use numpy functions on those • less general, but more compact and faster PV248 Python 199/306 December 5, 2019 Fourier Transform • continuous: /(f) = /_ /(x) exp (—27rixf) dx senes: /(*) =Sr=-oocnexp(i 27rnx real series: rr\ &o . v^00 Z' ■ (2nnx\ . 7 /2nnx\\ o f(x) = -f + £n=1 (ansin[—p-) + bncos(^-p-J) PV248 Python 200/306 December 5, 2019 Discrete Fourier Transform • available in numpy.fft • goes between time and frequency domains • a few different variants are covered © real-valued input (for signals, rfft) © inverse transform (ifft, irfft) © multiple dimensions (fft2, fftn) PV248 Python 201/306 December 5, 2019 Polynomial Series • the numpy.polynomial package • Chebyshev, Hermite, Laguerre and Legendre © arithmetic, calculus and special-purpose operations © numeric integration using Guassian quadrature © fitting (polynomial regression) PV248 Python 202/306 December 5, 2019 Statistics in numpy • a basic statistical toolkit o averages, medians o variance, standard deviation o histograms • random sampling and distributions PV248 Python 203/306 December 5, 2019 Linear Regression • very fast model-fitting method © both in computational and human terms © quick and dirty first approximation • widely used in data interpretation o biology and sociology statistics © finance and economics, especially prediction PV248 Python 204/306 December 5, 2019 Polynomial Regression • higher-order variant of linear regression • can capture acceleration or deceleration • harder to use and interpret © also harder to compute • usually requires a model of the data PV248 Python 205/306 December 5, 2019 Interpolation • find a line or curve that approximates data • it must pass through the data points © this is a major difference to regression • more dangerous than regression © runs a serious risk of overfitting PV248 Python 206/306 December 5, 2019 Linear and Polynomial Regression, Interpolation • regressions using the least squares method © linear: numpy.linalg.lstsq © polynomial: numpy.polyfit • interpolation: scipy.interpolate o e.g. piecewise cubic splines © Lagrange interpolating polynomials PV248 Python 207/306 December 5, 2019 Pandas: Data Analysis • the Python equivalent of R o works with tabular data (CSV, SQL, Excel) o time series (also variable frequency) o primarily works with floating-point values • partially implemented in C and Cython PV248 Python 208/306 December 5, 2019 Pandas Series and DataFrame • Series is a single sequence of numbers • DataFrame represents tabular data © powerful indexing operators o index by column -> series o index by condition -> filtering PV248 Python 209/306 December 5, 2019 Pandas Example scores = [ ('Maxine', 12), ('John', 12), ('Sandra', 10) ] cols = [ 'name', 'score' ] df = pd.DataFrame( data=scores, columns=cols ) df['score'].max() # 12 df[ df['score'] >= 12 ] # Maxine and John PV248 Python 210/306 December 5, 2019 Part 5: Communication, HTTP & asyncio PV248 Python 211/306 December 5, 2019 Running Programs (the old way) • os. system is about the simplest © also somewhat dangerous - shell injection o you only get the exit code • os .popen allows you to read output of a program © alternatively, you can send input to the program © you can't do both (would likely deadlock anyway) © runs the command through a shell, same as os. system PV248 Python 212/306 December 5, 2019 Low-level Process API • POSIX-inherited interfaces (on POSIX systems) • os. exec: replace the current process • os. fork: split the current process in two • os. f orkpty: same but with a PTY PV248 Python 213/306 December 5, 2019 Detour: bytes vs str • strings (class str) represent text o that is, a sequence of Unicode points • files and network connections handle data © represented in Python as bytes • the bytes constructor can convert from str o e.g. b = bytes("hello", "utf8") PV248 Python 214/306 December 5, 2019 Running Programs (the new way) • you can use the subprocess module • subprocess can handle bidirectional 10 © it also takes care of avoiding 10 deadlocks o set input to feed data to the subprocess • internally run uses a Popen object © if run can't do it, Popen probably can PV248 Python 215/306 December 5, 2019 Getting subprocess Output • available via run since Python 3.7 • the run function returns a CompletedProcess • it has attributes stdout and stderr • both are bytes (byte sequences) by default • or str if text or encoding were set • available if you enabled capture_output PV248 Python 216/306 December 5, 2019 Running Filters with Popen • if you are stuck with 3.6, use Popen directly • set stdin in the constructor to PIPE • use the communicate method to send the input • this gives you the outputs (as bytes) PV248 Python 217/306 December 5, 2019 import subprocess from subprocess import PIPE input = bytes( "x\na\nb\ny", "utf8") p = subprocess.Popen(["sort"], stdin=PIPE, stdout=PIPE) out = p communicate(input=input) # out[0] is the stdout, out[l] is None PV248 Python 218/306 December 5, 2019 Subprocesses with asyncio • import asyncio.subprocess • create_subprocess_exec, like subprocess.run o but it returns a Process instance o Process has a communicate async method • can run things in background (via tasks) o also multiple processes at once PV248 Python 219/306 December 5, 2019 Protocol-based asyncio subprocesses • let loop be an implementation of the asyncio event loop • there's subprocess_exec and subprocess_shell o sets up pipes by default • integrates into the asyncio transport layer (see later) • allows you to obtain the data piece-wise • https://docs.python.Org/3/library/asyncio-protocol.html PV248 Python 220/306 December 5, 2019 Sockets • the socket API comes from early BSD Unix • socket represents a (possible) network connection • sockets are more complicated than normal files o establishing connections is hard o messages get lost much more often than file data PV248 Python 221/306 December 5, 2019 Socket Types • sockets can be internet or unix domain o internet sockets connect to other computers o Unix sockets live in the filesystem • sockets can be stream or datagram o stream sockets are like files (TCP) o you can write a continuous stream of data o datagram sockets can send individual messages (UDP) PV248 Python 222/306 December 5, 2019 Sockets in Python • the socket module is available on all major OSes • it has a nice object-oriented API o failures are propagated as exceptions o buffer management is automatic • useful if you need to do low-level networking o hard to use in non-blocking mode PV248 Python 223/306 December 5, 2019 Sockets and asyncio • asyncio provides sock_* to work with socket objects • this makes work with non-blocking sockets a lot easier • but your program needs to be written in async style • only use sockets when there is no other choice o asyncio protocols are both faster and easier to use PV248 Python 224/306 December 5, 2019 Hyper-Text Transfer Protocol • originally a simple text-based, stateless protocol • however o SSL/TLS, cryptography (https) © pipelining (somewhat stateful) © cookies (somewhat stateful in a different way) • typically between client and a front-end server • but also as a back-end protocol (web server to app server) PV248 Python 225/306 December 5, 2019 Request Anatomy • request type (see below) • header (text-based, like e-mail) • content Request Types • GET - asks the server to send a resource • HEAD - like GET but only send back headers • POST - send data to the server PV248 Python 226/306 December 5, 2019 Python and HTTP • both client and server functionality © import http.client © import http.server • TLS/SSL wrappers are also available © import ssl • synchronous by default PV248 Python 227/306 December 5, 2019 Serving Requests • derive from BaseHTTPRequestHandler • implement a do_6ET method • this gets called whenever the client does a GET • also available: do_HEAD, do_P0ST, etc. • pass the class (not an instance) to HTTPServer PV248 Python 228/306 December 5, 2019 Serving Requests (cont'd) • HTTPServer creates a new instance of your Handler • the BaseHTTPRequestHandler machinery runs • it calls your do_6ET etc. method • request data is available in instance variables o self.path, self.headers PV248 Python 229/306 December 5, 2019 Talking to the Client • HTTP responses start with a response code o self.send_response( 200, 'OK' ) • the headers follow (set at least Content-Type) o self.send_header( 'Connection', 'close' ) • headers and the content need to be separated o self.end-headers() • finally, send the content by writing to self. wf ile PV248 Python 230/306 December 5, 2019 Sending Content • self .wfile is an open file • it has a write() method which you can use • sockets only accept byte sequences, not str • use the bytes( string, encoding ) constructor o match the encoding to your Content-Type PV248 Python 231/306 December 5, 2019 HTTP and asyncio • the base asyncio currently doesn't directly support HTTP • but: you can get aiohttp from PyPI • contains a very nice web server © from aiohttp import web © minimum boilerplate, fully asyncio-ready PV248 Python 232/306 December 5, 2019 SSL and TLS • you want to use the ssl module for handling HTTPS o this is especially true server-side o aiohttp and http. server are compatible • you need to deal with certificates (loading, checking) • this is a rather important but complex topic PV248 Python 233/306 December 5, 2019 Certificate Basics • certificate is a cryptographically signed statement © it ties a server to a certain public key o the client ensures the server knows the private key • the server loads the certificate and its private key • the client must validate the certificate © this is typically a lot harder to get right PV248 Python 234/306 December 5, 2019 SSL in Python • start with import ssl • almost everything happens in the SSLContext class • get an instance from ssl.create_default_context() o you can use wrap_socket to run an SSL handshake o you can pass the context to aiohttp • if httpd is a http. server. HTTPServer: httpd.socket = ssl.wrap_socket( httpd.socket, ... ) PV248 Python 235/306 December 5, 2019 HTTP Clients • there's a very basic http. client • for a more complete library use urllib.request • aiohttp has client functionality • all of the above can be used with ssl • another 3rd party module: Python Requests PV248 Python 236/306 December 5, 2019 10 at the OS Level • often defaults to blocking © read returns when data is available © this is usually OK for files • but what about network code? © could work for a client PV248 Python 237/306 December 5, 2019 Threads and 10 • there may be work to do while waiting o waiting for 10 can be wasteful • only the calling (OS) thread is blocked o another thread may do the work o but multiple green threads may be blocked PV248 Python 238/306 December 5, 2019 Non-Blocking 10 • the program calls read © read returns immediately o even if there was no data • but how do we know when to read? o we could poll o for example call read every 30ms PV248 Python 239/306 December 5, 2019 Polling • trade-off between latency and throughput © sometimes, polling is okay © but is often too inefficient • alternative: 10 dispatch © useful when multiple IOs are pending © wait only if all are blocked PV248 Python 240/306 December 5, 2019 select • takes a list of file descriptors • block until one of them is ready o next read will return data immediately • can optionally specify a timeout • only useful for OS-level resources PV248 Python 241/306 December 5, 2019 Alternatives to select • select is a rather old interface • there is a number of more modern variants • poll and epoll system calls © despite the name, they do not poll © epoll is more scalable • kqueue and kevent on BSD systems PV248 Python 242/306 December 5, 2019 Synchronous vs Asynchronous • the select family is synchronous o you call the function o it may wait some time o you proceed when it returns • OS threads are fully asynchronous PV248 Python 243/306 December 5, 2019 The Thorny Issue of Disks • a file is always 'ready' for reading • this may still take time to complete • there is no good solution on UNIX • POSIX AIO exists but is sparsely supported • OS threads are an option PV248 Python 244/306 December 5, 2019 10 on Windows • select is possible (but slow) • Windows provides real asynchronous 10 o quite different from UNIX o the 10 operation is directly issued o but the function returns immediately • comes with a notification queue PV248 Python 245/306 December 5, 2019 The asyncio Event Loop • uses the select family of syscalls • why is it called async 10? o select is synchronous in principle o this is an implementation detail o the IOs are asynchronous to each other PV248 Python 246/306 December 5, 2019 How Does It Work • you must use asyncio functions for 10 • an async read does not issue an OS read • it yields back into the event loop • the fd is put on the select list • the coroutine is resumed when the fd is ready PV248 Python 247/306 December 5, 2019 Timers • asyncio allows you to set timers • the event loop keeps a list of those • and uses that to set the select timeout o just uses the nearest timer expiry • when a timer expires, its owner is resumed PV248 Python 248/306 December 5, 2019 Blocking 10 vs asyncio • all user code runs on the main thread • you must not call any blocking 10 functions • doing so will stall the entire application o in a server, clients will time out o even if not, latency will suffer PV248 Python 249/306 December 5, 2019 DNS • POSIX: getaddrinfo and getnameinfo © also the older API gethostbyname • those are all blocking functions © and they can take a while © but name resolution is essential • asyncio internally uses OS threads for DNS PV248 Python 250/306 December 5, 2019 Signals • signals on UNIX are very asynchronous • interact with OS threads in a messy way • asyncio hides all this using C code PV248 Python 251/306 December 5, 2019 Native Coroutines (Reminder) • delared using async def async def foo(): await asyncio.sleep( 1 ) • calling foo() returns a suspended coroutine • which you can await © or turn it into an asyncio. Task PV248 Python 252/306 December 5, 2019 Tasks • asyncio. Task is a nice wrapper around coroutines o create with asyncio.create_task() • can be stopped prematurely using cancel () • has an API for asking things: o done() tells you if the coroutine has finished o resultQ gives you the result PV248 Python 253/306 December 5, 2019 Tasks and Exceptions • what if a coroutine raises an exception? • calling result will re-raise it o i.e. it continues propagating from result() • you can also ask directly using exception() o returns None if the coroutine ended normally PV248 Python 254/306 December 5, 2019 Asynchronous Context Managers • normally, we use with for resource acquisition © this internally uses the context manager protocol • but sometimes you need to wait for a resource © __enter__() is a subroutine and would block © this won't work in async-enabled code • we need __enter__() to be itself a coroutine PV248 Python 255/306 December 5, 2019 async with • just like wait but uses __aenter__(), __aexit__() o those are async def • the async with behaves like an await o it will suspend if the context manager does o the coroutine which owns the resource can continue • mainly used for locks and semaphores PV248 Python 256/306 December 5, 2019 Part 6: Testing, Pitfalls PV248 Python 257/306 December 5, 2019 Mixing Languages • for many people, Python is not a first language • some things look similar in Python and Java (C++,...) © sometimes they do the same thing © sometimes they do something very different © sometimes the difference is subtle PV248 Python 258/306 December 5, 2019 Python vs Java: Decorators • Java has a thing called annotations • looks very much like a Python decorator • in Python, decorators can drastically change meaning • in Java, they are just passive metadata © other code can use them for meta-programming though PV248 Python 259/306 December 5, 2019 Class Body Variables class Foo: some_attr = 42 • in Java/C++, this is how you create instance variables • in Python, this creates class attributes © i.e. what C++/Java would call static attributes PV248 Python 260/306 December 5, 2019 Very Late Errors if a == 2: priiiint("a is not 2") • no error when loading this into python • it even works as long as a ! = 2 • most languages would tell you much earlier PV248 Python 261/306 December 5, 2019 Very Late Errors (cont'd) try: foo() except TyyyypeError: print("my mistake") • does not even complain when running the code • you only notice when foo() raises an exception PV248 Python 262/306 December 5, 2019 Late Imports if a == 2: import foo foo.say_hello() • unless a == 2, mymod is not loaded • any syntax errors don't show up until a == 2 © it may even fail to exist PV248 Python 263/306 December 5, 2019 Block Scope for i in range(10): pass print(i) II not a NameError • in Python, local variables are function-scoped • in other languages, i is confined to the loop PV248 Python 264/306 December 5, 2019 Assignment Pitfalls x = [ l, 2 ] y = x x.append( 3 ) print( y ) tt prints [ 1, 2, 3 ] • in Python, everything is a reference • assignment does not make copies PV248 Python 265/306 December 5, 2019 Equality of Iterables • [0» 1] -- [0, l]-> True (obviously) • range© == range© -»True • list(range(2)) == [0, l] ->True • [0» 1] -- range(2) -»False PV248 Python 266/306 December 5, 2019 Equality of bool • if 0: print( "yes" ) -> nothing • if 1: print( "yes" ) -> yes • False == 0 -» True • True == 1 ->True • 0 is False -» False • 1 is True -> False PV248 Python 267/306 December 5, 2019 Equality of bool (cont'd) • if 2: print( "yes" )->yes • True == 2 -> False • False == 2 -» False • if 11: print( "yes" ) -> nothing • if 'x': print( "yes" )->yes • 11 == False -> False • 'x' == True -> False PV248 Python 268/306 December 5, 2019 Mutable Default Arguments def foo( x = [] ): x.append( 7 ) return x foo() # [ 7 ] foo() # [ 7, 7 ]... wait, what? PV248 Python 269/306 December 5, 2019 Late Lexical Capture f = [ lambda x : i * x for i in range( 5 ) ] f[ 4 ]( 3 ) # 12 f[ 0 ]( 3 ) # 12 ... ?! g = [ lambda x, i = i: i * x for i in range( 5 ) ] g[ 4 ]( 3 ) tt 12 g[ 0 ]( 3 ) tt 0 ... fml h = [ ( lambda x : i * x )( 3 ) for i in range( 5 ) ] h#[0, 3, 6, 12 ] ... i kid you not PV248 Python 270/306 December 5, 2019 Dictionary Iteration Order • in python <= 3.6 © small dictionaries iterate in insertion order © big dictonaries iterate in 'random' order • in python 3.7 © all dictonaries in insertion, but not documented • in python >= 3.8 © guaranteed to iterate in insertion order PV248 Python 271/306 December 5, 2019 x = [ [ 1 ] * 2 ] * 3 print( x ) # [ [ 1, 1 ], [ 1, i ], [ i, 1 ] ] x[ 0 ][ 0 ] = 2 print( x ) # [ [ 2, 1 ], [ 2, i ], [ 2, 1 ] ] PV248 Python 272/306 December 5, 2019 Forgotten Await import asyncio async def foo(): print( "hello" ) async def main(): foo() asyncio.run( main() ) • gives warning coroutine 1 f oo1 was never awaited PV248 Python 273/306 December 5, 2019 Python vs Java: Closures • captured variables are final in Java • but they are mutable in Python © and of course captured by reference • they are whatever you tell them to be in C++ PV248 Python 274/306 December 5, 2019 Explicit super () • Java and C++ automatically call parent constructors • Python does not • you have to call them yourself PV248 Python 275/306 December 5, 2019 Setters and Getters obj.attr obj.attr = 4 • in C++ or Java, this is an assignment • in Python, it can run arbitrary code o this often makes getters/setters redundant PV248 Python 276/306 December 5, 2019 Why Testing • reading programs is hard • reasoning about programs is even harder • testing is comparatively easy • difference between an example and a proof PV248 Python 277/306 December 5, 2019 What is Testing • based on trial runs • the program is executed with some inputs • the outputs or outcomes are checked • almost always incomplete PV248 Python 278/306 December 5, 2019 Testing Levels • unit testing © individual classes © individual functions • functional © system © integration PV248 Python 279/306 December 5, 2019 Testing Automation • manual testing o still widely used © requires human • semi-automated © requires human assistance • fully automated o can run unattended PV248 Python 280/306 December 5, 2019 Testing Insight • what does the test or tester know? • black box: nothing known about internals • gray box: limited knowledge • white box: 'complete knowledge PV248 Python 281/306 December 5, 2019 Why Unit Testing? • allows testing small pieces of code • the unit is likely to be used in other code © make sure your code works before you use it o the less code, the easier it is to debug • especially easier to hit all the corner cases PV248 Python 282/306 December 5, 2019 Unit Tests with unittest • from unittest import TestCase • derive your test class from TestCase • put test code into methods named test_* • run with python -m unittest program.py o add -v for more verbose output PV248 Python 283/306 December 5, 2019 from unittest import TestCase class TestArith(TestCase): def test_add(self): self.assertEqual(l, 4 - • 3) def test_leq(self): self.assertTrue(3 <= 2 * 3) PV248 Python 284/306 December 5, 2019 Unit Tests with pytest • a more pythonic alternative to unittest © unittest is derived from JUnit • easier to use and less boilerplate • you can use native python assert • easier to run, too o just run pytest in your source repository PV248 Python 285/306 December 5, 2019 Test Auto-Discovery in pytest • pytest finds your testcases for you o no need to register anything • put your tests in test_.py or _test.py • name your testcases (functions) test_* PV248 Python 286/306 December 5, 2019 Fixtures in pytest • sometimes you need the same thing in many testcases • in unittest, you have the test class • pytest passes fixtures as parameters © fixtures are created by a decorator © they are matched based on their names PV248 Python 287/306 December 5, 2019 import pytest import smtplib @pytest.fixture def smtp_connection(): return smtplib.SMTP("smtp.gmail.com", 587) def test_ehlo(smtp_connection): response, msg = smtp_connection.ehlo() assert response == 250 PV248 Python 288/306 December 5, 2019 Property Testing • writing test inputs is tedious • sometimes, we can generate them instead • useful for general properties like © idempotency (e.g. serialize + deserialize) o invariants (output is sorted,...) o code does not cause exceptions PV248 Python 289/306 December 5, 2019 Using hypothesis • property-based testing for Python • has strategies to generate basic data types © int, str, diet, list, set,... • compose built-in generators to get custom types • integrated with pytest PV248 Python 290/306 December 5, 2019 import hypothesis import hypothesis.strategies as s ^hypothesis.given(s.lists(s.integers())) def test_sorted(x): assert sorted(x) == x It should fail {hypothesis.given(x=s.integers(), y=s.integers()) def test_cancel(x, y): assert (x + y) - y == x # looks okay PV248 Python 291/306 December 5, 2019 Going Quick and Dirty • goal: minimize time spent on testing • manual testing usually loses © but it has almost 0 initial investment • if you can write a test in 5 minutes, do it • useful for testing small scripts PV248 Python 292/306 December 5, 2019 Shell 101 • shell scripts are very easy to write • they are ideal for testing 10 behaviour • easily check for exit status: set -e • see what is going on: set -x • use dif f -u to check expected vs actual output PV248 Python 293/306 December 5, 2019 Shell Test Example set -ex python script.py < testl.in | tee out cliff -u testl.out out python script.py < test2.in | tee out cliff -u test2.out out PV248 Python 294/306 December 5, 2019 Continuous Integration • automated tests need to be executed • with many tests, this gets tedious to do by hand • CI builds and tests your project regularly © every time you push some commits © every night (e.g. more extensive tests) PV248 Python 295/306 December 5, 2019 CI: Travis • runs in the cloud (CI as a service) • trivially integrates with pytest • virtualenv out of the box for python projects • integrated with github • configure in .travis.yml in your repo PV248 Python 296/306 December 5, 2019 CI: GitLab • GitLab has its own CI solution (similar to travis) • also available at FI • runs tests when you push to your gitlab • drop a .gitlab-ci.yml in your repository • automatic deployment into heroku &c. PV248 Python 297/306 December 5, 2019 CI: Buildbot • written in python/twisted © basically a framework to build a custom CI tool • self-hosted and somewhat complicated to set up © more suited for complex projects © much more flexible than most CI tools • distributed design PV248 Python 298/306 December 5, 2019 CI: Jenkins • another self-hosted solution, this time in Java © widely used and well supported • native support for python projects (including pytest) © provides a dashboard with test result graphs &c. © supports publishing sphinx-generated documentation PV248 Python 299/306 December 5, 2019 Print-based Debugging • no need to be ashamed, everybody does it • less painful in interpreted languages • you can also use decorators for tracing • never forget to clean your program up again PV248 Python 300/306 December 5, 2019 def debug(e): f = sys._getframe(l) v = eval(e, f.f.globals, f.f-locals) 1 = f,f_code.co_filename + ':' 1 += str(f.f_lineno) + ':' print(l, e, 1=1, repr(v), file=sys.stderr) x = 1 debug('x + 1') PV248 Python 301/306 December 5, 2019 The Python Debugger • run as python -m pdb program.py • there's a built-in help command • next steps through the program • break to set a breakpoint • cont to run until end or a breakpoint PV248 Python 302/306 December 5, 2019 What is Profiling • measurement of resource consumption • essential info for optimising programs • answers questions about bottlenecks © where is my program spending most time? © less often: how is memory used in the program PV248 Python 303/306 December 5, 2019 Why Profiling • 'blind' optimisation is often misdirected © it is like fixing bugs without triggering them © program performance is hard to reason about • tells you exactly which point is too slow © allows for best speedup with least work PV248 Python 304/306 December 5, 2019 Profiling in Python • provided as a library, cProf ile © alternative: profile is slower, but more flexible • run as python -m cProfile program.py • outputs a list of lines/functions and their cost • use cProfile.runQ to profile a single expression PV248 Python 305/306 December 5, 2019 It python -m cProfile -s time fib.py ncalls tottime percall file:line(function) 13638/2 0.032 0.016 fib.py:l(fib_rec) 2 0.000 0.000 {builtins.print} 2 0.000 0.000 fib.py 5(fib_mem) PV248 Python 306/306 December 5, 2019