PV248 Python 1/9 October 22, 2020 PV248 Python Petr Ročkai and Zuzana Baranová Part A: Introduction This document is a collection of exercises and commented examples of source code (in Python). All of the source code included here is also available as source files which you can edit and directly execute. Each chapter corresponds to a single week of the semester. The correspondence between exercises and the content of the lectures is somewhat loose, especially at the start of the semester. The course assumes that you are intuitively familiar with common programming concepts like classes, objects, higher-order functions and function closures (which can be stored in variables). However, you do not need a detailed theoretical understanding of the concepts. NB. The exercise part of this document may be incomplete. Please always refer to the source files that you obtained via pv248 update on aisa as the authoritative version. We are working on also improving this document, please be patient. Part A.1: Course Overview Welcome to PV248 Programming in Python. In a normal semester, the course consists of lectures, seminars and assignments. This is not a normal semester: the course will be entirely online, and your primary source of information will be this collection of code examples and exercises. There are a few lecture recordings from previous years, but only one or so is in English. Since this is a programming subject, most of the coursework – and grading – will center around actual programming. There will be 2 types of programs that you will write in this seminar: tiny programs for weekly exercises (15-20 minutes each) and small programs for homework (a few hundred lines). Writing programs is hard and this course won’t be entirely easy either. You will need to put in effort to pass the subject. Hopefully, you will have learned something by the end of it. Further details on the organisation of this course are in this directory: • grading.txt – what is graded and how; what you need to pass, • homework.txt – general guidelines that govern assignments, • reviews.txt – writing and receiving peer reviews, • advisors.txt – whom to talk to and how when you need help. Study materials for each week are in directories 01 through 13. Start by reading intro.txt. Assignments are in directories hw1 through hw6 and will be made available according to the schedule shown in grading.txt. Part A.2: Grading To pass the subject, you need to collect a total of 18 points (by any means). The points can be obtained as follows (these are upper limits): • 12 points for homework (6 assignments, 2 points each) • 9 points for weekly exercises, • 6 points for finishing your homework early, • 3 points for peer review. You need to pass the 18 point mark by 17th of February, one week after the last deadline of the last homework (this gives you some space to collect the remaining points via peer review). A.2.1 Homework There will be 6 assignments, one every two weeks. There will be 8 deadlines for each of them, one week apart and each deadline gives you one chance to pass the automated test suite. If you pass on the first or second deadline, you get 1 extra point for the assignment. For the third and fourth deadlines, the bonus is reduced to 0.5 point. Afterwards, you only get the baseline 2 points. The deadline schedule is as follows: given try 1 try 2 try 3 try 4 3 points 2.5 points hw1 7.10. 14.10. 21.10. 28.10. 4.11. hw2 21.10. 28.10. 4.11. 11.11. 18.11. hw3 4.11. 11.11. 18.11. 25.11. 2.12. hw4 18.11. 25.11. 2.12. 9.12. 16.12. hw5 2.12. 9.12. 16.12. 23.12. 30.12. hw6 16.12. 23.12. 30.12. 6.1. 13.1. given try 5 try 6 try 7 try 8 2 points hw1 7.10. 11.11. 18.11. 25.11. 2.12. hw2 21.10. 25.11. 2.12. 9.12. 16.12. hw3 4.11. 9.12. 16.12. 23.12. 30.12. hw4 18.11. 23.12. 30.12. 6.1. 13.1. hw5 2.12. 6.1. 13.1. 20.1. 27.1. hw6 16.12. 20.1. 27.1. 3.2. 10.2. The test suite is strictly binary: you either pass or you fail. More details and guidelines are in homework.txt. A.2.2 Weekly Exercises Besides homework assignments, the main source of points will be weekly exercises. Like with homework, you are not required to do any of these (except to get sufficient points to pass the course). How you split points between homework and the weekly exercises is up to you. Each week, you will be able to submit a fixed subset of the exercises given to you (i.e. we will select usually 2, sometimes perhaps 3 exercises, which you can submit and get the point). Each week, you will be able to get up to one point (so in theory, 12 points are available, but the maximum you can earn this way is capped at 9). The point will be split between the exercises, i.e. it will be possible to earn fractional points in a given week, too. If bonuses are present in an exercise, those are not required in submissions (nor they are rewarded with points). The exercises have test cases enclosed: it is sufficient to pass those test cases to earn the associated points. The deadlines to earn points are as follows (you will have 2 weeks to solve each set): chapter given deadline 01 7.10. 21.10. 02 14.10. 28.10. 03 21.10. 4.11. 04 28.10. 11.11. 05 4.11. 18.11. 06 11.11. 25.11. 07 18.11. 2.12. 08 25.11. 9.12. 09 2.12. 16.12. 10 9.12. 23.12. 11 16.12. 30.12. 12 23.12. 6.1. A.2.3 Peer Review Reading code is an important skill – sometimes PV248 Python 2/9 October 22, 2020 more so than writing it. While the space to practice reading code in this subject is limited, you will still be able to earn a few points doing just that. The rules for peer review are as follows: • only homework is eligible for reviews (not the weekly exercises), • you can submit any code (even completely broken) for peer review, • to write a review for any given submission, you must have already passed the respective assignment yourself, • there are no deadlines for requesting or providing peer reviews (other than the deadline on passing the subject), • writing a review is worth 0.3 points and you can write at most 10. It is okay to point out correctness problems during peer reviews, with the expectation that this might help the recipient pass the assignment. This is the only allowed form of cooperation (more on that below). A.2.4 Plagiarism Copying someone else’s work or letting someone else copy yours will earn you -6 points per instance. You are also responsible for keeping your solutions private. If you only use the pv248 command on aisa, it will make your ~/pv248 directory inaccessible to anyone else (this also applies to school-provided UNIX workstations). Keep it that way. If you work on your solution using other computers, make sure they are secure. Do not publish your solutions anywhere (on the internet or otherwise). All parties in a copying incident will be treated equally. No cooperation is allowed (not even design-level discussion about how to solve the exercise) on homework and on weekly exercises which you submit. If you want to study with your classmates, that is okay – but only cooperate on exercises which are not going to be submitted by either party. Part A.3: Homework The general principles outlined here apply to all assignments. The first and most important rule is, use your brain – the specifications are not exhaustive and sometimes leave room for different interpretations. Do your best to apply the most sensible one. Do not try to find loopholes (all you are likely to get is failed tests). Technically correct is not the best kind of correct. Think about pre- and postconditions. Aim for weakest preconditions that still allow you to guarantee the postconditions required by the assignment. If your preconditions are too strong (i.e. you disallow inputs that are not ruled out by the spec) you will likely fail the tests. Do not print anything that you are not specifically directed to. Programs which print garbage (i.e. anything that wasn’t specified) will fail tests. You can use the standard library. Third-party libraries are not allowed, unless specified as part of the assignment. Make sure that your classes and methods use the correct spelling, and that you accept and/or return the correct types. In most cases, either the ‘syntax’ or the ‘sanity’ test suite will catch problems of this kind, but we cannot guarantee that it always will – do not rely on it. If you don’t get everything right the first time around, do not despair. The expectation is that most of the time, you will pass in the second or third week. In the real world, the first delivered version of your product will rarely be perfect, or even acceptable, despite your best effort to fulfill every customer requirement. Only very small programs can be realistically written completely correctly in one go. If you strongly disagree with a test outcome and you believe you adhered to the specification and resolved any ambiguities in a sensible fashion, please use the online chat or the discussion forum in the IS to discuss the issue (see advisors.txt for details). A.3.1 Submitting Solutions The easiest way to submit a solution is this: $ ssh aisa.fi.muni.cz $ cd ~/pv248/hw1 $ pv248 submit If you prefer to work in some other directory, you may need to specify which homework you wish to submit, like this: pv248 submit hw1. The number of times you submit is not limited (but see also below). NB. Only the files listed in the assignment will be submitted and evaluated. Please put your entire solution into existing files. You can check the status of your submissions by issuing the following command: $ pv248 status In case you already submitted a solution, but later changed it, you can see the differences between your most recent submitted version and your current version by issuing: $ pv248 diff The lines starting with - have been removed since the submission, those with + have been added and those with neither are common to both versions. A.3.2 Evaluation There are three sets of automated tests which are executed on the solutions you submit: • The first set is called syntax and runs immediately after you submit. Only 2 checks are performed: the code can be loaded (no syntax errors) and passes mypy. • The next step is sanity and runs every midnight. Its main role is to check that your program meets basic semantic requirements, e.g. that it recognizes correct inputs and produces correctly formatted outputs. The ‘sanity’ test suite is for your information only and does not guarantee that your solution will be accepted. The ‘sanity’ test suite is only executed if you passed ‘syntax’. • Finally the verity test suite covers most of the specified functionality and runs once a week – every Wednesday at midnight, right after the deadline. If you pass the verity suite, the assignment is considered complete and you are awarded the corresponding number of points. The verity suite will not run unless the code passes ‘sanity’. If you pass on the first or the second run of the full test suite (7 or 14 days after the assignment is given), you are entitled to a bonus point. If you pass at one of the next 2 attempts, you are entitled to half a bonus point. After that, you have 4 more attempts to get it right. See grading.txt for more details. Only the most recent submission is evaluated, and each submission is evaluated at most once in the ‘sanity’ and once in the ‘verity’ mode. You will find your latest evaluation results in the IS in notepads (one per assignment). Part A.4: Advisors It is hard to anticipate what problems you will run into while programming, and which concepts you will find hard to understand. Normally, those issues would be resolved in the seminar, but this semester, we won’t have that luxury. Instead, we will do our best to give you extended text materials and examples, so that you can resolve as many issues as possible on your own. Of course, that will sometimes fail: for that reason, you will be able to interactively ask for help online. Unfortunately, as much as we would like to, we cannot provide help 24/7 – there will instead be a few slots in which one of the teachers will be specifically available. You can ask questions at other times, and we will provide a ‘best effort’ service: if someone is available, they may answer the question, but please do not rely on this. For this, we will use the online chat available at https://lounge.fi.muni.cz – use your faculty login and password to get in, and join the channel (room) ##pv248 (double sharp). The schedule PV248 Python 3/9 October 22, 2020 is as follows: day start end person Tue 18:00 20:00 Petr Ročkai Thu 16:00 18:00 Vladimír Štill The other option is of course the discussion forum in IS, where you can ask questions, though this is not nearly as interactive, and the delay can be considerable (please be patient). Please also note that the online chat is meant for programming discussion: if you have questions about organisation or technical issues, use the discussion forum instead. Since exercises won’t be published until Wednesday, the first session will be held on Thursday, 8th of October. Part 1: Python Intro There are two sets of exercises in the first week (exercises within each set are related). The first set is an evaluator of simple expressions in reverse polish notation (files prefixed rpn_) and the other is about planar analytic geometry (simple geometric objects, their attributes, transformations on them and interactions between them; these files are prefixed geom_). Each of the two blocks is split into three exercises. One thing that you will need but might not be familiar with is variadic functions: see varargs.py for an introduction. The order in which the exercises were meant to be solved is this: 1. rpn_un.py 2. rpn_bin.py (can be submitted) 3. rpn_gen.py 4. geom_types.py 5. geom_intersect.py (can be submitted) 6. geom_dist.py It is okay to flip the two blocks, but the exercises within each block largely build on each other and cannot be as easily skipped or re- ordered. Part 1.1: Exercises 1.1.1 [rpn_un] In the first (short) series of exercises, we will implement a simple RPN (Reverse Polish Notation) evaluator. The entry point will be a single function, with the following prototype: def rpn_eval( rpn ): pass The rpn argument is a list with two kinds of objects in it: numbers (of type int, float or similar) and operators (for simplicity, these will be of type str). To evaluate an RPN expression, we will need a stack (which can be represented using a list, which has useful append and pop methods). Implement the following unary operators: neg (for negation, i.e. unary minus) and recip (for reciprocal, i.e. the multiplicative inverse). The result of rpn_eval should be the stack at the end of the computation. Below are a few test cases to check the implementation works as expected. You are free to add your own test cases. When you are done, you can continue with rpn_bin.py. 1.1.2 [rpn_bin] The second exercise is rather simple: take the RPN evaluator from the previous exercise, and extend it with the following binary operators: +, -, *, /, **. On top of that, add two ‘greedy’ operators, sum and prod, which reduce the entire content of the stack to a single number. Note that we write the stack with ‘top’ to the right, and operators take arguments from left to right in this ordering (i.e. the top of the stack is the right argument of binary operators). This is important for non-commutative operators. This exercise is one of the two which you can submit this week, and is worth 0.5 points. def rpn_eval( rpn ): pass Some test cases are included below. Write a few more test cases to convince yourself that your code works correctly. If you didn’t see it yet, you should make a short detour to varargs.py before you come back to the last round of RPNs, in rpn_gen.py. 1.1.3 [rpn_gen] Let’s generalize the code. Until now, we had a fixed set of operators hard-coded in the evaluator. Let’s instead turn our evaluator into an object which can be extended by the user with additional operators. The class should have an evaluate method which takes a list like before. On top of that, it should also have an add_op(name, arity, f) method, where name is the string that describes / names the operator, arity is the number of operands it expects and f is a function which implements it. The function f should take as many arguments as arity specifies. class Evaluator: def __init__( self ): pass def add_op( self, name, arity, f ): pass def evaluate( self, rpn ): pass def example(): e = Evaluator() e.add_op( '*', 2, lambda x, y: x * y ) e.add_op( '+', 2, lambda x, y: x + y ) print( e.evaluate( [ 1, 2, '+', 7, '*' ] ) ) # expect [21] Bonus 1: Allow arity = 0 to mean ‘greedy’. The function passed to add_op in this case must accept any number of arguments. bonus_1 = True # enable / disable tests for bonus 1 Bonus 2: Can you implement Evaluator in such a way that it does not require the arity argument in add_op()? How portable among different Python implementations do you think this is? As usual, write a few test cases to convince yourself that your code works (in addition to the ones already provided). Be sure to check that operators with arities 1 and 3 work, for instance. Then, you can continue to geom_types.py. 1.1.4 [geom_types] The second set of exercises will deal with planar analytic geometry. First define classes Point and Vector (tests expect the attributes to be named x and y): class Point: def __init__( self, x, y ): pass def __sub__( self, other ): # self - other pass # compute a vector def translated( self, vec ): pass # compute a new point class Vector: def __init__( self, x, y ): pass def length( self ): pass def dot( self, other ): # dot product PV248 Python 4/9 October 22, 2020 pass def angle( self, other ): # in radians pass Let us define a line next. Whether you use a point and a vector or two points is up to you (the constructor should take two points). Whichever you choose, make both representations available using methods (point_point and point_vector, both returning a 2-tuple). The points returned should be the same as those passed to the constructor, and the vector should be the vector from the first point to the second point. Apart from the above methods, also implement an equality operator for two lines (__eq__), which will be called when two lines are compared using ==. In Python 2, you were also expected to implement its counterpart, __ne__ (which stands for ’not equal’), but Python 3 defines __ne__ automatically, by negating the result of __eq__. class Line: def __eq__( self, other ): if not isinstance( other, Line ): return False pass # continue the implementation def translated( self, vec ): pass def point_point( self ): pass def point_vector( self ): pass The Segment class is a finite version of the same. class Segment: def length( self ): pass def translated( self, vec ): pass def point_point( self ): pass And finally a circle, using a center (a Point) and a radius (a float). class Circle: def __init__( self, c, r ): pass def center( self ): pass def radius( self ): pass def translated( self, vec ): pass As always, write a few test cases to check that your code works. Please make sure that your implementation is finished before consulting tests; specifically, try to avoid reverse-engineering the tests to find out how to write your program. 1.1.5 [geom_intersect] We first import all the classes from the previous exercise, since we will want to use them. from geom_types import * We will want to compute intersection points of a few object type combinations. We will start with lines, which are the simplest. You can find closed-form general solutions for all the problems in this exercise on the internet. Use them. This exercise is the second that you can submit. You will need to include geom_types.py as well, but the points are all attached to this exercise (i.e. submitting geom_types.py alone will not earn you any points). Line-line intersect either returns a list of points, or a Line, if the two lines are coincident. def intersect_line_line( p, q ): pass A variation. Re-use the line-line case. def intersect_line_segment( p, s ): pass Intersecting lines with circles is a little more tricky. Checking e.g. MathWorld sounds like a good idea. It might be helpful to translate both objects so that the circle is centered at the origin. The function returns a list of points. def intersect_line_circle( p, c ): pass It’s probably quite obvious that users won’t like the above API. Let’s make a single intersect() that will work on anything (that we know how to intersect, anyway). You can use type( a ) to find the type of object a. You can compare types for equality, too: type( a ) == Circle will do what you think it should. def intersect( a, b ): pass Test cases follow. Note that the tests use line equality which you implemented in geom_types. The last exercise for this week can be found in geom_dist.py. 1.1.6 [geom_dist] In case there are no intersections, it makes sense to ask about distances of two objects. In this case, it also makes sense to include points, and we will start with those: def distance_point_point( a, b ): pass def distance_point_line( a, p ): pass If we already have the point-line distance, it’s easy to also find the distance of two parallel lines: def distance_line_line( p, q ): pass Circles vs points are rather easy, too: def distance_point_circle( a, c ): pass A similar idea works for circles and lines. Note that if they intersect, we set the distance to 0. def distance_line_circle( p, c ): pass And finally, let’s do the friendly dispatch function: def distance( a, b ): pass Part 2: Data Structures In the second week, there is one set of 3 related and another set of 3 standalone exercises. The first set is a mock ‘reimplement a legacy system’ story. If something appears to be stupid, blame the original authors. They were overworked programmers too, probably fresh out PV248 Python 5/9 October 22, 2020 of school. The first set contains: 1. ts3_escape.py 2. ts3_normalize.py (this one can be submitted) 3. ts3_render.py [tbd] The second set contains exercises which make use of the basic built-in classes, like dict, set, list, str and so on: • merge.py (this one can be submitted) • rewrite.py – fun with rewrite systems • magic.py – identifying file types Part 2.1: Exercises 2.1.1 [ts3_escape] Big Corp has an in-house knowledge base / information filing system. It does many things, as legacy systems are prone to, and many of them are somewhat idiosyncratic. Either because the relevant standards did not exist at the time, or the responsible programmer didn’t like the standard, so they rolled their own. The system has become impossible to maintain, but the databases contain a vast amount of information and are in active use. The system will be rewritten from scratch, but will stay backward-compatible with all the existing formats. You are on the team doing the rewrite (we are really sorry to hear this, honest). The system stores structured documents, and one of its features is that it can format those documents using templates. However, the template system got a little out of hand (they always do, don’t they) and among other things, it is recursive. Each piece of information inserted into the template is itself treated as a template and can have other pieces of the document substituted. A template looks like this: template_1 = '''The product ‘${product}’ is made by ${manufacturer} in ${country}. The production uses these rare-earth metals: #{ingredients.rare_earth_metals} and these toxic substances: #{ingredients.toxic}.'''; The system does not treat $ and # specially, unless they are followed by a left brace. This is a rare combination, but it turns out it sometimes appears in documents. To mitigate this, the sequences $${ and ##{ are interpreted as literal ${ and #{. At some point, the authors of the system realized that they need to write literal $${ into a document. So they came up with the scheme that when a string of 2 or more $ is followed by a left brace, one of the $ is removed and the rest is passed through. Same with #. Your first task is to write functions which escape and un-escape strings using the scheme explained above. The template component of the system is known simply as ‘template system 3’, so the functions will be called ts3_escape and ts3_unescape. Return the altered string. If the string passed to ts3_unescape contains the sequence #{ or ${, throw RuntimeError, since such string could not have been returned from ts3_escape. Once you are done, continue to ts3_normalize.py. def ts3_escape( string ): pass def ts3_unescape( string ): pass def test_main(): pairs = [ ( "", "" ), ( "aa", "aa" ), ( "$", "$" ), ( "{", "{" ), ( "${", "$${" ), ( "$${", "$$${" ), ( "$$ {", "$$ {" ), ( "${$", "$${$" ), ( "$$${", "$$$${" ), ( "ab${ nabc$$${{tsk$$$${asd${${}}}aa$a{", "ab$${ nabc$$$${{tsk$$$$${asd$${$${}}}aa$a{" ), ( "#", "#" ), ( "#{", "##{" ), ( "#{{", "##{{" ), ( "####{", "#####{" ), ( "#}{", "#}{" ), ( "$#{", "$##{" ), ( "#${", "#$${" ), ( "$#{}${##}##{$}%${##${$$${{${", "$##{}$${##}###{$}%$${##$${$$$${{$${" ) ] for unescaped, escaped in pairs: err = "ts3_escape( '{}' ) did not match '{}'".format( unescaped, escaped ) assert ts3_escape( unescaped ) == escaped, err err = "ts3_unescape( '{}' ) did not match '{}'".format( escaped, unescaped ) assert ts3_unescape( escaped ) == unescaped, err 2.1.2 [ts3_normalize] Eventually, we will want to replicate the actual substitution into the templates. This will be done by the ts3_render function. However, somewhat surprisingly, that function will only take one argument, which is the structured document to be converted into a string. Recall that the template system is recursive: before ts3_render, another function, ts3_combine combines the document and the templates into a single tree-like structure. One of your less fortunate colleagues is doing that one. This structure has 5 types of nodes: lists, maps, templates (strings), documents (also strings) and integers. In the original system there are more types (like decimal numbers, booleans and so on) but it has been decided to add those later. Many documents only make use of the above 5. A somewhat unfortunate quirk of the system is that there are multiple types of nodes represented using strings. The way the original system dealt with this is by prefixing each string by its type: $document$ (with a trailing space!) and $template$ . Those prefixes are stored in the database. To make matters worse, there are strings with no prefix: earlier versions looked for ${ and #{ sequences in the string, and if it found some, treated the string as a template, and as a document otherwise. The team has rightly decided that this is stupid. You drew the short straw and now you are responsible for function ts3_normalize, which takes the above slightly baroque structure and sorts the strings into two distinct types, which are represented using Python classes. Someone else will deal with converting the database ‘later’. class Document: pass class Template: pass Each of the above classes should have an attribute called text, which is a string and contains only the actual text, without the funny prefixes. The lists, maps and integers fortunately arrive as Python list, dict and int into this function. Return the altered tree: the strings substituted for their respective types. def ts3_normalize( tree ): pass PV248 Python 6/9 October 22, 2020 2.1.3 [ts3_render] At this point, we have a structure made of dict, list, Template, Document and int instances. The lists and maps can be arbitrarily nested. Within templates, the substitutions give dot-separated paths into this tree-like structure. If the top-level object is a map, the first component of a path is a string which matches a key of that map. The first component is then chopped off, the value corresponding to the matched key is picked as a new root and the process is repeated recursively. If the current root is a list and the path component is a number, the number is used as an index into the list. If a dict meets a number in the path (we will only deal with string keys), or a list meets a string, raise a RuntimeError and let someone else deal with the problem later. The ${path} substitution performs scalar rendering, while #{path} substitution performs composite rendering. Scalar rendering resolves the path to an object, and depending on its type, performs the following: • if it is a Document, replace the ${...} with the text of the document; the pasted text is excluded from further processing, • if it is a Template, the ${...} is replaced with the text of the template; occurrences of ${...} and #{...} within the pasted text are further processed, • if it is an int, it is formatted and the resulting string replaces the ${...}, • if it is a list, the length of the list is formatted as if it was an int, and finally, • if it is a dict, .default is appended to the path and the substitution is retried. Composite rendering using #{...} is similar, but: • a dict is rendered as a comma-separated (with a space) list of its values, after the keys are sorted alphabetically, where each value is rendered as a scalar, • a list is likewise rendered as a comma-separated list of its values as scalars, • everything else is an error: raise a RuntimeError for now, someone else will fix that later. The top-level entity passed to ts3_render must always be a dict. The starting template is expected to be in the key ’$template’ of that dict. Remember that ##{...} and $${...} must remain untouched. If you encounter nested templates while parsing the path, e.g. ${abc${d}}, throw an error (but see also bonus 2 below). def ts3_render(tree): pass Bonus 1: It turns out that the original system had a bug, where a template could look like this: ${foo.bar}.baz} – if ${foo.bar} referenced a template and that template ended with ${quux (notice all the oddly unbalanced brackets!), the system would then paste the strings to get ${quux.baz} and proceed to perform that substitution. The real clincher is that template authors started to use this as a feature, and now we are stuck with it. Replicate this functionality. However, make sure that this does not happen when the first part of the pasted substitution comes from a document! PS: The original bug would still do the substitution if the second part was a document and not a template. Feel free to replicate that part of the bug too. As far as anyone knows, the variant with template + document is not abused in the wild, so it is also okay to fix it. Bonus 2: If you encounter nested templates while parsing the path, first process the innermost substitutions, resolve the inside path and append the path to the outer one, then continue resolving the outer path. Example: ${path${inner.tpl}}, first resolve inner.tpl, append the result after `path`, then continue parsing. If the inner.tpl path leads to a document with text ".outside.2", the outer path is "path.outside.2". from ts3_normalize import ts3_normalize 2.1.4 [merge] Write a function merge_dict which takes these 3 argu- ments: • a dict instance, in which some keys are deemed equivalent: the goal of merge_dict is to create a new dictionary, where all equivalent keys have been merged; keys which are not equivalent to anything else are left alone • a list of set instances, where each set describes one set of equivalent keys (the sets are pairwise disjoint), and finally, • a function combine which takes a list of values (not a set, because we may care about duplicates): merge_dict will pass, for each set of equivalent keys, all the values corresponding to those keys into combine. In the output dictionary, create a single key for each equivalent set: • the key is the smallest of the keys from the set which were actually present in the input dict, • the value is the result of calling combine on the list of values associated with all the equivalent keys in the input dict. Do not modify the input dictionary. def merge_dict(dict_in, equiv, combine): pass 2.1.5 [rewrite] Write a function is_generated which checks whether word final can be generated by a rewrite system described by rules starting from word initial. The rewrite system is given as a dict, where keys are str and values are each a list of str. The key is the left-hand side of a rule (see below) while the list gives all possible right-hand sides to go with it. The initial string and the string to be generated (final) are given as str instances. def is_generated( rules, initial, final ): pass A rewrite system is like a grammar, but does not distinguish between terminals and non-terminals. There are only letters, and the rules say that a given substring can be rewritten to some other substring. For instance, consider the rules: 1. x → xx (any x in the string can be doubled) 2. xx → xyz 3. xx → xyx Starting from xy, a possible derivation would be: 1. use rule 1 to obtain xxy 2. use rule 2 to obtain xyzxy 3. use rule 1 again to obtain xyzxxy All of the words which appear above are said to be generated by the rewrite system. More formally, a word is generated by the system if it can be obtained by applying a finite sequence of rules. Each rule can be applied in an arbitrary position (i.e. wherever you like). In this exercise, the right side of a rule is always strictly longer than the left side (this reduces the power of the system considerably and makes the exercise much easier to solve). 2.1.6 [magic] Write function identify which takes rules, a list of rules, and data, a bytes object to be identified. It then tries to apply each rule and return the identifier associated with the first matching rule, or None if no rules match. Each rule is a tuple with 2 components: • name, a string to be returned if the rule matches, • a list of patterns, where each pattern is a tuple with: 0. offset, an integer, a. bits, a bytes object, b. mask, another bytes object, c. positivity, a bool. PV248 Python 7/9 October 22, 2020 The mask and the pattern must have the same length. A rule matches the data if all of its patterns match. A pattern match is decided by comparing the slice of data at the given offset to the ‘bits’ field of the pattern, after both the slice and the bits have been bitwise-anded with the mask. The pattern matches iff: • the bits and slice compare equal and positivity is True, or • they compare inequal and positivity is False. def identify(rules, data): pass Part 3: Text, JSON The first set of exercises is general text processing, while the second deals with more structured data (JSON and a bit of CSV). 1. grep.py 2. rfc822.py 3. multi822.py – submit for 0.5 points 4. report.py (needs report.json) 5. elements.py (needs elements.csv) 6. json_flatten.py Part 3.1: Exercises 3.1.1 [grep] The goal of this exercise is to write a simple program that works like UNIX grep. We will start by writing a procedure which takes 2 arguments, a string representation of a regex and a filename. It will print the lines of the file that match the regular expression (in the same order as they appear in the file). Prefix the line with its line number like so: 43: This line matched a regex, Hint: check out the enumerate built-in. def grep( regex, filename ): pass 3.1.2 [rfc822] In this exercise, we will parse a format that is based on rfc 822 headers, though our implementation will only handle the simplest cases. The format looks like this: From: Petr Ročkai To: Random X. Student Subject: PV248 and so on and so forth (for your convenience, the above example can be also found in the file `rfc822.txt`). In real e-mail (and in HTTP), each header entry may span multiple lines, but we will not deal with that. Our goal is to create a dict where the keys are the individual header fields and the corresponding values are the strings coming after the colon. In this iteration, assume that each header is unique. def parse_rfc822( filename ): pass When done, go on to multi822.py. 3.1.3 [multi822] Building on the previous exercise, extend the parser in the following way: if the given field is unique, keep its associated value as a string. However, if a certain field appears multiple times, turn the value into a list. The right-hand-side strings should be listed in the order of appearance. def parse_multirfc822( filename ): pass 3.1.4 [report] The goal here is to load the file `report.json` which contains a report about a bug in a C program, and print out a simple stack trace. You will be interested in the key `active stack` (near the end of the file) and its format. The output will be plain text: for each stack frame, print a single line in this format: function_name at source.c:32 In the next exercise, we will try to write some JSON instead: `ele- ments.py`. import json # go for `load` (via io) or `loads` (via strings) 3.1.5 [elements] In this exercise, we will read in a CSV (commaseparated values) file and produce a JSON file. The input is in `elements.csv` and each row describes a single chemical element. The columns are, in order, the atomic number, the symbol (shorthand) and the full name of the element. Generate a JSON file which will consist of a list of objects, where each object will have attributes ’atomic number’, ’symbol’ and ’name’. The first of these will be a number and the latter two will be strings. Name the output file identically to the input file, except for the extension (`.json`). Note that the first line of the CSV file is a header. import csv # we want csv.reader import json # and json.dumps def csv_to_json(filename): pass 3.1.6 [flatten] In this exercise, your task is to write a function that flattens json data. Flattening works as follows: The result is a single-level (flat) json with key-value pairs, the keys representing the former structure of data. We could use any (unique) separator to indicate the nested structure, which would allow unflattening without loss of information. We will use the dollar sign ’$’. If you do encounter ’$’ in the original data, replace it with two dollars ’$$’. Assume that there are no keys composed entirely of numbers. Example: { 'student': { 'Joe': { 'full name': 'Joe Peppy', 'address': 'Clinical Street 7', 'aliases': ['Joey', 'MataMata'] } } } Flattened: { 'student$Joe$full name': 'Joe Peppy', 'student$Joe$address': 'Clinical Street 7', 'student$Joe$aliases$0': 'Joey', 'student$Joe$aliases$1': 'MataMata' } The simplest way to go about it is to use recursion. def flatten( data ): pass Part 4: SQL This week, we will look at some basic SQL. First, we will import and ex- port JSON data to/from an SQL database and perform simple searches. PV248 Python 8/9 October 22, 2020 The schema for all three exercises is in books.sql. 1. book_import.py – import data into a database 2. book_export.py – the converse 3. book_query.py – use SQL to search for things The second part will build up a very simple CRUD-type application (CRUD = Create Read Update Delete). The interface will be through Python objects. The subject will be shopping lists. 4. list_make.py – creating shopping lists 5. list_search.py – searching and reading 6. list_update.py – update and delete Part 5: Operators, Iterators, Decorators and Exceptions This week, we will look at some of the more advanced language features of Python. The first couple of exercises will be about iterators and generators: 1. iter.py – simple iterators 2. flat.py – a simple generator We will then move on to operator overloading: 3. poly.py – polynomials 4. mod.py – finite rings of integers mod N (/n) Finally, the last will cover exceptions and decorators: 5. with.py – context managers vs exceptions 6. trace.py – decorators part 1 7. noexcept.py – decorators and exceptions Part 6: Closures and Coroutines There will be two sets of exercises, one for closures and another for coroutines. The former is rather general, since closures work more or less the same in all languages. The latter, however,, will mainly look at low-level aspects of how coroutines are implemented in Python. 1. tbd 2. tbd 3. tbd 4. gen.py 5. interleave.py 6. tbd Part 7: asyncio Basics This week, we will get acquainted with asyncio, the framework for writing asynchronous programs in Python, based on native coroutines and non-blocking IO. Part 8: More asyncio While the lecture this weeks goes into low-level details of asyncio, we will stick with the high-level interface. We will write some simple servers and clients using sockets. Part 9: Programming Exercises This week will consist of a few generic exercises, since the lecture does not cover a specific topic. Part 10: Testing This week will cover hypothesis, a rather useful tool for testing Python code. Hypothesis is a property-based testing system: unlike traditional unit testing, we do not specify exact inputs. Instead, we provide a description of an entire class of inputs; hypothesis then randomly samples the space of all inputs in that class, invoking our test cases for each such sample. We will look at two types of programs to use hypothesis with, first some floating-point linear math: 1. inner.py – properties of the inner vector product 2. cross.py – same but cross product 3. tbd And some classic computer science problems: 4. sort.py – sorting algorithms 5. bsearch.py – binary search 6. heap.py – binary heaps Part 11: Numeric Math Since Python is often used as a driver for numeric algorithms, with numpy as the backend, we will try to explore some of its basic functions. The first part will focus on linear algebra. 1. linear.py – warm up 2. volume.py – volume of n-dimensional polyhedra 3. tbd While in the second we will look at signal processing. 4. histogram.py – drawing histograms 5. sig.py – generating signals 6. dft.py – analysing signals PV248 Python 9/9 October 22, 2020 Part 12: Statistics TBD.