Weidong's CSC148 Blog

Saturday, March 29, 2014

Week 11: Sorting and Efficiency

    Last two weeks, we learned about sorting and efficiency. In terms of efficiency, the major concentration is on running time complexity. To compare different algorithms, we should make a choice on standards. For sorting algorithms, we choose the length of a list as one standard. And also, we ought to find a measurement counting for one 'step'. For example, in 'insertion sort' algorithm, we may choose one assignment statement as one 'step' or one comparison as one 'step'. Running time complexity of an algorithm may get various results based on different standards. For instance, we may have running time complexity of '5n^2+3' based on one standard and 'n^2+log n' based on another. That is the reason why big-oh notation so useful. It provides an asymptotic upper bound for an algorithm. In the previous example, both '5n^2+2' and 'n^2+log n' belong to O(n^2). That significantly reduce the troubles when comparing two algorithms' running time complexity. Follows I will write down some sorting algorithms I have learned.
    In CSC108, I learned three O(n^2) algorithms: bubble sort, insertion sort and selection sort. These two weeks I learn some more efficient sorting algorithms: quicksort(O(n log n)), merge sort(O(n log n)), count sort(O(n) but it has preconditions), Tim sort(O(n log n), used in Python built-in sorting method). To get detail information about these algorithms, refer to these links: http://interactivepython.org/courselib/static/pythonds/SortSearch/sorting.html
http://en.wikipedia.org/wiki/Timsort
http://en.wikipedia.org/wiki/Counting_sort
    The common characteristic of bubble sort, insertion sort and selection sort is that they check items one by one. So intuitively, it is correct to have same asymptotic upper bound. Specifically, bubble sort repeatedly compares neighbour pairs and swaps if necessary, insertion sort repeatedly adds new element to the sorted result and selection sort repeatedly picks the smallest element to append to the result. Quick sort, merge sort, Tim sort are all 'divide and conquer' algorithms. Basically, they split the whole list into two halves and run recursively until reach the base case. So similar with binary search, dividing a list to two halves requires running time 'log n'. Roughly, on length n, we have running time 'n log n'.
    I caught a good website of learning sorting algorithms from my classmate's SLOG(http://lbg940131.blogspot.ca/): http://bigocheatsheet.com/. I strongly recommend it. I also recommend this SLOG(http://lbg940131.blogspot.ca/). Its analysis is sound and in depth.

Sunday, March 23, 2014

Week9: Binary Search Tree Data Structure

       In this week's lecture and lab, we concentrated on the binary search tree data structure. A binary search tree is a binary tree whose data in left subtree is less than that in the root, which in turn is less than that in right subtree. Also there is no duplicate node in the binary tree. With these properties, searching becomes more efficient.
    To find a data in a binary search tree, we compare the data with the root value. If the data is greater than the root value, we will search the right subtree recursively and if the data is less than the root value, we will search the left subtree recursively with appropriate base cases.

Insertion is similar, we compare the data with the root value. If the root is None, then we create a new binary search tree node. If the data is greater than the root, than search the right subtree and if the data is less than the root, search the left subtree. Eventually, we will reach a external node to add a new node to the binary search tree.

Deletion is a little bit complicated. There are roughly three cases: deleting a leaf, deleting a node with no left child, deleting a node with no right child.

    Algorithm for _delete:
      1. If this node is None, return that
      2. If data is less than node.data, delete it from left child and return this node
      3. If data is more than node.data, delete it from right child and return this node
      4. If node with data has fewer than two children, and you know one is None, return the other one
      5. If node with data has two non-None children, replace data with that of its largest child in

the left subtree, and delete that child, and return this node

In this week's lab, we wrote a function count_less in binary search tree using several different ways. Recursion is a good way to deal with binary search tree, and looping is also a good way.

In Wikipeia, there is more information about binary search tree that I can get approach.

Reference: http://en.wikipedia.org/wiki/Binary_search_tree

Sunday, March 9, 2014

Week8: Linked List Data Stucture

    In these two weeks lecture, we learned another data structure, linked list. Linked lists are made up of nodes, where each node contains a object and a reference to a linked list. So, linked list is a recursive definition. It is either empty or a node that contains an object and a reference of another linked list. Also if the reference is itself, we make a infinite list.
    We use linked list to implement some abstract data types. In theses two weeks' labs, We implement the ADT queue and stack using linked list instead of using Python built-in type list weeks ago. We found it hard to implement queue's 'enqueue' and 'dequeue' method without methods similar in list. The basic approach to the problem is to store two attributes: a reference to where you add new linked list elements, and a reference to where you remove the oldest linked list element. Stack is easier to implement because its methods, push and pop, deal with the very last item.
    Linked list methods. List have a lot of methods. So we can write these methods in class LinkedList. Recursion is greatly used in writing these methods to deal with indexes. Also in this process, I learned some equivalent ways to call these method. __contains__, __len__, __getitem__, __setitem__, __delitem__ have equivalent ways to call them in Python. See lab handout for more details. In addition to that, when I talked to TA, we know the methods 'prepend' and 'decapitate' have a problem that when we implement them, a piece of memories become 'garbage' without reference to it. For example, if we implement decapitate on a linked list, the rest becomes the new head and the rest.rest becomes the new rest. But the problem is that the previous head becomes 'garbage'. When I google it, I found they are called 'garbage collection' which has both advantages and disadvantages.

    Some useful reading materials:    http://openbookproject.net/thinkcs/python/english3e/linked_lists.html
http://en.wikipedia.org/wiki/Linked_list
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

Sunday, March 2, 2014

Week 7: Recursion

    We have been working with recursion since the fourth week of this course on lectures, labs, exercises and assignments. Here I will write down some usage of recursion including some examples which I encounter during my study. Even now, I cannot confidently say I have master all the ideas of recursion, but at least part of it. Write down them and remind myself in the future.
    The basic idea of recursion is that we define a base case for a function, then step one by one. We get any of what we want. To write a recursive function, the very first thing is to think about the base case. Sometimes the base case may not be written individually, but more time we will write them explicitly. Next is the recursive step. What we will do is just assume we are done with the function and it is definitely correct, and then try to get the production.
    Here for some examples.

This example is from the lecture and it was my first time to feel the power of recursion when encountering it. It returns the depth of a nested list. It seems difficult to tackle this problem. But if we assume we have done the function and things get much simple. For base case, when L is not a list, it should return 0. And then we use a list comprehension and built-in function 'max' to get the maximum depth of each element in L. Plus one and get the result. It is not hard when we look back.
Next example:

    This function is trying to get the height of a Tree. Tree is a newly kown data structure. Using recursion to deal with Tree problems is quite helpful.
    In additon, Assignment 1 uses a lot of recursion. Also some examples in the reading materials.
http://openbookproject.net/thinkcs/python/english3e/recursion.html .They may be helpful in the future.
    In week 7's lab, we use recursion to write some methods of linked list ADT.

Monday, February 17, 2014

Week 6: Binary Tree and Usage of Some Built-in Functions

   This week, I learned something about recursive structures. Binary tree is a kind of recursive structure. Generally, it has one node, left tree and right tree. Left tree and right tree can be either none or another tree. It is where the recursive structure from. Tree-list is one way to represent a tree. And it can also be represented as a class. To traversal a tree, we have pre-order(node, left, right), in-order(left, node, right) and post-order(left, right, node). I did not know the usefulness until I went through some examples in use of tree structure in the reading material. For instance, we can use a tree expression to translate a mathematical expression like '(3 + 7) * 9' (see the graph above). Also use a binary tree to deal with a series of recursive questions is quite benefit. We can let the node be a question. If the answer is 'yes', read the right tree and if the answer is 'no', read the left tree. And we can create another tree to create more question recursively. See how to think like a computer scientist - Tree , it is very interesting. I have scanned the handout for assignment 2. It is also about recursive structure. In my opinion, translating a regular expression to a tree representation is quite helpful when dealing with problems like matching strings.
    Another thing I get this week is some use of built-in functions. They makes code shorter and the program more efficient. See Built-in Functions .
          1. all(iterable)

Return True if all elements of the iterable are true (or if the iterable is empty).

2. any(iterable)

Return True if any element of the iterable is true. If the iterable is empty, return False.

3. filter(function, iterable)

Construct an iterator from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.

4. map(function, iterable, ...)

Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().

5. repr(object)

Return a string containing a printable representation of an object.

6. zip(*iterables)

Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. With a single iterable argument, it returns an iterator of 1-tuples. With no arguments, it returns an empty iterator.

                         >>> x = [1, 2, 3]
                         >>> y = [4, 5, 6]
                         >>> zipped = zip(x, y)
                         >>> list(zipped)
                         [(1, 4), (2, 5), (3, 6)]
                         >>> x2, y2 = zip(*zip(x, y))
                         >>> x == list(x2) and y == list(y2)
                         True

We wrote some equivalent code during the lab using non-built-in functions. But use of built-in functions as well as built-in modules is of great importance to program in python.

Reference: http://openbookproject.net/thinkcs/python/english3e/trees.html

http://docs.python.org/3.3/library/functions.html

Thursday, February 6, 2014

Week 5: Name Problems in Python

    At the beginning of learning to write functions, I find local variables very confusing. Through a period of study and this weeks' topic, I gained a deeper understanding about name matters. As follows I wrote down some points in order to remind myself.
    As I learned in Wednesday's lecture, every name in Python has a value and every value is a reference to an address of an object. Current module or '__main__' names are global names. It is visible to the whole module. Inside a function, the names are local names, which are only visible under the function. Outside the function, the name is not defined. But inside a function, we may use key word 'global' to define a name globally, or it is seen as local by default. In cases we define a function inside a function, we may use key word 'nonlocal' to get approach to the name in upper layers. python docs namespace example is a great help to understand them. Sometimes, names that seems to be same are totally different. Python reads local name first without 'global' 'nonlocal' or other key words. Also in Python there are some built-in names. When we write our own code, we should try to avoid using global names or built-in names. It may get messed if we use them.
    In term of names in class methods, child class sometimes overrides its super class. When we call a child class method that also appear in its super class, Python runs the method in the child class, no matter where the method call is(even inside a super class method call). Recursive functions are sometimes a little hard to trace, because in the stack frame(seen in visualizer), I find the same names appear many times and the frame repeated itself again and again. So in general, do not trace recursive functions too far.

def scope_test():
    def do_local():
        spam = "local spam"
    def do_nonlocal():
        nonlocal spam
        spam = "nonlocal spam"
    def do_global():
        global spam
        spam = "global spam"
    spam = "test spam"
    do_local()
    print("After local assignment:", spam)
    do_nonlocal()
    print("After nonlocal assignment:", spam)
    do_global()
    print("After global assignment:", spam)

scope_test()
print("In global scope:", spam)
# From http://docs.python.org/3.3/tutorial/classes.html#scopes-and-namespaces-example

Saturday, February 1, 2014

Week4: Use of Unittest

    In this week's lab, we used the Python test module 'unittest' to test the methods of the class we wrote. Accustomed to use doctest in the past, I find unittest unfamiliar. Here I will summarize some basic use of unittest to remind myself in the future.
    To test a module, we should write in a new module and import unittest module and the the module we want to test. To get start, we should write a subclass of unittest.TestCase or unittest.FuctionTestCase. The method setUp() runs automatically before testing each method. And likewise, the method tearDown() runs automatically after each test of cases. To test a method or a function, we write methods starting with the four lowercase 'test', so that the module would test it by itself. In each test case method, we call one of the method *.assert**(), such as *.assertEqual() to verify an expected result, *.assertTrue() to check a condition and *.assertRaises() to check an exception to raise. In particular, *.assertRaises() is what I find most uncomfortable because it is something I never use. It should be used as *.assertRaises(exception, callable, *args, **kwds). In lab time, we spend a long time to deal with the raises test. At last, we call the main() function of unittest to test all the subclasses of unittest.TestCase.
    The most important thing to test a module is the choice of test cases. Poorly designed cased may not find important bugs in a module, so it will spend more time in the future. The general consideration includes size, dichotomies, boundaries, order, etc.
    I find unittest extremely useful, because unlike doctest, it can contain more test cases. Also in doctest, it has shortcomings such as dealing with newline character '\n', which is what I found in an exercise. But this will not happen in unittest.

Reference: http://docs.python.org/3.1/library/unittest.html