[June 31 - July 07]

First Evaluation is here!

I’m officially at the middle of my Google Summer of Code 2024 program, but as I had started early, from the first week of May itself, I’ve completed a huge chunk of my GSoC project. The mid-term evaluation is coming up, which is from July 8 t July 12, and hence, here is all that I have achieved:

My Additions:

I have 3745 lines of code merged, and I am the #2 contributor by number of lines of code added.

  • Version changes: sphinx and myst_nb: PR #553
  • C++ backend for Class Node: PR #556
  • C++ backend for Class TreeNode: PR #556
  • C++ backend for Class ArrayForTrees: PR #556
  • C++ backend for Class Binary Trees: PR #556
  • C++ backend for Class Binary Search Trees: PR #556
  • C++ backend for Class Binary Tree Traversals: PR #556
    • Depth First Search
    • Pre-order traversal
    • Post-order traversal
    • In-order traversal
    • Out-order traversal
    • Breadth First Search
  • C++ backend for Class Self Balancing Binary Trees: PR #559
  • C++ backend for Red Black Trees: PR #560
  • C++ backend for Binary Indexed Trees: PR #561
  • C++ backend for Splay Trees: PR #562

Achievements:

The C++ backend provides 8-10 times faster code execution than the Python backend. Here’s a benchmark test:

def test_BinarySearchTree(**kwargs):
    cpp = Backend.CPP
    repeat = 1
    number = 1

    size = int(os.environ.get("PYDATASTRUCTS_BENCHMARK_SIZE", "1000"))
    size = kwargs.get("size", size)

    BST = BinarySearchTree
    b1 = BST(backend=Backend.PYTHON)
    b2 = BST(backend=Backend.CPP)

    def f(backend, tree):
        for node in range(-1000,1000):
            tree.insert(node, node)
    def g(backend, tree):
        for node in range(-1000, 1000):
            tree.search(node)
    def h(backend, tree):
        for node in range(2000):
            tree.delete(node)

    kwds_dict_PY = {"backend": Backend.PYTHON, "tree":b1}
    kwds_dict_CPP = {"backend": Backend.CPP, "tree":b2}

    timer_python = timeit.Timer(functools.partial(f, **kwds_dict_PY))
    python_insert = min(timer_python.repeat(repeat, number))

    timer_cpp = timeit.Timer(functools.partial(f, **kwds_dict_CPP))
    cpp_insert = min(timer_cpp.repeat(repeat, number))
    assert cpp_insert < python_insert

    timer_python = timeit.Timer(functools.partial(g, **kwds_dict_PY))
    python_search = min(timer_python.repeat(repeat, number))

    timer_cpp = timeit.Timer(functools.partial(g, **kwds_dict_CPP))
    cpp_search = min(timer_cpp.repeat(repeat, number))
    assert cpp_search < python_search

    timer_python = timeit.Timer(functools.partial(h, **kwds_dict_PY))
    python_delete = min(timer_python.repeat(repeat, number))

    timer_cpp = timeit.Timer(functools.partial(h, **kwds_dict_CPP))
    cpp_delete = min(timer_cpp.repeat(repeat, number))
    assert cpp_delete < python_delete

    print("Python Time:")
    print("insert(): ",python_insert,"s")
    print("search(): ",python_search,"s")
    print("delete(): ",python_delete,"s")
    python_total = python_insert+python_search+python_delete
    print("Total Python time: ", python_total,"s\n")

    print("C++ Time:")
    print("insert(): ",cpp_insert,"s")
    print("search(): ",cpp_search,"s")
    print("delete(): ",cpp_delete,"s")
    cpp_total = cpp_insert+cpp_search+cpp_delete
    print("Total C++ time: ", cpp_total,"s\n")

    print("C++ backend is",round(python_total/cpp_total),"times faster!")
    
test_BinarySearchTree()

Output:

Python Time:
insert():  7.025027044000126 s
search():  3.316694295000161 s
delete():  3.271200282000109 s
Total Python time:  13.612921621000396 s

C++ Time:
insert():  0.9178718929999832 s
search():  0.43746472500015443 s
delete():  0.7311443830001281 s
Total C++ time:  2.0864810010002657 s

C++ backend is 7 times faster!

Coding info

This week, I finished coding up the C++ backend for AVL Trees. The following methods were implemented:

  • AVLTree__balance_delete()
  • AVLTree_delete()
  • AVLTree_rank()
  • AVLTree_select()

I have created a lightweight testing feature that runs all the test files excluding benchmark tests. This is useful to test locally, when the benchmark tests become too computationally heavy for the contributor’s machine to handle. I also modified one GitHub action to run only benchmark tests, saving both time and memory.

Relevant PRs:

PR for C++ backend for AVL Trees: #564

In the following week, I will implement the C++ backend for Cartesian Trees.

Learnings/Difficulties

No major difficulties this week, just a few random segmentation faults, which were not critical and were solved easily. Everything going smoothly, I’m enjoying my Google Summer of Code project!


Thanks to my mentor, Gagandeep Singh, for his support throughout.

See you again after a productive week! 😊