Program Issues

Before proposing changes to the curriculum, I’d like to share what I see as the issues in our program. The problems I highlight are informed by my experiences as a student and as a TA, and by what I’ve seen in other universities’ curricula, including CMU, Waterloo, UoT, and UIUC.

I believe that our program lacks a strong foundations, both in theory/mathematics as well as computing. We do not measure up to the depth of algorithms, proofs, and theoretical computer science present at other universities, both in the US and in Canada.
1. Students have no idea how their languages work (compiler, linker, make files, build systems, etc).
  1. Anecdotally, many students feel that they don’t know how compilers and linkers work, what header files are, etc.
  2. Waterloo spends an entire course on this + code generation + parsing CS 241: Foundations of Sequential Programs
2. We lack a solid systems foundation
  1. Operating systems at UBC does not cover how unix operating systems work, their design patterns, or how to use them.
    1. A standard undergraduate textbook in OS shows that we don’t even scratch the surface.
      1. OSTEP covers design decisions for the API, scheduling, greater depth on process isolation, malloc implementation, segmentation, physical memory management, greater depth in virtual memory management, how concurrency structures are implemented in the OS, the threading API, more in depth details on IO and filesystems, journalling for crash recovery.
      2. Stanford implements full filesystem, journalling system, a shell, 2 exercises on concurrency, and implementing virtual memory management. https://web.stanford.edu/class/archive/cs/cs111/cs111.1244/
      3. UIUC implements shell, malloc, 2 concurrency exercises, a filesystem, and a client/server assignment using sockets.
  2. Concurrency is introduced in 213, but never delivered on in 313.
    1. Students never create thread safe datastructures.
    2. Students do not know how to take advantage of processes and threads to improve their programs (ex building map reduce or parallelized make in UIUC)
3. Our theory of computation education is insufficient
  1. Theory of computation is required material
    1. UoT CSC236H1: Introduction to the Theory of Computation
      1. Covers Formal language theory
    2. Waterloo CS 245: Logic and Computation
      1. Propositional logic, first order logic, automated theorem proving, formal deduction in first order logic, turing machines, decidability, Peano arithmetic, Goedel’s Incompleteness Theorem, program verification
    3. UIUC
    4. CMU
4. Our proofs education is insufficient
  1. The proofs gap in 2nd year between cpsc 121 and 320 is detrimental to any mathematical maturity that may have been built in 121
  2. Other Canadian universities insist on honours, proof-based math
    1. Waterloo
    2. UoT CS specialist
  3. Significant group of students do not feel that they have enough experience writing math proofs. (40%)
  4. CPSC 121 is insufficient proofs course
    1. Shares time with computer architecture
    2. Does not introduce full range of mathematical proof such as set proof (image and pre-image), relations, functions, injectivitiy/surjectivity, and cardinality
    3. Much shallower than Math 220
    4. Much shallower than UIUC’s discrete structures CS 173
    5. We are missing out on full Theory of Computation course
A computer science program is meant to deliver a core set of skills and knowledge to every student that successfully completes it. Fundamentally the curricular decision making comes down to balancing trade offs between the depth and breadth of content, the difficulty, and the flexibility of the program. The expected outcomes of the major depend on which of these we prioritize. UBC’s CS program provides excellent flexibility since it requires few courses, and it allows students to tailor their university experience to their interests. Students can decide how challenging the program is with the flexibility to manage their schedules and because core courses offer an appropriate level of difficulty. However, I argue that our CS program undervalues depth and breadth of foundational knowledge, and that this hurts students more than it helps them. Indeed, lacking the foundations can make catching up an uphill battle. I personally felt that I had to do so much catching up in systems and my programming skills to even compete with other students who’ve been given those foundations. My rule of thumb is that the decisions should be made to open doors for students, not close them off. To keep doors open, the program should deliver on the level of knowledge that is expected of new graduates from both academia and from industry. Currently, a UBC student simply can’t compete in today’s software engineering market or graduate school without doing a lot of work on their own. This leaves behind students who are less connected, less knowledgeable, and less wealthy. I believe it is disingenuous to present this as a complete CS education when it does not meet the level of competitors. Furthermore I believe it is possible to achieve this level of foundations without making our program impossibly difficult. I will first show how our program doesn’t achieve a comparable level of knowledge to other programs in terms of mathematical maturity, theory of computation, systems, and in programming. I’ll show what other universities do to teach each of these fields. Finally, I provide my opinion on how the program should proceed in the long term. While these changes may take years to implement, I think that identifying these shortcomings can inform the choices we make about courses today.

Mathematical Background

I believe mathematical maturity should be a core learning goal of any CS program. All graduates should be comfortable writing and reading proofs. Proofs are an exercise in technical communication and argumentation. We must convince the reader of a result, citing our reasoning. Even if a graduate never writes a formal proof, the structured and logical approach of proofs is widely applicable in an engineer’s duties. Moreover, graduates should be able to apply proof techniques to CS areas such as algorithms, compilers, or distributed systems. If graduates do not reach this level of expertise with math, we close doors to the fields that demand it. Personally, I’ve grown a lot from my extra proofs background and I attribute most of my problem solving skills to proof courses I’ve taken. I’ve even made use of proof skills during two of my internships. The first, at Monad Labs, where we were expected to read and apply the (mathematically dense) literature on the Three Generals Problem, Byzantine Fault Tolerant algorithms, and the FLP impossibility result. The second is when I worked at Huawei as an NLP research intern, where I was expected to read papers in machine learning. These were great experiences, and I want our students to have a shot at work like this!

Let’s take a look at what UBC CS has so far. Our degree teaches proofs in CPSC 121, 221, and 320. Despite that, I believe we don’t set students up to reach mathematical maturity. Both CPSC 121 and 221 spend at most 50% of the course getting students to practice proofs. The first half of CPSC 121 covers predicate logic and circuits, and only begins proof techniques on module 7 out of 11. Notably, some of the written proofs in CPSC 121 have been replaced with fill in the blank or drag and drop proofs!$^0$ CPSC 221 alternates homework each week, with only half the homework letting students exercise their proof skills. Looking forward to 3rd year, CPSC 320 is a decently heavy proof-based course. Our students graduate with at most 7 credits worth of proofs experience. As someone who’s taken MATH 220, 223, and 320, and having TAed CPSC 320, CS students lack experience working with proofs. A large portion of students feel the same way, with 40% of respondents on the survey claiming that they do not have enough experience writing proofs. TODO: Provide survey data on notion

Let us consider what other universities do. Both Waterloo and UoT require that their students take 6 credits of calculus with proofs in their first year$^1$. Right off the bat, their students get 2 courses worth of proofs exercises. At UBC, this would be equivalent to taking MATH 120 and 121. In addition to the proof-based calculus, UoT supplements them with 3 credits worth of proofs exercises in their 9 credit full year first year course: CSC110Y1: Foundations of Computer Science I / CSC111H1: Foundations of Computer Science II $^2$. It covers predicate logic, first order logic, number theory, function specification, correctness proofs, and runtime analysis, correctness of cryptographic algorithms (Diffie-Helman), induction/recurrences, proofs on graph algorithms. Quite the busy first year! Waterloo on the other hand adds an additional Math requirement in their Math 1[34]5: Algebra, which introduces proof techniques, set theory, induction, relations, injective/surjective functions, number theory, among many other topics. The closest comparison at UBC would be a combination of MATH 220 and a number theory course. All in all, students at UoT and Waterloo come out of their 1st year with nearly 9 credits worth of proof experience! In addition, both UoT and Waterloo provide a proof-heavy 2nd year course (CSC236H1: Introduction to the Theory of Computation, CS 245: Logic and Computation respectively), and a 3rd year algorithms course just like us (CSC373H1: Algorithm Design, Analysis & Complexity and CS 341: Algorithms respectively). After a total of 15 credits of working with proofs in their program, UoT and Waterloo students can confidently tackle proofs that come up later in their careers, whether in algorithms, theory of computation, distributed systems, formal methods, compilers, or just convincing a coworker that their code is correct.

Much like programming, proofs demand time on the saddle. I think the heavy proofs emphasis in first year is essential to building proof confidence in students. Taking inspiration from both UoT and Waterloo, UBC CS should aim to require MATH 120 and MATH 121 in first year. In addition, I think one course that focuses entirely on proofs is essential. This should introduce Number Theory, introduce all the basic tools of mathematics like sets, relations, functions and their properties, and hopefully tie these ideas back to computer science with algorithms and cryptography proofs. There are two routes we can take: The first option is to move CPSC 121 to be a proof-heavy course that covers the above content. This means the other content in CPSC 121 should go elsewhere. Stay tuned for that! The second option is to adopt MATH 220 as a first year course, since it is already a solid and well tested introduction to proofs course. One drawback to this approach is that its content may overlap too heavily with MATH 120/121. Either ways, both of these options will give students ample experience coming out of first year and sets them up for success. However, building on top of CPSC 121 would be a preferable route.

Footnotes:

0: A drag and drop proof constitutes a proof that has been split up into several boxes. Given a question, the student is expected to order the boxes to create a correct proof.

1: UoT requires this for their CS specialist program, which is the closest program to UBC’s current CS program

2: This is a 9 credit course which historically was 3 separate courses: 2 Intro to programming, and 1 discrete math course. About a third of this content is proof exercises.

Theory of Computation

UBC’s CS program covers theory of computation (TOC) in CPSC 121 and CPSC 320. In CPSC 121, we discuss regular expressions, finite automata (both deterministic and nondeterministic) and briefly touch on Turing machines. All of this content is covered in a total of 2 lectures in CPSC 121 before proofs are introduced. In CPSC 320, we teach complexity classes and reductions to prove that problems belong to certain classes. While we cover some of the basics of TOC, we teach a fraction of the content that is covered elsewhere. I’ll start off by highlighting what other universities cover, then make an argument as to why we should incorporate more theory of computation in our curriculum, and even add a required 2nd year course. Let’s take a tour of Theory of Computation in Waterloo and UoT.

Starting with Waterloo, they cover TOC concepts in both CS 245: Logic and Computation and CS 241: Foundations of Sequential Programs. Waterloo’s CS 241 goes over the basics of programming language and compiler construction. Motivated by compilers, it covers DFAs and NFAs, Regular languages, and context-free languages, and context-aware languages. Additionally, they apply these concepts to understand both bottom-up and top-down parsing, code generation, and ultimately explain the internals of a C-style compiler. Not only does Waterloo expand on the theory of languages and grammars, they also show the motivation to study this field and apply it to compiler construction. Additionally, Waterloo’s CS 245: Logic and Computation goes further than UBC on predicate and first order logics by also studying the concept of proof systems and formal verifiers. They showcase algorithms for formal deduction of predicate and first order logic (ie given a set of truths, prove or disprove a logical expression). This course also introduces the concepts of soundness and completeness and leads into discussions on automated theorem solvers and unification. The course finishes off by proving Goedel’s Incompleteness Theorem, discussing Turing machines, decidability, and the Halting problem. Waterloo definitely delivers content beyond the realm of TOC, but what it does deliver leaves students equipped to conquer any future encounters with computability, logic, and programming languages.

UoT certainly doesn’t cover as much as Waterloo, but still exceeds UBC’s content through their TOC course CSC236H1: Introduction to the Theory of Computation. This course goes into more depth on induction, recurrences, algorithm complexity than CPSC 121. It also covers Formal Language Theory theory for 3 weeks. They show the equivalence of DFAs, NFAs, and regular expressions, formally define regular and non-regular languages, and introduce the Pumping Lemma. While they miss out on Turing machines, computability, halting problem, and Goedel’s Incompleteness Theorem, their students still get more rigorous study of automata and languages than we do.

We notice that both Waterloo and UoT cover more content on formal languages theory, and notably do so in their 2nd year courses. Formal languages, automata, and computation models are foundational for compilers, language theory, and algorithms. With their extra language theory background, UoT and Waterloo students are better equipped to handle problems involving code generation and parsing, which show up all the time in computing. Waterloo students go so far as understanding how their entire C compiler is built through the lens of formal languages theory. Speaking personally, I’ve had to do parsing and code generation for both my internship at Monad labs, as well as my research project at Systopia. Finite automata are such useful tools that help model many problems in distrubuted systems, networking, programming languages (as shown above), and even operating systems. Furthermore, turing machines, decidability, and the Halting problem are fundamental to notions of computability. In our current program, all students should know that solving an NP-hard problem in polynomial time is an exercise in futility (or an exercise in winning a Turing award). Why shouldn’t they also know that solving the halting problem is yet another exercise in futility?

As Joey Eremondi from stack exchange puts it:

You want a compiler that finds the fastest possible machine code for a given program? Actually the halting problem.

You have JavaScript, with some variables at a high security level and some at a low security level. You want to make sure that an attacker can't get at the high security information. This is also just the halting problem.

You have a parser for your programming language. You change it, but you want to make sure it still parses all the programs it used to. Actually the halting problem.

This is where I align more with Waterloo’s approach to their curriculum where they consider Turing machines and all their results to be essential content. While UoT covers Formal Language Theory, I think they could go further to incorporate these fundamental results in computing. Here at UBC we can choose

I think we should introduce a 2nd year TOC course is a practical one. I’ve shown above that our mathematical maturity is lacking. Part of the reason is that currently, a student in UBC CS just doesn’t practice their proof skills in 2nd year. A 2nd year course on TOC helps bridge the mathematics introduced in 1st year with the mathematics expected in 3rd year. The survey results show that 40% of students don’t feel that they get enough proof experience. I believe a proofs-heavy TOC course would remedy this problem and build a strong mathematical maturity among UBC students.

Systems

TODO
Course topics should be cohesive and focused.
1. 121 tries to cover proofs, computer architecture, theory of computation. This leads to a bad base in all of them.
  1. Math 220 covers more proofs.
  2. UoT and Waterloo have entire theory of computation and computer organization courses..
  3. 58% of students want a dedicated Computer architecture course.
2. 313 tries to cover computer architecture and operating systems. Our students come out not knowing enough operating systems.
  1. Students learn much less than UoT, Waterloo, and UIUC on operating systems.
  2. Students do not understand the role of the operating system as application programmers, nor do they understand the abstractions provided by a Unix OS, or the reasons for the isolation mechanisms.
  3. Students don’t know how to relate the caching knowledge they learn to software applications. When are index and offset bits necessary, when is data locality important, what does a software cache look like? Questions I was asked on the eve of the final.
3. 121 covers a mini x86, 213 does SM213, 313 does y86, 436a does arm. So much time wasted.
4. 213 and 221 both independently have to teach memory management and pointers to students.

On curriculum structure: Say no to cramming, say yes to planning

Course and curriculum design at UBC has seen a lot of change over the years. The curriculum’s evolution has uncovered structural pitfalls that we should seek to avoid moving forward. The first issue I see is that we cram content into courses. In trying to deliver all the foundations, we have courses that try to deliver too much in too little time. I will dissect both CPSC 121 and CPSC 313 to show how we have crammed content into them, and how it creates cracks in students’ foundation. The second structural issue I see is the lack of holistic curriculum planning. I show how this manifests in our courses, how other universities deliver coherent curricula, and lessons we can learn.

Good course content should aim for students to have a deep understanding to the level where they can apply the topic to solve problems and analyze it rigorously. Knowledge and skills do not persist when the exposure is cursory, not motivated or applied, and doesn’t cover design decisions and trade offs. It also helps to highlight connections to existing knowledge. As instructors, we should structure our courses to support this long term, densely connected knowledge. Courses with a lot of crammed content simply cannot deliver this ideal, and I’ll show why. Lets start with CPSC 121. It aims to introduce students to proof techniques, automata, regular languages, logic, algorithm analysis, digital logic, basic CPU design, and assembly. This course is trying to cover a lot of ground given that it is the only introduction to proof that students get, this is half the theory of computation students get, and this is half the digital logic and computer architecture students get. I reiterate. This is a single course. This content would be taught in 2 or 3 courses elsewhere. I’ve already shown that we are severely under prepared on mathematics, theory of computation, and computer architecture. It is clear to me that the reason is we’ve chosen to cram the content into a single course. Let’s consider an example. We teach the idea of pipelining and assembly in CPSC 121 through the paper computer lab. Despite that, we have to teach it CPU design from scratch in CPSC 313. Why has nothing stuck? The reason is that pipelining and assembly sections of CPSC 121 were covered in a single lab. What insight can a student gather if they look at material for a day and move on? The nuance, motivation, impact, and design decisions are entirely lost because it is impossible to deliver that in a single lab. They didn’t get to apply their new knowledge on different problems, nor did they get to consider trade offs in design. Similarly, students spend a total of 2 lectures on automata, turing machines, and regular languages. I guarantee that the average student does not understand the significance of computability and languages after only two lectures. They haven’t proven any results and they’ve barely applied their knowledge to solving tough problems. This level of knowledge does not stick. On the other hand, 313 covers CPU design, caching, filesystems, and virtual memory, and kernel basics. 313 is equivalent to teaching a computer architecture course and an operating systems course in one. Because of this, 313 is forced to rush much of the OS content in the end, leading to so much missed systems content. As a 313 TA, my students felt that they did not understand the role of the kernel, how core OS systems like virtual memory work, how OSs affect their programs, or how to use OS abstractions to their advantage. How are students completing a 3rd year OS course, and cannot explain or use an OS? What was the point of learning about virtual memory if students never learned how to leverage it, or what isolation properties it provides? What was the point of learning about processes when students never learned how and when to use them to parallelize their workloads? I and many of my former students had promptly forgotten the content from 313 upon completing the course. This shows that it is fundamentally broken.

Can a course incorporate several areas into one? Absolutely! In fact, UoT’s entire first year is a 9 credit course that combines 6 credits of intro to programming, and 3 credits of proof CSC110Y1: Foundations of Computer Science I CSC111H1: Foundations of Computer Science II. Waterloo combines parts of compilers and theory of computation in their CS 241: Foundations of Sequential Programs. The common thread is that combining a variety of topics is requires that you give each topic its due time, and that topics reinforce and tie in to each other. There was no logical thread between the OS and the computer architecture sections of CPSC 313. There was no tie in between the proofs of CPSC 121 and the computer architecture and the predicate logic.

How did CPSC 121 and 313 come to exist, and why are they still around? Having spoken to professors, these courses resulted from a merge of computer architecture and discrete mathematics many years ago. Today, they remain a sticky fixture on our curriculum due to the department’s resource constraints, lack of manpower, and the mountains of bureaucracy that must go through the department and faculty of science to make any changes to the curriculum. Whenever UBC CS sought to make a change to the curriculum, it was just easier and cheaper to shuffle topics around our current course structure, which lead to this mess. We are effectively trying to to fit a complete CS education that would take 10 required courses in Waterloo or UoT, and fit it into our 8 required courses. If we are to make changes to our curriculum and keep this 8 course structure, we must trim down content. The cramming must stop so students can get a good foundation in most areas. Below I propose one approach to handling the curriculum that both maintains the current course structure, and teaches students enough about important areas of CS.

Programming assignments should promote open ended design.
Commandline tools, git, etc should be a core part of the curriculum.
The software engineering track tries to teach programming, higher level abstractions, and software engineering processes all at once. In doing so concurrently, it fails to deliver on all.