CS3223 Database Systems Implementation

Module: CS3223 Database Systems Implementation

Semester taken: AY 2020/21 Semester 2

Lecturer: Prof Tan Kian-Lee

Tutor: Prof Tan Kian-Lee

Textbook: Database Management Systems, 3rd ed., Raghu Ramakrishnan & Johannes Gehrke, 2003. McGraw-Hill

What it is about

This module is a follow-up to CS2102 but focuses more on how database management systems are designed and built. Students will learn topics such as storage management, algorithms for querying data, concurrency control, etc. The knowledge gained will be immediately applied through a group project to implement a simple database management system.

Assessment components

  • Gradiance Assignments: 5%

  • Mid-term test: 20-25%

  • Tutorial participation: 5%

  • Project: 35-40%

  • Final exam: 30%

Comments

The knowledge from CS2102 would barely be useful in this module, as the focus is more on how database management systems are implemented, rather than coming up with SQL queries for obtaining data. You will only need a basic understanding of what the SQL query means, but you will focus mainly on how to obtain the correct set of data from the database, which is stored in a certain way that you determine (covered under storage management topic).

The content in this module comprises some aspects of CS2106 in terms of storage management and CS2040 in terms of the hash and B+ tree data structure. You will also get to learn some additional algorithms that are used for processing JOIN statements. It is overall a tough but exciting module, as you will be able to learn how modern database management systems are designed. Even though you may not create new database management systems in your day-to-day life, it is still important to know it, so that you can design your databases in a better and more optimised way.

Lectures

There is a 2-hour lecture per week, and Prof Tan will upload his slides onto the course website (which is publicly viewable). During this semester, the lecture is early in the morning at 8AM on Zoom, which of course I did not attend. The lecture sessions are recorded, so it is possible to view them at a later time, but I did hear that Prof Tan regularly stops the recording to talk about things beyond the course material and sometimes offer more detailed explanations, which can be quite helpful in understanding the complex material. I regretted not attending the live lectures and definitely recommend that you do.

There are Gradience assignments to be completed every now and then, which serve as a good recap for the lecture content. They come in the form of MCQs and are randomised from a test bank. You have unlimited retries, and only having full marks for the assignment will count towards the grade. Revise your lecture content before attempting and use this opportunity to address any misunderstandings you have about the material.

Tutorials

There is a 1-hour tutorial per week, and I was fortunate to get Prof Tan as my tutorial tutor. However, he has the habit of not giving the answers at all, and expect students to be the ones coming up with the answer. If students give the wrong answer, he will correct them, but if the whole class remains silent, he would rather leave the questions unanswered even until the end of the tutorial. If you are used to being spoon-fed answers, then you probably will not have a good time here.

For each tutorial, some students will be selected to present the answer to the class, and the tutorial session will be spent going through the tutorial using the student answers. If you are chosen to present a certain question for that tutorial, be sure to prepare a detailed answer along with explanations on how you arrived at that answer. The tutorial participation comes from attendance, so be sure to turn up for every lesson.

Project

This is the toughest component of the entire module, as you will have limited time to write code using a brownfield project that was written almost 2 decades ago. You will work in teams of 3 people, and will be given a simple database management system with only the query processing implemented. Your task during the semester is to implement the other features required, such as the index, ORDER BY, JOINs, aggregate functions, DISTINCT, etc.

For this semester, we were supposed to finish implementing everything by Week 8, which many people were not able to, as midterms had just finished. We were eventually given a week's extension, but even then it was still a mad rush as it is difficult to test the code. There was a lack of structure and guidance for the project, which was something that we feedback to Prof Tan.

The project is written in Java, and I recommend starting early as it takes some time to understand the codebase and how the query processing is done. Write clean and well-commented code, as it will help you a lot in the later half of the project.

Mid-term and Final Test

The mid-term test was an open-book test that was held physically, and had 20 questions for 75 minutes. You are only required to give the final answer, and some of the questions can be MCQs. The questions are harder than the Gradience assignment questions and are similar in standard with the tutorial questions.

The final exam had the same format as the mid-term test for Section A, with 21 questions (although 1 question was nullified). There was a new Section B, which is for long-answer and fill-in-the-blanks questions. The exam was 2 hours long and focuses on material covered after the recess week.

Be sure to practice as many questions as you can, so that you are very sure with the concepts, as many of the questions require you to know the deeper meaning behind certain concepts.

Other information

Assignment workload: There are multiple Gradience assignments to complete throughout the semester, plus some tutorial preparation to be done if you are a presenter.

Project workload: Very high, as it is a brownfield project and requires you to implement algorithms and data structures in a certain manner.

Readings: None

Recommended if: You are a Computer Science student interested in database-related work in the future. I would not recommend any other majors taking this, as the content would hardly be useful for them (even data science students).

Rating: 4.0/5. Interesting content but can be a little tough to understand sometimes. Project can be quite tough to do.

Expected grade: A

Actual grade: A- (did not do too well for mid-term and final tests)

Last updated