Luigi's guide to writing Master's theses (in Data Science)

Luigi Acerbi, University of Helsinki, Finland
Last edited: 25 May 2022

This page contains a set of personal guidelines, suggestions and advice on how to write a Master's thesis. This page is not specifically on how to do research during the project (although I might write another guide at some point).

I am writing these recommendations primarily for my students from the Master's Programme in Data Science at the University of Helsinki, but many points are likely to apply to related programmes and other institutions. In fact, most of this guide generalizes to scientific academic writing in general (e.g., articles, PhD theses).

Disclaimer: There are loads of better materials elsewhere online; this page is mostly a collection of advice I realized I was repeating to multiple students, so I thought to put it in writing in a single place. Many of these points are not absolute rules, but my own sometimes-idiosyncratic opinions and personal recommendations: always double-check with your thesis advisor.

Before you start:

  • Programme instructions: Read carefully all the instructions about Master's theses provided by the Programme, which you can find at this link (select your programme from the menu).
  • Planning and deadlines: Writing the Master's thesis will take at the very minimum a full month, probably more, so plan accordingly. Be aware of the deadlines for submitting your thesis for review, and bring this up well in advance with your supervisor if you want to graduate at a given time. Generally, your reviewers will need at least a month to read and review the thesis, and the second reviewer (who is typically not directly involved in the thesis work) will need to be warned in advance. On top of that, there is the extra time needed to submit the thesis to the steering committee for approval. See the programme instructions above.
  • Start writing before you start writing: Even before officially starting the writing period, write down what you do (e.g., in a scrap LaTeX document). Write partial results, derivations, notes, etc. - no need to be particularly organized at this stage. However, if you already have written down bits and pieces of your work, it will make your life much easier later.
  • Look up other theses: Nothing better than learning by example. Since almost all University of Helsinki theses are published in the Helka database, it is easy to look at other theses. For example, this query will display all theses from the Master's programme in Data Science.

Workflow:

  • Supervisor feedback: When writing the thesis it is wise to finish at least one or two chapters early on (e.g, Introduction and Background), and share them with your supervisor so that they can point out major issues before they are all over the thesis (citations, grammar, structure...). More generally, agree early on with your thesis supervisor how the feedback on the thesis will work.
  • Meeting notes: After each meeting with you supervisor, especially if you do not meet that often, it is a very good habit to write a brief recap as a bullet point list. Briefly summarize comments/clarifications about the work discussed during the metting, and list the action points about the things you need to work on next. Send the recap to your supervisor (via email or on Slack). This is a very good way to keep track of progress and potentially clarify misunderstandings.

General thesis:

  • Layout: If you are in the hard or exact sciences, do not even think of writing your thesis in anything but LaTeX. Use the LaTeX template provided by the University of Helsinki (or your institution). If you are MSc Data Science student, you can find the template in the Moodle course Data Science MSc Thesis. For example, you can use Overleaf as an easy entry to LaTeX. The University of Helsinki has a premium license for Overleaf, accessible by signing in with your institution credentials.
  • Length: While there is no set requirement for length, a typical Master's thesis will be between 40-60 pages. This is just a broad guideline; of course, nobody will complain if you solve the Riemann hypothesis in 10 pages. Longer theses are also possible, but think whether all the content is needed; if you really think that everything is essential, at least consider putting something in one or multiple Appendices.
  • Structure: While titles can change, a typical thesis will have: a Summary/Abstract; an Introduction chapter; a Preliminaries/Background chapter covering the literature review with background theory and tools used in your thesis (cover only what you need for your work, no need to write a full textbook or to show off knowledge of unrelated topics); likely a Methods chapter explaining more in detail what you actually did in the thesis (e.g., describe your model(s), your data, your theory); a Results chapter showing your method applied to the data; and a final Discussion chapter summarizing the thesis and conclusions. If needed, you could also have an Appendix for extra material. Of course, these are just generic guidelines - depending on your thesis work, you might have two chapters with results, or no methods chapter, etc.
  • Consistency: The Master's thesis is a unified scholarly work so pay particular attention to consistency of notation, figures, tables, naming conventions, etc. across sections and chapters etc. (see below for more examples).

Content:

  • Level of detail: Generally speaking, the thesis is about reporting what you did in a scientific way. Finding the right level of detail can be tricky, but try to be both informative and brief. You do not need to write every single detail - the thesis is not a diary of what you did. On the other hand you need to provide enough information so that the reader can figure out what you actually worked on and obtained.
  • Target readership: The ideal target reader for the thesis is a peer Master's student, i.e. someone from your programme who may have taken a few different courses from you and ended up working on a completely different project for the thesis. So, when writing the thesis, think carefully what you can take for granted (e.g., you can safely assume that the reader knows what a real number is, what Numpy is, but also what K-means is), and what you may have to explain (e.g., you might have to at least write a paragraph or short section on what a Gaussian process is). As a rule of thumb, anything that you did not know before starting the thesis should be explained.
  • Negative results: Most explorations and attempts in science do not work, and the same is true for data science and machine learning. Luckily, the Master's thesis is not a NeurIPS submission, so there can be plenty of merit in exploring and reporting "negative" results. It is totally fine and in fact quite normal to report negative results in a Master's thesis (i.e., things which did not work as planned), but try to keep a scientific approach. If the proposed method did not work, can you explain (with evidence) or at least hypothesize why it didn't work? Would you have a proposal on what could be done to fix it, had it there been more time?

Equations:

  • Text format: Ensure that function names are not written in italic, e.g. in LaTeX use "\exp" and "\log" and not "exp" and "log". Similarly, use "\text{}" as needed for textual elements that are not variables. For example, use $\hat{\theta}_\text{MAP}$ to denote the maximum-a-posteriori estimate, as opposed to $\hat{\theta}_{MAP}$. We do not want "MAP" to be in italic, since it is not a variable name.
  • Notation: Even for relatively common notation, explain the notation you are using (unless it's truly basic). This is a must when there might be similar notations out there. For example, "$\mathcal{N}\left(x; \mu, \sigma^2\right)$ denotes the probability density function of a normal distribution with mean $\mu$ and variance $\sigma^2$."

Figures:

  • Basic presentation: Check that you have labelled all the axes, you have a legend if needed, the figure is readable (e.g., font size is large enough). Figures should look pretty.
  • Captions: Figure captions should be brief but informative. Describe briefly what the axes are, what is being represented in the figure (or in each panel). For example, if there is a color map, what's the color map representing? 
    Describe what the reader should look at and give a brief takeaway message if possible, i.e. why this figure is here? What is it showing of interest?
  • Link from the text: Figures are somewhat independent from the text in that they should mostly read stand-alone. However, Figures should be always referred to from the text, ideally just before or just after the figure (e.g., "As shown in Figure 1, [...]").
  • Consistency: Check for consistency across figures. For example, font size, color map, naming of axes, ordering of variables, etc. should be as consistent as possible for figures in the same work (here, in the same thesis). Additional consistency, if possible without harming presentation, is a bonus (e.g., consistent axis limits across figures).
  • License: You are allowed to include figures which are in the public domain or for example with a CC-BY 4.0 license. Just be sure you are specifying somewhere (e.g, in the caption) the source and its license. In most cases, be mindful to modify the figure for your purposes, do not just copy-paste it (e.g., you might not need all the details from the original figure, or you might have to modify something to keep consistency with the rest of your thesis).
  • Format: If you can, try to render figures as vector-based graphics such as pdf or svg, rather than as bitmap (png or jpg), to make them sharper and smaller in filesize.

Tables:

  • General comments: What written above for Figures generally applies to Tables too. Make sure that the tables are well-presented, readable, the caption is explanatory, the layout is consistent, etc.
  • Figure or Table?: Think if the content of a table could be better conveyed using a figure (and vice versa: a very cluttered figure could become a neat table).

References:

  • How many: The Master's thesis is a piece of scholarly work, so we expect to see appropriate citations to the literature (especially in the introductory and preliminary parts, but also later). Again, there is no set requirement for number of citations, but if your thesis cites less than ten articles / conference papers / books you could probably do a bit more of literature review, or be more mindful in citing the papers related to the methods you are using.
  • Format: There are many different bibliography options to choose from in LaTeX. I recommend against the default number-only citation format, which is non-informative and hard to parse for humans (e.g., what's citation [31]?). Instead, use a "(Author(s), Year)" citation format, or alternatively the alphanumeric [authors' initials + year] format, e.g. "[ABC22]". One or the other might be more popular, depending on the community. If you use the author + year format, be sure to appropriately use \citep{} or \citet{} depending on the context.
  • Bibliography check: Double and triple-check your formatted bibliography as generated by LaTeX. You will likely be using BibTex in LaTeX for your bibliography. Check that your .bib files are correct and that the references appear correctly in the bibliography. It is very easy that .bib entries taken from e.g. Google Scholar have missing parts (name of the journal, page number, even authors), or perhaps refer to an earlier arXiv preprint while the paper has been in the meantime published in a journal or conference. It is your job as a scholar to ensure the up-to-date validity and correctness of your bibliography. So be sure to read through the generated PDF to at least spot glaring omissions, and put effort in polishing the bibliography.
  • Capitalization: As a subset of the above check, you want to capitalize words in article titles correctly (e.g., "Bayesian" should be capitalized, not "bayesian"). In a .bib entry, you can force BibTex to keep the capitalization by using curly brackets around letters. For example, you can write "title={{V}ariational {B}ayesian {M}onte {C}arlo}" to ensure proper capitalization.
  • Reference management software: To keep track of references used during the thesis work and writing, it might be useful to use some reference management software (beyond a .bib file), such as Zotero or Paperpile.

Miscellanea:

  • Footnotes: Unless stated otherwise, you can have footnotes in your thesis; which can be a good way to add side information without cluttering the main text. Use them sparsely and wisely.
  • Spell-checking: Wherever you write your .tex files, you should find a way to run a spell-checker at least at the end, when you are polishing the text. For example, there should be one in Overleaf. A spell-checker should be able to catch the most obvious mistakes. Google's spell-checking (in doc and gmail, for example) is also very good. One notorious point to be aware of for some non-native English speakers, including myself, is the usage of articles (i.e., "the" or "a") which can be quite random, but modern spell checkers (at least the Google ones) are able to spot this and recommend when an article should be added or removed.
  • Perspective: While important as the final step of your Master's studies, keep in mind this is a Master's thesis and not a doctoral dissertation, so make sure that the scope is appropriate, do not spend too many months on it. If you plan to continue with research, consider applying for a PhD position, which sometimes could build on top of your Master's work (if not directly, at least in terms of gained experience).

Acknowledgments:

Thanks to Antti Honkela and Marlon Tobaben for useful comments and suggestions.