1 Description

  • This course is part of the first semester of M1 ETIC/MPE. It aims at providing students with fundamental programming concepts in R, including data structures, libraries, re-usable functions, efficient codes. The course will also focus on how to report results of data analysis using Markdown and hosts of source codes (GitHub).

 

  • Each session will be divided into two parts, first we will go through the course using this site as a support. In the second part of the course students will be asked to solve exercises and present their approaches to others. The aim is to familiarize students with presenting and explaining their codes, but it is even more important that it serves students to benchmark their different approaches in order to better understand how to improve their code.

 

This lecture is structured as follows:

  • Chapter ‘Basics’ will go through the simplest operations we can perform. It covers the different types of objects present in the two languages, the use of control flows, and the creation of functions.

     

  • Chapter ‘Arrays/Vectors’ introduces the use of vectors and matrices to perform operations. It highlights the benefits of vectorization and briefly discusses how to deal with sparse matrix.

     

  • Chapter ‘Data Analysis’ use of Tibbles from dplyr and data.table. Basic operations on dataframe like slicing, filtering, grouping, merging are covered in the chapter as well as reading and writing data.

     

  • Chapter ‘Regex’ gives an overview of regular expression and string manipulation. It summarizes the first two chapters by applying the regular expressions in a very basic webscrapping exercise.

     

  • Chapter ‘Best Practices’ is about coding in a conventional way to improve the readability of the code by others. It will also explain how to communicate and make your code accessible via github, but also how to produce reports or feed a blog with RMarkdown.

     

2 Exams

There are two different grades in this course :

 

  • A project to be handed in by groups of 2-3. This project is completely unbounded, but here is a guideline:

    • The project must involve data processing. After defining the question you want to address, select the type of data to use (cross-sectional data, emails, time series, articles, websites, maps, simulated databases, Kaggle, UCI, etc.).

    • The goal is to create a tool to answer your question. The tool can be almost anything — it could be something you’ve always dreamed of automating.

    • The tool developed must be documented and accessible via the Github platform. The final version of the project must be on Github by the 08/12/2024 before midnight. No commits after this date will be taken into account. The slides for the presentation must be created in markdown.

 

  • The project must be presented orally the 11/12/2024. Students will present the project, discussing both the strengths and weaknesses of their code and the approaches they used during the tool’s development. Students who ask questions during others’ presentations will receive bonus points towards their oral grade.