This lecture is structured as follows:
There are two different grades in this course :
A project to be handed in by groups of 2-3. This project is completely unbounded, but here is a guideline:
The project must involve data processing. After defining the question you want to address, select the type of data to use (cross-sectional data, emails, time series, articles, websites, maps, simulated databases, Kaggle, UCI, etc.).
The goal is to create a tool to answer your question. The tool can be almost anything — it could be something you’ve always dreamed of automating.
The tool developed must be documented and accessible via the Github platform. The final version of the project must be on Github by the 08/12/2024 before midnight. No commits after this date will be taken into account. The slides for the presentation must be created in markdown.
Here are some link that you can use to download and install stuff that we will need.
Hadley Wickham. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
Zed A. Shaw. Learn Python the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code
Robin Lovelace. Efficient R Programming by Colin Gillespie
https://www.anotherbookondatascience.com/
https://www.business-science.io/business/2018/10/08/python-and-r.html
https://www.practicaldatascience.org/html/vars_v_objects.html
https://learnanalyticshere.wordpress.com/2015/05/14/clash-of-the-titans-r-vs-python/
https://www.statmethods.net/input/datatypes.html
https://www.datacamp.com/community/tutorials/r-tutorial-apply-family#as
https://towardsdatascience.com/getting-started-with-git-and-github-6fcd0f2d4ac6
https://docs.oracle.com/javase/tutorial/java/data/characters.html
http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers/fr_Tanagra_R_Python_Data_Perfs.pdf
https://juba.github.io/tidyverse/06-tidyverse.html
https://atrebas.github.io/post/2019-03-03-datatable-dplyr/
https://cran.r-project.org/web/packages/data.table/vignettes/datatable-sd-usage.html
To be completed..