About the Course


Course description

Welcome to Cultural Industry & Data Analytics course (a.k.a. Data Science 101), designed to equip you with the essential skills to analyze, visualize, and communicate data effectively. Over the course of 15 weeks, you will delve into the fundamentals of data science, master the power of R programming, and learn how to create interactive visualizations and websites to showcase your findings.

Throughout the course, you will learn how to import, manipulate, and explore data using R and the tidyverse. You will gain hands-on experience with data cleaning, transformation, and aggregation techniques. Additionally, you’ll dive deep into data visualization with ggplot2 and learn how to create advanced, interactive plots using Shiny and plotly.

By the end of the course, you will have completed a data science project that demonstrates your ability to analyze, visualize, and communicate complex data insights. You will also learn the importance of collaboration, version control, and reproducible research in data science projects. With a solid understanding of the concepts and tools covered, you will be well-prepared to apply your skills in various real-world applications.


Weekly Design

Week Date Pre-class Class PBL Note
1 03/06/2024 Course intro
2 03/13/2024 Variable & Vector Basic Syntax (1)
3 03/20/2024 Array Basic Syntax (2) Problem description
4 03/27/2024 Data.frame & List Basic Syntax (3) Data introduction
5 04/03/2024 Data import, export, filter Data Manipulation Team arrangement
6 04/10/2024 Repetition, Function Data Exploration (1)
(Recorded Lecture)
Public holiday (N.A. elections)
7 04/17/2024 Missing values, Outliers Data Exploration (2) Team meeting #1
8 04/24/2024 Data viz intro Data Visualization (1) Team meeting #2
9 05/01/2024 Data viz Practice Data Visualization (2) Team meeting #3
10 05/08/2024 QZ
11 05/15/2024 <No class>
Public Holiday
Public holiday (Buddha’s)
12 05/22/2024 Shiny Intro Interactive Web: Shiny Team meeting #4
13 05/29/2024 Git, GitHub Basic Version Control and Collaboration Team meeting #5
14 06/05/2024 Quarto Intro Reproducible Research Team meeting #6
15 06/12/2024 <Team meetings> <Team meetings>
16 06/19/2024 Project Presentation Project Presentation


Syllabus

Week 1: Introduction to Data Science and R

  • Course Orientation

  • Introduction to R and RStudio

  • What is Data Science?

Week 2: Basic Syntax (1)

  • R syntax and basic operations

  • Data types and structures in R

Week 3: Basic Syntax (2)

  • Data types and structures in R: Array

Week 4: Basic Syntax (3)

  • Data types and structures in R: Data.frame & List

Week 5: Data Manipulation

  • Data import & export

  • Data filtering

Week 6: Data Exploration (1): Recorded lecture

  • Repetition

  • Function

  • Introduction to tidyverse

  • Data cleaning with dplyr and tidyr

  • Data filtering and aggregation

  • Data transformation with dplyr

Week 7: Data Exploration (2)

  • Missing values & Outliers
  • Descriptive statistics

  • Grouping and summarizing data

  • Joining datasets

  • Exploratory data analysis (EDA)

Week 8: Data Visualization (1)

  • Introduction to ggplot2 for data visualization

  • Grammar of graphics with ggplot2

  • Customizing plots with themes and scales

  • Adding labels, titles, and legends

  • Creating different types of plots (scatter plots, bar plots, etc.)

Week 9: Data Visualization (2)

  • Advanced ggplot2 techniques

  • Visualizing distributions and relationships

  • Faceting and multi-panel plots

  • Plotting time series data

  • Interactive plots with plotly or ggplotly


Week 10: Mid-term QZ


Week 11: Officially No Class (Public Holiday)


Week 12: Interactive Web: Shiny

  • What is Shiny?

  • Creating Shiny apps with R

  • Adding interactivity to data visualizations

Week 13: Version Control and Collaboration

  • Introduction to Git and GitHub

  • Collaborating with others using version control

  • Best practices for organizing and documenting data science projects

  • Working with AI (feat. ChatGPT)

Week 14: Reproducible Research

  • Introduction to Quarto

  • Creating a Quarto website with R Markdown

  • Customizing the website layout and design

  • Publishing and sharing your Quarto website


Week 15: Project Consultation

Week 16: Project Presentation


Course management


  • Lecturer: Changjun Lee (Associate Professor in SKKU School of Convergence)

    • changjunlee@skku.edu
  • TA: Ye Seo Lim (Master Student, SKKU Immersive Media Engineering)

    • ivisy6952@g.skku.edu
  • Time:

    • (1h): Flipped learning content

    • (2h): Wed 09:00 ~ 10:50

  • Location: International Hall High-Tech e+ Lecture Room (9B312)


Class consists of Pre-class, Class, and PBL project

  • Pre-class

    • Students will be required to watch the lecturer’s recorded lecture (or other given videos) before the off-line (or online streaming ZOOM) class and learn themselves

    • Video is about the concept of the data science and the programming language

    • (Sometimes) Students are required to submit Discussions to check the level of their understanding

  • Class

    • Lecturer summarize the pre-class lecture and explain more details

      • Ask students about the pre-class content to check whether they learned themselves

      • OK to answer incorrectly, but if you cannot answer at all, it will be reflected in your pre-class discussion score.

    • Students will practice with the advanced code

    • A Quiz will be in the class to check the level of understanding

  • PBL project

    • Students organize teams that meet several conditions.

      • 4~5 members in a team

      • Background diversity: no homogeneous majors in a team

      • Exception: Allowed if persuasion is possible for sufficient reasons

    • Data will be given. Teams are going to choose the data they want to explore considering their interest

    • Teams can offer a zoom meeting with lecturer if they need


Final outputs (An example not limited)

  • Data Preparing (or Collecting)

  • Explore data (Descriptive stats)

  • Set your hypothesis (or research questions)

  • Visualize data to confirm your hypo or RQs

  • Explain your findings

  • Expanding your findings to implications


Textbooks for the course

  • R4DS: R for Data Science (written by Hadley Wickham and Garrett Grolemund)
    • is an excellent resource for learning data science using R, covering data manipulation, visualization, and modeling with R. The book is available as a free online resource.
  • RC2E: R Cookbook (written by JD Long and Paul Teetor)
    • is a comprehensive resource for data scientists, statisticians, and programmers who want to explore the capabilities of R programming for data analysis and visualization.
  • RGC: R Graphic Cookbook (written by Winston Chang)
    • is a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems
  • MDR: Statistical Inference via Data Science (Modern Dive) (written by Chester Ismay and Albert Y. Kim)
    • is a comprehensive textbook that provides an accessible and hands-on approach to learning the fundamental concepts of statistical inference and data analysis using the R programming language.
  • ISR: Introductory Statistics with R (written by Peter Dalgaard)
    • is a great resource for learning basic statistics with a focus on R programming. This book covers a wide range of statistical concepts, from descriptive statistic


Score

See Course intro in Week 1

  • Attendance & Participation (10 %)

  • Preclass Discussion Submission (10 %)

  • QZ (40 %)

  • Project (40 %)


Communication

  • Notices & Questions

    • Please join Kakao open-chat room

      • https://open.kakao.com/o/gNpAFOcg

      • When you enter, please make sure to enter your name as it is on the attendance sheet. (입장하셔서 이름을 꼭 출석부에 있는 이름으로 설정해주세요.)

  • Personal counsel (Scholarship, recommendation letter, etc.)