PUBH 7006 Data Management and Programming for Epidemiology

Credit Points 10

Legacy Code 401179

Coordinator Sandro Martins Sperandei Opens in new window

Description Modern epidemiology deals with ever increasing volumes of data and complexity of analysis. This course is aimed at equipping students with effective practices for managing data and programme code and ensuring the security of their data. Students will be taught the fundamentals of managing code and data in a revision control system as well as good programming practices and techniques which can form a basis for a robust, repeatable and test-driven research methodology. Programming instruction and exercises will use the SAS and R languages, and SQL databases.

School Medicine

Discipline Epidemiology

Student Contribution Band HECS Band 2 10cp

Level Postgraduate Coursework Level 7 subject

Co-requisite(s) HLTH 7008


Students must be enrolled in a postgraduate program.

Assumed Knowledge

High school mathematics (arithmetic, formulas and algebra, reading graphs). Basic computer competency and basic programming skills.

Learning Outcomes

On successful completion of this subject, students should be able to:

  1. Explain key concepts and rationale for revision control systems
  2. Use basic Git commands with local and remote repositories as part of personal and collaborative workflows
  3. Explain key concepts of data types, variables, working memory and storage
  4. Write simple programmes which use loops and conditional statements and simple functions and/or macros which use parameters passed to them
  5. Explain key concepts of relational databases
  6. Write simple SQL queries to select and subset data, join two tables, and use indexes to improve efficiency
  7. Read data into R from text, CSV and spreadsheet files, from an SQL database, and from online data sources
  8. Describe strategies and techniques for data checking and cleaning, and demonstrate ability to use some of these techniques in simple data manipulation tasks
  9. Describe strategies and techniques for detecting logic and other programming errors, and demonstrate ability to use these techniques in the context of a small data preparation and analysis project
  10. Explain key concepts of information security, describe strategies for ensuring that data is stored and transmitted securely, demonstrate ability to use basic encryption technologies safely

Subject Content

1. Introduction to basic computing concepts and methods
2. Revision control and source code management 1: personal workflows
3. Revision control and source code management 2: collaborative workflows
4. Essential programming review 1: data types, storage, loops, conditionals
5. Essential programming review 2: subroutines (and macros) and functions
6. Essential database review: tables, rows and columns, SELECT, WHERE, indexes, basic JOIN
7. Reading data in: from files, spreadsheets, databases, the web
8. Data cleaning and preparation: tools and strategies
9. Robust techniques to protect against (and detect) programming logic errors when manipulating data
10. Introduction to information security: principles and good practices for keeping data safe and preserving confidentiality


The following table summarises the standard assessment tasks for this subject. Please note this is a guide only. Assessment tasks are regularly updated, where there is a difference your Learning Guide takes precedence.

Item Length Percent Threshold Individual/Group Task
Essay: Two page report regarding computing concepts and methods. 1,000 words 20 N Individual
Applied project: Small data manipulation and analysis project 12 hours 30 N Individual
Applied project: Research Project Report 18 hours 50 N Individual

Teaching Periods




Subject Contact Sandro Martins Sperandei Opens in new window

View timetable Opens in new window

Parramatta - Victoria Rd


Subject Contact Sandro Martins Sperandei Opens in new window

View timetable Opens in new window