Data Cleaning

Set up

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Data import

df <- read_excel("data/abuse_cases.xlsx") |>
  glimpse()
Rows: 1,100
Columns: 8
$ `年份/Year`                                <dbl> 2005, 2005, 2005, 2005, 200…
$ 有否向警方舉報事件                         <chr> "不適用", "不適用", "不適用…
$ `Incident Being or Not Reported to Police` <chr> "N/A", "N/A", "N/A", "N/A",…
$ 類別                                       <chr> "虐待長者性質", "虐待長者性…
$ Category                                   <chr> "Type of Elder Abuse", "Typ…
$ 項目                                       <chr> "身體虐待", " 精神虐待", " …
$ Item                                       <chr> "Physical abuse", "Psycholo…
$ `個案數字/No. of Cases`                    <dbl> 390, 26, 3, 87, 1, 1, 0, 20…

Data cleaning

Change the column names and keep only the columns needed

df_clean <- df |>
  rename("year" = "年份/Year",
         "question" = "Category",
         "answer" = "Item",
         "n_case" = "個案數字/No. of Cases") |> 
  select(year, question, answer, n_case) |>
  glimpse()
Rows: 1,100
Columns: 4
$ year     <dbl> 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2…
$ question <chr> "Type of Elder Abuse", "Type of Elder Abuse", "Type of Elder …
$ answer   <chr> "Physical abuse", "Psychological abuse", "Neglect", "Financia…
$ n_case   <dbl> 390, 26, 3, 87, 1, 1, 0, 20, 200, 15, 1, 41, 1, 1, 0, 6, 190,…

Simplify the column names of the nursing home dataset

df_home <- read_excel("data/nursing_home.xlsx") |>
  rename("n1" = "Subsidised Service-Subvented and Self-financing Home Operated by NGO",
         "n2" = "Subsidised Service-Contract Home",
         "n3" = "Subsidised Service-Private Home Participating in Enhanced Bought Place Scheme-EA1",
         "n4" = "Subsidised Service-Private Home Participating in Enhanced Bought Place Scheme-EA2",
         "n5" = "Non-Subsidised Service-Home Operated by NGO",
         "n6" = "Non-Subsidised Service-Contract Home",
         "n7" = "Non-Subsidised Service-Private Home") |>
  select(district, n1, n2, n3, n4, n5, n6, n7)

Save the cleaned data

# save the df_cleaned to RDS file
saveRDS(df_clean, "data/abuse_cleaned.rds")

# save the df_home to RDS file
saveRDS(df_home, "data/nursing_home.rds")