Kai's Data Science Blog

I find fun in data, hope you do too!

The Beginnig of the End: What Is the Expected Notification Rate of the Polio Surveillance Indicator?

This post presents my research proposal for my MSc thesis: “Predicting the rate of Acute Flaccid Paralysis in different settings to support polio surveillance and elimination”. You will learn why non-polio Acute Flaccid Paralysis (NPAFP) is important in support of polio eradication. The current challenge involves interpreting the AFP indicator that exceed the targeted levels. Potential data sources and modelling methods are raised to fill this knowledge gap. Aim and Objectives Research Aim To investigate and model the mechanisms underlying the notification of the key polio surveillance indicator - NPAFP rates in endemic and outbreak settings.
2024-05-03

World Immunisation Week 2024

For poeple doing vaccination works, their world wareness day is not a single day - it’s a week, spanning from 24th to 30th April each year. This year, WHO celebrate 50 years of the “Expanded Programme on Immunization (EPI)”. In LSHTM’s Vaccine Centre (VaC), our world immunisation week (WIW) topic this year is “responding to outbreaks”. We want to highlight the collective action needed to protect people from outbreaks and vaccine-preventable diseases.
2024-04-22

Equity vs. Equality: Modelling Vaccine Distribution Strategies for Outbreak Response Decision-Making

This group project is a collective effort, with contributions from my teammates Minn Thit Aung, Polly Nightingale, Simon Kent, and Xavier Dunn, listed alphabetically. This post aims to: Determine the the basic reproduction number (R0) of an hypothetical outbreak Assess the impact of various strategies for vaccination and school closures on accumulative cases and peak cases/timing Provide instructions for constructing a SEIR model using Berkeley Madonna Offer simple R code for converting a ggplot into a GIF Setting the Scene Many countries have recently experienced the first wave of an influenza pandemic caused by the strain HuNz.
2024-03-30

Survival Analysis in Electronic Health Records Data

How do you apply time-to-event analysis to compare the impact of different prescriptions on death? This article examines the survival function of two prescriptions using Kaplan-Meier and Cox models in an electronic health records (EHR) setting. EHR data are powerful real-world data. They are conducive to time-to-event analysis owing to the characteristic of sequential visits to primary and secondary care services. Take UK’s OpenSAFELY for instance, this secure, transparent, and open-source platform provides an Trusted Research Environment (TRE) for National Health Service (NHS) EHR data analysis, which supported urgent research into the COVID-19 emergency.
2024-03-23

Predicting Remission Status in Healthcare: A Comparative Analysis of Lasso, Random Forest, and Adaboost Machine Learning Models

In this article, we will explore the following topics: Using regularized method (Lasso) for predictive variable selection Tuning hyperparameters for tree-based methods Employing the weighted sum of weak learners for boosted classifier Comparing prediction performances and predictors importance Basic Methods Inmagine you possess a dataset comprising 30 biomarker varaibles with 5000+. How would you use it to predict patient’s remission status, i.e. remission or active disease? One common approach that may cross your mind is the logistic regression, as illustrated below:
2024-02-10

Minimizing data linkage error in an ETL pipeline using R: an intersection of MIMIC III and ODK database

What can you learn from this article? Understand the concepts of data linkage, especially deterministic linakge. Address linkage error in the conjunction of MIMIC III (served in a postgreSQL database) and ODK database. Employ R to design the Extract, Transform, and Load (ETL) pipeline. Use Quarto document to generate a report in PDF format. Concepts of Data Linkage In a data scientist’s typical day, the merge/join function is an inevitable task.
2024-01-06

Assessing the Non-inferiority of the Single-dose Human Papillomavirus Vaccination Schedule: A Hypothetical Multi-country Cohort Study

What can you learn from this article? An experimental cohort study design for policy evaluation, where countries made the move from a two-dose to single-dose schedule for national HPV vaccination programmes since 2023. Introduction Since the first licensing of the Human Papillomavirus (HPV) vaccine in 2006, evidence has been emerging showing that single-dose schedules provide comparable efficacy to the conditional regimens, i.e. two or three doses. In 2022, a review of World Health Organization (WHO) Strategic Advisory Group of Experts on Immunization (SAGE) concluded that a single-dose HPV vaccine delivers solid protection against HPV.
2023-12-15

Optimizing Data Protection: Leveraging Information Governance Principles from GDPR into Research Planning

This article delivers two key insights: Advantages of applying UK GDPR in research planning. Five approaches to safeguard data protection in lung cancer patient interviews. Setting th Scene The need of assessing the Quality of Life (QoL) in patients with lung cancer undergoing chemotherapy has been increasing. After treatment, patients may experience breathlessness or fatigue, along with potential challenges in their daily and occupational functioning. These side effects of chemotherapy consistently rank as a common complaint among patients.
2023-10-30

Navigating Short-Term Accommodation Search in London

This year, London home rental prices hit record as demand outstriped supply, with the average tenant now being asked to pay a whopping £2,500 a month for a new let. Within this rental market, individuals seeking short-term accommodations encounter greater challenges. My roommates and I was looking for a house/flat for co-renting with a 6-month tenancy, and we put in a significant effort throughout the entire month of August by searching and making inquiries about over 170 properties before finally settling down.
2023-09-09

有關台灣生化實驗室的境外資訊操弄與干預 / A Case Study of Foreign Information Manipulation Interference: United States Asked Taiwan to Develop Weaponized Biological Agents in P4 Lab

你知道,台灣是全球受境外假訊息侵擾最嚴重的國家,而且已經蟬聯榜首10年嗎? 本文選台灣的生化實驗室為例,揭露一個境外資訊操弄與干預的手法。 謠言的前世 從2020年COVID-19疫情以來,在當時的Twitter(現稱X)渠道上,就流傳一個謠言是,美國在全世界設的生化實驗室被陸續曝光,包含台灣的10座實驗室詳細名稱、地址、聯絡資訊等。 但這訊息已有Cofacts真的假的和MyGoPen闢謠,指出「美國在台設實驗室做生化武器或研究大規模流行病」屬於陰謀論,企圖煽動與分化。 台灣這則謠言的前身,是來自俄烏戰爭的假訊息。 俄國曾指控美國在烏克蘭開設生化實驗室,俄烏戰爭開打後,還緊急銷毀致命的病原體。但實情是,美國在戰前就推動在烏克蘭的公衛及生物威脅減少計畫,並非研發生物武器,且銷毀只是避免實驗樣本落入俄軍手中。 可見這次的陰謀論版圖擴大,把俄烏戰爭的謠言巨網張到台灣的上空。 謠言的今生 今年7月,聯合報獨家報導美方要台灣開設P4實驗室研發生物戰劑。前立委雷倩亦在中天新聞的節目上表示,因為台灣人可以代表全中國人的DNA,所以美方在此研發生物戰劑是危險的。 縱使國防部、國安局、外交部、美國在台協會立刻聲明此傳聞非事實,台灣曾簽署「禁止發展、製造、儲存、取得或保有生物及毒素武器並避免其可能引發之毀滅公約」,故不研發生物戰劑,且台北地檢署後續於9月5日認定所謂「南海工作會議紀錄」實屬造假,包括使用了許多中國式用語等。但雷倩的疑美片段被轉傳再製,最再加上微博大V玉渊谭天(實為中國央廣旗下的自媒體品牌)的推波助瀾,讓聲量達微博的135萬影片觀看人次: 「真正被犧牲的到底是誰?」 ── 玉渊谭天《独家披露台湾生物实验室分布》(2022/8/16) 玉渊谭天不但暗指台灣人民就是美國與台灣政府勾連下最大的犧牲者(因為台灣P2等級以上實驗室多在人口稠密區儲藏病毒),還對中國人民恫嚇台灣病毒的威脅(台灣實驗室多位於靠近中國的西半部),還附上繪聲繪影的證據(國防部預醫所增加的預算是為了建P4實驗室,以及預醫所疑似動土前後的空照圖)。陰謀論、疑美論、與誇張描述的手法可見一斑。 在Facebook的繁體字社群,也發現一些異常操作。周曉彤轉貼來自Tiktok的短影片,爆料美國要在台灣建造P4實驗室,但其帳號本身可疑,因其大頭貼顯為盜用;其發文還被李倩(帳號同樣不真)分享到15個反綠陣營社團,內容完全複製相同,顯非一般自然人的行為模式(Pattern of Behavior)。 根據上述資料,可以看出境外資訊操弄與干預(FIMI, Foreign Information Manipulation Interference)的痕跡,其中串聯了議題、行動者、動機(利益)、以及行動四大要素。雖然微博大V還透過標籤劫持(Hashtag Hijacking)來吸引注意,但這事件在台灣主流社群平台上聲量相對不高(至多在中天新聞上傳於Youtube的節目片段有5萬人次觀看),討論度熱度也不持久。 有趣的是,今年8月美國疾病管制單位在加州破獲一間非法生物實驗室,其中有實驗老鼠、化學藥品及帶有傳染源的生物材料。這起非法實驗室疑與中國有關,引爆美國的國安危機,同時亦可做為「台灣生化實驗室」謠言的真實對造組。 結語 透過SCOTCH框架調查發現,兩岸媒體以及非國家行動者使用Youtube、Facebook、X(Twitter)、微博、Tiktok等社群平台,引用台媒的獨家報導及台灣的實驗室相關圖資,通過標籤劫持來吸引注意,以及在有共同理念(Shared-Interest)的Facebook社團中複製貼上分享貼文。 試圖在中國群眾及台灣反對民進黨的群眾中,植入美方籲台建造P4實驗室研發生物戰劑的謠言,這謠言帶有部分真實性,但滲入不實資訊或過度解讀,以植入民進黨政府犧牲民眾安全的疑美論,不但破壞台灣民眾對國家生物安全政策的信任,也令中國群眾感受到來自台灣的生化威脅。 本文使用的工具及分析方法感謝台灣民主實驗室的課程。 本文同步發表於Medium。
2023-08-31