Covid-19-2020-04-18

0.2 Data Downloading and Early Processing

We work with data provided by Protezione Civile Italiana, publicly downloadable at https://datastudio.google.com/u/0/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB

To download data in an elementary way, move the cursor to the top right corner of the figure you are interested, three vertical small dots will appear. Click on the three dots to select the data file. If you are interested in retrieving data from a particular Italian Region, select it on the map at the beginning of the downloading procedure.

Be aware that the download procedure always produces files with the name “Coronavirus Italia_Dati Protezione Civile_Serie temporali.xxx”, where xxx is the extension of your choice. Therefore, it might be a good idea to rename them on or after downloading.

We download .csv files and rename them as “CoViD_19_Italy_Time_Series_X_Y_Z.csv” where X Y Z are for different date and type specifications (see below). In particular, we consider first Total Cases (linear) and Tests data.

After setting the work directory by the command setwd(“C:/Users/User Name/Documents/My Documents/…/…/…”), we import data in R in a data frame object from “CoViD_19_Italy_Time_Series_X_Y_Z.csv” files.

DF_Total <- read.csv("CoViD_19_Italy_Time_Series_04_15_Total.csv", header=TRUE)
DF_Daily <- read.csv("CoViD_19_Italy_Time_Series_04_15_Daily.csv", header=TRUE)
DF_Tests <- read.csv("CoViD_19_Italy_Time_Series_04_15_Tests.csv", header=TRUE)

We quickly chek the format of imported data and their structure.

class(DF_Total)

## [1] "data.frame"

head(DF_Total)

##          Date Total.Cases Deaths Recovered
## 1 19 feb 2020           3      0         0
## 2 20 feb 2020           4      0         0
## 3 21 feb 2020          21      1         1
## 4 22 feb 2020          79      2         1
## 5 23 feb 2020         157      3         1
## 6 24 feb 2020         229      7         1

class(DF_Daily)

## [1] "data.frame"

head(DF_Daily)

##          Date Daily.Cases Daily.Deaths Daily.Recovered
## 1 19 feb 2020          NA           NA              NA
## 2 20 feb 2020           0            0               0
## 3 21 feb 2020          15            1               1
## 4 22 feb 2020          54            1               0
## 5 23 feb 2020          75            1               0
## 6 24 feb 2020          72            4               0

class(DF_Tests)

## [1] "data.frame"

head(DF_Tests)

##          Date Daily.Tests Daily.Tests..weekly.average.
## 1 19 feb 2020          NA                           NA
## 2 20 feb 2020          NA                           NA
## 3 21 feb 2020          NA                           NA
## 4 22 feb 2020          NA                           NA
## 5 23 feb 2020          NA                           NA
## 6 24 feb 2020          NA                           NA

We change the data format of the column “Date” in the data frame DF_Total to simplify the syntax of the scripts.

class(DF_Total$Date)

## [1] "factor"

DF_Total$Date <- as.factor(as.Date(DF_Total$Date, format="%d %b %Y"))
head(DF_Total$Date)

## [1] 2020-02-19 2020-02-20 2020-02-21 2020-02-22 2020-02-23 2020-02-24
## 57 Levels: 2020-02-19 2020-02-20 2020-02-21 2020-02-22 ... 2020-04-15

We add the column “Daily.Tests” of the data frame DF_Tests to the data frame DF_Total (to this we need the library “tibble”), after changing the “NA” (Not Available) data to zeroes. This will be useful in what follows for computational procedures.

TTests <- DF_Tests$Daily.Tests
TTests[sapply(TTests, is.na)] <- 0
# library(tibble)
DF_Total <- add_column(DF_Total, TTests, .after="Recovered")

We slightly modify the names of some columns in the data frame DF_Total (we need the library “tidyverse”) and check the result.

colnames(DF_Total)

## [1] "Date"        "Total.Cases" "Deaths"      "Recovered"   "TTests"

# library(tidyverse)
DF_Total <-  rename(DF_Total, TCases=Total.Cases, TDeaths=Deaths, TRec=Recovered)
head(DF_Total)

##         Date TCases TDeaths TRec TTests
## 1 2020-02-19      3       0    0      0
## 2 2020-02-20      4       0    0      0
## 3 2020-02-21     21       1    1      0
## 4 2020-02-22     79       2    1      0
## 5 2020-02-23    157       3    1      0
## 6 2020-02-24    229       7    1      0

0.3 Early Data Plots

We plot the figures of Total Cases, Total Deaths and Total Recovered against Dates (we need the library tidyverse). We consider both a scatter plot and a line plot. In the first of the following chuncks, we make a copy DF of the DF_Total data frame and in terms of this copy we set some auxiliary variables and vectors.

DF <- DF_Total
First_Day <- as.character(DF$Date[1])
Last_DaY <- as.character(DF$Date[length(DF$Date)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science - Scatter Plots of CoViD-19 National Total Cases from", First_Day, "to", Last_DaY)
subtitle_content <- "Dati Protezione Civile Italiana - https://datastudio.google.com/u/0/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB"
x_labs <- format(as.Date(DF$Date),format="%m-%d")
y_breaks <- c(0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000,
              100000, 110000, 120000, 130000, 140000, 150000, 160000, 170000)
y_labs <- c("0", "10,000", "20,000", "30,000", "40,000", "50.000", "60.000", "70.000", "80.000", "90.000", 
            "100.000", "110.000", "120.000", "130.000", "140.000", "150.000", "160.000", "170.000")
colors <- c("Total Cases"="black", "Total Deaths"="red","Total Recovered"="blue")

In the second chunck we build the object DF_SP, which is a list containing the istructions to draw the data scatter plot, thereby we draw the plot.

# library(ggplot2)
DF_SP <- ggplot(DF, aes(Date)) + 
  geom_point(alpha=1, size=1, aes(y=TRec, color="Total Recovered")) +
  geom_point(alpha=1, size=1, aes(y=TDeaths, color="Total Deaths")) +
  geom_point(alpha=1, size=1, aes(y=TCases, color="Total Cases")) +
  scale_x_discrete(name="Date",
                   breaks=waiver(),
                   labels=x_labs) +
  scale_y_continuous(name="Value",
                     breaks=y_breaks, 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        axis.text.y.left = element_blank(),
        legend.position=c(0.075,0.82))
plot(DF_SP)

In the third chunck we build the object DF_LP, which is a list containing the istructions to draw the data line plot, thereby we draw the plot.

DF_LP <- ggplot(DF, aes(Date)) + 
  geom_line(alpha=1, size=1, aes(y=TRec, color="Total Recovered"), group=1) +
  geom_line(alpha=1, size=1, aes(y=TDeaths, color="Total Deaths"), group=1) +
  geom_line(alpha=1, size=1, aes(y=TCases, color="Total Cases"), group=1) +
  scale_x_discrete(name="Date",
                   breaks=waiver(),
                   labels=x_labs) +
  scale_y_continuous(name="Value",
                     breaks=y_breaks, 
                     labels=waiver(),
                     sec.axis= sec_axis(~., breaks=y_breaks, labels=y_labs)) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors) +
  theme(plot.title=element_text(hjust = 0.5), plot.subtitle=element_text(hjust =  0.5),
        axis.text.x = element_text(angle=90, vjust=1), 
        axis.text.y.left = element_blank(),
        legend.position=c(0.075,0.82))
plot(DF_LP)

We can print the plots as .png or .pdf files by the commands

file_name = paste("National Total Cases Till - ", "04-05", " - SP.png", sep="")
png(file_name, width=1600, height=800, res=120)
print(DF_SP)
dev.off()

## png 
##   2

file_name = paste("National Total Cases Till - ", "04-05", " - LP.png", sep="")
png(file_name, width=1600, height=800, res=120)
print(DF_SP)
dev.off()

## png 
##   2

file_name = paste("National Total Cases Till - ", "04-05", " - SP.pdf", sep="")
pdf(file_name, width=12, height=7)
print(DF_SP)
dev.off()

## png 
##   2

file_name = paste("National Total Cases Till - ", "04-05", " - LP.pdf", sep="")
pdf(file_name, width=12, height=7)
print(DF_SP)
dev.off()

## png 
##   2

It is well known (see e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331683/, see also https://doi.org/10.1016/j.idm.2019.12.009) that the total number of infected people in epidemic outbreaks is modelled against time by a logistic function (e.g. see https://en.wikipedia.org/wiki/Logistic_function). This figure is then represented by the function \[\begin{equation} f\left(t;t_{0},\kappa,\Lambda\right)\overset{\text{def}}{=}\frac{\Lambda}{1+e^{-\kappa (t-t_{0})}} =\frac{\Lambda e^{\kappa(t-t_{0})}}{1+e^{\kappa(t-t_{0})}}, \tag{1} \end{equation}\] where the parameter \(\Lambda\) represents the maximum number of infected cases, \(\kappa\) represents the growth rate in total cases, that is the stepness of the graph \(\Gamma_f\) of the function and \(t_0\) is the instant at which the curve \(\Gamma_f\) changes from convex to concave. The derivative of the function (1) is given by \[\begin{equation} \frac{d}{dt}f\left(t;t_{0},\kappa,\Lambda\right) =\frac{\Lambda\kappa e^{-\kappa(t-t_{0})}}{\left(1+e^{-\kappa(t-t_{0})}\right)^{2}} =\frac{\Lambda\kappa e^{\kappa(t-t_{0})}}{\left(1+e^{\kappa(t-t_{0})}\right)^{2}} \tag{2} \end{equation}\] The second derivative is \[\begin{equation} \frac{d^{2}}{dt^{2}}f\left(t;t_{0},\kappa,\Lambda\right) =-\frac{\Lambda\kappa^{2}e^{-\kappa\left(t-t_{0}\right)}\left(1-e^{-\kappa\left(t-t_{0}\right)}\right)} {\left(1+e^{-\kappa(t-t_{0})}\right)^{3}} =\frac{\Lambda\kappa^{2}e^{\kappa\left(t-t_{0}\right)}\left(1-e^{\kappa\left(t-t_{0}\right)}\right)} {\left(1+e^{\kappa(t-t_{0})}\right)^{3}} \tag{3} \end{equation}\] The structure of the first and second derivative will turn useful in our analysis.

We plot the graphs \(\Gamma_{f}\), \(\Gamma_{\frac{d}{dt}f}\), and \(\Gamma_{\frac{d^{2}}{dt^{2}}f}\) of the logistic function and its first and second derivative for the parameter values \(\Lambda=150.000\), \(\kappa=0.5\), and \(t_{0}=50\). To this we define first a data frame Lgst_DF having for columns the values taken by the variable t, the logistic function f, and the first and second derivative of the logistic function D1_f and D2_f, respectively.

Lambda <- 150000
kappa <- 0.5
t <- c(1:100)
t_0 <- 50
f <- I(Lambda/(1+exp(-kappa*(t-t_0))))
D1_f <- I((Lambda*kappa*exp(-kappa*(t-t_0)))/(1+exp(-kappa*(t-t_0))^2))
D2_f <- I((-Lambda*(kappa^2)*exp(-kappa*(t-t_0))*(1-exp(-kappa*(t-t_0))))/((1+exp(-kappa*(t-t_0)))^3))
Lgst_DF <- data.frame(t,f,D1_f,D2_f)
head(Lgst_DF)

##   t            f         D1_f         D2_f
## 1 1 3.434602.... 1.717301.... 8.586505....
## 2 2 5.662701.... 2.831350.... 1.415675....
## 3 3 9.336216.... 4.668108.... 2.334054....
## 4 4 1.539281.... 7.696409.... 3.848204....
## 5 5 2.537846.... 1.268923.... 6.344617....
## 6 6 4.184202.... 2.092101.... 1.046050....

Hence, we plot the graphs. Note that we use two different plots because the function and derivative values are scattered over ranges of rather different width.

DF <- Lgst_DF
First_Day <- as.character(DF$t[1])
Last_DaY <- as.character(DF$t[length(DF$t)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science",
                       "\nLine Plots of a logistic function and its first and second derivative for t=", First_Day, " to t=", Last_DaY, sep="")
subtitle_content <- expression(paste("Parameter Values ", Lambda, "=150.000, ", kappa, "=0.5, ", t[0], "=50."))
leg_ord_1 <- c("logistic function")
colors_1 <- c("logistic function"="black")
DF_LP_1 <- ggplot(DF, aes(x=t)) + 
  geom_line(alpha=1, size=1, aes(y=f, color="logistic function"), group=1) +
  scale_x_continuous(name="",
                     breaks=waiver(),
                     labels=waiver()) +
  scale_y_continuous(name="function values",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors_1, breaks=leg_ord_1) +
  theme(plot.title=element_text(hjust = 0.5), plot.subtitle=element_text(hjust =  0.5),
        axis.text.y.left=element_blank(), legend.position="right")
leg_ord_2 <- c("logistic first deriv.", "logistic sec. deriv.")
colors_2 <- c("logistic first deriv."="green","logistic sec. deriv."="magenta")
DF_LP_2 <- ggplot(DF, aes(x=t)) + 
  geom_line(alpha=1, size=1, aes(y=D1_f, color="logistic first deriv."), group=1) +
  geom_line(alpha=1, size=1, aes(y=D2_f, color="logistic sec. deriv."), group=1) +
  scale_x_continuous(name="t-values",
                     breaks=waiver(),
                     labels=waiver()) +
  scale_y_continuous(name="function values",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis= sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Legend", values=colors_2, breaks=leg_ord_2) +
  theme(axis.text.y.left = element_blank(), legend.position="right")
ggarrange(DF_LP_1, DF_LP_2, nrow=2)

We can print the plots as .png or .pdf files by the commands

file_name = paste("Logist Function and Derivatives - LP.png", sep="")
png(file_name, width=1600, height=800, res=120)
print(DF_SP)
dev.off()

## png 
##   2

file_name = paste("Logist Function and Derivatives - LP.pdf", sep="")
png(file_name, width=1600, height=800, res=120)
print(DF_SP)
dev.off()

## png 
##   2

We can compare the shape of the graph of the logistic function to the data plots in the above figures. It also seems interesting to try to compare the shape of the graph of the derivatives of the logistic function to appropriate data plots. The derivative of a function \(f(t)\) at some point \(t\) is defined as \[ \lim_{\Delta t\rightarrow 0} \frac{f(t+\Delta t)-f(t)}{\Delta t}. \] Clearly we cannot exctract the limit as \(\Delta t\rightarrow 0\) from daily spaced data, but we can think on the first order changes in daily data as a proxy for the first derivative of the logistic function and second order changes as a proxy for the second derivative. As matter of fact this is a standard procedure. Therefore we build the vector of the first and secon order changes in daily data and we add to our data frame.

u <- DF_Total$TCases[-1]   # vector of the total cases without the first entry
v <- DF_Total$TCases[-length(DF_Total$TCases)]  # vector of the total cases without the last entry
D1 <- u-v  # first order change vector
show(D1)

##  [1]    1   17   58   78   72   93   78  250  238  240  566  342  466  587
## [15]  769  778 1247 1492 1797  977 2313 2651 2547 3497 3590 3233 3526 4207
## [29] 5322 5986 6557 5560 4789 5249 5210 6153 5959 5974 5217 4050 4053 4782
## [43] 4668 4585 4805 4316 3599 3039 3836 4204 3951 4694 4092 3153 2972 2667

D1_bis <- diff(DF_Total$TCases, differences=1)  # direct generation of the first order change vector
show(D1_bis)

##  [1]    1   17   58   78   72   93   78  250  238  240  566  342  466  587
## [15]  769  778 1247 1492 1797  977 2313 2651 2547 3497 3590 3233 3526 4207
## [29] 5322 5986 6557 5560 4789 5249 5210 6153 5959 5974 5217 4050 4053 4782
## [43] 4668 4585 4805 4316 3599 3039 3836 4204 3951 4694 4092 3153 2972 2667

u <- D1[-1] # first order change vector without the first entry
v <- D1[-length(D1)]  # first order change vector without the last entry
D2 <- u-v  # second order change vector
show(D2)

##  [1]    16    41    20    -6    21   -15   172   -12     2   326  -224
## [12]   124   121   182     9   469   245   305  -820  1336   338  -104
## [23]   950    93  -357   293   681  1115   664   571  -997  -771   460
## [34]   -39   943  -194    15  -757 -1167     3   729  -114   -83   220
## [45]  -489  -717  -560   797   368  -253   743  -602  -939  -181  -305

D2_bis <- diff(DF_Total$TCases, differences=2)  # direct generation of the second order change vector
show(D2_bis)

##  [1]    16    41    20    -6    21   -15   172   -12     2   326  -224
## [12]   124   121   182     9   469   245   305  -820  1336   338  -104
## [23]   950    93  -357   293   681  1115   664   571  -997  -771   460
## [34]   -39   943  -194    15  -757 -1167     3   729  -114   -83   220
## [45]  -489  -717  -560   797   368  -253   743  -602  -939  -181  -305

D1 <- c(NA,D1)  # D1 extension by one NA entry to fit the number of rows in DF_Total
D2 <- c(NA,NA,D2) # D2 extension by two NA entries to fit the number of rows in DF_Total
DF_Total <- add_column(DF_Total, TCases_D1=D1, TCases_D2=D2, .before="TDeaths")  # addition of vectors D1 and D2 to DF_Total

Note that the construction of the first and second order changes can be made “by hands” or, more quickly, by applying the command diff(x, differences=n), where x is the vector to be differenced and n is the difference order. It is also worth noting that the vector D1 coincides with the vector Daily.Cases of the dataframe DF_Daily except the first four numeric entries.

show(DF_Daily$Daily.Cases)

##  [1]   NA    0   15   54   75   72   93   78  250  238  240  566  342  466
## [15]  587  769  778 1247 1492 1797  977 2313 2651 2547 3497 3590 3233 3526
## [29] 4207 5322 5986 6557 5560 4789 5249 5210 6153 5959 5974 5217 4050 4053
## [43] 4782 4668 4585 4805 4316 3599 3039 3836 4204 3951 4694 4092 3153 2972
## [57] 2667

In former releases of the figure Daily.Cases by Protezione Civile the concidence was without exception, as it should be. We have no idea why the first four entries of the vector Daily.Cases have been slightly modified.

To draw the scatter plot of the first and second order change jointly with a Locally Estimated Scatterplot Smoothing (LOESS) curves we add a column Index, just reporting the corresponding row number, to the dataframe DF_Total. Moreover, comparing to the total case scatter plot, we change the aesthetic x variable from x=Date to x=Index.

DF_Total <- add_column(DF_Total, Index=1:nrow(DF_Total), .before=1)  # addition of an Index column to DF_Total for plotting purpouses
DF <- DF_Total
First_Day <- as.character(DF$Date[3])
Last_DaY <- as.character(DF$Date[length(DF$Date)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science - Scatter Plots of CoViD-19 National Total Case Changes from", First_Day, "to", Last_DaY)
subtitle_content <- "Dati Protezione Civile Italiana - https://datastudio.google.com/u/0/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB"
x_breaks <- DF$Index
x_labs <- format(as.Date(DF$Date),format="%m-%d")
colors <- c("Total Case First Order Changes"="blue", "Total Case F.O. Changes LOESS Curve"="green",
            "Total Case Second Order Changes"="red", "Total Case S.O. Changes LOESS Curve"="magenta")
leg_ord <- c("Total Case First Order Changes", "Total Case F.O. Changes LOESS Curve",
             "Total Case Second Order Changes", "Total Case S.O. Changes LOESS Curve")
DF_SP <- ggplot(DF, aes(x=Index)) + 
  geom_hline(yintercept = 0, linetype="solid", color="black", size=0.5) +
  geom_point(alpha=1, size=1, aes(y=TCases_D1, color="Total Case First Order Changes"), na.rm=TRUE) +
  geom_smooth(method="loess", formula=y~x, se=FALSE, level = 0.95, fullrange=TRUE, alpha=1, size=1, 
              aes(y=TCases_D1, color="Total Case F.O. Changes LOESS Curve"), na.rm=TRUE) +
  geom_point(alpha=1, size=1, aes(y=TCases_D2, color="Total Case Second Order Changes"), na.rm=TRUE) +
  geom_smooth(method = "loess", formula=y~x, se=FALSE, level = 0.95, fullrange=TRUE, alpha=1, size=1, 
              aes(y=TCases_D2, color="Total Case S.O. Changes LOESS Curve"), na.rm=TRUE) +
  scale_x_continuous(name="Date", 
                    breaks=x_breaks, 
                    labels=x_labs) +
  scale_y_continuous(name="Value",
                     breaks=y_breaks, 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors, breaks=leg_ord) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        axis.text.y.left = element_blank(),
        legend.position=c(0.15,0.8))
plot(DF_SP)

Before tackling the problem of the parameter estimation in light of the data, we present some further properties of the logistic function. We have

\[\begin{equation} \ln\left(f\left(t;t_{0},\kappa,\Lambda\right)\right) =\log\left(\Lambda\right)-\ln\left(1+e^{-\kappa(t-t_{0})}\right) \tag{4} \end{equation}\] In addition, \[\begin{equation} \frac{d}{dt}\log\left(f\left(t;t_{0},\kappa,\Lambda\right)\right) =\frac{\kappa e^{-\kappa(t-t_{0})}}{1+e^{-\kappa(t-t_{0})}} =\frac{\kappa}{1+e^{\kappa(t-t_{0})}} \tag{5} \end{equation}\] and \[\begin{equation} \frac{d^{2}}{dt^{2}}\log\left(f\left(t;t_{0},\kappa,\Lambda\right)\right) =-\frac{\kappa^{2}e^{\kappa(t-t_{0})}}{\left(1+e^{\kappa(t-t_{0})}\right)^{2}} =-\frac{\kappa^{2}e^{-\kappa(t-t_{0})}}{\left(1+e^{-\kappa(t-t_{0})}\right)^{2}} \tag{6} \end{equation}\] Combining (2) and (6), it follows \[\begin{equation} \kappa \frac{d}{dx}f\left(t;t_{0},\kappa,\Lambda\right) +\Lambda\frac{d^{2}}{dt^{2}}\ln\left(f\left(t;t_{0},\kappa,\Lambda)\right)\right) =0 \tag{7} \end{equation}\]

Also in this case we plot the graphs \(\Gamma_{log(f)}\), \(\Gamma_{\frac{d}{dt}log(f)}\), and \(\Gamma_{\frac{d^{2}}{dt^{2}}log(f)}\) of the natural logarithm of the logistic function and its first and second derivative for the parameter values \(\Lambda=150.000\), \(\kappa=0.5\), and \(t_{0}=50\). To this, we define first a data frame Ln_Lgst_DF having for columns the values taken by the variable \(t\), the log-logistic function ln_f, and the first and second derivative of the logistic function D1_ln_f and D2_ln_f, respectively.

Lambda <- 150000
kappa <- 0.5
t <- c(1:100)
t_0 <- 50
ln_f <- I(log(Lambda)-log(1+exp(-kappa*(t-t_0))))
D1_ln_f <- I(kappa/(1+exp(kappa*(t-t_0))))
D2_ln_f <- I((kappa^2)*exp(kappa*(t-t_0))/((1+exp(kappa*(t-t_0)))^2))
Ln_Lgst_DF <- data.frame(t,ln_f,D1_ln_f,D2_ln_f)
head(Ln_Lgst_DF)

##   t         ln_f      D1_ln_f      D2_ln_f
## 1 1 -12.5816.... 0.499999.... 5.724337....
## 2 2 -12.0816.... 0.499999.... 9.437836....
## 3 3 -11.5816.... 0.499999.... 1.556036....
## 4 4 -11.0816.... 0.499999.... 2.565469....
## 5 5 -10.5816.... 0.499999.... 4.229744....
## 6 6 -10.0816.... 0.499999.... 6.973670....

Hence, we plot the graphs. Note that we use again two different plots because the function and derivative values are scattered over ranges of rather different width

DF <- Ln_Lgst_DF
First_Day <- as.character(DF$t[1])
Last_DaY <- as.character(DF$t[length(DF$t)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science",
                      "\nLine Plots of the logarithm of a logistic function and its first and second derivative for t=", First_Day, " to t=", Last_DaY, sep="")
subtitle_content <- expression(paste("Parameter Values ", Lambda, "=150.000, ", kappa, "=0.5, ", t[0], "=50."))
leg_ord_1 <- c("log-logistic func.")
colors_1 <- c("log-logistic func."="black")
DF_LP_1 <- ggplot(DF, aes(x=t)) + 
  geom_line(alpha=1, size=1, aes(y=ln_f, color="log-logistic func."), group=1) +
  scale_x_continuous(name="",
                     breaks=waiver(),
                     labels=waiver()) +
  scale_y_continuous(name="function values",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors_1, breaks=leg_ord_1) +
  theme(plot.title=element_text(hjust = 0.5), plot.subtitle=element_text(hjust =  0.5),
        axis.text.y.left=element_blank(), legend.position="right")
leg_ord_2 <- c("log-logis. first deriv.", "log-logis. sec. deriv.")
colors_2 <- c("log-logis. first deriv."="green","log-logis. sec. deriv."="magenta")
DF_LP_2 <- ggplot(DF, aes(x=t)) + 
  geom_line(alpha=1, size=1, aes(y=D1_ln_f, color="log-logis. first deriv."), group=1) +
  geom_line(alpha=1, size=1, aes(y=D2_ln_f, color="log-logis. sec. deriv."), group=1) +
  scale_x_continuous(name="t-values",
                     breaks=waiver(),
                     labels=waiver()) +
  scale_y_continuous(name="function values",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis= sec_axis(~., breaks=waiver(), labels=waiver())) +
  scale_color_manual(name="Legend", values=colors_2, breaks=leg_ord_2) +
  theme(axis.text.y.left = element_blank(), legend.position="right")
ggarrange(DF_LP_1, DF_LP_2, nrow=2)

As above, we consider the first and second order changes in logarithms of the tocal cases as a proxy for the first and second derivatives of the log-logistic function, respectively. Therefore, we build the vector of the logarithms of the total cases and the vectors of the first and secon order changes in logarithms. Then we add these vectors to our data frame.

u <- log(DF_Total$TCases)   # vector of the logarithm of the total cases
v <- diff(u, differences=1)  # first order logarithm change vector
z <- diff(u, differences=2)  # second order logarithm change vector
DF_Total <- add_column(DF_Total, Tcases_Log=u, Tcases_Log_D1=c(NA,v), Tcases_Log_D2=c(NA,NA,z), .before="TDeaths") # addition of vectors Tcases_Log and Tcases_Log_D1 to Tcases_Log_D2 before the coulmn TDeaths

In the end of this section, we plot the logarihm of the total cases and the first and second order changes in the logarithm.

The logarithms of the total cases.

DF <- DF_Total
First_Day <- as.character(DF$Date[1])
Last_DaY <- as.character(DF$Date[length(DF$Date)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science - Scatter Plots of CoViD-19 Logarithm of National Total Cases from", First_Day, "to", Last_DaY)
subtitle_content <- "Dati Protezione Civile Italiana - https://datastudio.google.com/u/0/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB"
x_breaks <- DF$Index
x_labs <- format(as.Date(DF$Date),format="%m-%d")
colors <- c("Total Case Logarithm"="red", "Total Case Logarithm LOESS Curve"="black")
leg_ord <- c("Total Case Logarithm", "Total Case Logarithm LOESS Curve")
DF_SP <- ggplot(DF, aes(x=Index)) + 
  geom_hline(yintercept = 0, linetype="solid", color="black", size=0.5) +
  geom_point(alpha=1, size=1, aes(y=Tcases_Log, color="Total Case Logarithm"), na.rm=TRUE) +
  geom_smooth(method="loess", formula=y~x, se=FALSE, level = 0.95, fullrange=TRUE, alpha=1, size=1, 
              aes(y=Tcases_Log, color="Total Case Logarithm LOESS Curve"), na.rm=TRUE) +
  scale_x_continuous(name="", 
                     breaks=x_breaks, 
                     labels=x_labs) +
  scale_y_continuous(name="Value",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors, breaks=leg_ord) +
  theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        axis.text.y.left = element_blank(),
        legend.position=c(0.15,0.8))
plot(DF_SP)

The first order changes in logarithms.

First_Day <- as.character(DF$Date[2])
Last_DaY <- as.character(DF$Date[length(DF$Date)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science - Scatter Plots of CoViD-19 Logarithm of National Total Cases First Order Changes from", First_Day, "to", Last_DaY)
subtitle_content <- "Dati Protezione Civile Italiana - https://datastudio.google.com/u/0/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB"
x_breaks <- DF$Index
x_labs <- format(as.Date(DF$Date),format="%m-%d")
colors <- c("Log-Total Case F.O. Changes"="blue", "Log-Total Case F.O. Changes LOESS Curve"="green")
leg_ord <- c("Log-Total Case F.O. Changes", "Log-Total Case F.O. Changes LOESS Curve")
DF_SP_1 <- ggplot(DF, aes(x=Index)) + 
  geom_hline(yintercept = 0, linetype="solid", color="black", size=0.5) +
  geom_point(alpha=1, size=1, aes(y=Tcases_Log_D1, color="Log-Total Case F.O. Changes"), na.rm=TRUE) +
  geom_smooth(method="loess", formula=y~x, se=FALSE, level = 0.95, fullrange=TRUE, alpha=1, size=1, 
              aes(y=Tcases_Log_D1, color="Log-Total Case F.O. Changes LOESS Curve"), na.rm=TRUE) +
  scale_x_continuous(name="", 
                     breaks=x_breaks, 
                     labels=x_labs) +
  scale_y_continuous(name="Value",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors, breaks=leg_ord) +
  theme(plot.title=element_text(hjust=0.5), 
        plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        axis.text.y.left = element_blank(),
        legend.position=c(0.85,0.85))
plot(DF_SP_1)

The secon order changes in logarithms.

First_Day <- as.character(DF$Date[3])
Last_DaY <- as.character(DF$Date[length(DF$Date)])
title_content <- paste("Università di Roma Tor Vergata - Master in Data Science - Scatter Plots of CoViD-19 Logarithm of National Total Cases Second Order Changes from", First_Day, "to", Last_DaY)
subtitle_content <- "Dati Protezione Civile Italiana - https://datastudio.google.com/u/0/reporting/91350339-2c97-49b5-92b8-965996530f00/page/RdlHB"
x_breaks <- DF$Index
x_labs <- format(as.Date(DF$Date),format="%m-%d")
colors <- c("Log-Total Case S.O. Changes"="red", "Log-Total Case S.O. Changes LOESS Curve"="magenta")
leg_ord <- c("Log-Total Case S.O. Changes", "Log-Total Case S.O. Changes LOESS Curve")
DF_SP_2 <- ggplot(DF, aes(x=Index)) + 
  geom_hline(yintercept = 0, linetype="solid", color="black", size=0.5) +
  geom_point(alpha=1, size=1, aes(y=Tcases_Log_D2, color="Log-Total Case S.O. Changes"), na.rm=TRUE) +
  geom_smooth(method = "loess", formula=y~x, se=FALSE, level = 0.95, fullrange=TRUE, alpha=1, size=1, 
              aes(y=Tcases_Log_D2, color="Log-Total Case S.O. Changes LOESS Curve"), na.rm=TRUE) +
  scale_x_continuous(name="Date", 
                     breaks=x_breaks, 
                     labels=x_labs) +
  scale_y_continuous(name="Value",
                     breaks=waiver(), 
                     labels=waiver(),
                     sec.axis=sec_axis(~., breaks=waiver(), labels=waiver())) +
  ggtitle(title_content) +
  labs(subtitle=subtitle_content) +
  scale_color_manual(name="Legend", values=colors, breaks=leg_ord) +
  theme(plot.title=element_text(hjust=0.5), 
        # plot.subtitle=element_text(hjust=0.5),
        axis.text.x = element_text(angle=90, vjust=1),
        axis.text.y.left = element_blank(),
        legend.position=c(0.85,0.85))
plot(DF_SP_2)

0.4 Estimation of the Pararameters of Logistic Function

To be continued

0.5 References

0.5.1 Books:

Kleiber, Christian, Zeileis, Achim - Applied Econometrics with R Use R! - Springer, 2008

Shumway, Robert H., Stoffer, David S. - Time Series Analysis and Its Applications (with R Examples) 4th Edition Springer Texts in Statistics - Springer, 2017 https://www.stat.pitt.edu/stoffer/tsa4/tsa4.pdf

0.5.2 Websites:

Hyndman, Robin J,, Athanasopoulos, George - Forecasting: Principles and Practice Monash Univeristy, Australia https://otexts.com/fpp2/

Covid-19-2020-04-18

R. Monte - docente di Time Series - monte@uniroma.it
Dipartimento d’Ingegneria Civile e Ingegneria Informatica - Master in Data Science

18/04/2020

0.1 Preface

0.2 Data Downloading and Early Processing

0.3 Early Data Plots

0.4 Estimation of the Pararameters of Logistic Function

0.5 References

0.5.1 Books:

0.5.2 Websites:

Covid-19-2020-04-18

R. Monte - docente di Time Series - monte@uniroma.it Dipartimento d’Ingegneria Civile e Ingegneria Informatica - Master in Data Science

18/04/2020

0.1 Preface

0.2 Data Downloading and Early Processing

0.3 Early Data Plots

0.4 Estimation of the Pararameters of Logistic Function

0.5 References

0.5.1 Books:

0.5.2 Websites:

R. Monte - docente di Time Series - monte@uniroma.it
Dipartimento d’Ingegneria Civile e Ingegneria Informatica - Master in Data Science