The following graphs are a forecast model for Covid-19 daily new-cases and new-deaths in . These forecasts are computed using a machine learning model that uses all of the Covid-19 case data provided by Oxford University1,2. No time-of-year information is included in the model, only localized daily-historical trends are used in each feature record. This prevents day-biasing of the results, such as the model predicting that covid outbreaks will occur more often on Monday rather than after 26 days of a specific statistical trend. Our model uses statistical trends to accurately forecast trends in Covid-19 morbidity and mortality.
The immunity factor (IF) shown in the graphs as a black line starting from the lower left of the graph is a measurement of how much exposure the population has had. The IF is computed as the (ConfirmedCases + VaccinationDoses - ConfirmedDeaths) divided by the population. Reaching an IF of 1.5 seems to be a critical step in controlling the morbidity of Covid-19. This model does not look at individual vaccinations. Rather, it considers only vaccine doses administered. Having 1 dose of a vaccination is good, just like having 2 is better. The notion of "fully vaccinated" is not used in this learning model.
The forecast must be validated against the historical truth. The following graphs are the current forecast superimposed over the forecast from 10/12/2021. The faded graph is the past prediction, and the current forecast and truth are bright. The vertical dotted lines depict the today day of that forecast. Use these graphs to decide, for yourself, how accurate the forecast has become.
In some of the graphs, the scale no longer matches with the historical forecast. In these cases you will have to use your innate mental abilities to scale the historical forecast properly. The scaling mismatch occurs when the future forecast has larger predictions than the historical predictions.
The following graph compares the truth data from 10/12/2021 to the prediction that started on 10/12/2021. The top of the graph shows New Daily Cases, and the bottom shows New Daily Deaths. The middle graph shows the relative cumulative cases, both truth and predicted. The green color is used to depict the prediction and the red color is the truth. The relative basis is zero starting on 10/12/2021. The 14-day zero-lag exponential moving averagewikipedia is shown as the black line in the top and bottom graphs.
The lines in the middle graph are the daily-running explained variance of the prediction. The first few days of the explained variance are undefined because there isn't enough data to compute a good variance, so the lines start 4 days after 10/12/2021. Look at how the lines diverge when the prediction pulls away from the truth, and then converges back to zero when the prediction aligns with the truth. Ideally the explained variance would tend towards 1.0 (for a perfect match), but the truth data can have wide variance (Monday reporting trends) which makes the prediction very wrong in some cases, but very good in others, and so the variance becomes balanced by the negative/positive difference from the truth. The scale of the explained variance will mute the [0,1] range when there is a large negative divergence.
The middle region of this graph is a stacked graph. The predicted confirmed cases is stacked atop the truth as if the truth were the zero. This can be confusing to read and may not convey the importance of this comparison. Even if the prediction does not match the truth perfectly, if the two have the same variance then the stacked graphs should be identical in size on any given period. The running daily R2 will indicate when the graphs are similar, and when they are not. As the R2 diverges negatively, the green graph area should be different than the red graph area. As the R2 approaches 1.0 so does the difference between the prediction and truth converge to zero.
"Even a stopped clock is right twice a day" - Marie von Ebner-Eschenbach
It's easy to get excited about the numbers matching, when they match. Looking at the model closely you can easily see the divergence of the prediction (the numbers). In many regions the model over-predicts the case load, which is largely due to the lack of emmigration information in the model.
The true value of this model is its ability to forecast future trends. Look for upward movement and peaks, these will tell you the forecast of the infection. There are instabilities in the model, these are apparent where sharp up-ticks occur and give rise to "super waves." These predictions are incorrect and should be ignored. These instabilities are often caused by model confusion, such as poor reporting of truth data, or new states in the model that were not considered.
At any given moment in this covid forecast model there is a 2X-period horizon of validity, followed by a protracted period of instability. For the 29 day model, for instance, the validity period is 58 days. For the 11 day model, the validity period is 22 days. Eventhough the forecast is predicting out to April 2022, only consider the horizon of validity.