The following graphs are a forecast model for Covid-19 daily new-cases and new-deaths in .
These forecasts are computed using a machine learning model that uses all of the Covid-19 case data provided
by Oxford University^{1,2}. No time-of-year information is included in the model, only localized daily-historical
trends are used in each feature record. This prevents day-biasing of the results, such as the model predicting that covid outbreaks will occur more
often on Monday rather than after 26 days of a specific statistical trend. Our model uses statistical trends to accurately forecast
trends in Covid-19 morbidity and mortality.

The *immunity factor* (IF) shown in the graphs as a black line starting from the lower left of the graph is
a measurement of how much exposure the population has had. The IF is computed as the (ConfirmedCases + VaccinationDoses - ConfirmedDeaths) divided by
the population. Reaching an IF of 1.5 seems to be a critical step in controlling the morbidity of Covid-19. This model does not look at individual vaccinations. Rather, it
considers only vaccine doses administered. Having 1 dose of a vaccination is good, just like having 2 is better. The notion of "fully vaccinated" is not used
in this learning model.

This graph shows the daily new cases at the top, the daily new deaths at the bottom,
and the vaccination with immunity factor in the middle. The green lines are the forecast, the
red lines are the historical *truth*. This is raw data, no data smoothing or manipulation is performed
on the data for training or analysis. The PCA model does all of the smoothing we need for this forecast.

These two graphs are the new cases (left) and new deaths (right) on a per-day basis. The green dots are the
prediction, and the red dots are the known (historical). The date runs along the X axis (horizontal) of
graph. The R2 value shown in the title of the graph is the explained variance^{wikipedia,scikit} of the predicted values as compared to
the known values (when known). We know the *truth* starting on 8/1/2021, up until yesterday. The forecast predicts starting
on 8/1/2021 so that its predictability performance can be determined.

The validation graphs, again for new cases on the left, and new deaths on the right, show how well the learning model can predict the historical data. Only 80% of the historical data was used for training the model. The hold-out data is selected randomly from the ordered vector of inputs. The training data is chosen as the complement set of that hold-out data, and is shuffled before being used in the training. What you are seeing here is the result of a randomized n-day window of data fed to the training model followed by all of the data fed back into that model. The R2 (explained variance) is shown in the legends, multiplied by 10,000 for readability.

The MAE is the Mean Absolute Error. In these graphs, the MAE Pct is the measure
of the residual as a ratio to the MAE of the prediction. Small values mean good agreement
with the *truth*, and a large value means divergence.

The forecast must be validated against the historical truth. The following graphs are the current
forecast superimposed over the forecast from 10/12/2021. The faded graph is the past prediction, and the current forecast and
truth are bright. The vertical dotted lines depict the *today* day of that forecast. Use these graphs
to decide, for yourself, how accurate the forecast has become.

In some of the graphs, the scale no longer matches with the historical forecast. In these cases you will have to use your innate mental abilities to scale the historical forecast properly. The scaling mismatch occurs when the future forecast has larger predictions than the historical predictions.

The following graph compares the truth data from 10/12/2021 to the prediction that started on 10/12/2021. The top
of the graph shows New Daily Cases, and the bottom shows New Daily Deaths. The middle graph shows the relative cumulative cases, both truth
and predicted. The green color is used to depict the prediction and the red color is the truth. The relative basis is zero starting
on 10/12/2021. The 14-day zero-lag exponential moving average^{wikipedia}
is shown as the black line in the top and bottom graphs.

The lines in the middle graph are the daily-running explained variance of the prediction. The first few days of the explained variance are undefined
because there isn't enough data to compute a good variance, so the lines start 4 days after 10/12/2021. Look at how the lines diverge when the
prediction pulls away from the truth, and then
converges back to zero when the prediction aligns with the truth. Ideally the explained variance would tend towards 1.0 (for a perfect match), but
the truth data can have wide variance (Monday reporting trends) which makes the prediction very wrong in some cases, but very good in others, and so
the variance becomes *balanced* by the negative/positive difference from the truth. The scale of the explained variance will mute the [0,1] range
when there is a large negative divergence.

The middle region of this graph is a stacked graph. The predicted *confirmed cases* is stacked atop the truth as if the truth were the zero. This can be
confusing to read and may not convey the importance of this comparison. Even if the prediction does not match the truth perfectly, if the two have the same variance
then the stacked graphs should be identical in size on any given period. The running daily R2 will indicate when the graphs are similar, and when they are not.
As the R2 diverges negatively, the green graph area should be different than the red graph area. As the R2 approaches 1.0 so does the difference between the
prediction and truth converge to zero.

"Even a stopped clock is right twice a day" - Marie von Ebner-Eschenbach

It's easy to get excited about the numbers matching, when they match. Looking at the model closely you can easily see the divergence of the prediction (the numbers). In many regions the model over-predicts the case load, which is largely due to the lack of emmigration information in the model.

The true value of this model is its ability to forecast future trends. Look for upward movement and peaks, these will tell you the forecast of the infection. There are instabilities in the model, these are apparent where sharp up-ticks occur and give rise to "super waves." These predictions are incorrect and should be ignored. These instabilities are often caused by model confusion, such as poor reporting of truth data, or new states in the model that were not considered.

At any given moment in this covid forecast model there is a 2X-period horizon of validity, followed by a protracted period of instability. For the 29 day model, for instance, the validity period is 58 days. For the 11 day model, the validity period is 22 days. Eventhough the forecast is predicting out to April 2022, only consider the horizon of validity.