When calculating the stats on, say, January 1st 2024, and using a group
age of, say, between 20 and 24 years, we were considering that everyone
born between 2000 and 2004 had between 20 and 24 years. This wasn't
accurate, since most people born in 2004 would have 19 years at that
point, and most people born in 1999 would have 24 years.
When we first started caching the stats, generating them was a process
that took several minutes, so we never expired the cache.
However, there have been cases where we run into issues where the stats
shown on the screen were outdated. That's why we introduced a task to
manually expire the cache.
But now, generating the stats only takes a few seconds, so we can
automatically expire them every day, remove all the logic needed to
manually expire them, and get rid of most of the issues related to the
cache being outdated.
We're expiring them every day because it's the same day we were doing in
public stats (which we removed in commit 631b48f58), only we're using
`expires_at:` to set the expiration time, in order to simplify the code.
Note that, in the test, we're using `travel_to(time)` so the test passes
even when it starts an instant before midnight. We aren't using
`:with_frozen_time` because, in similar cases (although not in this
case, but I'm not sure whether that's intentional), `travel_to` shows
this error:
> Calling `travel_to` with a block, when we have previously already made
> a call to `travel_to`, can lead to confusing time stubbing.
We were calculating the age stats based on the age of the users who
participated... at the moment where we were calculating the stats. That
means that, if 20 years ago, 1000 people who were 16 years old
participated, they would be shown as having 36 years in the stats.
Instead, we want to show the stats at the time when the process took
place, so we're implementing a `participation_date` method.
Note that, for polls, we could actually use the `age` column in the
`poll_voters` table. However, doing so would be harder, would only work
for polls but not for budgets, and it wouldn't be statistically very
relevant, since the stats are shown by age groups, and only a small
percentage of people would change their age group (and only to the
nearest one) between the time they participate and the time the process
ends.
We might use the `poll_voters` table in the future, though, since we
have a similar issue with geozones and genders, and using the
information in `poll_voters` would solve it as well (only for polls,
though).
Also note that we're using the `ends_at` dates because some people but
be too young to vote when a process starts but old enough to vote when
the process ends.
Finally, note that we might need to change the way we calculate the
participation date for a budget, since some budgets might not enabled
every phase. Not sure how stats work in that scenario (even before these
changes).
For the HashAlignment rule, we're using the default `key` style (keys
are aligned and values aren't) instead of the `table` style (both keys
and values are aligned) because, even if we used both in the
application, we used the `key` style a lot more. Furthermore, the
`table` style looks strange in places where there are both very long and
very short keys and sometimes we weren't even consistent with the
`table` style, aligning some keys without aligning other keys.
Ideally we could align hashes to "either key or table", so developers
can decide whether keeping the symmetry of the code is worth it in a
case-per-case basis, but Rubocop doesn't allow this option.
CONSUL doesn't implement blank votes via web; the comment was based on
the code used in Madrid, which was actually very complex.
And the concept of "all city" was also specific to Madrid. Poll
questions aren't associated to a geozone, so the geozone will depend on
the poll they're associated to.
In Rails 5.1, calling `travel_to` inside another `travel_to` block will
result in a RuntimeError:
> Calling `travel_to` with a block, when we have previously already made
> a call to `travel_to`, can lead to confusing time stubbing.
Due to technical issues, sometimes users voted in booths and their vote
couldn't be added to the database. So we're including them in the users
with no demographic data.
We need a way to manually expire the cache for a budget or poll without
expiring the cache of every budget or poll.
Using the `updated_at` column would be dangerous because most of the
times we update a budget or a poll, we don't need to regenerate their
stats.
We've considered adding a `stats_updated_at` column to each of these
tables. However, in that case we would also need to add a similar column
in the future to every process type whose stats we want to generate.
If users participated and were hidden after participating, we should
still count them in the participants stats.
In the tests, we set users' `hidden_at` attribute before they vote.
Although in real life they would vote first and then they would be
hidden, I've written the tests like this for the sake of simplicity.
It will make it far easier to call other methods on the stats object,
and we're already caching the methods.
We had to remove the view fragment caching because the stats object
isn't as easy to cache. The good thing about it is the view will
automatically be updated when we change logic regarding which stats to
show, and the methods taking long to execute are cached in the model.
For now we think showing them would be showing too much data and it
would be a bit confusing.
I've been tempted to just remove the view and keep the methods in the
model in case they're used by other institutions using CONSUL. However,
it's probably better to wait until we're asked to re-implement them, and
in the meantime we don't maintain code nobody uses. The code wasn't that
great to start with (I know it because I wrote it).
While we already had "one test to rule all stats", testing each method
individually makes reading, adding and changing tests easier.
Note we need to make all methods being tested public. We could also test
them using methods like `stats.generate[:total_valid_votes]` instead of
`stats.total_valid_votes`, but then the tests would be more difficult to
read.
Adding the option to assign a poll to a poll recount factory meant we
didn't need to create so much data.
Also note we're removing the `create(:poll_voter, origin: "booth")`
code, since it isn't used in the stats calculations.