This code might be slightly slower because it performs one query per
field in the form, but I didn't notice any differences on my development
machine, and the code is now much easier to understand.
Due to technical issues, sometimes users voted in booths and their vote
couldn't be added to the database. So we're including them in the users
with no demographic data.
Using SQL's `select` instead of converting the records to a ruby array
increases performance dramatically when there are thousands of records.
For a poll with 200000 voters, calculating stats took more than 7
minutes, and now it takes less than 2 minutes.
These methods are only used while stats are being generated; once stats
are generated, they aren't used anymore. So there's no need to store
them using the Dalli cache.
Furthermore, there are polls (and even budgets) with hundreds of
thousands of participants. Calculating stats for them takes a very long
time because we can't store all those records in the Dalli cache.
However, since these records aren't used once the stats are generated,
we can store them in an instance variable while we generate the stats,
speeding up the process.
We need a way to manually expire the cache for a budget or poll without
expiring the cache of every budget or poll.
Using the `updated_at` column would be dangerous because most of the
times we update a budget or a poll, we don't need to regenerate their
stats.
We've considered adding a `stats_updated_at` column to each of these
tables. However, in that case we would also need to add a similar column
in the future to every process type whose stats we want to generate.
If users participated and were hidden after participating, we should
still count them in the participants stats.
In the tests, we set users' `hidden_at` attribute before they vote.
Although in real life they would vote first and then they would be
hidden, I've written the tests like this for the sake of simplicity.
This implementation is a bit more robust because we don't have to change
any of the "or_later?" methods if we add or remove a new phase.
We could also use metaprogramming to reduce code duplication in these
methods. So far, I've decided to keep the code simple since the
duplication seems reasonable.
As the Rails guides say:
> All scope methods will return an ActiveRecord::Relation object
That means `find_by_kind` will return a relation when nothing is found;
the expected behaviour is to return `nil`, like all `find_by` methods
do.
Using scopes also means strange things happen when we try to chain
scopes like `phases.published.drafting`. With scopes, the `drafting`
part would be ignored and all published phases would be returned.
Even if this class looks very simple now, we're trying a few things
related to these stats. Having a class for it makes future changes
easier and, if there weren't any future changes, at least it makes
current experiments easier.
Note we keep the method `participants_by_geozone` to return a hash
because we're caching the stats and storing GeozoneStats objects would
need a lot more memory and we would get an error.
The code is easier to read now, it returns the same results it used to
return, and performance-wise it's probably the same thing, but if it's
not, we'll trust Rails will do optimizations that we don't when we
manually pluck the IDs.
It is way more efficient because we're caching the result of that
method, and this way we only store each voter once in the cache. We were
storing many voters several times and then we were filtering them with
`uniq`.
So if we don't have information regarding gender, age or geozone, stats
regarding those topics will not be shown.
Note we're using `spec/models/statisticable_spec.rb` because having the
same file in `spec/models/concerns` caused the tests to be executed
twice.
Also note the implementation behind the `gender?`, `age?` and `geozone?`
methods is a bit primitive. We might need to make it more robust in the
future.
It will make it far easier to call other methods on the stats object,
and we're already caching the methods.
We had to remove the view fragment caching because the stats object
isn't as easy to cache. The good thing about it is the view will
automatically be updated when we change logic regarding which stats to
show, and the methods taking long to execute are cached in the model.
For now we think showing them would be showing too much data and it
would be a bit confusing.
I've been tempted to just remove the view and keep the methods in the
model in case they're used by other institutions using CONSUL. However,
it's probably better to wait until we're asked to re-implement them, and
in the meantime we don't maintain code nobody uses. The code wasn't that
great to start with (I know it because I wrote it).
We were expecting `balloters` to include `poll_ballot_voters` (that's
why we're substracting them to calculate web participants), but reality
has proven `poll_ballot_voters` aren't included in `balloters`.
We didn't use metaprogramming from the start because the
`null_percentage_web` method had a particular behaviour.
However, the behaviour (due to a typo) didn't really matter because
there are no null web votes, and so the `null_percentage_web` is always
zero.
While we already had "one test to rule all stats", testing each method
individually makes reading, adding and changing tests easier.
Note we need to make all methods being tested public. We could also test
them using methods like `stats.generate[:total_valid_votes]` instead of
`stats.total_valid_votes`, but then the tests would be more difficult to
read.