Commit Graph

45 Commits

Author SHA1 Message Date
Javi Martín
048bdb2e9e Add and apply Rails/OrderArguments rubocop rule
This rule was introduced in rubocop-rails 2.33. We were following it
most of the time.
2025-11-05 11:51:23 +01:00
Julian Herrero
16f844595d Don't use the cache in admin budget stats
In commit e51e03446, we started using the same code to show stats in the
public area and in the admin area. However, in doing so we introduced a
bug, since stats in the public area are only shown after a certain part
of the process has finished, meaning the stats appearing on the page
never change (in theory), so it's perfectly fine to cache them. However,
in the admin area stats can be accessed while the process is still
ongoing, so caching the stats will lead to the wrong results being
displayed.

We've thought about expiring the cache when new supports or ballot lines
are added; however, that means the methods calculating the stats for the
supporting phase would expire when supports are added/removed but the
methods calculating the stats for the voting phase would expire when
ballot lines are added/removed. It gets even more complex because the
`headings` method calculates stats for both the supporting and the
voting phases.

So, since loading stats in the admin section is fast even without the
cache because they only load very basic statistics, we're taking the
simple approach of disabling the cache in this case, so everything works
the same way it did before commit e51e03446.

Co-authored-by: Javi Martín <javim@elretirao.net>
2024-05-20 16:19:41 +02:00
Javi Martín
a4461a1a56 Expire the stats cache once per day
When we first started caching the stats, generating them was a process
that took several minutes, so we never expired the cache.

However, there have been cases where we run into issues where the stats
shown on the screen were outdated. That's why we introduced a task to
manually expire the cache.

But now, generating the stats only takes a few seconds, so we can
automatically expire them every day, remove all the logic needed to
manually expire them, and get rid of most of the issues related to the
cache being outdated.

We're expiring them every day because it's the same day we were doing in
public stats (which we removed in commit 631b48f58), only we're using
`expires_at:` to set the expiration time, in order to simplify the code.

Note that, in the test, we're using `travel_to(time)` so the test passes
even when it starts an instant before midnight. We aren't using
`:with_frozen_time` because, in similar cases (although not in this
case, but I'm not sure whether that's intentional), `travel_to` shows
this error:

> Calling `travel_to` with a block, when we have previously already made
> a call to `travel_to`, can lead to confusing time stubbing.
2024-05-17 20:11:16 +02:00
Javi Martín
a5646fcdb3 Remove Cron job to generate stats
Since now generating stats (assuming the results aren't in the cache)
only takes a few seconds even when there are a hundred thousand
participants, as opposed to the several minutes it took to generate them
when we introduced the Cron job, we can simply generate the stats during
the first request to the stats page.

Note that, in order to avoid creating a temporary table when the stats
are cached, we're making sure we only create this table when we need to.
Otherwise, we could spend up to 1 second on every request to the stats
page creating a table that isn't going to be used.

Also note we're using an instance variable to check whether we're
creating a table; I tried to use `table_exists?`, but it didn't work. I
wonder whether `table_exists?` doesn't detect temporary tables.
2024-05-17 16:08:08 +02:00
Javi Martín
653848fc4e Extract method to get the stats key in stats
This way we remove a bit of duplication and it'll be easier to change
the `stats_cache` method.
2024-05-17 16:08:08 +02:00
Javi Martín
62cd4c8d7b Add indices to stats temporary tables
Since we're doing many queries to get stats for each age group and each
geozone, testing shows these indices make stats calculation about 25%
faster on processes with 100,000 participants.
2024-05-17 16:08:08 +02:00
Javi Martín
80dcbfc23c Improve performance generating stats
Debugging shows that the bottleneck in the stats calculation is the
number of times we're querying the users table using the same array of
IDs in the `where` condition but each time combined with other
conditions.

So we're inserting the results of querying the users table with the
array of IDs in a temporary table and using this temporary table for the
other calculations. When querying this temporary table, there's no need
to filter for IDs anymore.

For budget stats, the `generate` method is now about 10-20 times faster
for a budget with 20,000 participants. For budgets with only a few dozen
participants, there's no significant difference in performance.

I thought about modifying the `participants` method and use the
temporary table there. The problem, however, is that in this case it
isn't clear when to drop the temporary table, and we could end up with
thousands of temporary tables in the database if we don't do it right.
Creating and dropping the temporary table in the same transaction, on
the other hand, guarantees that won't be the case.

Note there's no risk of duplicate tables since they're created and
dropped inside a transaction, so we're always using the same table name
for the same resource. We're adding a test that fails with a
`PG::DuplicateTable: ERROR:  relation "participants__1"` error if we
don't use a transaction.
2024-05-17 16:08:04 +02:00
Javi Martín
6f0c27c0fb Remove unused code in statisticable concern
This code isn't used since commit e3063cd24f.
2024-05-17 16:07:26 +02:00
Javi Martín
bcc9fd97f5 Revert "Extract class to manage GeozoneStats"
Back in commit 383909e16, we said:

> Even if this class looks very simple now, we're trying a few things
> related to these stats. Having a class for it makes future changes
> easier and, if there weren't any future changes, at least it makes
> current experiments easier.

Since there haven't been any changes in the last 5 years and we've found
cases where using the GeozoneStats class results in a slightly worse
performance, we're removing this class. The code is now a bit easier to
read, and is consistent with the way we calculate participants by age.

This reverts commit 383909e16.
2024-05-17 16:07:26 +02:00
Javi Martín
1d85a63e7c Calculate age stats based on the participation date
We were calculating the age stats based on the age of the users who
participated... at the moment where we were calculating the stats. That
means that, if 20 years ago, 1000 people who were 16 years old
participated, they would be shown as having 36 years in the stats.

Instead, we want to show the stats at the time when the process took
place, so we're implementing a `participation_date` method.

Note that, for polls, we could actually use the `age` column in the
`poll_voters` table. However, doing so would be harder, would only work
for polls but not for budgets, and it wouldn't be statistically very
relevant, since the stats are shown by age groups, and only a small
percentage of people would change their age group (and only to the
nearest one) between the time they participate and the time the process
ends.

We might use the `poll_voters` table in the future, though, since we
have a similar issue with geozones and genders, and using the
information in `poll_voters` would solve it as well (only for polls,
though).

Also note that we're using the `ends_at` dates because some people but
be too young to vote when a process starts but old enough to vote when
the process ends.

Finally, note that we might need to change the way we calculate the
participation date for a budget, since some budgets might not enabled
every phase. Not sure how stats work in that scenario (even before these
changes).
2024-05-13 15:42:37 +02:00
Javi Martín
0aee568977 Add and apply Rails/RedundantActiveRecordAllMethod
This rule was introduced in rubocop-rails 2.21.0.
2023-11-20 14:22:12 +01:00
Javi Martín
97aca0cf95 Add and apply rules for multi-line arrays
We were already applying these rules in most cases.

Note we aren't enabling the `MultilineArrayLineBreaks` rule because
we've got places with many elements whire it isn't clear whether
having one element per line would make the code more readable.
2023-08-18 14:56:16 +02:00
Javi Martín
4a851c0d82 Add and apply Style/MapToHash rubocop rule
This rule was added in Rubocop 1.24.0. Applying it slightly simplifies
the code.
2022-10-19 14:26:49 +02:00
Javi Martín
8b5cca746c Apply rubocop rules to freeze constants
Added by popular demand among our team members.
2019-10-26 13:21:36 +02:00
Javi Martín
47b2c42a1d Apply IndentationConsistency rubocop rule 2019-09-10 20:02:15 +02:00
Javi Martín
45a3d8daf0 Add option to enable advanced stats 2019-05-22 11:50:03 +02:00
Javi Martín
483ebffb47 Fix statisticable concern definition
Methods defined inside "included" cannot be called using `super` from
a class including the module.
2019-05-21 13:50:19 +02:00
Javi Martín
aa0e813970 Use ruby cache for stats helper methods
These methods are only used while stats are being generated; once stats
are generated, they aren't used anymore. So there's no need to store
them using the Dalli cache.

Furthermore, there are polls (and even budgets) with hundreds of
thousands of participants. Calculating stats for them takes a very long
time because we can't store all those records in the Dalli cache.

However, since these records aren't used once the stats are generated,
we can store them in an instance variable while we generate the stats,
speeding up the process.
2019-05-21 13:50:19 +02:00
Javi Martín
7c0e499eee Add table to store stats versions
We need a way to manually expire the cache for a budget or poll without
expiring the cache of every budget or poll.

Using the `updated_at` column would be dangerous because most of the
times we update a budget or a poll, we don't need to regenerate their
stats.

We've considered adding a `stats_updated_at` column to each of these
tables. However, in that case we would also need to add a similar column
in the future to every process type whose stats we want to generate.
2019-05-21 13:50:19 +02:00
Javi Martín
9335c51cfc Include hidden users in stats
If users participated and were hidden after participating, we should
still count them in the participants stats.

In the tests, we set users' `hidden_at` attribute before they vote.
Although in real life they would vote first and then they would be
hidden, I've written the tests like this for the sake of simplicity.
2019-05-21 13:50:19 +02:00
Javi Martín
1f4707facd Extract method to get stats participant_ids
This way we can share the `participants` method between budget and poll
stats.
2019-05-21 13:50:19 +02:00
Javi Martín
ae4cd06c24 Include no geozone in no demographic data 2019-05-21 13:50:18 +02:00
Javi Martín
383909e16c Extract class to manage GeozoneStats
Even if this class looks very simple now, we're trying a few things
related to these stats. Having a class for it makes future changes
easier and, if there weren't any future changes, at least it makes
current experiments easier.

Note we keep the method `participants_by_geozone` to return a hash
because we're caching the stats and storing GeozoneStats objects would
need a lot more memory and we would get an error.
2019-05-21 13:50:18 +02:00
Javi Martín
558070d530 Remove geozone participation percentage
We currently don't store geozone population.
2019-05-21 13:50:18 +02:00
Javi Martín
793bfed372 Display only existing stats
So if we don't have information regarding gender, age or geozone, stats
regarding those topics will not be shown.

Note we're using `spec/models/statisticable_spec.rb` because having the
same file in `spec/models/concerns` caused the tests to be executed
twice.

Also note the implementation behind the `gender?`, `age?` and `geozone?`
methods is a bit primitive. We might need to make it more robust in the
future.
2019-05-21 13:50:18 +02:00
Javi Martín
76c7827cf4 Use stats objects instead of hashes
It will make it far easier to call other methods on the stats object,
and we're already caching the methods.

We had to remove the view fragment caching because the stats object
isn't as easy to cache. The good thing about it is the view will
automatically be updated when we change logic regarding which stats to
show, and the methods taking long to execute are cached in the model.
2019-05-21 13:50:18 +02:00
Javi Martín
e3063cd24f Remove complex poll stats
For now we think showing them would be showing too much data and it
would be a bit confusing.

I've been tempted to just remove the view and keep the methods in the
model in case they're used by other institutions using CONSUL. However,
it's probably better to wait until we're asked to re-implement them, and
in the meantime we don't maintain code nobody uses. The code wasn't that
great to start with (I know it because I wrote it).
2019-05-21 13:50:17 +02:00
Javi Martín
8f69113233 Add poll stats by geozone and channel 2019-05-21 13:50:17 +02:00
Javi Martín
202fb44008 Add poll stats by age and channel 2019-05-21 13:50:17 +02:00
Javi Martín
7b408a4b88 Add poll stats by gender and channel 2019-05-21 13:50:17 +02:00
Javi Martín
90fe746d27 Add geozone stats to polls 2019-05-21 13:50:16 +02:00
Javi Martín
a552645e7f Add tests to poll stats
While we already had "one test to rule all stats", testing each method
individually makes reading, adding and changing tests easier.

Note we need to make all methods being tested public. We could also test
them using methods like `stats.generate[:total_valid_votes]` instead of
`stats.total_valid_votes`, but then the tests would be more difficult to
read.
2019-05-21 13:50:16 +02:00
Javi Martín
4d520a3a47 Rename age_groups method
The name was confusing because it seemed to return a list of age groups.
2019-05-21 13:50:16 +02:00
Javi Martín
88daaee9fe Simplify code 2019-05-21 13:50:16 +02:00
Javi Martín
9a01ff5323 Refactor age groups method
We try to make the method return data which is easier to handle in the
view.
2019-05-21 13:50:15 +02:00
Javi Martín
c1b76a7ebf Simplify age groups method 2019-05-21 13:50:15 +02:00
Javi Martín
c2489e3209 Increase number of age groups
We would now like to differenciate between 70-year-old people and
90-year-old people.
2019-05-21 13:50:15 +02:00
Javi Martín
e4a032ee68 Split common and specific stats methods 2019-05-21 13:49:42 +02:00
Javi Martín
5d2f5d1d81 Move gender and age methods to a common concern
These are generic methods which only depend on the participants.
2019-05-21 13:49:42 +02:00
Javi Martín
04c920c27d Simplify calculate percentage method 2019-05-21 13:49:42 +02:00
Javi Martín
ccaa2e1a77 Remove duplication to calculate percentage 2019-05-21 13:49:42 +02:00
Javi Martín
188278296c Simplify the way we cache stats 2019-05-21 13:48:54 +02:00
Javi Martín
d627215af4 Use symbols for method names 2019-05-21 13:27:03 +02:00
Javi Martín
313ffb589b Share method to generate stats 2019-05-21 13:27:03 +02:00
Javi Martín
62a97f9003 Add a common concern for budget and poll stats 2019-05-21 13:27:03 +02:00