This way it'll be easier to change it while checking we haven't broken
existing behavior.
While writing the tests, I noticed we were sometimes storing a symbol in
the session while sometimes we were storing a string. So we're adding a
`to_s` call so we always store a string in the session.
This parameter isn't used since commit b86c0d3c3, which deleted a file
that wasn't used since commit 146c09adb. Further proof that this code
wasn't used is the fact that the `enable_translation_style` method,
which this code called, was removed in commit 5ada97544.
When calculating the stats on, say, January 1st 2024, and using a group
age of, say, between 20 and 24 years, we were considering that everyone
born between 2000 and 2004 had between 20 and 24 years. This wasn't
accurate, since most people born in 2004 would have 19 years at that
point, and most people born in 1999 would have 24 years.
The age of the participants should refer to the time where the voting
took place, and not the time when the budget was created.
Note that now the tests pass even when the budget is created in a year
but the balloting phase takes place in a different year. By default,
each budget phase lasts for one month, so when we create a budget in a
test, its balloting phase takes place a few months after the date when
the budget is created. Since the `between_ages` scope uses the date of
the balloting phase as a reference, and the test was using the date when
the budget was created as a reference, the test failed depending on
whether those dates took place in the same year or not.
In commit e51e03446, we started using the same code to show stats in the
public area and in the admin area. However, in doing so we introduced a
bug, since stats in the public area are only shown after a certain part
of the process has finished, meaning the stats appearing on the page
never change (in theory), so it's perfectly fine to cache them. However,
in the admin area stats can be accessed while the process is still
ongoing, so caching the stats will lead to the wrong results being
displayed.
We've thought about expiring the cache when new supports or ballot lines
are added; however, that means the methods calculating the stats for the
supporting phase would expire when supports are added/removed but the
methods calculating the stats for the voting phase would expire when
ballot lines are added/removed. It gets even more complex because the
`headings` method calculates stats for both the supporting and the
voting phases.
So, since loading stats in the admin section is fast even without the
cache because they only load very basic statistics, we're taking the
simple approach of disabling the cache in this case, so everything works
the same way it did before commit e51e03446.
Co-authored-by: Javi Martín <javim@elretirao.net>
When we first started caching the stats, generating them was a process
that took several minutes, so we never expired the cache.
However, there have been cases where we run into issues where the stats
shown on the screen were outdated. That's why we introduced a task to
manually expire the cache.
But now, generating the stats only takes a few seconds, so we can
automatically expire them every day, remove all the logic needed to
manually expire them, and get rid of most of the issues related to the
cache being outdated.
We're expiring them every day because it's the same day we were doing in
public stats (which we removed in commit 631b48f58), only we're using
`expires_at:` to set the expiration time, in order to simplify the code.
Note that, in the test, we're using `travel_to(time)` so the test passes
even when it starts an instant before midnight. We aren't using
`:with_frozen_time` because, in similar cases (although not in this
case, but I'm not sure whether that's intentional), `travel_to` shows
this error:
> Calling `travel_to` with a block, when we have previously already made
> a call to `travel_to`, can lead to confusing time stubbing.
Since now generating stats (assuming the results aren't in the cache)
only takes a few seconds even when there are a hundred thousand
participants, as opposed to the several minutes it took to generate them
when we introduced the Cron job, we can simply generate the stats during
the first request to the stats page.
Note that, in order to avoid creating a temporary table when the stats
are cached, we're making sure we only create this table when we need to.
Otherwise, we could spend up to 1 second on every request to the stats
page creating a table that isn't going to be used.
Also note we're using an instance variable to check whether we're
creating a table; I tried to use `table_exists?`, but it didn't work. I
wonder whether `table_exists?` doesn't detect temporary tables.
Since we're doing many queries to get stats for each age group and each
geozone, testing shows these indices make stats calculation about 25%
faster on processes with 100,000 participants.
Debugging shows that the bottleneck in the stats calculation is the
number of times we're querying the users table using the same array of
IDs in the `where` condition but each time combined with other
conditions.
So we're inserting the results of querying the users table with the
array of IDs in a temporary table and using this temporary table for the
other calculations. When querying this temporary table, there's no need
to filter for IDs anymore.
For budget stats, the `generate` method is now about 10-20 times faster
for a budget with 20,000 participants. For budgets with only a few dozen
participants, there's no significant difference in performance.
I thought about modifying the `participants` method and use the
temporary table there. The problem, however, is that in this case it
isn't clear when to drop the temporary table, and we could end up with
thousands of temporary tables in the database if we don't do it right.
Creating and dropping the temporary table in the same transaction, on
the other hand, guarantees that won't be the case.
Note there's no risk of duplicate tables since they're created and
dropped inside a transaction, so we're always using the same table name
for the same resource. We're adding a test that fails with a
`PG::DuplicateTable: ERROR: relation "participants__1"` error if we
don't use a transaction.
Back in commit 383909e16, we said:
> Even if this class looks very simple now, we're trying a few things
> related to these stats. Having a class for it makes future changes
> easier and, if there weren't any future changes, at least it makes
> current experiments easier.
Since there haven't been any changes in the last 5 years and we've found
cases where using the GeozoneStats class results in a slightly worse
performance, we're removing this class. The code is now a bit easier to
read, and is consistent with the way we calculate participants by age.
This reverts commit 383909e16.
Using `>` and `<` for dates is a bit confusing because it usually
requires mentally translating `<` into "before" and `>` into "after",
meaning it takes a few seconds to check which is the starting date and
which is the ending date. When using a range, it's easier to see which
is which.
Note that we were using `>` and `<`, but now we're using the equivalent
of `>=` and `<=`. This makes sense IMHO, since the beginning of the year
and the end of the year should also be included in this case.
We were calculating the age stats based on the age of the users who
participated... at the moment where we were calculating the stats. That
means that, if 20 years ago, 1000 people who were 16 years old
participated, they would be shown as having 36 years in the stats.
Instead, we want to show the stats at the time when the process took
place, so we're implementing a `participation_date` method.
Note that, for polls, we could actually use the `age` column in the
`poll_voters` table. However, doing so would be harder, would only work
for polls but not for budgets, and it wouldn't be statistically very
relevant, since the stats are shown by age groups, and only a small
percentage of people would change their age group (and only to the
nearest one) between the time they participate and the time the process
ends.
We might use the `poll_voters` table in the future, though, since we
have a similar issue with geozones and genders, and using the
information in `poll_voters` would solve it as well (only for polls,
though).
Also note that we're using the `ends_at` dates because some people but
be too young to vote when a process starts but old enough to vote when
the process ends.
Finally, note that we might need to change the way we calculate the
participation date for a budget, since some budgets might not enabled
every phase. Not sure how stats work in that scenario (even before these
changes).
We were already testing it as part of the statisticable concern, but
it's easier to understand when we have specific tests for it as well.
Note that we're using `eq` insteab of `be` because it's what we use most
often in the test. For consistency, we're also changing the tesst in
budget stats so it uses `eq`.
According to the README [1]:
> To mask previously collected IPs, use:
> Ahoy::Visit.find_each do |visit|
> visit.update_column :ip, Ahoy.mask_ip(visit.ip)
> end
We're adapting the code with our version, since we use the `Visit` model
instead of the `Ahoy::Visit` model.
[1] https://github.com/ankane/ahoy/blob/v5.0.2/README.md#ip-masking
As mentioned in Ahoy's README [1]:
> Ahoy provides a number of options to help with GDPR compliance.
> Update config/initializers/ahoy.rb with:
>
> class Ahoy::Store < Ahoy::DatabaseStore
> def authenticate(data)
> # disables automatic linking of visits and users
> end
> end
>
> Ahoy.mask_ips = true
> Ahoy.cookies = :none
As also mentioned in the README:
> If Ahoy was installed before v5, add an index before making this
> change.
> (...)
> For Active Record, create a migration with:
> add_index :ahoy_visits, [:visitor_token, :started_at]
However, the `visitor_token` doesn't exist in our table, since we
generated the `visits` table when Ahoy used the `visitor_id` column. So
we're using this column for the index.
Note we also need to change the `visit` method, since otherwise we get
an exception [2]. As mentioned on the issue reporting the exception:
> you'll need to copy the latest version of that method and adapt it to
> your model. I believe you'll want to replace:
>
> where(visit_token: ahoy.visit_token) with
> where(id: ensure_uuid(ahoy.visit_token))
>
> where(visitor_token: ahoy.visitor_token) with
> where(visitor_id: ensure_uuid(ahoy.visitor_token))
So we're copying the latest version of that method and changing it
accordingly.
[1] https://github.com/ankane/ahoy/blob/v5.0.2/README.md
[2] Issue 549 in https://github.com/ankane/ahoy
This was added in commit 02f19aa4b, before we started tracking events.
I don't think we ever used it; in any case, we now use the `Ahoy::Chart`
class to deal with the stats Ahoy used to generate.
This page isn't linked from anywhere and most Consul Democracy
installations don't even know it exists, so it's useless for most
people.
If we ever bring it back, we should at least add a link pointing to this
page.
The test was supposed to be testing notifications without a link, but it
was adding a link. It was only passing because the expectations was
checking whether there was a link whose text was a URL, but what we
want to check here is whether the URL is present in the `href`
attribute.
We're about to delete the page showing public stats and, since this link
was using the `/stats` path, we're fixing it.
Now that we've moved the logic to generate the events data to a model,
and we've got access to the model in the component rendering the chart,
we can render the data inside the chart instead of doing an extra AJAX
request to get the same data.
Originally, this was problaby done this way so the page wouldn't take
several seconds to load while preparing the data for the chart when
there are thousands of dates being displayed. With an AJAX call, the
page would load as fast as usual, and then the chart would render after
a few seconds. However, we can have an even better performance
improvement in this scenario if we use a Set instead of an Array. The
method `Array#include?`, which we were calling for every date in the
data, is much slower that `Set#merge`. So now both the page and the
chart load as fast as expected.
We could also use something like:
```
def add
(...)
shared_keys.push(*collection.keys)
end
def build
(...)
shared_keys.uniq.each do |k|
(...)
end
def shared_keys
@shared_keys ||= []
end
```
Or other approaches to avoid using `Array#include?`. The performance
would be similar to the one we get when using `Set`. We're using a `Set`
because it makes more obvious that `shared_keys` is supposed to contain
unique elements.
We've had some tests failing in the past due to these AJAX requests
being triggered automatically during the tests and no expectations
checking the requests have finished, so now we're reducing the amount of
flaky tests.
We were always displaying the event names in English.
Note we're changing the `user_supported_budgets` key because it didn't
make much sense; the investments are supported, and not the budgets.
We're also adding "created" to most of the event names in order to make
the texts more explicit, since not all the events refer to created data.
Note we're delegating the `t` method because i18n-tasks doesn't detect
code like `ApplicationController.helpers.t` and so reports we aren't
using the `admin.stats.graph` translations.
We were tracking some events with Ahoy, but in an inconsistent way. For
example, we were tracking when a debate was created, but (probably
accidentally) we were only tracking proposals when they were created
from the management section. For budget investments and their supports,
we weren't using Ahoy events but checking their database tables instead.
And we were only using ahoy events for the charts; for the other stats,
we were using the real data.
While we could actually fix these issues and start tracking events
correctly, existing production data would remain broken because we
didn't track a certain event when it happened. And, besides, why should
we bother, for instance, to track when a debate is created, when we can
instead access that information in the debates table?
There are probably some features related to tracking an event and their
visits, but we weren't using them, and we were storing more user data
than we needed to.
So we're removing the track events, allowing us to simplify the code and
make it more consistent. We aren't removing the `ahoy_events` table in
case existing Consul Democracy installations use it, but we'll remove it
after releasing version 2.2.0 and adding a warning in the release notes.
This change fixes the proposal created chart, since we were only
tracking proposals created in the management section, and opens the
possibility to add more charts in the future using data we didn't track
with Ahoy.
Also note the "Level 2 user Graph" test wasn't testing the graph, so
we're changing it in order to test it. We're also moving it next to the
other graphs test and, since we were tracking the event when we were
confirming the phone, we're renaming to "Level 3 users".
Finally, note that, since we were tracking events when something was
created, we're including the `with_hidden` scope. This is also
consistent with the other stats shown in the admin section as well as
the public stats.
The only way to use campaigns is to manually insert them in the
database, which IMHO isn't very practical.
We're going to change every piece of code that generates an Ahoy event
and, in this case, the easiest way to change the Campaing model so it
doesn't use Ahoy events is to simply remove it.
Note we're keeping the database tables until we release a new version,
just in case some Consul Democracy installations are using them. We'll
inform in the release notes that we'll remove the campaigns table after
the release, so existing installations using the `campaigns` table can
move the data somewhere else before we remove the table.