When calculating the stats on, say, January 1st 2024, and using a group
age of, say, between 20 and 24 years, we were considering that everyone
born between 2000 and 2004 had between 20 and 24 years. This wasn't
accurate, since most people born in 2004 would have 19 years at that
point, and most people born in 1999 would have 24 years.
In commit e51e03446, we started using the same code to show stats in the
public area and in the admin area. However, in doing so we introduced a
bug, since stats in the public area are only shown after a certain part
of the process has finished, meaning the stats appearing on the page
never change (in theory), so it's perfectly fine to cache them. However,
in the admin area stats can be accessed while the process is still
ongoing, so caching the stats will lead to the wrong results being
displayed.
We've thought about expiring the cache when new supports or ballot lines
are added; however, that means the methods calculating the stats for the
supporting phase would expire when supports are added/removed but the
methods calculating the stats for the voting phase would expire when
ballot lines are added/removed. It gets even more complex because the
`headings` method calculates stats for both the supporting and the
voting phases.
So, since loading stats in the admin section is fast even without the
cache because they only load very basic statistics, we're taking the
simple approach of disabling the cache in this case, so everything works
the same way it did before commit e51e03446.
Co-authored-by: Javi Martín <javim@elretirao.net>
When we first started caching the stats, generating them was a process
that took several minutes, so we never expired the cache.
However, there have been cases where we run into issues where the stats
shown on the screen were outdated. That's why we introduced a task to
manually expire the cache.
But now, generating the stats only takes a few seconds, so we can
automatically expire them every day, remove all the logic needed to
manually expire them, and get rid of most of the issues related to the
cache being outdated.
We're expiring them every day because it's the same day we were doing in
public stats (which we removed in commit 631b48f58), only we're using
`expires_at:` to set the expiration time, in order to simplify the code.
Note that, in the test, we're using `travel_to(time)` so the test passes
even when it starts an instant before midnight. We aren't using
`:with_frozen_time` because, in similar cases (although not in this
case, but I'm not sure whether that's intentional), `travel_to` shows
this error:
> Calling `travel_to` with a block, when we have previously already made
> a call to `travel_to`, can lead to confusing time stubbing.
Since now generating stats (assuming the results aren't in the cache)
only takes a few seconds even when there are a hundred thousand
participants, as opposed to the several minutes it took to generate them
when we introduced the Cron job, we can simply generate the stats during
the first request to the stats page.
Note that, in order to avoid creating a temporary table when the stats
are cached, we're making sure we only create this table when we need to.
Otherwise, we could spend up to 1 second on every request to the stats
page creating a table that isn't going to be used.
Also note we're using an instance variable to check whether we're
creating a table; I tried to use `table_exists?`, but it didn't work. I
wonder whether `table_exists?` doesn't detect temporary tables.
Since we're doing many queries to get stats for each age group and each
geozone, testing shows these indices make stats calculation about 25%
faster on processes with 100,000 participants.
Debugging shows that the bottleneck in the stats calculation is the
number of times we're querying the users table using the same array of
IDs in the `where` condition but each time combined with other
conditions.
So we're inserting the results of querying the users table with the
array of IDs in a temporary table and using this temporary table for the
other calculations. When querying this temporary table, there's no need
to filter for IDs anymore.
For budget stats, the `generate` method is now about 10-20 times faster
for a budget with 20,000 participants. For budgets with only a few dozen
participants, there's no significant difference in performance.
I thought about modifying the `participants` method and use the
temporary table there. The problem, however, is that in this case it
isn't clear when to drop the temporary table, and we could end up with
thousands of temporary tables in the database if we don't do it right.
Creating and dropping the temporary table in the same transaction, on
the other hand, guarantees that won't be the case.
Note there's no risk of duplicate tables since they're created and
dropped inside a transaction, so we're always using the same table name
for the same resource. We're adding a test that fails with a
`PG::DuplicateTable: ERROR: relation "participants__1"` error if we
don't use a transaction.
Back in commit 383909e16, we said:
> Even if this class looks very simple now, we're trying a few things
> related to these stats. Having a class for it makes future changes
> easier and, if there weren't any future changes, at least it makes
> current experiments easier.
Since there haven't been any changes in the last 5 years and we've found
cases where using the GeozoneStats class results in a slightly worse
performance, we're removing this class. The code is now a bit easier to
read, and is consistent with the way we calculate participants by age.
This reverts commit 383909e16.
Using `>` and `<` for dates is a bit confusing because it usually
requires mentally translating `<` into "before" and `>` into "after",
meaning it takes a few seconds to check which is the starting date and
which is the ending date. When using a range, it's easier to see which
is which.
Note that we were using `>` and `<`, but now we're using the equivalent
of `>=` and `<=`. This makes sense IMHO, since the beginning of the year
and the end of the year should also be included in this case.
We were calculating the age stats based on the age of the users who
participated... at the moment where we were calculating the stats. That
means that, if 20 years ago, 1000 people who were 16 years old
participated, they would be shown as having 36 years in the stats.
Instead, we want to show the stats at the time when the process took
place, so we're implementing a `participation_date` method.
Note that, for polls, we could actually use the `age` column in the
`poll_voters` table. However, doing so would be harder, would only work
for polls but not for budgets, and it wouldn't be statistically very
relevant, since the stats are shown by age groups, and only a small
percentage of people would change their age group (and only to the
nearest one) between the time they participate and the time the process
ends.
We might use the `poll_voters` table in the future, though, since we
have a similar issue with geozones and genders, and using the
information in `poll_voters` would solve it as well (only for polls,
though).
Also note that we're using the `ends_at` dates because some people but
be too young to vote when a process starts but old enough to vote when
the process ends.
Finally, note that we might need to change the way we calculate the
participation date for a budget, since some budgets might not enabled
every phase. Not sure how stats work in that scenario (even before these
changes).
This was added in commit 02f19aa4b, before we started tracking events.
I don't think we ever used it; in any case, we now use the `Ahoy::Chart`
class to deal with the stats Ahoy used to generate.
This page isn't linked from anywhere and most Consul Democracy
installations don't even know it exists, so it's useless for most
people.
If we ever bring it back, we should at least add a link pointing to this
page.
Now that we've moved the logic to generate the events data to a model,
and we've got access to the model in the component rendering the chart,
we can render the data inside the chart instead of doing an extra AJAX
request to get the same data.
Originally, this was problaby done this way so the page wouldn't take
several seconds to load while preparing the data for the chart when
there are thousands of dates being displayed. With an AJAX call, the
page would load as fast as usual, and then the chart would render after
a few seconds. However, we can have an even better performance
improvement in this scenario if we use a Set instead of an Array. The
method `Array#include?`, which we were calling for every date in the
data, is much slower that `Set#merge`. So now both the page and the
chart load as fast as expected.
We could also use something like:
```
def add
(...)
shared_keys.push(*collection.keys)
end
def build
(...)
shared_keys.uniq.each do |k|
(...)
end
def shared_keys
@shared_keys ||= []
end
```
Or other approaches to avoid using `Array#include?`. The performance
would be similar to the one we get when using `Set`. We're using a `Set`
because it makes more obvious that `shared_keys` is supposed to contain
unique elements.
We've had some tests failing in the past due to these AJAX requests
being triggered automatically during the tests and no expectations
checking the requests have finished, so now we're reducing the amount of
flaky tests.
We were always displaying the event names in English.
Note we're changing the `user_supported_budgets` key because it didn't
make much sense; the investments are supported, and not the budgets.
We're also adding "created" to most of the event names in order to make
the texts more explicit, since not all the events refer to created data.
Note we're delegating the `t` method because i18n-tasks doesn't detect
code like `ApplicationController.helpers.t` and so reports we aren't
using the `admin.stats.graph` translations.
We were tracking some events with Ahoy, but in an inconsistent way. For
example, we were tracking when a debate was created, but (probably
accidentally) we were only tracking proposals when they were created
from the management section. For budget investments and their supports,
we weren't using Ahoy events but checking their database tables instead.
And we were only using ahoy events for the charts; for the other stats,
we were using the real data.
While we could actually fix these issues and start tracking events
correctly, existing production data would remain broken because we
didn't track a certain event when it happened. And, besides, why should
we bother, for instance, to track when a debate is created, when we can
instead access that information in the debates table?
There are probably some features related to tracking an event and their
visits, but we weren't using them, and we were storing more user data
than we needed to.
So we're removing the track events, allowing us to simplify the code and
make it more consistent. We aren't removing the `ahoy_events` table in
case existing Consul Democracy installations use it, but we'll remove it
after releasing version 2.2.0 and adding a warning in the release notes.
This change fixes the proposal created chart, since we were only
tracking proposals created in the management section, and opens the
possibility to add more charts in the future using data we didn't track
with Ahoy.
Also note the "Level 2 user Graph" test wasn't testing the graph, so
we're changing it in order to test it. We're also moving it next to the
other graphs test and, since we were tracking the event when we were
confirming the phone, we're renaming to "Level 3 users".
Finally, note that, since we were tracking events when something was
created, we're including the `with_hidden` scope. This is also
consistent with the other stats shown in the admin section as well as
the public stats.
The only way to use campaigns is to manually insert them in the
database, which IMHO isn't very practical.
We're going to change every piece of code that generates an Ahoy event
and, in this case, the easiest way to change the Campaing model so it
doesn't use Ahoy events is to simply remove it.
Note we're keeping the database tables until we release a new version,
just in case some Consul Democracy installations are using them. We'll
inform in the release notes that we'll remove the campaigns table after
the release, so existing installations using the `campaigns` table can
move the data somewhere else before we remove the table.
We moved this file to `app/lib/` in commit cb477149c so it would be in
the autoload_paths. However, this class is loaded by ActiveStorage, with
the following method:
```
def resolve(class_name)
require "active_storage/service/#{class_name.to_s.underscore}_service"
ActiveStorage::Service.const_get(:"#{class_name.camelize}Service")
rescue LoadError
raise "Missing service adapter for #{class_name.inspect}"
end
``
So this file needs to be in the $LOAD_PATH, or else ActiveStorage won't
be able to load it when we disable the `add_autoload_paths_to_load_path`
option, which is the default in Rails 7.1 [1].
Moving it to the `lib` folder solves the issue; as mentioned in the
guide to upgrade to Rails 7.1 [2]:
> The lib directory is not affected by this flag, it is added to
> $LOAD_PATH always.
However, we were also referencing this class in the `Tenant` model,
meaning we needed to autoload it as well somehow. So, instead of
directly referencing this class, we're using `respond_to?` in the Tenant
model.
We're changing the test so it fails when the code calls
`is_a?(ActiveStorage::Service::TenantDiskService)`. We need to change
the active storage configurations in the test because, otherwise, the
moment `ActiveStorage::Blob` is loaded, the `TenantDiskService` class is
also loaded, meaning the test will pass when using `is_a?`.
Note that, since this class isn't in the autoload paths anymore, we need
to add a `require` in the tests. We could add an initializer to require
it; we're not doing it in order to be consistent with what ActiveStorage
does: it only loads the service that's going to be used in the current
Rails environment. If somebody changed their production environment in
order to use (for example), S3, and we added an initializer to require
the TenantDiskService, we would still load the TenantDiskService even if
it isn't going to be used.
[1] https://guides.rubyonrails.org/v7.1/configuring.html#config-add-autoload-paths-to-load-path
[2] https://guides.rubyonrails.org/v7.1/upgrading_ruby_on_rails.html#autoloaded-paths-are-no-longer-in-$load-path
This way we simplify the code a little bit and we create a method unique
to the `TenantDiskService` class, which can be used to check whether
we're using this class without using `is_a?` or similar.
The config.file_watcher option still exists but it's no longer included
in the default environtment file. Since we don't use it, we're removing
it.
The config.assets.assets.debug option is no longer true by default [1],
so it isn't included anymore.
The config.active_support.deprecation option is now omitted on
production in favor of config.active_support.report_deprecations, which
is false by default. I think it's OK to keep it this way, since we check
deprecations in the development and test environments but never on
production environments.
As mentioned in the Rails upgrade guide, sprockets-rails is no longer a
rails dependency and we need to explicitly include it in our Gemfile.
The behavior of queries trying to find an invalid enum value has changed
[2], so we're updating the tests accordingly.
The `favicon_link_tag` method has removed the deprecated `shortcut`
link type [3], so we're updating the tests accordingly.
The method `raw_filter` in ActiveSupport callbacks has been renamed to
`filter` [4], so we're updating the code accordingly.
[1] https://github.com/rails/rails/commit/adec7e7ba87e3
[2] https://github.com/rails/rails/commit/b68f0954
[3] Pull request 43850 in https://github.com/rails/rails
[4] Pull request 41598 in https://github.com/rails/rails
While using `require_dependency` to load original Consul Democracy code
from custom code works with the classic autoloader, this method was
never meant to be used this way. With zeitwerk, the code (apparently)
works in the test, development and production environments, but there's
one important gotcha: changing any `.rb` file in development will
require restarting the rails server since the application will crash
when reloading.
Quoting zeitwerk's author Xavier Noria, whom we've contacted while
looking for a solution:
> With the classic autoloader, when the Setting constant is autoloaded,
> the autoloader searched the autoload paths, found setting.rb in
> app/models/custom, and loaded it. With zeitwerk, the autoloader scans
> the folders in order and defines an autoload (Module#autoload) in
> Object so Ruby autoloads Setting with app/models/custom/settings.rb.
> Later, when app/models/setting.rb is found, it's ignored since there's
> already an autoload for Setting.
>
> That means the first file is managed by the autoloaders, while the
> second is not.
>
> So require_dependency worked, but it was pure luck, since the purpose
> of require_dependency is forcing the load of files managed by the
> autoloaders and, as we've seen, app/models/settings.rb isn't one of
> them.
>
> With your current pattern for custom files, the best solution is using
> Kernel#load.
So we're using `load` instead of `require_dependency`. Note that, with
`load`, we need to add the `.rb` extension to the required file, and we
no longer have to convert the Pathname to a string with `to_s`.
We were getting a few errors when trying out Zeitwerk:
```
expected file lib/sms_api.rb to define constant SmsApi
expected file app/components/layout/common_html_attributes_component.rb
to define constant Layout::CommonHtmlAttributesComponent
```
In these cases, we aren't using an inflection because we also define the
`Verification::SmsController` and a few migrations containing `Html` in
their class name, and none of them would work if we defined the
inflection.
We were also getting an error regarding classes containing WYSIWYG in
its name:
```
NameError: uninitialized constant WYSIWYGSanitizer
Did you mean? WysiwygSanitizer
```
In this case, adding the acronym is easier, since we never use "Wysiwyg"
in the code but we use "WYSIWYG" in many places.
This monkey-patch doesn't seem to be working with Zeitwerk, and we were
only using it in one place, so the easiest way to solve the problem is
to remove it.
Note that, in the process, we're changing the operation so `* 100`
appears before the division, so it's consistent with other places where
we do similar things (like the `supports_percentage` method in the
proposals helper).
We were using generic names like `args` and `options` which don't really
add anything to `*` or `**` because Ruby required us to.
That's no longer the case in Ruby 3.2, so we can simplify the code a
bit.
Note that the `budget` parameter was added to the `delete_path` method
so it works in the tests; on production, it worked because this
component is only rendered on pages which already have the `budget`
parameter.
Co-authored-by: Javi Martín <javim@elretirao.net>
Since we've changed these scopes in the previous commit because of the
new rubocop version, we're also making them consistent with the other
scopes in the same file.
Note that we keep :created_at order as complement to new :order field.
We do this so that current installations will not notice any change in the
sorting of their cards when upgrading, as the default "order" field will always
be 1, so it will continue to sort by the "created_at".
So now we know where to use the `where.missing` method which was
introduced in Rails 6.1.
Note this rule didn't detect all cases where the new method can be used.
Before this change, two important things depend on the format of each key,
where to render it in the administration panel and which kind of interface
to use for each setting. Following this strategy led us to a very complex
code, very difficult to maintain or modify. So, we do not want to depend
on the setting key structure anymore to decide how or where to render each
setting.
With this commit, we get rid of the key format-based rules. Now we render
each setting explicitly passing to it the type and the tab where it belongs.
This way it won't be possible to browse all user URLs by just going to
/users/1, /users/2, /users/3, ... and collect usernames, which might not
be desirable in some cases.
Note we could use the username as a URL parameter and just find the user
with `@user = User.find_by!(id: id, username: username)`, but since
usernames might contain strange characters, this might lead to
strange/ugly URLs.
Finally, note we're using `username.to_s` in order to cover the case
where the username is `nil` (as is the case with erased users).