nairobi

Author	SHA1	Message	Date
Javi Martín	048bdb2e9e	Add and apply Rails/OrderArguments rubocop rule This rule was introduced in rubocop-rails 2.33. We were following it most of the time.	2025-11-05 11:51:23 +01:00
Julian Herrero	16f844595d	Don't use the cache in admin budget stats In commit `e51e03446`, we started using the same code to show stats in the public area and in the admin area. However, in doing so we introduced a bug, since stats in the public area are only shown after a certain part of the process has finished, meaning the stats appearing on the page never change (in theory), so it's perfectly fine to cache them. However, in the admin area stats can be accessed while the process is still ongoing, so caching the stats will lead to the wrong results being displayed. We've thought about expiring the cache when new supports or ballot lines are added; however, that means the methods calculating the stats for the supporting phase would expire when supports are added/removed but the methods calculating the stats for the voting phase would expire when ballot lines are added/removed. It gets even more complex because the `headings` method calculates stats for both the supporting and the voting phases. So, since loading stats in the admin section is fast even without the cache because they only load very basic statistics, we're taking the simple approach of disabling the cache in this case, so everything works the same way it did before commit `e51e03446`. Co-authored-by: Javi Martín <javim@elretirao.net>	2024-05-20 16:19:41 +02:00
Javi Martín	a4461a1a56	Expire the stats cache once per day When we first started caching the stats, generating them was a process that took several minutes, so we never expired the cache. However, there have been cases where we run into issues where the stats shown on the screen were outdated. That's why we introduced a task to manually expire the cache. But now, generating the stats only takes a few seconds, so we can automatically expire them every day, remove all the logic needed to manually expire them, and get rid of most of the issues related to the cache being outdated. We're expiring them every day because it's the same day we were doing in public stats (which we removed in commit `631b48f58`), only we're using `expires_at:` to set the expiration time, in order to simplify the code. Note that, in the test, we're using `travel_to(time)` so the test passes even when it starts an instant before midnight. We aren't using `:with_frozen_time` because, in similar cases (although not in this case, but I'm not sure whether that's intentional), `travel_to` shows this error: > Calling `travel_to` with a block, when we have previously already made > a call to `travel_to`, can lead to confusing time stubbing.	2024-05-17 20:11:16 +02:00
Javi Martín	a5646fcdb3	Remove Cron job to generate stats Since now generating stats (assuming the results aren't in the cache) only takes a few seconds even when there are a hundred thousand participants, as opposed to the several minutes it took to generate them when we introduced the Cron job, we can simply generate the stats during the first request to the stats page. Note that, in order to avoid creating a temporary table when the stats are cached, we're making sure we only create this table when we need to. Otherwise, we could spend up to 1 second on every request to the stats page creating a table that isn't going to be used. Also note we're using an instance variable to check whether we're creating a table; I tried to use `table_exists?`, but it didn't work. I wonder whether `table_exists?` doesn't detect temporary tables.	2024-05-17 16:08:08 +02:00
Javi Martín	653848fc4e	Extract method to get the stats key in stats This way we remove a bit of duplication and it'll be easier to change the `stats_cache` method.	2024-05-17 16:08:08 +02:00
Javi Martín	62cd4c8d7b	Add indices to stats temporary tables Since we're doing many queries to get stats for each age group and each geozone, testing shows these indices make stats calculation about 25% faster on processes with 100,000 participants.	2024-05-17 16:08:08 +02:00
Javi Martín	80dcbfc23c	Improve performance generating stats Debugging shows that the bottleneck in the stats calculation is the number of times we're querying the users table using the same array of IDs in the `where` condition but each time combined with other conditions. So we're inserting the results of querying the users table with the array of IDs in a temporary table and using this temporary table for the other calculations. When querying this temporary table, there's no need to filter for IDs anymore. For budget stats, the `generate` method is now about 10-20 times faster for a budget with 20,000 participants. For budgets with only a few dozen participants, there's no significant difference in performance. I thought about modifying the `participants` method and use the temporary table there. The problem, however, is that in this case it isn't clear when to drop the temporary table, and we could end up with thousands of temporary tables in the database if we don't do it right. Creating and dropping the temporary table in the same transaction, on the other hand, guarantees that won't be the case. Note there's no risk of duplicate tables since they're created and dropped inside a transaction, so we're always using the same table name for the same resource. We're adding a test that fails with a `PG::DuplicateTable: ERROR: relation "participants__1"` error if we don't use a transaction.	2024-05-17 16:08:04 +02:00
Javi Martín	6f0c27c0fb	Remove unused code in statisticable concern This code isn't used since commit `e3063cd24f`.	2024-05-17 16:07:26 +02:00
Javi Martín	bcc9fd97f5	Revert "Extract class to manage GeozoneStats" Back in commit `383909e16`, we said: > Even if this class looks very simple now, we're trying a few things > related to these stats. Having a class for it makes future changes > easier and, if there weren't any future changes, at least it makes > current experiments easier. Since there haven't been any changes in the last 5 years and we've found cases where using the GeozoneStats class results in a slightly worse performance, we're removing this class. The code is now a bit easier to read, and is consistent with the way we calculate participants by age. This reverts commit `383909e16`.	2024-05-17 16:07:26 +02:00
Javi Martín	1d85a63e7c	Calculate age stats based on the participation date We were calculating the age stats based on the age of the users who participated... at the moment where we were calculating the stats. That means that, if 20 years ago, 1000 people who were 16 years old participated, they would be shown as having 36 years in the stats. Instead, we want to show the stats at the time when the process took place, so we're implementing a `participation_date` method. Note that, for polls, we could actually use the `age` column in the `poll_voters` table. However, doing so would be harder, would only work for polls but not for budgets, and it wouldn't be statistically very relevant, since the stats are shown by age groups, and only a small percentage of people would change their age group (and only to the nearest one) between the time they participate and the time the process ends. We might use the `poll_voters` table in the future, though, since we have a similar issue with geozones and genders, and using the information in `poll_voters` would solve it as well (only for polls, though). Also note that we're using the `ends_at` dates because some people but be too young to vote when a process starts but old enough to vote when the process ends. Finally, note that we might need to change the way we calculate the participation date for a budget, since some budgets might not enabled every phase. Not sure how stats work in that scenario (even before these changes).	2024-05-13 15:42:37 +02:00
Javi Martín	0aee568977	Add and apply Rails/RedundantActiveRecordAllMethod This rule was introduced in rubocop-rails 2.21.0.	2023-11-20 14:22:12 +01:00
Javi Martín	97aca0cf95	Add and apply rules for multi-line arrays We were already applying these rules in most cases. Note we aren't enabling the `MultilineArrayLineBreaks` rule because we've got places with many elements whire it isn't clear whether having one element per line would make the code more readable.	2023-08-18 14:56:16 +02:00
Javi Martín	4a851c0d82	Add and apply Style/MapToHash rubocop rule This rule was added in Rubocop 1.24.0. Applying it slightly simplifies the code.	2022-10-19 14:26:49 +02:00
Javi Martín	8b5cca746c	Apply rubocop rules to freeze constants Added by popular demand among our team members.	2019-10-26 13:21:36 +02:00
Javi Martín	47b2c42a1d	Apply `IndentationConsistency` rubocop rule	2019-09-10 20:02:15 +02:00
Javi Martín	45a3d8daf0	Add option to enable advanced stats	2019-05-22 11:50:03 +02:00
Javi Martín	483ebffb47	Fix statisticable concern definition Methods defined inside "included" cannot be called using `super` from a class including the module.	2019-05-21 13:50:19 +02:00
Javi Martín	aa0e813970	Use ruby cache for stats helper methods These methods are only used while stats are being generated; once stats are generated, they aren't used anymore. So there's no need to store them using the Dalli cache. Furthermore, there are polls (and even budgets) with hundreds of thousands of participants. Calculating stats for them takes a very long time because we can't store all those records in the Dalli cache. However, since these records aren't used once the stats are generated, we can store them in an instance variable while we generate the stats, speeding up the process.	2019-05-21 13:50:19 +02:00
Javi Martín	7c0e499eee	Add table to store stats versions We need a way to manually expire the cache for a budget or poll without expiring the cache of every budget or poll. Using the `updated_at` column would be dangerous because most of the times we update a budget or a poll, we don't need to regenerate their stats. We've considered adding a `stats_updated_at` column to each of these tables. However, in that case we would also need to add a similar column in the future to every process type whose stats we want to generate.	2019-05-21 13:50:19 +02:00
Javi Martín	9335c51cfc	Include hidden users in stats If users participated and were hidden after participating, we should still count them in the participants stats. In the tests, we set users' `hidden_at` attribute before they vote. Although in real life they would vote first and then they would be hidden, I've written the tests like this for the sake of simplicity.	2019-05-21 13:50:19 +02:00
Javi Martín	1f4707facd	Extract method to get stats participant_ids This way we can share the `participants` method between budget and poll stats.	2019-05-21 13:50:19 +02:00
Javi Martín	ae4cd06c24	Include no geozone in no demographic data	2019-05-21 13:50:18 +02:00
Javi Martín	383909e16c	Extract class to manage GeozoneStats Even if this class looks very simple now, we're trying a few things related to these stats. Having a class for it makes future changes easier and, if there weren't any future changes, at least it makes current experiments easier. Note we keep the method `participants_by_geozone` to return a hash because we're caching the stats and storing GeozoneStats objects would need a lot more memory and we would get an error.	2019-05-21 13:50:18 +02:00
Javi Martín	558070d530	Remove geozone participation percentage We currently don't store geozone population.	2019-05-21 13:50:18 +02:00
Javi Martín	793bfed372	Display only existing stats So if we don't have information regarding gender, age or geozone, stats regarding those topics will not be shown. Note we're using `spec/models/statisticable_spec.rb` because having the same file in `spec/models/concerns` caused the tests to be executed twice. Also note the implementation behind the `gender?`, `age?` and `geozone?` methods is a bit primitive. We might need to make it more robust in the future.	2019-05-21 13:50:18 +02:00
Javi Martín	76c7827cf4	Use stats objects instead of hashes It will make it far easier to call other methods on the stats object, and we're already caching the methods. We had to remove the view fragment caching because the stats object isn't as easy to cache. The good thing about it is the view will automatically be updated when we change logic regarding which stats to show, and the methods taking long to execute are cached in the model.	2019-05-21 13:50:18 +02:00
Javi Martín	e3063cd24f	Remove complex poll stats For now we think showing them would be showing too much data and it would be a bit confusing. I've been tempted to just remove the view and keep the methods in the model in case they're used by other institutions using CONSUL. However, it's probably better to wait until we're asked to re-implement them, and in the meantime we don't maintain code nobody uses. The code wasn't that great to start with (I know it because I wrote it).	2019-05-21 13:50:17 +02:00
Javi Martín	8f69113233	Add poll stats by geozone and channel	2019-05-21 13:50:17 +02:00
Javi Martín	202fb44008	Add poll stats by age and channel	2019-05-21 13:50:17 +02:00
Javi Martín	7b408a4b88	Add poll stats by gender and channel	2019-05-21 13:50:17 +02:00
Javi Martín	90fe746d27	Add geozone stats to polls	2019-05-21 13:50:16 +02:00
Javi Martín	a552645e7f	Add tests to poll stats While we already had "one test to rule all stats", testing each method individually makes reading, adding and changing tests easier. Note we need to make all methods being tested public. We could also test them using methods like `stats.generate[:total_valid_votes]` instead of `stats.total_valid_votes`, but then the tests would be more difficult to read.	2019-05-21 13:50:16 +02:00
Javi Martín	4d520a3a47	Rename `age_groups` method The name was confusing because it seemed to return a list of age groups.	2019-05-21 13:50:16 +02:00
Javi Martín	88daaee9fe	Simplify code	2019-05-21 13:50:16 +02:00
Javi Martín	9a01ff5323	Refactor age groups method We try to make the method return data which is easier to handle in the view.	2019-05-21 13:50:15 +02:00
Javi Martín	c1b76a7ebf	Simplify age groups method	2019-05-21 13:50:15 +02:00
Javi Martín	c2489e3209	Increase number of age groups We would now like to differenciate between 70-year-old people and 90-year-old people.	2019-05-21 13:50:15 +02:00
Javi Martín	e4a032ee68	Split common and specific stats methods	2019-05-21 13:49:42 +02:00
Javi Martín	5d2f5d1d81	Move gender and age methods to a common concern These are generic methods which only depend on the participants.	2019-05-21 13:49:42 +02:00
Javi Martín	04c920c27d	Simplify calculate percentage method	2019-05-21 13:49:42 +02:00
Javi Martín	ccaa2e1a77	Remove duplication to calculate percentage	2019-05-21 13:49:42 +02:00
Javi Martín	188278296c	Simplify the way we cache stats	2019-05-21 13:48:54 +02:00
Javi Martín	d627215af4	Use symbols for method names	2019-05-21 13:27:03 +02:00
Javi Martín	313ffb589b	Share method to generate stats	2019-05-21 13:27:03 +02:00
Javi Martín	62a97f9003	Add a common concern for budget and poll stats	2019-05-21 13:27:03 +02:00

45 Commits