In order to remove metadata from PDF documents we will use the
exiftool_vendored gem.
The following line:
Exiftool.new(attachment_path, "-overwrite_original -all:all=")
Overwrites the original file with another file without metadata.
So far this is the best solution we have found to perform this
metadata deletion.
When using Exiftool an exception is thrown, so we added a rescue
to handle it. Here is a task created where this problem is discussed
in issue 28 in the https://github.com/exiftool-rb/exiftool.rb/ repository.
We'll wait to see if this will be fixed in future versions.
This fixes a few issues we've had for years.
First, when attaching an image and then sending a form with validation
errors, the image preview would not be rendered when the form was
displayed once again. Now it's rendered as expected.
Second, when attaching an image, removing it, and attaching a new
one, browsers were displaying the image preview of the first one. That's
because Paperclip generated the same URL from both files (as they both
had the same hash data and prefix). Browsers usually cache images and
render the cached image when getting the same URL.
Since now we're storing each image in a different Blob, the images have
different URLs and so the preview of the second one is correctly
displayed.
Finally, when users downloaded a document, they were getting files with
a very long hexadecimal hash as filename. Now they get the original
filename.
In order to migrate existing files from Paperclip to ActiveStorage, we
need Paperclip to find out the files associated to existing database
records. So we can't simply replace Paperclip with ActiveStorage.
That's why it's usually recommended [1] to first run the migration and
then replace Paperclip with ActiveStorage using two consecutive
deployments.
However, in our case we can't rely on two consecutive deployments
because we have to make an easy process so existing CONSUL installations
don't run into any issues. We can't just release version 1.4.0 and 1.5.0
and day and ask everyone to upgrade twice on the same day.
Instead, we're following a different plan:
* We're going to provide a Rake task (which will require Paperclip) to
migrate existing files
* We still use Paperclip to generate link and image tags
* New files are handled using both Paperclip and ActiveStorage; that
way, when we make the switch, we won't have to migrate them, and in
the meantime they'll be accessible thanks to Paperclip
* After we make the switch, we'll update the `name` column in the active
storage attachments tables in order to remove the `storage_` prefix
Regarding our handling of new files, the exception are cached
attachments. Since those attachments are temporary files used while
submitting a form and we have to delete them afterwards, we're only
handling them with Paperclip. We'll handle these ones in version 1.5.0.
Note the task creating the dev seeds was failing after these changes
with an `ActiveStorage::IntegrityError` exception because we were
opening some files without closing them. If the same file was attached
twice, it failed the second time.
We're solving it by closing the files with `File.open` and a block. Even
though we didn't get any errors, we're doing the same thing in the
`Attachable` concern because it's a good practice to close files after
we're done with them.
Also note we have to change the CKEditor Active Storage code so it's
compatible with Paperclip. In this case, I haven't been able to write a
test to confirm the attachment exists; I was getting the same
`ActiveStorage::IntegrityError` mentioned above.
Finally, we're updating the site customization image controller to use
`update` so the image and the attachment are updated within the same
transaction. This is also what we do in most controllers.
[1] https://www.youtube.com/watch?v=tZ_WNUytO9o
We were using helper methods inside the model; we might as well include
them in the model and use them from anywhere else.
Note we're using a different logic for images and documents methods.
That's because for images the logic was defined in the helper methods,
but for documents the logic is defined in the Documentable concern. In
the past, different documentable classes allowed different content
types, while imageable classes have always allowed the same content
types.
I'm not sure which method is better; for now, I'm leaving it the way it
was (except for the fact that we're removing the helper methods).
The same way it's done for images.
We were converting the number of megabytes to bytes and then converting
it to megabytes again. Instead, we can leave it as it is and only
convert it to bytes when necessary (only one place).
Our previous system to delete cached attachments didn't work for
documents because the `custom_hash_data` is different for files created
from a file and files created from cached attachments.
When creating a document attachment, the name of the file is taken into
account to calculate the hash. Let's say the original file name is
"logo.pdf", and the generated hash is "123456". The cached attachment
will be "123456.pdf", so the generated hash using the cached attachment
will be something different, like "28af3". So the file that will be
removed will be "28af3.pdf", and not "123456.pdf", which will still be
present.
Furthermore, there are times where users choose a file and then they
close the browser or go to a different page. In those cases, we weren't
deleting the cached attachments either.
So we're adding a rake task to delete these files once a day. This way
we can simplify the logic we were using to destroy cached attachments.
Note there's related a bug in documents: when editing a record (for
example, a proposal), if the title of the document changes, its hash
changes, and so it will be impossible to generate a link to that
document. Changing the way this hash is generated is not an option
because it would break links to existing files. We'll try to fix it when
moving to Active Storage.
We were very inconsistent regarding these rules.
Personally I prefer no empty lines around blocks, clases, etc... as
recommended by the Ruby style guide [1], and they're the default values
in rubocop, so those are the settings I'm applying.
The exception is the `private` access modifier, since we were leaving
empty lines around it most of the time. That's the default rubocop rule
as well. Personally I don't have a strong preference about this one.
[1] https://rubystyle.guide/#empty-lines-around-bodies
Metrics/LineLength: Line is too long.
RSpec/InstanceVariable: Use let instead of an instance variable.
Layout/TrailingBlankLines: Final newline missing.
Style/StringLiterals: Prefer double-quoted strings.
DEPRECATION WARNING: ActiveModel::Errors#[]= is deprecated and will be
removed in Rails 5.1. Use model.errors.add(:attachment, "content type
image/png does not match any of accepted content types pdf") instead.
(called from validate_attachment_content_type at
/home/travis/build/consul/consul/app/models/document.rb:92)