I’ve described the actual workaround in the very bottom. If you are
As of 2019-03-22 that I’m writing this entry now, I can still see this bug. Although I’ve opened an Issue and a Pull Request for it Ansible team has responded nothing now. So it may not be fixed sooner…
Besides, I have a reproducer for it, you can check it out here: https://github.com/issei-m/ansible-s3-sync-bug-reproducer
What Ansible is
I believe a reader of this entry doesn’t need to know what it is but I try to explain briefly. It’s an infrastructure management tool where you can define the desired server state (e.g. “The foo
Actually, recently I don’t use it much because of the latest more modern execution environments such as Container, Serverless but I do sometimes even now.
s3_sync module
Literally, an Ansible module where you can upload files to your S3 bucket.
It can not only upload files, but it can difference check for files between local and remote. So it’s effective because you can avoid futile upload for the same file.
Following is for example definition:
tasks:
- name: copy file to s3
s3_sync:
bucket: "{{ bucket_name }}"
file_root: "./s3"
key_prefix: "" # Default
mode: push
file_change_strategy: date_size # Default
In this definition, you can sync the all local files in ./s3
directory with a certain S3 bucket.
And as I explained above, file_change_strategy
is an option where you can specify a strategy (logic) of how to determine whether the file has been changed.
date_size
is a default strategy. The document claims: “it will upload if file sizes don’t match or if local file modified date is newer than s3’s version”. Surely it’s an advantage at performance perspective because no files will be upload if no changes exist.
Unfortunately, however, As the title implies the bug I want to share with you today is living in exactly date_size
Bug details
Let’s see what will happen along the reproducer I’ve mentioned above.
Clone this repository and execute the following command, then the local file s3/target.txt
will be uploaded to specified S3 bucket as the /target.txt
$ ansible-playbook playbook.yml --extra-vars 'bucket_name=your-bucket-name'
(!) Replace your-bucket-name
with one your own in mind.
As you can see the result, you can see the file has been uploaded successfully.
TASK [copy file to s3] *********************************************************************************************************************************************************************************
changed: [localhost]
The content of the file Test!!
$ aws s3 cp s3://your-bucket-name/target.txt -
Test!!
Looks good.
Okay, let’s also see what will happen if we execute the same command again after changed the content of Yeah!!
$ echo 'Yeah!!' > s3/target.txt
$ ansible-playbook playbook.yml --extra-vars 'bucket_name=your-bucket-name'
...
TASK [copy file to s3] *********************************************************************************************************************************************************************************
ok: [localhost]
...
Reported nothing has changed against we expected. Actually, you can confirm that nothing is changed in S3 side:
$ aws s3 cp s3://your-bucket-name/target.txt -
Test!!
However, actually, you’d be able to make a change with it if you updated the content to Yeah!!!
which the 1 byte greater than original. (I omit the execution result, though)
Apparently, the file isn’t uploaded if the file size matches between local and remote.
The situation where we update the file as the same size does often exist in particularly some config file (e.g. tweaking some numeric), so it’s inconvenient. Besides, the document claims you can upload the file when the local file is newer than remote regardless of the same file size. There must be something wrong with it.
Looking at the implementation
As of now, the default branch devel
which contains the problem looks like:
if local_modified_epoch <= remote_modified_epoch or local_size == remote_size:
entry['skip_flag'] = True
It says: “If the last-modified-date of the S3 side is newer or both files have the same size, then upload will be skipped”
That’s definitely a bug.
If the default value of entry['skip_flag']
False
if local_modified_epoch <= remote_modified_epoch and local_size == remote_size:
entry['skip_flag'] = True
Or
entry['skip_flag'] = not (local_size != remote_size or local_modified_epoch > remote_modified_epoch)
I guess the cause of this bug is because the desired behavior is described using a or
which is also used in documentand
is right)
(And in my opinion, it’s better to avoid using a negative name for a boolean value)
Workaround
To be honest, that the above-mentioned issue is resolved is best but you can now use a workaround, that is file_change_strategy: checksum
In this strategy, the module can check file sameness between local and remote by comparing an md5 for the content. And fortunately as of now, it looks like no bugs in there, so it’d work perfectly.
You’re interested in the performance side? That’s not a big deal.
Because the logic is using S3 object’s ETag metadata which represents the md5 for the content, so you don’t need to worry about remote file size.
In addition, the local is as well because not so many cases where you need to deal with the too huge file which can take much time to calculate the md5 exist.
So you can use this strategy with no worry.
Conclusion
- As of 2019-03-22, the strategy
file_change_strategy: date_size
is buggy where you cannot update the file which has the same size as the remote - You can use
checksum
strategy instead - You should avoid using a negative name for a boolean value (IMO)