Unreliable coverage report uploads

Description

I’m seeing frequent network errors when attempting to upload a coverage report to codecov.io. Typically one or two test environments fail per CI run, and which ones fail is random.

==> Uploading
    .url https://codecov.io
    .query slug=lukeburden%2Fdjango-zengo&commit=8ab188e0d06fad32df380ba18e4a2d0b68111a57&job=579.0&flags=py34dj111&branch=master&package=py2.0.15&service=circleci&build=579.0
    Pinging Codecov...
Error: HTTPSConnectionPool(host='codecov.io', port=443): Max retries exceeded with url: /upload/v4?slug=lukeburden%2Fdjango-zengo&commit=8ab188e0d06fad32df380ba18e4a2d0b68111a57&job=579.0&flags=py34dj111&branch=master&package=py2.0.15&service=circleci&build=579.0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fd7d7e72fd0>: Failed to establish a new connection: [Errno 110] Connection timed out',))

https://circleci.com/gh/lukeburden/django-zengo/579

Repository

Steps to Reproduce

  1. Run CI for the above mentioned project

Expected behavior:
All tests should pass - they do locally, and which one fails is random, depending on the network reliability of codecov’s upload API.

Actual behavior:
1 or more environments fails per CI run.

Flakiness? [Does this happen all the time or only sometimes?]
Definitely flakey - which one fails is seemingly random.

Versions

Using python codecov 2.0.15

there’s a few issues filed on this on the python codecov uploader client.

the solution (beyond fixing the python uploader) seems to be switching to the codecov bash uploader.

1 Like

As @kapilt said, @lukeburden, please try the bash uploader. That will help us narrow down if the issue is in the python form, or something lower down with the coverage report itself.

Thanks, I’ll try that next time it hits a flaky patch.

1 Like

A follow-up on this - the bash method does prove to be more reliable, simply because it has a retry mechanism:

This is a nice work-around, but it seems to me there are still some reliability issues with the receiving endpoint that need to be addressed.

We’ve dug into this issue a bit more and think we have a handle on it. The fix is in testing, I’ll let you know when it’s deployed.

In case it is helpful, I have custom uploaders that use curl and the Powershell Net.WebClient and run into this issue on at least one of the CI matrix environments for every build.

Out of say 30 uploads, usually at least one or two fail. This is after I added a bunch of code to retry the upload up to 6 times with a 5 second timeout in between.

I’m getting this same issue. Started happening about 12 hrs ago.
Trying the npm codecov package gives me the wrong paths for some reason.

@tayyab-anwar can you provide a build URL?

Where do I find that if the upload failed?

Sorry, I meant a build to your CI/CD @tayyab-anwar

I use Google Cloud Build which is not in the supported CI list.
I install the package using pip and then call it in the root directory of my project. I don’t think the build would help here.
This is the GCB build ID 31928549-ba45-40eb-a538-b94466680101.

I’m still not sure if I understand you correctly, however.

Hi @tayyab-anwar, fair enough. Could you show the Codecov output then if possible and a commit SHA?

bash uploader is insecure. with python i can use poetry.lock file to ensure md5 checksums. with the bash uploader im asking for a supply chain attack

@earonesty this appears to be off-topic. The bash uploader has been deprecated as of early this year. The current version of the uploader has explicit instructions on how to do sha* checksums, and the wrappers all automatically do integrity checks.