Project

General

Profile

Feature #11109

Bug #7161: Support more than 24 HTTP mirrors

Have DAVE build the ISO URL using our mirrors pool configuration

Added by intrigeri almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Installation
Target version:
Start date:
01/09/2015
Due date:
04/15/2016
% Done:

100%

QA Check:
Feature Branch:
451f/download-and-verify-extension:master
Type of work:
Code
Blueprint:
Starter:
Affected tool:
ISO Verification Extension

Description

This is the implementation side of #10284, and we hade some good initial discussion about why and how to do it: https://mailman.boum.org/pipermail/tails-dev/2016-February/010262.html. See in particular Giorgio's reply, and the bits about reloading the JSON configuration every time the UI page is reloaded (we might need to have Cache-Control and/or ETag headers tweaked server-side to limit the load).


Subtasks

Feature #8641: Have the mirror pool dispatcher script deployed on tails.b.oResolved


Related issues

Blocked by Tails - Feature #10284: Design how the IA+DAVE will support the new mirror pool system Resolved 09/26/2015
Blocks Tails - Feature #11284: Update the DNS round-robin mirror pool to only include a few fast and reliable HTTP mirrors Resolved 03/25/2016
Blocks Tails - Bug #11300: Have stable version of DAVE out of development channel Resolved 04/01/2016
Blocks Tails - Feature #8643: Clean up the remainers of the old mirror pool setup Resolved 01/09/2015 04/15/2016

History

#1 Updated by intrigeri almost 3 years ago

  • Blocked by Feature #10284: Design how the IA+DAVE will support the new mirror pool system added

#2 Updated by intrigeri over 2 years ago

  • Description updated (diff)

#3 Updated by intrigeri over 2 years ago

  • Blocks Feature #11284: Update the DNS round-robin mirror pool to only include a few fast and reliable HTTP mirrors added

#4 Updated by intrigeri over 2 years ago

  • Status changed from Confirmed to In Progress
  • Feature Branch set to 451f/download-and-verify-extension:master

#5 Updated by intrigeri over 2 years ago

  • Blocked by Feature #8635: Make each mirror provide a unique virtualhost name added

#6 Updated by intrigeri over 2 years ago

The URL used for a new download is now built using our mirror pool (example) config, using our mirror-dispatcher.js library.

Next WIP step: support resuming an existing download (that may have been started using another mirror than the one we would like to use this time).

And then we'll want ma1 and co6 to look at our proposed changes.

Note that this can't be published before enough mirrors have a unique hostname and we've published the resulting mirrors.json.

#7 Updated by intrigeri over 2 years ago

  • Blocked by deleted (Feature #8635: Make each mirror provide a unique virtualhost name)

#8 Updated by intrigeri over 2 years ago

  • Blocked by Feature #11384: Record the current state of our mirror pool in JSON added

#9 Updated by intrigeri over 2 years ago

  • Target version changed from Tails_2.3 to Tails_2.4

2.3 is out => postponing. Bumping priority (via the subtask) since this is one of the last remaining things that blocks actual deployment.

#10 Updated by u over 2 years ago

I've spent some time figuring out how this works exactly. Here are my observations, I might lack complete understanding about how the cache works in DAVE (I'll send an email to ma1 to ask):

We actually modify the blob.url with a url_prefix from our mirrors.

Then, if the user closes the browser or loses the connection, she can resume the download using DAVE.
Our code does not remodify blob.url a second time. blob.url's value is cached.

From my current understanding that means that we actually do not have to take care of reverifying that the download matches one of our mirrors. Unless.. unless one of the mirrors disappears. I need to build a test case for this to see what happens then.

I'd be glad about your input @intrigeri and @ma1, but i'll take some more time to re-verify these observations again. myself in the meantime.

#11 Updated by intrigeri over 2 years ago

From my current understanding that means that we actually do not have to take care of reverifying that the download matches one of our mirrors. Unless.. unless one of the mirrors disappears.

Yes, IIRC this was exactly our analysis the last time we worked on this. Mirrors may be disabled at any time, so indeed we have to handle this case.

#12 Updated by intrigeri over 2 years ago

  • Blocked by deleted (Feature #11384: Record the current state of our mirror pool in JSON)

#13 Updated by intrigeri over 2 years ago

  • Tracker changed from Bug to Feature

#14 Updated by intrigeri over 2 years ago

IIRC, a few weeks ago we tentatively set an internal & informal deadline to the end of May, for having a working PoC that handles interrupted downloads. So I guess we should now update this timeline. Of course, it's fine if this requires a couple more weeks (except that I won't be available anymore in case help is needed).

#15 Updated by anonym over 2 years ago

  • Target version changed from Tails_2.4 to Tails_2.5

#16 Updated by sajolida over 2 years ago

  • Blocks Bug #11300: Have stable version of DAVE out of development channel added

#17 Updated by u over 2 years ago

Hi,

current status is git commit 30f5e82b926cf2f5088eef21beb071ef6095300e:

  • 1) DAVE uses mirror pool and replaces the dl.amnesia.boum.org by a URL from our mirror list in JSON format.
  • 2) When a download is paused, and the page reloaded, instead of doing again 1) we check all current downloads from the list against our mirrors and replace the URL to download from if we find a match in there so that the download can be resumed correctly. (that's code & functionality which has been added during June 2016 by me.)
Still unhandled case:
  • if a mirror disappears, we'll get a download error. In that case we want to first try to download from another mirror and not throw an error immediately.

#18 Updated by u over 2 years ago

Unfortunately i'm currently unable to push my modifications :/

#19 Updated by u over 2 years ago

I pushed and also merged recent changes by ma1 into our master branch so that he can already review this. I'll send an email to him now.

#20 Updated by u over 2 years ago

So here is how I imagine the last missing scenario. But I might miss something here and am very open for ideas you people might have:

The last missing bit would be to assign any URL from the mirror pool if a download fails. We cannot directly detect if a mirror disappears, because the JSON file is cached as far as I understand. That means that the user will simply be presented with a failed download.

The problem here is that when my download fails, as a user, I still will be presented with the same download link as before. But I want the download link to be replaced automatically, without me even noticing.

That means, I need to detect if there's a download error, then either automatically retry or present a link to the user asking her if she wants to try another mirror.

What do you think?

#21 Updated by intrigeri over 2 years ago

So here is how I imagine the last missing scenario. But I might miss something here and am very open for ideas you people might have:

Done in the email thread where you raised similar issues :)

#22 Updated by ma1 over 2 years ago

I did review the changes in downloader.js and they look good. However I could not find the lib/mirror-dispatcher.js file you mentioned in your email. Is it yet to be integrated in the extension tree?

#23 Updated by intrigeri over 2 years ago

However I could not find the lib/mirror-dispatcher.js file you mentioned in your email. Is it yet to be integrated in the extension tree?

It lives there: https://git-tails.immerda.ch/mirror-pool-dispatcher/

IMO we should copy it into the extension tree as part of the release process, but not add it to Git (to avoid confusion and the temptation to modify it here). Fair enough?

#24 Updated by ma1 over 2 years ago

intrigeri wrote:

It lives there: https://git-tails.immerda.ch/mirror-pool-dispatcher/

IMO we should copy it into the extension tree as part of the release process, but not add it to Git (to avoid confusion and the temptation to modify it here). Fair enough?

Sounds fine, yes.

#25 Updated by intrigeri over 2 years ago

  • Blocks Feature #8643: Clean up the remainers of the old mirror pool setup added

#26 Updated by u over 2 years ago

  • Target version changed from Tails_2.5 to Tails_2.6

#30 Updated by u over 2 years ago

Committed some more modifications which choose a random mirror if the resuming of an existing download fails.
I still want to test this a bit more and modify the current behaviour of always choosing a random mirror on resume.

#31 Updated by intrigeri over 2 years ago

u wrote:

Committed some more modifications which choose a random mirror if the resuming of an existing download fails.

:)

#32 Updated by anonym about 2 years ago

  • Target version changed from Tails_2.6 to Tails_2.7

#33 Updated by u about 2 years ago

fyi waiting for review, but earliest expected date for that to happen is in 2 weeks.

#34 Updated by intrigeri about 2 years ago

I started with a quick code review of 192314f..626719f. Many thanks for the HACKING documentation!

tryAnotherMirror:

  • It seems to me that let tmp_url = download.source.url.split('/tails/'); assumes too much about how download.source.url looks like, and as a result is not robust enough: e.g. if its value is https://example.com/tails/pub/tails/stable/whatever.iso, then tmp_url[1] will be pub, while it should be stable/whatever.iso. If I got it right, what we want here is to drop the previously used mirror's url_prefix. If we still have access to the previous content of mirrors.json (at the time when we picked the previous mirror), then we can loop over values of url_prefix for all mirrors found in there, pick the first value of url_prefix that is a prefix of download.source.url, remove that from the beginning of the string and we get something equivalent to tmp_url[1] (granted, that's not 100% bullet proof either, in case we have mirrors with overlapping prefixes, but I think that's very unlikely to happen); else, if we don't have access anymore at this point to the previous content of mirrors.json, then I guess we need to save somehow the url_prefix of the mirror we picked while starting a download (can we do that?), so that we can simply remove it from the beginning of the string in tryAnotherMirror. If all these options are not feasible or too hard to implement, then I suggest we simply add a check in validate-config (mirror-pool.git) to detect cases that would not work due to the current implementation's limitations (at first glance, strictly more that one occurrence of /tails/ in url_prefix is the condition we have to check for); plus, adding a comment in tryAnotherMirror that explains the limitation, and the workaround we put in place; that should be easy to do and good enough for all practical purposes, right? BTW, it seems that testurl.split('/tails/') might have exactly the same problem, but I'll let you check.
  • This function currently can pick a disabled mirror, no? I'm worried that we have two different implementations of picking a random mirror, one in mirror-dispatcher.js (that honors weight), that's used at initial download startup, and another one in tryAnotherMirror. I'm sure I don't have to elaborate about why such duplication is problematic, especially since we already see it create a bug (not ignoring disabled mirrors). I think we should fix that by having mirror-dispatcher.js provide whatever API you need in tryAnotherMirror; that (possibly new) function would use getRandomMirrorUrlPrefix behind the hood just like transformURL does, so that we have only one single implementation of the "pick a random mirror taking into account a bunch of criteria" logics.

The documentation of the compareUrlToMirrors function is outdated, now that it returns something else than a boolean. BTW, if not done yet, please consider renaming that function to better express its current behaviour.

Next step for me: read u's email messages on this topic, and test. Hopefully doable later today, or tomorrow. Worst case, by September 27..

#35 Updated by u about 2 years ago

intrigeri wrote:

I started with a quick code review of 192314f..626719f.

Thanks!

  • This function currently can pick a disabled mirror, no? I'm worried that we have two different implementations of picking a random mirror, one in mirror-dispatcher.js (that honors weight), that's used at initial download startup, and another one in tryAnotherMirror. I'm sure I don't have to elaborate about why such duplication is problematic, especially since we already see it create a bug (not ignoring disabled mirrors). I think we should fix that by having mirror-dispatcher.js provide whatever API you need in tryAnotherMirror; that (possibly new) function would use getRandomMirrorUrlPrefix behind the hood just like transformURL does, so that we have only one single implementation of the "pick a random mirror taking into account a bunch of criteria" logics.

No it cannot pick a disabled mirror.

You might have missed line 371-375:


371           for ( let i = 0 ; i < mirrors.mirrors.length; i++) {
372             if(mirrors.mirrors[i].weight != 0) {
373               mirror_list.push(mirrors.mirrors[i].url_prefix);
374             }
375           }

I only add mirrors to the list which are not disabled. tryAnotherMirror chooses a mirror from this list only.

However, I agree that a cleaner way would be to implement this in mirror-dispatcher.js instead.

#36 Updated by u about 2 years ago

intrigeri wrote:

tryAnotherMirror:

  • It seems to me that let tmp_url = download.source.url.split('/tails/'); assumes too much about how download.source.url looks like, and as a result is not robust enough: e.g. if its value is https://example.com/tails/pub/tails/stable/whatever.iso, then tmp_url[1] will be pub, while it should be stable/whatever.iso. If I got it right, what we want here is to drop the previously used mirror's url_prefix. If we still have access to the previous content of mirrors.json (at the time when we picked the previous mirror), then we can loop over values of url_prefix for all mirrors found in there, pick the first value of url_prefix that is a prefix of download.source.url, remove that from the beginning of the string and we get something equivalent to tmp_url[1] (granted, that's not 100% bullet proof either, in case we have mirrors with overlapping prefixes, but I think that's very unlikely to happen); else, if we don't have access anymore at this point to the previous content of mirrors.json, then I guess we need to save somehow the url_prefix of the mirror we picked while starting a download (can we do that?), so that we can simply remove it from the beginning of the string in tryAnotherMirror. If all these options are not feasible or too hard to implement, then I suggest we simply add a check in validate-config (mirror-pool.git) to detect cases that would not work due to the current implementation's limitations (at first glance, strictly more that one occurrence of /tails/ in url_prefix is the condition we have to check for); plus, adding a comment in tryAnotherMirror that explains the limitation, and the workaround we put in place; that should be easy to do and good enough for all practical purposes, right? BTW, it seems that testurl.split('/tails/') might have exactly the same problem, but I'll let you check.

I'll check that.

The documentation of the compareUrlToMirrors function is outdated, now that it returns something else than a boolean. BTW, if not done yet, please consider renaming that function to better express its current behaviour.

I'll rewrite that part.

#37 Updated by intrigeri about 2 years ago

No it cannot pick a disabled mirror.

You might have missed line 371-375:

Absolutely! Indeed, I've missed it. Cool :)

However, I agree that a cleaner way would be to implement this in mirror-dispatcher.js instead.

Glad we're on the same page. Do you want to track this here on a subtask? (IMO a subtask would make it easier to not forget it, but that's your call.)

#38 Updated by ma1 about 2 years ago

Sorry for the delay.

Since intrigeri is already reviewing this code, I'll start fixing bug #11796 and come back when you're done here.

BTW, regarding tryAnotherMirror(), don't we actually want the path slice after the last occurrence of '/tails/'?
Therefore, wouldn't

// returns the URL slice after the last "/tails/", or an empty string if no "/tails/" is found
function mirrorSuffix(downloadUrl) {
  let slices = downloadUrl.split("/tails/");
  return slices.length > 1 ? slices.pop() : '';
}

be a reusable replacement for the tmp_url[] stuff?

#39 Updated by intrigeri about 2 years ago

Since intrigeri is already reviewing this code, [..]

Disclaimer: I'm an absolute beginner at JS, and even more so when it comes to Firefox add-ons.

#40 Updated by ma1 about 2 years ago

Following up on an email exchange about tryAnotherMirror() to be called when resuming a failed (rather than paused) download: the tailsReconnectionFailure() function is meant to return true only if a certain very Tails-specific and weird networking behavior is triggered when you disconnect and reconnect a network interface.
That's why it's never called under the conditions described in the HACKING document.

I suggest to modify the resume() method as follows:

resume() {
    reset(status.phase);
    if (curDownload) {
      if(curDownload.error && curDownload.error.becauseSourceFailed) {
        curDownload = tryAnotherMirror(curDownload);
      }
      curDownload.start();
    }
  }

Probably even better, but I'd leave the decision to intrigeri because is about design, would be, rather than trying just one mirror per manual resume attempt, automatically (but still randomly) pick one URL after the other from a copy of the mirror list (except the current one) until either the download correctly starts or the array is empty.

#41 Updated by u about 2 years ago

  • Assignee changed from u to ma1
  • QA Check set to Info Needed

Hi Ma1,

thanks that sounds much more like what I wanted to do :))
I'll test that soonish and will let you know.

I'll also rework on the url prefixes issue you both mentioned.

Did you see any possible security issues with the code we wrote?

Cheers!
u.

#42 Updated by intrigeri about 2 years ago

Probably even better, but I'd leave the decision to intrigeri because is about design, would be, rather than trying just one mirror per manual resume attempt, automatically (but still randomly) pick one URL after the other from a copy of the mirror list (except the current one) until either the download correctly starts or the array is empty.

That would work, but I'm very unsure that handling this case is worth the effort. u, what do you think?

#43 Updated by intrigeri about 2 years ago

You might have missed line 371-375:


> 371           for ( let i = 0 ; i < mirrors.mirrors.length; i++) {
> 372             if(mirrors.mirrors[i].weight != 0) {
> 373               mirror_list.push(mirrors.mirrors[i].url_prefix);
> 374             }
> 375           }

> 

I only add mirrors to the list which are not disabled. tryAnotherMirror chooses a mirror from this list only.

However, I agree that a cleaner way would be to implement this in mirror-dispatcher.js instead.

And by the way, one afterthought: once we reuse the existing implementation from mirror-dispatcher.js, presumably the "if weight != 0" check quoted above can disappear, since it's itself duplicating logic that our mirror selection algorithm already implements :)

#44 Updated by intrigeri about 2 years ago

BTW, regarding tryAnotherMirror(), don't we actually want the path slice after the last occurrence of '/tails/'?

This would work currently, but only thanks to (currently valid) assumptions wrt. the paths of stuff we store on mirrors (i.e. we don't have any /tails/ further in the URL). I'd rather not make such assumptions since we cannot enforce them from our (mirror team) perspective, and any change made on the other side (release management, misc development work) can break our stuff without us necessarily being informed.

So I prefer the solution I've proposed, i.e. "we simply add a check in validate-config (mirror-pool.git) to detect cases that would not work due to the current implementation's limitations (at first glance, strictly more that one occurrence of /tails/ in url_prefix is the condition we have to check for); plus, adding a comment in tryAnotherMirror that explains the limitation, and the workaround we put in place". It has the advantage that it does check before deploying changes if they're going to introduce a case that our code does not support, and would prevent us from actually pushing such breakage.

#45 Updated by intrigeri about 2 years ago

u: FWIW I'm starting to find it hard to track what's left to do and who's working on what, since we're mixing various things (some dev work remaining on your plate + some review by ma1 + some testing by myself that I'll postpone until the few dev tasks are done) one one single ticket that can only have 1 assignee. It's not a big problem if only I am affected, I can totally live with it, but if it's affecting you too at some point, then I suggest creating a few subtasks to track stuff. My 2 cts :)

#46 Updated by ma1 about 2 years ago

  • Assignee changed from ma1 to u
  • QA Check changed from Info Needed to Dev Needed

CODE REVIEW

File: tails-download-and-verify/downloader.js

Pending my patch for Bug #11796 and the changes I suggested in comment #40 above, it looks fine for me.
Optional: some very minor nits about formatting, like missing spaces in "if (" and "for (", or non-idiomatic EcmaScript, like iterating by index rather than "for (... of ...)".

I couldn't see any security issue introduced by your mirror pool handling code, anyway.

File: mirror-pool-dispatcher/lib/js/mirror-dispatcher.js

< 151 var url_pattern = new RegExp('^(http|https):\/\/[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-_\.,@\?^=%&;:/~\+#]*[a-z0-9\-\_#@\?^=%&;/~\+])?$', 'i');

Nit: wouldn't
> 151 var url_pattern = /^https?:\/\/[a-z0-9-_]+(.[a-z0-9-_]+)+([a-z0-9-_.,@?^=%&;:/~+#]*[a-z0-9-_#@?^=%&;/~+])?$/i

be easier to read and maintain?

< 222 Object.assign(exports, transformURL());

This is unlikely to work as expected: it calls transformURL() without arguments, causing it to throw because mirrors is undefined.
Maybe

> 222 Object.assign(exports, {transformURL});

As you can see, no big issue :)
Really good job, thank you!

#47 Updated by u about 2 years ago

ma1 wrote:

CODE REVIEW

Thank you very much!

File: tails-download-and-verify/downloader.js

Pending my patch for Bug #11796 and the changes I suggested in comment #40 above, it looks fine for me.
Optional: some very minor nits about formatting, like missing spaces in "if (" and "for (", or

I changed that.

non-idiomatic EcmaScript, like iterating by index rather than "for (... of ...)".

For now I left this as is.

I couldn't see any security issue introduced by your mirror pool handling code, anyway.

That's really good news!

File: mirror-pool-dispatcher/lib/js/mirror-dispatcher.js
[...]
Nit: wouldn't
[...]
be easier to read and maintain?

For now we also have http mirrors, sor that part needs to be kept. As for the simplification, I need to dive into it more in detail to find out if that would work. Thanks for suggesting it.

This is unlikely to work as expected: it calls transformURL() without arguments, causing it to throw because mirrors is undefined.

Ok, I'll modify this.

#48 Updated by u about 2 years ago

Hi,

I suggest to modify the resume() method as follows:
[...]

I've finally tried this but it does not work. However, I think that that's not due to the code, but rather to the fact that FF seems to use its own DNS cache and simply continues to download from the mirror we previously selected even though it's not supposed to reach it, because I've modified /etc/hosts to point this URL to 127.0.0.1.

#49 Updated by intrigeri about 2 years ago

u wrote:

ma1 wrote:

File: mirror-pool-dispatcher/lib/js/mirror-dispatcher.js
[...]
Nit: wouldn't
[...]
be easier to read and maintain?

For now we also have http mirrors, sor that part needs to be kept.

The regexp proposed by ma1 still supports http mirrors, AFAICT.

#50 Updated by intrigeri about 2 years ago

Trying to sum up what's left to do:

  • pause + resume while the initially picked mirror has gone away: not a blocker
  • unchecked assumptions in download.source.url.split in tryAnotherMirror
  • check if testurl.split('/tails/') is affected by similar unchecked assumptions
  • de-duplicate logics for picking a random mirror, by having mirror-dispatcher.js provide whatever API we need for tryAnotherMirror; and then:
  • drop "if weight != 0" check (that's a duplicate that will be useless once we're reusing our existing code)
  • outdated compareUrlToMirrors doc
  • potentially buggy Object.assign(exports, transformURL()); (beware, we need to support Jessie's nodejs, Stretch's nodejs, and various browsers)

#51 Updated by intrigeri about 2 years ago

intrigeri wrote:

  • potentially buggy Object.assign(exports, transformURL()); (beware, we need to support Jessie's nodejs, Stretch's nodejs, and various browsers)

u and I agree that it's strange that this works at all. This change was successfuly tested on DAVE, and in sid & jessie's nodejs:

--- a/lib/js/mirror-dispatcher.js
+++ b/lib/js/mirror-dispatcher.js
@@ -216,12 +216,5 @@ function transformURL(url, fallback_download_url_prefix, mirrors) {

 if(typeof exports !== 'undefined') {
   // Now we know that we're not running in the browser
-  try {
-    // This works in DAVE but not in node.js
-    Object.assign(exports, transformURL());
-  }
-  catch (e) {
-    // Fallback for Jessie's node.js
-    module.exports.transformURL = transformURL;
-  }
+  module.exports.transformURL = transformURL;
 }

Next step is to test it on our download pages that don't use DAVE, e.g. a call for testing a Tails RC, in FF, Chromium & IE.

#52 Updated by intrigeri about 2 years ago

  • unchecked assumptions in download.source.url.split in tryAnotherMirror
  • de-duplicate logics for picking a random mirror, by having mirror-dispatcher.js provide whatever API we need for tryAnotherMirror; and then:

Pushed an early draft that tries to fix both issues, to the 451f/de-duplicate+robustness branch.

#53 Updated by intrigeri about 2 years ago

Another code style/design issue that I've missed earlier: tryAnotherMirror uses the global blob variable. I think it should get the data it needs from blob (url and fallback_download_url_prefix, at 1st glance) as arguments.

#54 Updated by u about 2 years ago

Concerning resuming: our code works but when we try to resume downloading from another mirror if the previous one were not reachable anymore, we check if the download is failing because the source is not reachable.

if (curDownload.error && curDownload.error.becauseSourceFailed) {
   curDownload = tryAnotherMirror(curDownload);
}

This results in some errors:


NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIHttpChannel.responseStatus] downloader.js line 50 > eval:141:0
NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIChannel.contentLength] downloader.js line 50 > eval:219:0

But it might actually be that we're hit by this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1275289

These errors could indeed be the result of Mozilla's built-in error handler https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Downloads.jsm/DownloadError (this is the browser component which returns the becauseSourceFailed information). It assumes that there should be a responseStatus, but on failed downloads there is none. So instead of being able to detect if the source failed, we get hit by an error and cannot resume the download from another mirror.

A fix is available for FF51 https://hg.mozilla.org/mozilla-central/rev/0fc7e282d053.

So I'll indeed stop loosing time to make this work.

#55 Updated by u about 2 years ago

  • Assignee changed from u to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

All other issues have been fixed and pushed (last commit: b17fe88ffd6f76c1bc4d88c92ca96dfc74fae945 in my DAVE repo.)

I also verified that we don't choose mirrors with weight=0: I verified that when DAVE selects mirrors, DAVE always only uses the transformURL function which itself uses getRandomMirrorURL which uses our weight logic. Beware I also modified the mirror-pool-dispatcher script. It looks like we oversaw something crucial: the selection of a random mirror with weight 0 was still possible. I've fixed and commented that change in 4b4d1d2b187477612e08995eaa6ccc1f4157d2fe. (Pushed to my repo and main repo.)

Please test all that and once you're done, I'll ask Giorgio to build & upload the extension.

#56 Updated by u about 2 years ago

  • potentially buggy Object.assign(exports, transformURL()); (beware, we need to support Jessie's nodejs, Stretch's nodejs, and various browsers)

u and I agree that it's strange that this works at all. This change was successfuly tested on DAVE, and in sid & jessie's nodejs:

Next step is to test it on our download pages that don't use DAVE, e.g. a call for testing a Tails RC, in FF, Chromium & IE.

  • DAVE does not use the object and I've tested that the current code works.
  • We've tested the code in Jessie's and Stretch's nodejs, it works.
  • The browser itself is not supposed to execute this code and the same line was present in the code beforehand.

#57 Updated by u about 2 years ago

Trying to sum up what has been done:

  • pause + resume while the initially picked mirror has gone away: not a blocker

=> Mozilla bug ?

  • unchecked assumptions in download.source.url.split in tryAnotherMirror

=> modified & tested using url_suffix

  • check if testurl.split('/tails/') is affected by similar unchecked assumptions

=> modified & tested using url_suffix

  • de-duplicate logics for picking a random mirror, by having mirror-dispatcher.js provide whatever API we need for tryAnotherMirror; and then:
  • drop "if weight != 0" check (that's a duplicate that will be useless once we're reusing our existing code)

=> done, we use transformURL

  • outdated compareUrlToMirrors doc

=> modified

  • potentially buggy Object.assign(exports, transformURL()); (beware, we need to support Jessie's nodejs, Stretch's nodejs, and various browsers)

=> Modified in mirror-dispatcher.js.

#58 Updated by u about 2 years ago

@intrigeri: As for changing the pattern in isValidURL(url) in the mirror dispatcher script, I think you're more firm with regexp than me and I'd like your opinion on modifying the regexp because I fear that I will certainly miss something.

#59 Updated by intrigeri about 2 years ago

Beware: I also modified the mirror-pool-dispatcher script. It looks like we oversaw something crucial: the selection of a random mirror with weight 0 was still possible. I've fixed and commented that change in 4b4d1d2b187477612e08995eaa6ccc1f4157d2fe. (Pushed to my repo and main repo.)

Good catch! The fix seems OK. I've now pushed this (+ some other improvements and one regression fix) to the real main repo.

#60 Updated by intrigeri about 2 years ago

@intrigeri: As for changing the pattern in isValidURL(url) in the mirror dispatcher script, I think you're more firm with regexp than me and I'd like your opinion on modifying the regexp because I fear that I will certainly miss something.

I've looked both at the current regexp and the one proposed by Giorgio, and both have bugs (assuming that my knowledge of Perl REs applies to those):

  • some of them are common to both versions, e.g. including a dash '-' in the middle of a square-bracket character class does not work, since that char has a special meaning (range separator) in this context, unless it's 1st or last in the character class;
  • some of them are fixed in Giorgio's version, e.g. escaping of chars in square-bracket character classes that don't need to be escaped in this context;
  • some of them are introduced in Giorgio's version, e.g. not escaping "." in (.[a-z0-9-_]+), while I think it really should be.

So to improve any of those versions, quite some work would be needed (and a bunch of well-chosen automated tests, if you ask me).

Now, I've looked at the context in which we use this RE, and basically we control all the bits we feed through it, either because they come from well-known links we add ourselves to our website, or because we manually add url_prefix to mirrors.json. So I'm not super worried about the need for this regexp to be perfect and support all corner cases correctly. Besides, I see the task of creating & maintaining our own URL validation RE pretty hopeless a task, and I'd rather not spend any time on it.

Therefore, I propose that we keep the current RE as-is, and create a new ticket (not blocking anything) about finding a better, well-maintained regexp (ideally in a library we can include) and replacing ours with it. What do you think?

#61 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to u
  • QA Check changed from Ready for QA to Dev Needed

I've pushed some trivial improvements to your master branch of DAVE (as I have no repo of mine, sorry!). I've reviewed all code changes and they do look good. I've not tested anything though, but my understanding is that you have tested all cases that matter (direct download without DAVE in FF and Chromium, download with DAVE) already, so there's nothing left for me to test, right?

The only concern I have left (aside of the RE thing discussed above) is that it seems to me that this code:

        if (blob.mirrors) {
          if (curDownload && compareUrlToMirrors(curDownload.source.url, blob.mirrors, blob.url_suffix) !== false) {
            blob.url = curDownload.source.url;
          }
        }

… implicitly relies on the assumption that blob.mirrors only contains active (weight > 0) mirrors, which is not the case anymore. So if for some reason curDownload.source.url matches a disabled mirror's URL prefix, then we're going to try and continue the download using that disabled mirror, which is likely to fail (we usually disable broken mirrors). IMO compareUrlToMirrors's semantics should be adjusted to only return a true value if the tested URL matches an active mirror's URL prefix. AFAICT the other use case of compareUrlToMirrors (in tryAnotherMirror) should be fine with this updated semantics: worst case we don't delete an inactive mirror from the list, but this doesn't matter since the only use of the mirrors list in tryAnotherMirror is passing it to transformURL, which is designed to deal with inactive mirrors anyway. What do you think?

#62 Updated by u about 2 years ago

intrigeri wrote:

I've pushed some trivial improvements to your master branch of DAVE (as I have no repo of mine, sorry!). I've reviewed all code changes and they do look good. I've not tested anything though, but my understanding is that you have tested all cases that matter (direct download without DAVE in FF and Chromium, download with DAVE) already, so there's nothing left for me to test, right?

Correct, everything was tested.

The only concern I have left (aside of the RE thing discussed above) is that it seems to me that this code:

[...]

… implicitly relies on the assumption that blob.mirrors only contains active (weight > 0) mirrors, which is not the case anymore. So if for some reason curDownload.source.url matches a disabled mirror's URL prefix, then we're going to try and continue the download using that disabled mirror, which is likely to fail (we usually disable broken mirrors). IMO compareUrlToMirrors's semantics should be adjusted to only return a true value if the tested URL matches an active mirror's URL prefix. AFAICT the other use case of compareUrlToMirrors (in tryAnotherMirror) should be fine with this updated semantics: worst case we don't delete an inactive mirror from the list, but this doesn't matter since the only use of the mirrors list in tryAnotherMirror is passing it to transformURL, which is designed to deal with inactive mirrors anyway. What do you think?

I agree with this reasoning.

#63 Updated by u about 2 years ago

  • Assignee changed from u to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

I've reupdated that function now. If you're available to do a last check that this fix is fine, that would be cool.

#64 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to ma1

I've reupdated that function now. If you're available to do a last check that this fix is fine, that would be cool.

Looks good (untested, I assume you did and Giorgio will too anyway).

I'm reassigning to Giorgio who should, I guess, do the last review and then release.

#65 Updated by sajolida about 2 years ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (ma1)
  • QA Check deleted (Ready for QA)

The release was done so I'm closing this.

#66 Updated by intrigeri about 2 years ago

Hi Giorgio! Thanks for manually importing u's work via https://git-tails.immerda.ch/ma1/download-and-verify-extension/commit/?id=f024097253ead33b1a2d8397829307dff08def79, but it's problematic to do it this way (in an unrelated new commit), since Git still believes that her work has not been merged. So I've merged (as in git merge) u's branch into yours, and pushed to your repo, and now both your master branch and u's are in the exact same state (wrt. both the content of the repo, and the history thereof).

Also available in: Atom PDF