Closed
Bug 983948
Opened 10 years ago
Closed 10 years ago
Linux debug mochitest-browser-chrome tests are taking much longer to complete & frequently timing out after excessive GCing
Categories
(DevTools :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: KWierso, Assigned: miker)
References
(Blocks 2 open bugs)
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
Linux debug bc runs have been failing frequently (but not constantly) all day today. We've been starring most of them as intermittent and retriggering them, but we should probably figure out why they're failing. Not sure why this is only affecting Linux debug and not Linux64 debug. The earliest instance I see on mozilla-inbound is https://tbpl.mozilla.org/php/getParsedLog.php?id=36145293&tree=Mozilla-Inbound from Tomcat's merge from m-c back to inbound. I haven't tracked down where this actually started (and I'm retriggering the pushes prior to the merge to hopefully make sure it did come from the merge). https://tbpl.mozilla.org/php/getParsedLog.php?id=36145293&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36150556&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36153768&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36157278&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36164105&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36168057&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36169082&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36178922&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36167205&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36177299&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36181546&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36171967&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36174847&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36175835&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36177256&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36186007&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36180387&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36176803&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36183463&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36185567&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=36186956&tree=Mozilla-Inbound
Reporter | ||
Comment 1•10 years ago
|
||
They all seem to be failing during or soon after the tests in content/browser/toolkit/mozapps/extensions/test/xpinstall/ run.
Reporter | ||
Comment 2•10 years ago
|
||
Odd, it started happening with about the same frequency on fx-team right after that same merge from m-c, but I'm not seeing it happen on m-c itself... Retriggering some stuff there.
Comment 3•10 years ago
|
||
Bug 772484 was a recent landing that touched those tests, FWIW.
Comment 4•10 years ago
|
||
Those tests are just the victim of being the last tests in the suite: if someone takes it from being 7 minutes away from maxtime to being 1 minute over maxtime, then those are what will be running at maxtime. You don't actually need to retrigger, so much as just look at how long the (approximately) passing runs took. Thursday morning on fx-team they were taking around 157/158 minutes, or put more accurately around 9 or 10 minutes less than the suite's maxtime. Bug 663778 landed, and took it to 30 seconds under, or 30 seconds over, maxtime. Top Men are working on figuring out what we can do in the meantime, and what we can do to speed those tests back up, but we can live with it until bug 983941 catches a reconfig on Monday and we get our new pair of pants with a 68 inch waist.
Blocks: 663778
Updated•10 years ago
|
Component: Add-ons Manager → Developer Tools
Product: Toolkit → Firefox
Version: 28 Branch → Trunk
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 6•10 years ago
|
||
And lest you think the move to AWS slaves and running browser-chrome in three chunks will save you, https://tbpl.mozilla.org/php/getParsedLog.php?id=36195513&tree=Mozilla-Inbound is one of several where your chunk has exceeded its 4800 second share.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 70•10 years ago
|
||
Disabling the highlighter and comparing reveals that debugging on OSX shows an average slowdown of only 0.03 seconds for tests that use the highlighter. Looks like I will need to look into this on a Linux box... wondering if Linux SVG performance could be the culprit here.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 75•10 years ago
|
||
Interestingly, the worst affected test appears to be browser/toolkit/mozapps/extensions/test/browser/test-window/browser_bug608316.js, which shows an increase of 963%. In fact, 5 out of the worst 10 offenders are test-window tests all showing at least 500% slowdown although this could be down to server load. In bc runs where we reach the global timeout the DevTools tests are taking around 2.7 minutes longer to run and the rest are taking 5.36 minutes longer. To put this in context our tests are reaching the global timeout of 2.75 hours. I suspect that these small increases on a busy server are simply the straw that broke the camels back. Of course, some tests are timing out so these should be split up and we can optimimize the highlighter, look at the worst offending tests etc. so I will continue working on this.
Assignee | ||
Comment 76•10 years ago
|
||
(In reply to Michael Ratcliffe [:miker] [:mratcliffe] from comment #75) > Created attachment 8392804 [details] > Slow Test Analysis.ods > > Interestingly, the worst affected test appears to be > browser/toolkit/mozapps/extensions/test/browser/test-window/ > browser_bug608316.js, which shows an increase of 963%. In fact, 5 out of the > worst 10 offenders are test-window tests all showing at least 500% slowdown > although this could be down to server load. > Turns out it is just GC cleaning up old docshells and windows. I think the best solution since the camels back is breaking would be to create a mochitest-dt testsuite and move the devtools tests into there... would anybody be opposed to us doing this?
Comment 77•10 years ago
|
||
Are we sure we haven't regressed GC? eg There are a bunch of new linux browser-chrome intermittents similar to bug 984843, which show timeouts during excessive GCing.
Assignee | ||
Comment 78•10 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #77) > Are we sure we haven't regressed GC? > > eg There are a bunch of new linux browser-chrome intermittents similar to > bug 984843, which show timeouts during excessive GCing. You are right... GC does appear to be far more aggressive even when running single tests. This wouldn't be related to the devtools changes but we should take a look.
Comment 79•10 years ago
|
||
Example newly filed linux browser-chrome timeout intermittents that seem to be caused by GC: Bug 984843 Bug 984854 Bug 984856 Bug 985348 Bug 984666 Bug 984661 Bug 984659 Bug 984657 Bug 951477 Bug 985239 Bug 984851
Updated•10 years ago
|
Summary: Intermittent Linux debug m-bc runs failing with | command timed out: 9900 seconds elapsed, attempting to kill | During or after xpinstall tests → Linux debug mochitest-browser-chrome tests are taking much longer to complete & frequently timing out after excessive GCing
Updated•10 years ago
|
Severity: normal → critical
Comment 80•10 years ago
|
||
One thing to note is that the work to force extra GCs that happened in bug 938945 doesn't run in BC, at least in my local testing. If we run the GC more often, they will hopefully take less time. I have a slightly more sophisticated approach to this in bug 949607.
Comment 81•10 years ago
|
||
Pushed https://hg.mozilla.org/mozilla-central/rev/51505a8f2a44 to temporarily increase the default timeout from 30 seconds to 45 seconds, which I expect will take this from grotesque (I was surprised that my try push only got one green out of five on linux32 debug, but when I counted back on mozilla-inbound we'd only gotten one green out of the last thirty runs) to merely awful. I think Critical severely underestimates the severity of this bug: I pushed that because we were seriously talking about doing one of closing all trunk trees, or hiding browser-chrome, or disabling wide swathes of tests, on Friday.
Comment 82•10 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #81) > I think Critical severely underestimates the severity of this bug: I pushed > that because we were seriously talking about doing one of closing all trunk > trees, or hiding browser-chrome, or disabling wide swathes of tests, on > Friday. Those last two ("disable/hide lots of tests") aren't really an option - if the tree should be closed the tree should be closed.
Comment 83•10 years ago
|
||
What comment 76 describes is going to happen if the GC isn't getting enough of a chance to run. When it finally does manage to trigger, then whatever it triggers during will take a really long time because there's so much to clean up, but it will probably cause whatever test it is running during to time out. Doing something like bug 938945, but during BC, should help. I'm not sure what the right place would be to trigger occasional GCs in the BC test harness is, though.
Comment 84•10 years ago
|
||
60 seconds, https://tbpl.mozilla.org/?tree=Try&rev=d7b7a16a08de, doesn't smell significantly better (as I suspected from the number of timeout-threshold failures that were finishing between 30 and 40 seconds).
Assignee | ||
Comment 85•10 years ago
|
||
I took another look at this and GC is cleaning windows and docshells from previous tests causing other tests to timeout. Would it not make sense to force GC when all windows are closed between each test? This way the GC wouldn't cause test timeouts.
Comment 86•10 years ago
|
||
Yes, you basically want to do something like this: https://hg.mozilla.org/mozilla-central/rev/c3d8ec0e53c7 eg every X window closes, call DOMWindowUtils.garbageCollect(), which will trigger the GC and CC. I don't know what the proper place is for BC.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → mratcliffe
Assignee | ||
Comment 87•10 years ago
|
||
GCing between every 2 or 3 tests is not enough during tests that open lots of windows as some of them tend to get GCd during the next test causing timeouts. This ridiculously simple patch CCs & GCs between tests so that GC is less likely to eat into actual test time. I am not sure why tab was sometimes not defined in tabbrowser.xml but a added the check there as a "Tab is undefined" error occurred with this patch and didn't without it. https://tbpl.mozilla.org/?tree=Try&rev=2c82f57975dd
Attachment #8398521 -
Flags: review?(rcampbell)
Comment 88•10 years ago
|
||
Comment on attachment 8398521 [details] [diff] [review] GC between each test Review of attachment 8398521 [details] [diff] [review]: ----------------------------------------------------------------- tabbrowser thing is surprising. Might want a browser peer to take a look.
Attachment #8398521 -
Flags: review?(rcampbell)
Attachment #8398521 -
Flags: review?(gijskruitbosch+bugs)
Attachment #8398521 -
Flags: review+
Comment 89•10 years ago
|
||
Comment on attachment 8398521 [details] [diff] [review] GC between each test Review of attachment 8398521 [details] [diff] [review]: ----------------------------------------------------------------- rs=me to get our trees in a happier place, but please file a followup bug so we can investigate and get stacks when there's no tab. We need to figure out how that's even being called with a null tooltip node, only because we're suddenly GC'ing more...
Attachment #8398521 -
Flags: review?(gijskruitbosch+bugs) → review+
Comment 90•10 years ago
|
||
Running this on cedar to see how it plays with mochitest-dt and the chunked browser-chrome suites. https://tbpl.mozilla.org/?tree=Cedar&rev=8c337a10e8d3
Assignee | ||
Comment 91•10 years ago
|
||
(In reply to :Gijs Kruitbosch (no internet 29 Mar - 6 Apr) from comment #89) > Comment on attachment 8398521 [details] [diff] [review] > GC between each test > > Review of attachment 8398521 [details] [diff] [review]: > ----------------------------------------------------------------- > > rs=me to get our trees in a happier place, but please file a followup bug so > we can investigate and get stacks when there's no tab. We need to figure out > how that's even being called with a null tooltip node, only because we're > suddenly GC'ing more... Bug 989362 logged.
Comment 92•10 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=37062973&tree=Mozilla-Inbound
Comment 93•10 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=37119116&tree=Mozilla-Inbound
Comment 94•10 years ago
|
||
Do we know why we suddenly need to gc more? Making us gc between each test is just wallpapering over the issue unless the increase in need to gc is expected?
Comment 95•10 years ago
|
||
Bug 991311 could be related, I guess.
Comment 96•10 years ago
|
||
Comment on attachment 8398521 [details] [diff] [review] GC between each test >diff --git a/browser/base/content/tabbrowser.xml b/browser/base/content/tabbrowser.xml > <method name="createTooltip"> > var tab = document.tooltipNode; >- if (tab.localName != "tab") { >+ if (!tab || tab.localName != "tab") { Perhaps worth asking one of the Neil's about this, it sounds related to http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/layout/xul/nsXULPopupManager.cpp#l1429. (createTooltip is called from a tab tooltip's popupshowing handler)
Comment 97•10 years ago
|
||
Oops, I'll comment in bug 989362.
Assignee | ||
Comment 98•10 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #94) > Do we know why we suddenly need to gc more? > > Making us gc between each test is just wallpapering over the issue unless > the increase in need to gc is expected? This is nothing new for devtools tests. The increase is probably because we now set dom.send_after_paint_to_content to false for devtools mochitests. We discovered that by doing this leak detection and GC were improved, I suppose because we are no longer flooded by mozafterpaint events. The down side is that GC is more likely to kick in part way through a test.
Comment 99•10 years ago
|
||
Ah I see, that's helpful thank you.
Comment 100•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #90) > Running this on cedar to see how it plays with mochitest-dt and the chunked > browser-chrome suites. > https://tbpl.mozilla.org/?tree=Cedar&rev=8c337a10e8d3 We ended up reverting this on Cedar and didn't notice any major differences in test failures. So let's drop this to save some runtime.
Comment 101•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/52da0fc935ce
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 31
Comment 102•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #101) > https://hg.mozilla.org/mozilla-central/rev/52da0fc935ce This was meant for bug 984930; ignore.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 103•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #100) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #90) > > Running this on cedar to see how it plays with mochitest-dt and the chunked > > browser-chrome suites. > > https://tbpl.mozilla.org/?tree=Cedar&rev=8c337a10e8d3 > > We ended up reverting this on Cedar and didn't notice any major differences > in test failures. So let's drop this to save some runtime. So I'm sure I'm understanding; you're suggesting wontfixing this yeah?
Target Milestone: Firefox 31 → ---
Comment 104•10 years ago
|
||
Yeah, I'm not sure if this bug has anything actionable to be done at this point.
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → WONTFIX
Updated•6 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•