From 26eb4cd6c7c7793ee76c73b66621f63daff51953 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Wed, 20 Jun 2018 14:59:29 +0100 Subject: drm/i915: Disable bh around call to tasklet MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The guc submission backends expects to only be run from (at least) softirq context, but during our intel_engine_is_idle() check we would call into the tasklet to make sure it was flushed. As this could occur from process context, occasionally we would be caught out using a wait_for_atomic() not from an atomic context: [ 59.939091] WARN_ON_ONCE((1) && !(preempt_count() != 0)) [ 59.939142] WARNING: CPU: 1 PID: 2901 at drivers/gpu/drm/i915/intel_guc_submission.c:615 guc_submission_tasklet+0x784/0xa90 [i915] [ 59.939143] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei prime_numbers [ 59.939164] CPU: 1 PID: 2901 Comm: gem_exec_schedu Tainted: G U W 4.18.0-rc1-g93475d62c730-drmtip_67+ #1 [ 59.939165] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018 [ 59.939188] RIP: 0010:guc_submission_tasklet+0x784/0xa90 [i915] [ 59.939189] Code: fc ff ff 80 3d 2f 87 11 00 00 0f 85 80 fb ff ff 48 c7 c6 f8 49 40 c0 48 c7 c7 80 41 3e c0 c6 05 14 87 11 00 01 e8 2c ea d6 d3 <0f> 0b e9 5f fb ff ff 8b 46 38 89 cf 31 c7 83 e7 c0 75 08 39 c1 0f [ 59.939253] RSP: 0018:ffffaafe08a03c10 EFLAGS: 00010286 [ 59.939255] RAX: 0000000000000000 RBX: ffff8f9112c246f0 RCX: 0000000000000001 [ 59.939256] RDX: 0000000080000001 RSI: ffffffff95086d8e RDI: 00000000ffffffff [ 59.939257] RBP: ffff8f9112c24680 R08: 000000009517be77 R09: 0000000000000000 [ 59.939258] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f9112c24700 [ 59.939259] R13: ffff8f9112c24700 R14: 0000000000000000 R15: ffff8f9112c242a8 [ 59.939260] FS: 00007fc2cc7e5980(0000) GS:ffff8f9136c40000(0000) knlGS:0000000000000000 [ 59.939261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.939262] CR2: 00007fc2cc815040 CR3: 000000021f10e003 CR4: 00000000003606e0 [ 59.939263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.939264] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.939265] Call Trace: [ 59.939288] ? intel_engine_is_idle+0x64/0x160 [i915] [ 59.939323] ? intel_engine_dump+0x638/0x890 [i915] [ 59.939327] ? seq_printf+0x49/0x70 [ 59.939353] ? i915_engine_info+0xc8/0x100 [i915] [ 59.939356] ? drm_get_color_range_name+0x20/0x20 [ 59.939361] ? seq_read+0xf1/0x470 [ 59.939365] ? trace_hardirqs_on_caller+0xe0/0x1b0 [ 59.939370] ? full_proxy_read+0x51/0x80 [ 59.939389] ? __vfs_read+0x31/0x170 [ 59.939395] ? do_sys_open+0x13b/0x240 [ 59.939398] ? rcu_read_lock_sched_held+0x6f/0x80 [ 59.939401] ? vfs_read+0x9e/0x140 [ 59.939404] ? ksys_read+0x50/0xc0 [ 59.939409] ? do_syscall_64+0x55/0x190 [ 59.939412] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 59.939420] irq event stamp: 552834 [ 59.939422] hardirqs last enabled at (552833): [] console_unlock+0x3fc/0x600 [ 59.939425] hardirqs last disabled at (552834): [] error_entry+0x7c/0x100 [ 59.939451] softirqs last enabled at (552614): [] i915_request_add+0x2e3/0x7b0 [i915] [ 59.939470] softirqs last disabled at (552604): [] i915_request_add+0x25b/0x7b0 [i915] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106977 Fixes: dd0cf235d81f ("drm/i915: Speed up idle detection by kicking the tasklets") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Mika Kuoppala Cc: Michal Wajdeczko Cc: MichaƂ Winiarski Cc: Michel Thierry Reviewed-by: Michel Thierry Link: https://patchwork.freedesktop.org/patch/msgid/20180620135929.23956-1-chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_engine_cs.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 32bf3a408d46..d3264bd6e9dc 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1000,10 +1000,12 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) if (READ_ONCE(engine->execlists.active)) { struct intel_engine_execlists *execlists = &engine->execlists; + local_bh_disable(); if (tasklet_trylock(&execlists->tasklet)) { execlists->tasklet.func(execlists->tasklet.data); tasklet_unlock(&execlists->tasklet); } + local_bh_enable(); if (READ_ONCE(execlists->active)) return false; -- cgit v1.2.3 From bc4237ec8deaaee5f75d1afa91a19bfe6f948c6f Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Thu, 28 Jun 2018 21:12:07 +0100 Subject: drm/i915/execlists: Unify CSB access pointers Following the removal of the last workarounds, the only CSB mmio access is for the old vGPU interface. The mmio registers presented by vGPU do not require forcewake and can be treated as ordinary volatile memory, i.e. they behave just like the HWSP access just at a different location. We can reduce the CSB access to a set of read/write/buffer pointers and treat the various paths identically and not worry about forcewake. (Forcewake is nightmare for worstcase latency, and we want to process this all with irqsoff -- no latency allowed!) v2: Comments, comments, comments. Well, 2 bonus comments. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-5-chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_engine_cs.c | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index d3264bd6e9dc..7209c22798e6 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -25,7 +25,6 @@ #include #include "i915_drv.h" -#include "i915_vgpu.h" #include "intel_ringbuffer.h" #include "intel_lrc.h" @@ -456,21 +455,10 @@ static void intel_engine_init_batch_pool(struct intel_engine_cs *engine) i915_gem_batch_pool_init(&engine->batch_pool, engine); } -static bool csb_force_mmio(struct drm_i915_private *i915) -{ - /* Older GVT emulation depends upon intercepting CSB mmio */ - if (intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915)) - return true; - - return false; -} - static void intel_engine_init_execlist(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; - execlists->csb_use_mmio = csb_force_mmio(engine->i915); - execlists->port_mask = 1; BUILD_BUG_ON_NOT_POWER_OF_2(execlists_num_ports(execlists)); GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS); -- cgit v1.2.3 From fd8526e509020ed30298ab57d03edc97bef83962 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Thu, 28 Jun 2018 21:12:10 +0100 Subject: drm/i915/execlists: Trust the CSB Now that we use the CSB stored in the CPU friendly HWSP, we do not need to track interrupts for when the mmio CSB registers are valid and can just check where we read up to last from the cached HWSP. This means we can forgo the atomic bit tracking from interrupt, and in the next patch it means we can check the CSB at any time. v2: Change the splitting inside reset_prepare, we only want to lose testing the interrupt in this patch, the next patch requires the change in locking Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-8-chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_engine_cs.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 7209c22798e6..ace93958689e 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1353,12 +1353,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine, ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine)); read = GEN8_CSB_READ_PTR(ptr); write = GEN8_CSB_WRITE_PTR(ptr); - drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], interrupt posted? %s, tasklet queued? %s (%s)\n", + drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], tasklet queued? %s (%s)\n", read, execlists->csb_head, write, intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)), - yesno(test_bit(ENGINE_IRQ_EXECLIST, - &engine->irq_posted)), yesno(test_bit(TASKLET_STATE_SCHED, &engine->execlists.tasklet.state)), enableddisabled(!atomic_read(&engine->execlists.tasklet.count))); @@ -1570,11 +1568,9 @@ void intel_engine_dump(struct intel_engine_cs *engine, spin_unlock(&b->rb_lock); local_irq_restore(flags); - drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s) (execlists? %s)\n", + drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s)\n", engine->irq_posted, yesno(test_bit(ENGINE_IRQ_BREADCRUMB, - &engine->irq_posted)), - yesno(test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))); drm_printf(m, "HWSP:\n"); -- cgit v1.2.3 From 9512f985c32d45ab439a210981e0d73071da395a Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Thu, 28 Jun 2018 21:12:11 +0100 Subject: drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half"), we came to the conclusion that running our CSB processing and ELSP submission from inside the irq handler was a bad idea. A really bad idea as we could impose nearly 1s latency on other users of the system, on average! Deferring our work to a tasklet allowed us to do the processing with irqs enabled, reducing the impact to an average of about 50us. We have since eradicated the use of forcewaked mmio from inside the CSB processing and ELSP submission, bringing the impact down to around 5us (on Kabylake); an order of magnitude better than our measurements 2 years ago on Broadwell and only about 2x worse on average than the gem_syslatency on an unladen system. In this iteration of the tasklet-vs-direct submission debate, we seek a compromise where by we submit new requests immediately to the HW but defer processing the CS interrupt onto a tasklet. We gain the advantage of low-latency and ksoftirqd avoidance when waking up the HW, while avoiding the system-wide starvation of our CS irq-storms. Comparing the impact on the maximum latency observed (that is the time stolen from an RT process) over a 120s interval, repeated several times (using gem_syslatency, similar to RT's cyclictest) while the system is fully laden with i915 nops, we see that direct submission an actually improve the worse case. Maximum latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 2) x Always using tasklets (a couple of >1000us outliers removed) + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ | + | | + | | + | | + + | | + + + | | + + + + x x x | | +++ + + + x x x x x x | | +++ + ++ + + *x x x x x x | | +++ + ++ + * *x x * x x x | | + +++ + ++ * * +*xxx * x x xx | | * +++ + ++++* *x+**xx+ * x x xxxx x | | **x++++*++**+*x*x****x+ * +x xx xxxx x x | |x* ******+***************++*+***xxxxxx* xx*x xxx + x+| | |__________MA___________| | | |______M__A________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 118 91 186 124 125.28814 16.279137 + 120 92 187 109 112.00833 13.458617 Difference at 95.0% confidence -13.2798 +/- 3.79219 -10.5994% +/- 3.02677% (Student's t, pooled s = 14.9237) However the mean latency is adversely affected: Mean latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 1) x Always using tasklets + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ | xxxxxx + ++ | | xxxxxx + ++ | | xxxxxx + +++ ++ | | xxxxxxx +++++ ++ | | xxxxxxx +++++ ++ | | xxxxxxx +++++ +++ | | xxxxxxx + ++++++++++ | | xxxxxxxx ++ ++++++++++ | | xxxxxxxx ++ ++++++++++ | | xxxxxxxxxx +++++++++++++++ | | xxxxxxxxxxx x +++++++++++++++ | |x xxxxxxxxxxxxx x + + ++++++++++++++++++ +| | |__A__| | | |____A___| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 3.506 3.727 3.631 3.6321417 0.02773109 + 120 3.834 4.149 4.039 4.0375167 0.041221676 Difference at 95.0% confidence 0.405375 +/- 0.00888913 11.1608% +/- 0.244735% (Student's t, pooled s = 0.03513) However, since the mean latency corresponds to the amount of irqsoff processing we have to do for a CS interrupt, we only need to speed that up to benefit not just system latency but our own throughput. v2: Remember to defer submissions when under reset. v4: Only use direct submission for new requests v5: Be aware that with mixing direct tasklet evaluation and deferred tasklets, we may end up idling before running the deferred tasklet. v6: Remove the redudant likely() from tasklet_is_enabled(), restrict the annotation to reset_in_progress(). v7: Take the full timeline.lock when enabling perf_pmu stats as the tasklet is no longer a valid guard. A consequence is that the stats are now only valid for engines also using the timeline.lock to process state. Testcase: igt/gem_exec_latency/*rthog* References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half") Suggested-by: Tvrtko Ursulin Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-9-chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_engine_cs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index ace93958689e..01862884a436 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1619,8 +1619,8 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) if (!intel_engine_supports_stats(engine)) return -ENODEV; - tasklet_disable(&execlists->tasklet); - write_seqlock_irqsave(&engine->stats.lock, flags); + spin_lock_irqsave(&engine->timeline.lock, flags); + write_seqlock(&engine->stats.lock); if (unlikely(engine->stats.enabled == ~0)) { err = -EBUSY; @@ -1644,8 +1644,8 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) } unlock: - write_sequnlock_irqrestore(&engine->stats.lock, flags); - tasklet_enable(&execlists->tasklet); + write_sequnlock(&engine->stats.lock); + spin_unlock_irqrestore(&engine->timeline.lock, flags); return err; } -- cgit v1.2.3 From f0d759f038dcced7dfb756dc54c224f09ad062a2 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 28 Jun 2018 17:35:41 -0500 Subject: drm/i915: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 141432 Addresses-Coverity-ID: 141433 Addresses-Coverity-ID: 141434 Addresses-Coverity-ID: 141435 Addresses-Coverity-ID: 141436 Addresses-Coverity-ID: 1357360 Addresses-Coverity-ID: 1357403 Addresses-Coverity-ID: 1357433 Addresses-Coverity-ID: 1392622 Addresses-Coverity-ID: 1415273 Addresses-Coverity-ID: 1435752 Addresses-Coverity-ID: 1441500 Addresses-Coverity-ID: 1454596 Signed-off-by: Gustavo A. R. Silva Signed-off-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com --- drivers/gpu/drm/i915/intel_engine_cs.c | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 01862884a436..478c928912c4 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -229,6 +229,7 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class) break; default: MISSING_CASE(class); + /* fall through */ case VIDEO_DECODE_CLASS: case VIDEO_ENHANCEMENT_CLASS: case COPY_ENGINE_CLASS: -- cgit v1.2.3 From 481827b441674b7f1d030c223decdb56266ff398 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Fri, 6 Jul 2018 11:14:41 +0100 Subject: drm/i915: Record logical context support in driver caps Avoid looking at the magical engines[RCS] to decide if the HW and driver supports logical contexts, and instead record that knowledge during initialisation. Signed-off-by: Chris Wilson Cc: Matthew Auld Reviewed-by: Matthew Auld Link: https://patchwork.freedesktop.org/patch/msgid/20180706101442.21279-1-chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_engine_cs.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 478c928912c4..e2f562853aee 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -302,6 +302,8 @@ intel_engine_setup(struct drm_i915_private *dev_priv, engine->class); if (WARN_ON(engine->context_size > BIT(20))) engine->context_size = 0; + if (engine->context_size) + DRIVER_CAPS(dev_priv)->has_logical_contexts = true; /* Nothing to do here, execute in order of dependencies */ engine->schedule = NULL; -- cgit v1.2.3 From 890fd185d53037d05d10a0825950c4b038e39d4a Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Fri, 6 Jul 2018 22:07:10 +0100 Subject: drm/i915: Replace nested subclassing with explicit subclasses In the next patch, we will want a third distinct class of timeline that may overlap with the current pair of client and engine timeline classes. Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise the different timeline classes with an explicit subclass. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_engine_cs.c | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/gpu/drm/i915/intel_engine_cs.c') diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index e2f562853aee..0ac497275a51 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -483,6 +483,7 @@ static void intel_engine_init_execlist(struct intel_engine_cs *engine) void intel_engine_setup_common(struct intel_engine_cs *engine) { i915_timeline_init(engine->i915, &engine->timeline, engine->name); + lockdep_set_subclass(&engine->timeline.lock, TIMELINE_ENGINE); intel_engine_init_execlist(engine); intel_engine_init_hangcheck(engine); -- cgit v1.2.3