编辑
2025-03-03
工作知识
0
请注意,本文编写于 95 天前,最后修改于 95 天前,其中某些信息可能已经过时。

目录

一:问题描述
二:定位
三:具体原因
四:规避方法
五:当前现象
六:进一步修改
七:总结

3588平台内核在某个阶段之后,如果启动硬光标之后,容易在一些场景下概率出现光标消失的问题。且无法恢复,只有重新启动。根据此问题现象,可以确定根本原因在内核,下面介绍一下此问题,以及如何解决和规避光标消失问题。

一:问题描述

在3588平台,客户经常反馈光标时不时就消失了,可能和插拔显示器,休眠唤醒,开关显示器有关系。当鼠标消失之后,没有任何方法来正常打开鼠标,只有重启系统。

二:定位

根据问题描述

“只有重启能够恢复”,能够说明问题在内核 “插拔显示器,休眠唤醒,开关显示”。能够说明和RK平台的VOP有关

而RK平台的VOP驱动主要在 文件 drivers/gpu/drm/rockchip/rockchip_drm_vop2.c 于是调试此文件即可

三:具体原因

留意这笔提交

Author: Andy Yan <andy.yan@rock-chips.com> Date: Fri Feb 25 20:31:08 2022 +0800 drm/rockchip: vop2: A workaround for PD_CLUSTER0 off The internal PD of VOP2 on rk3588 take effect immediately for power up and take effect by vsync for power down. And the PD_CLUSTER0 is a parent PD of PD_CLUSTER1/2/3, we may have this use case: Cluster0 is attached to VP0 for HDMI output, Cluster1 is attached to VP1 for MIPI DSI, When we enable Cluster1 on VP1, we should enable PD_CLUSTER0 as it is the parent PD, event though HDMI is plugout, VP1 is disabled, the PD of Cluster0 should keep power on. When system go to suspend: (1) Power down PD of Cluster1 before VP1 standby(the power down is take effect by vsync) (2) Power down PD of Cluster0 But we have problem at step (2), Cluster0 is attached to VP0. bus VP0 is in standby mode, as it is never used or hdmi plugout. So there is no vsync, the power down will never take effect. According to IC designer: We must power down all internal PD of VOP before we power down the global PD_VOP. So we get this workaround: We we found a VP is in standby mode when we want power down a PD is attached to it, we release the VP from standby mode, than it will run a default timing and generate vsync. Than we can power down the PD by this vsync. After all this is done, we standby the VP at last. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Change-Id: Ib9be8628f07d783c6bc3b7678c5eebfc63aabe1c

可以发现RK曾经为了解决某一问题做了一次workaround修复。

再看如下函数

static void vop2_power_domain_get(struct vop2_power_domain *pd) { if (pd->parent) vop2_power_domain_get(pd->parent); spin_lock(&pd->lock); if (pd->ref_count == 0) { if (pd->vop2->data->delayed_pd) cancel_delayed_work(&pd->power_off_work); vop2_power_domain_on(pd); } pd->ref_count++; spin_unlock(&pd->lock); } static void vop2_power_domain_put(struct vop2_power_domain *pd) { spin_lock(&pd->lock); /* * For a nested power domain(PD_Cluster0 is the parent of PD_CLuster1/2/3) * the parent power domain must be enabled before child power domain * is on. * * So we may met this condition: Cluster0 is not on a activated VP, * but PD_Cluster0 must enabled as one of the child PD_CLUSTER1/2/3 is enabled. * when all child PD is disabled, we want disable the parent * PD(PD_CLUSTER0), but as module CLUSTER0 is not attcthed on a activated VP, * the turn off operation(which is take effect by vsync) will never take effect. * so we will see a "wait pd0 off timeout" log when we turn on PD_CLUSTER0 next time. * * So we have a check here */ if (--pd->ref_count == 0 && vop2_power_domain_can_off_by_vsync(pd)) { if (pd->vop2->data->delayed_pd) schedule_delayed_work(&pd->power_off_work, msecs_to_jiffies(2500)); else vop2_power_domain_off(pd); } spin_unlock(&pd->lock); if (pd->parent) vop2_power_domain_put(pd->parent); }

留意这个注释

+ /* + * @lock: protect power up/down procedure. + * power on take effect immediately, + * power down take effect by vsync. + * we must check power_domain_status register + * to make sure the power domain is down before + * send a power on request. + * + */

可以看到RK需要确保 窗口的poweron是必须要在窗口已经poweroff下才能进行的。

所以可以留意这个使用引用计数变量:

pd->ref_count

也就是说,RK用引用计数的方式来统计vop的窗口是否已经正常poweroff了,如果引用计数不为0,说明窗口是正常打开的,故不会主动再次打开窗口

所以RK drm/rockchip: vop2: A workaround for PD_CLUSTER0 off 这笔提交出了问题。

留意如下注释和提交

According to IC designer: We must power down all internal PD of VOP before we power down the global PD_VOP.

代码如下:

if (vp) { ret = clk_prepare_enable(vp->dclk); if (ret < 0) DRM_DEV_ERROR(vop2->dev, "failed to enable dclk for video port%d - %d\n", vp->id, ret); crtc = &vp->rockchip_crtc.crtc; VOP_MODULE_SET(vop2, vp, standby, 0); vop2_power_domain_off(pd); vop2_cfg_done(crtc); vop2_wait_power_domain_off(pd); reinit_completion(&vp->dsp_hold_completion); vop2_dsp_hold_valid_irq_enable(crtc); VOP_MODULE_SET(vop2, vp, standby, 1); ret = wait_for_completion_timeout(&vp->dsp_hold_completion, msecs_to_jiffies(50)); if (!ret) DRM_DEV_INFO(vop2->dev, "wait for vp%d dsp_hold timeout\n", vp->id); vop2_dsp_hold_valid_irq_disable(crtc); clk_disable_unprepare(vp->dclk); }

这里是有一个漏洞,也就是这笔workaround的补丁,会主动关闭vop,但是关闭vop的时候,并没有给引用计数清零。所以窗口因为这笔workaround提交被异常关闭了,就很难再被打开了。

解决问题的方法如下:

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c index 996afa881289..e648a23bcc9f 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c @@ -3454,6 +3454,7 @@ static void vop2_power_domain_off_by_disabled_vp(struct vop2_power_domain *pd) crtc = &vp->rockchip_crtc.crtc; VOP_MODULE_SET(vop2, vp, standby, 0); vop2_power_domain_off(pd); + pd->ref_count = 0; vop2_cfg_done(crtc); vop2_wait_power_domain_off(pd);

四:规避方法

现在我们知道,光标消失和RK的一笔workaround有关,细心看一下这个补丁可以发现如下:

if (pd->data->id == VOP2_PD_CLUSTER0 || pd->data->id == VOP2_PD_CLUSTER1 || pd->data->id == VOP2_PD_CLUSTER2 || pd->data->id == VOP2_PD_CLUSTER3) { phys_id = ffs(pd->data->module_id_mask) - 1; win = vop2_find_win_by_phys_id(vop2, phys_id); vp_id = ffs(win->vp_mask) - 1; vp = &vop2->vps[vp_id]; } else { DRM_DEV_ERROR(vop2->dev, "unexpected power on pd%d\n", ffs(pd->data->id) - 1); }

也就是只有plane是cluster0/1/2/3的情况下,workaround才生效,那么针对内核,可以设置光标层不使用cluster层即可,方法如下:

&vp0 { rockchip,plane-mask = <(1 << ROCKCHIP_VOP2_CLUSTER0 | 1 << ROCKCHIP_VOP2_ESMART0)>; rockchip,primary-plane = <ROCKCHIP_VOP2_ESMART0>; cursor-win-id = <ROCKCHIP_VOP2_ESMART1>;; };

这里拿esmart1来举例,可以使用任意的非cluster层即可。

五:当前现象

如果使用修改引用计数的方法来解决问题,那么实际上还是有概率会把鼠标层关闭,但下一次drm_atomic_check_only的时候,还是会正常打开。那么现象为:

概率发现光标消失(在插拔显示器,休眠唤醒的情况下),但是动一下鼠标光标会正常显示。

考虑到此情况下,影响很小,不算bug,无需再次修改。

如果使用规避方法,此影响也不复存在

六:进一步修改

如果当前现象仍需要提高体验,可以进一步定制,也就是识别如果识别是光标层,且光标绑定在crtc上时,就将其设置为可见。而非不可见。但个人觉得没多大必要。

@@ -4131,6 +4132,8 @@ static int vop2_plane_atomic_check(struct drm_plane *plane, struct drm_plane_sta plane->name, state->src_x >> 16, state->src_y >> 16, state->src_w >> 16, state->src_h >> 16, state->crtc_x, state->crtc_y, state->crtc_w, state->crtc_h); + if(plane == crtc->cursor) + state->visible = true; return 0; }

这样就保证了,鼠标窗口不会因为其他特殊情况出现异常关闭的情况。

七:总结

此问题为RK为了解决其他问题引入的新问题。此问题和引用计数有关,内核有很多的地方都使用引用计数,但需要留意的是,引用计数必须要考虑周全,一旦某种情况下某人修改问题不注意使用引用计数,那么整个引用计数逻辑判断就会失效,从而诞生问题。