一、简单介绍

mongo语句查询条件中出现null或空串,可能会导致索引失效,查询优化器无法选择正确的查询计划,出现慢查询引起服务异常

mongo查询的执行计划使用了lru缓存,在很多种情况下会失效,导致重新选择执行计划并缓存,供后续同类查询直接使用;

服务开始时一直运行正常,当执行计划失效后,恰好出现null值的查询导致选择了错误的执行计划并缓存,后续正常的查询也会出现异常。

二、事件脉络

1、起因

用户中心收到业务方反馈,第三方登录注册出现频繁dubbo调用超时。

org.apache.dubbo.rpc.rpcexception: failed to invoke the method loginwiththird in the service weli.wormhole.rpc.user.center.api.iloginneedphoneservice. tried 1 times of the providers [10.65.5.0:11090] (1/4) from the registry node1.zk.all.platform.wtc.hwhosts.com:2181 on the consumer 10.65.1.81 using the dubbo version 2.7.3-snapshot. last error is: loginwiththird_2 failed and fallback failed.
at org.apache.dubbo.rpc.cluster.support.failoverclusterinvoker.doinvoke(failoverclusterinvoker.java:113)
at org.apache.dubbo.rpc.cluster.support.abstractclusterinvoker.invoke(abstractclusterinvoker.java:248)
at org.apache.dubbo.rpc.cluster.support.wrapper.mockclusterinvoker.invoke(mockclusterinvoker.java:78)
at org.apache.dubbo.rpc.proxy.invokerinvocationhandler.invoke(invokerinvocationhandler.java:55)
at org.apache.dubbo.common.bytecode.proxy9.loginwiththird(proxy9.java)
at weli.peanut.user.service.loginservice.loginandregister(loginservice.java:684)
at weli.peanut.user.service.loginservice.loginbythirdpartyv2(loginservice.java:629)
at weli.peanut.web.controller.api.userlogincontroller.lambda$loginbythirdpartyv2$0(userlogincontroller.java:113)
at weli.peanut.common.controller.basecontroller.resultforactionwithcat(basecontroller.java:71)
at weli.peanut.web.controller.api.userlogincontroller.loginbythirdpartyv2(userlogincontroller.java:113)
at sun.reflect.generatedmethodaccessor2017.invoke(unknown source)
at sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)
at java.lang.reflect.method.invoke(method.java:498)
at org.springframework.web.method.support.invocablehandlermethod.doinvoke(invocablehandlermethod.java:205)
at org.springframework.web.method.support.invocablehandlermethod.invokeforrequest(invocablehandlermethod.java:133)
at org.springframework.web.servlet.mvc.method.annotation.servletinvocablehandlermethod.invokeandhandle(servletinvocablehandlermethod.java:116)
at org.springframework.web.servlet.mvc.method.annotation.requestmappinghandleradapter.invokehandlermethod(requestmappinghandleradapter.java:827)
at org.springframework.web.servlet.mvc.method.annotation.requestmappinghandleradapter.handleinternal(requestmappinghandleradapter.java:738)
at org.springframework.web.servlet.mvc.method.abstracthandlermethodadapter.handle(abstracthandlermethodadapter.java:85)
at org.springframework.web.servlet.dispatcherservlet.dodispatch(dispatcherservlet.java:963)
at org.springframework.web.servlet.dispatcherservlet.doservice(dispatcherservlet.java:897)
at org.springframework.web.servlet.frameworkservlet.processrequest(frameworkservlet.java:970)
at org.springframework.web.servlet.frameworkservlet.dopost(frameworkservlet.java:872)
at javax.servlet.http.httpservlet.service(httpservlet.java:646)
at org.springframework.web.servlet.frameworkservlet.service(frameworkservlet.java:846)
at javax.servlet.http.httpservlet.service(httpservlet.java:727)
at org.apache.catalina.core.applicationfilterchain.internaldofilter(applicationfilterchain.java:303)
at org.apache.catalina.core.applicationfilterchain.dofilter(applicationfilterchain.java:208)
at org.apache.tomcat.websocket.server.wsfilter.dofilter(wsfilter.java:52)
at org.apache.catalina.core.applicationfilterchain.internaldofilter(applicationfilterchain.java:241)
at org.apache.catalina.core.applicationfilterchain.dofilter(applicationfilterchain.java:208)
at org.springframework.web.filter.characterencodingfilter.dofilterinternal(characterencodingfilter.java:197)
at org.springframework.web.filter.onceperrequestfilter.dofilter(onceperrequestfilter.java:107)
at org.apache.catalina.core.applicationfilterchain.internaldofilter(applicationfilterchain.java:241)
at org.apache.catalina.core.applicationfilterchain.dofilter(applicationfilterchain.java:208)
at org.apache.catalina.core.standardwrappervalve.invoke(standardwrappervalve.java:220)
at org.apache.catalina.core.standardcontextvalve.invoke(standardcontextvalve.java:122)
at org.apache.catalina.authenticator.authenticatorbase.invoke(authenticatorbase.java:501)
at org.apache.catalina.core.standardhostvalve.invoke(standardhostvalve.java:170)
at org.apache.catalina.valves.errorreportvalve.invoke(errorreportvalve.java:98)
at org.apache.catalina.core.standardenginevalve.invoke(standardenginevalve.java:116)
at org.apache.catalina.connector.coyoteadapter.service(coyoteadapter.java:408)
at org.apache.coyote.http11.abstracthttp11processor.process(abstracthttp11processor.java:1040)
at org.apache.coyote.abstractprotocol$abstractconnectionhandler.process(abstractprotocol.java:607)
at org.apache.tomcat.util.net.nioendpoint$socketprocessor.dorun(nioendpoint.java:1721)
at org.apache.tomcat.util.net.nioendpoint$socketprocessor.run(nioendpoint.java:1679)
at java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149)
at java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:624)
at java.lang.thread.run(thread.java:748)
caused by: com.netflix.hystrix.exception.hystrixruntimeexception: loginwiththird_2 failed and fallback failed.
at com.netflix.hystrix.abstractcommand$22.call(abstractcommand.java:818)
at com.netflix.hystrix.abstractcommand$22.call(abstractcommand.java:793)
at rx.internal.operators.operatoronerrorresumenextviafunction$4.onerror(operatoronerrorresumenextviafunction.java:140)
at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87)
at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87)
at com.netflix.hystrix.abstractcommand$deprecatedonfallbackhookapplication$1.onerror(abstractcommand.java:1454)
at com.netflix.hystrix.abstractcommand$fallbackhookapplication$1.onerror(abstractcommand.java:1379)
at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87)
at rx.observers.subscribers$5.onerror(subscribers.java:230)
at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:44)
at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:28)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:51)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:35)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.operatoronerrorresumenextviafunction$4.onerror(operatoronerrorresumenextviafunction.java:142)
at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87)
at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87)
at rx.internal.operators.operatorsubscribeon$1$1.onerror(operatorsubscribeon.java:59)
at rx.observers.subscribers$5.onerror(subscribers.java:230)
at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87)
at rx.observers.subscribers$5.onerror(subscribers.java:230)
at com.netflix.hystrix.abstractcommand$deprecatedonrunhookapplication$1.onerror(abstractcommand.java:1413)
at com.netflix.hystrix.abstractcommand$executionhookapplication$1.onerror(abstractcommand.java:1344)
at rx.observers.subscribers$5.onerror(subscribers.java:230)
at rx.observers.subscribers$5.onerror(subscribers.java:230)
at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:44)
at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:28)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:51)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:35)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:51)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:35)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41)
at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48)
at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30)
at rx.observable.unsafesubscribe(observable.java:10151)
at rx.internal.operators.operatorsubscribeon$1.call(operatorsubscribeon.java:94)
at com.netflix.hystrix.strategy.concurrency.hystrixcontexscheduleraction$1.call(hystrixcontexscheduleraction.java:56)
at com.netflix.hystrix.strategy.concurrency.hystrixcontexscheduleraction$1.call(hystrixcontexscheduleraction.java:47)
at com.netflix.hystrix.strategy.concurrency.hystrixcontexscheduleraction.call(hystrixcontexscheduleraction.java:69)
at rx.internal.schedulers.scheduledaction.run(scheduledaction.java:55)
at java.util.concurrent.executors$runnableadapter.call(executors.java:511)
at java.util.concurrent.futuretask.run(futuretask.java:266)
... 3 more
caused by: java.lang.runtimeexception: org.apache.dubbo.rpc.rpcexception: invoke remote method timeout. method: loginwiththird, provider: dubbo://10.65.5.0:11090/weli.wormhole.rpc.user.center.api.iloginneedphoneservice?anyhost=true&application=peanut-admin&bean.name=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&lazy=false&loadbalance=leastactive&methods=logout,sendsms,loginwithvisitor,loginwiththird,loginwithphone,loginwiththirdbindphone&pid=1&register=true&register.ip=10.65.1.81&release=2.7.3-snapshot&remote.application=wormhole-api&retries=0&revision=1.1.11-snapshot&side=consumer&sticky=false&timeout=5000&timestamp=1647332146309, cause: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
at suishen.esb.hystrix.dubbo.dubbohystrixcommand.run(dubbohystrixcommand.java:93)
at suishen.esb.hystrix.dubbo.dubbohystrixcommand.run(dubbohystrixcommand.java:12)
at com.netflix.hystrix.hystrixcommand$2.call(hystrixcommand.java:301)
at com.netflix.hystrix.hystrixcommand$2.call(hystrixcommand.java:297)
at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:46)
... 26 more
caused by: org.apache.dubbo.rpc.rpcexception: invoke remote method timeout. method: loginwiththird, provider: dubbo://10.65.5.0:11090/weli.wormhole.rpc.user.center.api.iloginneedphoneservice?anyhost=true&application=peanut-admin&bean.name=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&lazy=false&loadbalance=leastactive&methods=logout,sendsms,loginwithvisitor,loginwiththird,loginwithphone,loginwiththirdbindphone&pid=1&register=true&register.ip=10.65.1.81&release=2.7.3-snapshot&remote.application=wormhole-api&retries=0&revision=1.1.11-snapshot&side=consumer&sticky=false&timeout=5000&timestamp=1647332146309, cause: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
at org.apache.dubbo.rpc.protocol.asynctosyncinvoker.invoke(asynctosyncinvoker.java:63)
at org.apache.dubbo.rpc.listener.listenerinvokerwrapper.invoke(listenerinvokerwrapper.java:78)
at com.alibaba.dubbo.rpc.invoker$compatibleinvoker.invoke(invoker.java:55)
at suishen.esb.hystrix.dubbo.dubbohystrixcommand.run(dubbohystrixcommand.java:84)
... 30 more
caused by: java.util.concurrent.executionexception: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
at java.util.concurrent.completablefuture.reportget(completablefuture.java:357)
at java.util.concurrent.completablefuture.get(completablefuture.java:1915)
at org.apache.dubbo.rpc.protocol.asynctosyncinvoker.invoke(asynctosyncinvoker.java:56)
... 33 more
caused by: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
at org.apache.dubbo.remoting.exchange.support.defaultfuture.doreceived(defaultfuture.java:189)
at org.apache.dubbo.remoting.exchange.support.defaultfuture.received(defaultfuture.java:153)
at org.apache.dubbo.remoting.exchange.support.defaultfuture$timeoutchecktask.run(defaultfuture.java:252)
at org.apache.dubbo.common.timer.hashedwheeltimer$hashedwheeltimeout.expire(hashedwheeltimer.java:648)
at org.apache.dubbo.common.timer.hashedwheeltimer$hashedwheelbucket.expiretimeouts(hashedwheeltimer.java:727)
at org.apache.dubbo.common.timer.hashedwheeltimer$worker.run(hashedwheeltimer.java:449)
... 1 more

2、排查处理

a、查询用户中心cat监控,确认用户中心是否收到请求且正常响应。

当前状态:

正常状态:

对比结果:

用户中心已收到正常请求,排除客户端调用问题;

用户中心当前响应时间异常高出平常的响应时间,判断为用户中心内部业务处理出现问题。

b、查看kibana日志,发现无异常日志;

c、使用arthas诊断工具,监控业务调用链各部分耗时情况;

定位原因为mongo查询慢导致整体业务处理超时。

d、分析mongo查询语句,查看索引情况

public user getuserbydfid(string dfid, string app) {
criteria criteria = criteria.where("userbindinfo.dfid").is(dfid).and("app").is(app).and("status").is(1);
return this.mongotemplate.findone(query.query(criteria), user.class);
}
db.getcollection('wormhole_user').getindexes();
[
{
"v" : 1,
"key" : {
"app" : 1
},
"name" : "app",
"ns" : "wormhole.wormhole_user",
"background" : true
},
{
"v" : 1,
"key" : {
"userbindinfo.dfid" : 1
},
"name" : "userbindinfo.dfid",
"ns" : "wormhole.wormhole_user",
"background" : true
},
... (其余不关注索引已删除)
]

“userbindinfo.dfid”字段存在索引,查询语句也无明显异常。

e、查看华为云后台mongo慢日志

{
"op": "query",
"ns": "wormhole.wormhole_user",
"command": {
"find": "wormhole_user",
"filter": {
"userbindinfo.dfid": "duqmn3qlplrwc8pne1jot_9sdrrzazt4cuc5rfvrbu4zuuxwtfj3qzhqtmuxam9uxzltrfjyemf6vdrjvwm1c2h1",
"app": "maybe",
"status": 1
},
"ntoreturn": -1
},
"keysexamined": 1870039,
"docsexamined": 1870039,
"cursorexhausted": true,
"numyield": 14836,
"nreturned": 0,
"locks": {
"global": {
"acquirecount": {
"r": 14837
}
},
"database": {
"acquirecount": {
"r": 14837
}
},
"collection": {
"acquirecount": {
"r": 14837
}
}
},
"responselength": 36,
"millis": 14545,
"plansummary": "ixscan { app: 1 }",
"execstats": {
"stage": "cached_plan",
"nreturned": 0,
"executiontimemillisestimate": 14552,
"works": 1,
"advanced": 0,
"needtime": 0,
"needyield": 0,
"savestate": 14836,
"restorestate": 14836,
"iseof": 1,
"invalidates": 0,
"inputstage": {
"stage": "limit",
"nreturned": 0,
"executiontimemillisestimate": 14145,
"works": 1870040,
"advanced": 0,
"needtime": 1870039,
"needyield": 0,
"savestate": 14836,
"restorestate": 14836,
"iseof": 1,
"invalidates": 0,
"limitamount": 1,
"inputstage": {
"stage": "fetch",
"filter": {
"$and": [
{
"status": {
"$eq": 1
}
},
{
"userbindinfo.dfid": {
"$eq": "duqmn3qlplrwc8pne1jot_9sdrrzazt4cuc5rfvrbu4zuuxwtfj3qzhqtmuxam9uxzltrfjyemf6vdrjvwm1c2h1"
}
}
]
},
"nreturned": 0,
"executiontimemillisestimate": 14117,
"works": 1870040,
"advanced": 0,
"needtime": 1870039,
"needyield": 0,
"savestate": 14836,
"restorestate": 14836,
"iseof": 1,
"invalidates": 0,
"docsexamined": 1870039,
"alreadyhasobj": 0,
"inputstage": {
"stage": "ixscan",
"nreturned": 1870039,
"executiontimemillisestimate": 931,
"works": 1870040,
"advanced": 1870039,
"needtime": 0,
"needyield": 0,
"savestate": 14836,
"restorestate": 14836,
"iseof": 1,
"invalidates": 0,
"keypattern": {
"app": 1
},
"indexname": "app",
"ismultikey": false,
"multikeypaths": {
"app": []
},
"isunique": false,
"issparse": false,
"ispartial": false,
"indexversion": 1,
"direction": "forward",
"indexbounds": {
"app": [
"["maybe", "maybe"]"
]
},
"keysexamined": 1870039,
"seeks": 1,
"dupstested": 0,
"dupsdropped": 0,
"seeninvalidated": 0
}
}
}
},
"ts": {
"$date": 1647359086553
},
"client": "10.65.5.0",
"allusers": [
{
"user": "mongosiud",
"db": "wormhole"
}
],
"user": "mongosiud@wormhole"
}

通过慢日志发现,mongo并未使用“userbindinfo.dfid”作为索引查询条件,而是使用了“app”作为索引查询条件。

初步认为索引匹配度不够,mongo没正确使用索引。

f、新增联合索引,提高索引匹配度,查询时间恢复正常,接口恢复正常。

db.getcollection("wormhole_user").createindex({
"userbindinfo.dfid": 1,
"app": 1,
"status": 1 
}, {
name: "idx_user_app_userbindinfo.dfid",
background: true
});

3、问题分析

a、explain()语句分析

db.getcollection('wormhole_user').find({"userbindinfo.dfid": "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1", "app": "maybe", "status": 1}).explain();
{
"queryplanner" : {
"plannerversion" : 1,
"namespace" : "wormhole.wormhole_user",
"indexfilterset" : false,
"parsedquery" : {
"$and" : [ 
{
"app" : {
"$eq" : "maybe"
}
}, 
{
"status" : {
"$eq" : 1.0
}
}, 
{
"userbindinfo.dfid" : {
"$eq" : "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1"
}
}
]
},
"winningplan" : {
"stage" : "fetch",
"filter" : {
"$and" : [ 
{
"app" : {
"$eq" : "maybe"
}
}, 
{
"status" : {
"$eq" : 1.0
}
}
]
},
"inputstage" : {
"stage" : "ixscan",
"keypattern" : {
"userbindinfo.dfid" : 1
},
"indexname" : "userbindinfo.dfid",
"ismultikey" : false,
"multikeypaths" : {
"userbindinfo.dfid" : []
},
"isunique" : false,
"issparse" : false,
"ispartial" : false,
"indexversion" : 1,
"direction" : "forward",
"indexbounds" : {
"userbindinfo.dfid" : [ 
"[\"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\", \"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\"]"
]
}
}
},
"rejectedplans" : [ 
{
"stage" : "fetch",
"filter" : {
"$and" : [ 
{
"status" : {
"$eq" : 1.0
}
}, 
{
"userbindinfo.dfid" : {
"$eq" : "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1"
}
}
]
},
"inputstage" : {
"stage" : "ixscan",
"keypattern" : {
"app" : 1
},
"indexname" : "app",
"ismultikey" : false,
"multikeypaths" : {
"app" : []
},
"isunique" : false,
"issparse" : false,
"ispartial" : false,
"indexversion" : 1,
"direction" : "forward",
"indexbounds" : {
"app" : [ 
"[\"maybe\", \"maybe\"]"
]
}
}
}, 
{
"stage" : "fetch",
"filter" : {
"$and" : [ 
{
"app" : {
"$eq" : "maybe"
}
}, 
{
"userbindinfo.dfid" : {
"$eq" : "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1"
}
}, 
{
"status" : {
"$eq" : 1.0
}
}
]
},
"inputstage" : {
"stage" : "and_sorted",
"inputstages" : [ 
{
"stage" : "ixscan",
"keypattern" : {
"app" : 1
},
"indexname" : "app",
"ismultikey" : false,
"multikeypaths" : {
"app" : []
},
"isunique" : false,
"issparse" : false,
"ispartial" : false,
"indexversion" : 1,
"direction" : "forward",
"indexbounds" : {
"app" : [ 
"[\"maybe\", \"maybe\"]"
]
}
}, 
{
"stage" : "ixscan",
"keypattern" : {
"userbindinfo.dfid" : 1
},
"indexname" : "userbindinfo.dfid",
"ismultikey" : false,
"multikeypaths" : {
"userbindinfo.dfid" : []
},
"isunique" : false,
"issparse" : false,
"ispartial" : false,
"indexversion" : 1,
"direction" : "forward",
"indexbounds" : {
"userbindinfo.dfid" : [ 
"[\"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\", \"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\"]"
]
}
}
]
}
}
]
},
"serverinfo" : {
"host" : "host-192-168-10-163",
"port" : 27017,
"version" : "4.0.3",
"gitversion" : "0ead8608f9d151a199b05117bcc79ccb8d5f44a0"
},
"ok" : 1.0,
"operationtime" : timestamp(1647486920, 2),
"$clustertime" : {
"clustertime" : timestamp(1647486920, 2),
"signature" : {
"hash" : { "$binary" : "yltatlk6mvpnwdt7ptgwnst/9z4=", "$type" : "00" },
"keyid" : numberlong(7031512438860152835)
}
}
}
  • 发现mongo执行计划选择正确,这正好解释了服务在之前很长的一段时间内都是正常的原因;
  • mongo慢日志中的执行计划却选择错误,表明有什么原因导致了mongo执行计划的选择发生了变更。

b、文档查询

  • mongodb 查询优化器会缓存最有效的查询计划,关联的计划缓存条目会用于具有相同查询形状的后续查询;
  • 计划缓存会出现刷新:
    • mongodb重启;
    • 索引或者集合的删除添加更新等操作;
    • 最近最少使用 (lru) 缓存替换机制清除最近最少访问的缓存条目。

结论:之前使用了计划缓存服务正常,到了某一时刻,计划缓存失效,mongodb重新选择计划,此时选择了错误的计划并缓存,导致后续查询全部出现问题

c、查询首次出现的慢日志

{
"op":"query",
"ns":"wormhole.wormhole_user",
"command":{
"find":"wormhole_user",
"filter":{
"userbindinfo.dfid":null,
"app":"cybercat",
"status":1
},
"ntoreturn":-1
},
"keysexamined":872550,
"docsexamined":872550,
"frommultiplanner":true,
"replanned":true,
"cursorexhausted":true,
"numyield":13785,
"nreturned":0,
"locks":{
"global":{
"acquirecount":{
"r":13786
}
},
"database":{
"acquirecount":{
"r":13786
}
},
"collection":{
"acquirecount":{
"r":13786
}
}
},
"responselength":36,
"millis":27864,
"plansummary":"ixscan { app: 1 }",
"execstats":{
"stage":"limit",
"nreturned":0,
"executiontimemillisestimate":26216,
"works":872551,
"advanced":0,
"needtime":872550,
"needyield":0,
"savestate":13785,
"restorestate":13785,
"iseof":1,
"invalidates":0,
"limitamount":1,
"inputstage":{
"stage":"fetch",
"filter":{
"$and":[
{
"status":{
"$eq":1
}
},
{
"userbindinfo.dfid":{
"$eq":null
}
}
]
},
"nreturned":0,
"executiontimemillisestimate":26164,
"works":872551,
"advanced":0,
"needtime":872550,
"needyield":0,
"savestate":13785,
"restorestate":13785,
"iseof":1,
"invalidates":0,
"docsexamined":872550,
"alreadyhasobj":0,
"inputstage":{
"stage":"ixscan",
"nreturned":872550,
"executiontimemillisestimate":249,
"works":872551,
"advanced":872550,
"needtime":0,
"needyield":0,
"savestate":13785,
"restorestate":13785,
"iseof":1,
"invalidates":0,
"keypattern":{
"app":1
},
"indexname":"app",
"ismultikey":false,
"multikeypaths":{
"app":[
]
},
"isunique":false,
"issparse":false,
"ispartial":false,
"indexversion":1,
"direction":"forward",
"indexbounds":{
"app":[
"["cybercat", "cybercat"]"
]
},
"keysexamined":872550,
"seeks":1,
"dupstested":0,
"dupsdropped":0,
"seeninvalidated":0
}
}
},
"ts":{
"$date":1647309521700
},
"client":"10.65.1.25",
"allusers":[
{
"user":"mongosiud",
"db":"wormhole"
}
],
"user":"mongosiud@wormhole"
}

发现查询条件中出现了null值,本地再次分析:

db.getcollection('wormhole_user').find({"userbindinfo.dfid": null, "app": "maybe", "status": 1}).explain();
{
"queryplanner" : {
"plannerversion" : 1,
"namespace" : "wormhole.wormhole_user",
"indexfilterset" : false,
"parsedquery" : {
"$and" : [ 
{
"app" : {
"$eq" : "maybe"
}
}, 
{
"status" : {
"$eq" : 1.0
}
}, 
{
"userbindinfo.dfid" : {
"$eq" : null
}
}
]
},
"winningplan" : {
"stage" : "fetch",
"filter" : {
"$and" : [ 
{
"status" : {
"$eq" : 1.0
}
}, 
{
"userbindinfo.dfid" : {
"$eq" : null
}
}
]
},
"inputstage" : {
"stage" : "ixscan",
"keypattern" : {
"app" : 1
},
"indexname" : "app",
"ismultikey" : false,
"multikeypaths" : {
"app" : []
},
"isunique" : false,
"issparse" : false,
"ispartial" : false,
"indexversion" : 1,
"direction" : "forward",
"indexbounds" : {
"app" : [ 
"[\"maybe\", \"maybe\"]"
]
}
}
},
"rejectedplans" : [ 
{
"stage" : "fetch",
"filter" : {
"$and" : [ 
{
"userbindinfo.dfid" : {
"$eq" : null
}
}, 
{
"app" : {
"$eq" : "maybe"
}
}, 
{
"status" : {
"$eq" : 1.0
}
}
]
},
"inputstage" : {
"stage" : "ixscan",
"keypattern" : {
"userbindinfo.dfid" : 1
},
"indexname" : "userbindinfo.dfid",
"ismultikey" : false,
"multikeypaths" : {
"userbindinfo.dfid" : []
},
"isunique" : false,
"issparse" : false,
"ispartial" : false,
"indexversion" : 1,
"direction" : "forward",
"indexbounds" : {
"userbindinfo.dfid" : [ 
"[undefined, undefined]", 
"[null, null]"
]
}
}
}
]
},
"serverinfo" : {
"host" : "host-192-168-10-163",
"port" : 27017,
"version" : "4.0.3",
"gitversion" : "0ead8608f9d151a199b05117bcc79ccb8d5f44a0"
},
"ok" : 1.0,
"operationtime" : timestamp(1647489666, 1),
"$clustertime" : {
"clustertime" : timestamp(1647489666, 1),
"signature" : {
"hash" : { "$binary" : "1kji3aoz2qbowtklbknl9smwlzw=", "$type" : "00" },
"keyid" : numberlong(7031512438860152835)
}
}
}

结果与线上表现一致

4、回顾

紧急处理时,直接创建了一个新的索引,导致了计划缓存失效,重新选择计划并缓存,所以服务恢复了正常。

三、总结

  • 避免空值的查询;
  • 尽可能的使用explain()分析各种不同的查询情况;
  • 可以使用mongo提供的plancache相关功能,查看计划缓存情况;
  • 可使用hint()推荐查询索引。

到此这篇关于一次线上mongo慢查询问题排查处理的文章就介绍到这了,更多相关mongo慢查询问题排查内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持!