一、简单介绍
mongo语句查询条件中出现null或空串,可能会导致索引失效,查询优化器无法选择正确的查询计划,出现慢查询引起服务异常
mongo查询的执行计划使用了lru缓存,在很多种情况下会失效,导致重新选择执行计划并缓存,供后续同类查询直接使用;
服务开始时一直运行正常,当执行计划失效后,恰好出现null值的查询导致选择了错误的执行计划并缓存,后续正常的查询也会出现异常。
二、事件脉络
1、起因
用户中心收到业务方反馈,第三方登录注册出现频繁dubbo调用超时。
org.apache.dubbo.rpc.rpcexception: failed to invoke the method loginwiththird in the service weli.wormhole.rpc.user.center.api.iloginneedphoneservice. tried 1 times of the providers [10.65.5.0:11090] (1/4) from the registry node1.zk.all.platform.wtc.hwhosts.com:2181 on the consumer 10.65.1.81 using the dubbo version 2.7.3-snapshot. last error is: loginwiththird_2 failed and fallback failed. at org.apache.dubbo.rpc.cluster.support.failoverclusterinvoker.doinvoke(failoverclusterinvoker.java:113) at org.apache.dubbo.rpc.cluster.support.abstractclusterinvoker.invoke(abstractclusterinvoker.java:248) at org.apache.dubbo.rpc.cluster.support.wrapper.mockclusterinvoker.invoke(mockclusterinvoker.java:78) at org.apache.dubbo.rpc.proxy.invokerinvocationhandler.invoke(invokerinvocationhandler.java:55) at org.apache.dubbo.common.bytecode.proxy9.loginwiththird(proxy9.java) at weli.peanut.user.service.loginservice.loginandregister(loginservice.java:684) at weli.peanut.user.service.loginservice.loginbythirdpartyv2(loginservice.java:629) at weli.peanut.web.controller.api.userlogincontroller.lambda$loginbythirdpartyv2$0(userlogincontroller.java:113) at weli.peanut.common.controller.basecontroller.resultforactionwithcat(basecontroller.java:71) at weli.peanut.web.controller.api.userlogincontroller.loginbythirdpartyv2(userlogincontroller.java:113) at sun.reflect.generatedmethodaccessor2017.invoke(unknown source) at sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) at java.lang.reflect.method.invoke(method.java:498) at org.springframework.web.method.support.invocablehandlermethod.doinvoke(invocablehandlermethod.java:205) at org.springframework.web.method.support.invocablehandlermethod.invokeforrequest(invocablehandlermethod.java:133) at org.springframework.web.servlet.mvc.method.annotation.servletinvocablehandlermethod.invokeandhandle(servletinvocablehandlermethod.java:116) at org.springframework.web.servlet.mvc.method.annotation.requestmappinghandleradapter.invokehandlermethod(requestmappinghandleradapter.java:827) at org.springframework.web.servlet.mvc.method.annotation.requestmappinghandleradapter.handleinternal(requestmappinghandleradapter.java:738) at org.springframework.web.servlet.mvc.method.abstracthandlermethodadapter.handle(abstracthandlermethodadapter.java:85) at org.springframework.web.servlet.dispatcherservlet.dodispatch(dispatcherservlet.java:963) at org.springframework.web.servlet.dispatcherservlet.doservice(dispatcherservlet.java:897) at org.springframework.web.servlet.frameworkservlet.processrequest(frameworkservlet.java:970) at org.springframework.web.servlet.frameworkservlet.dopost(frameworkservlet.java:872) at javax.servlet.http.httpservlet.service(httpservlet.java:646) at org.springframework.web.servlet.frameworkservlet.service(frameworkservlet.java:846) at javax.servlet.http.httpservlet.service(httpservlet.java:727) at org.apache.catalina.core.applicationfilterchain.internaldofilter(applicationfilterchain.java:303) at org.apache.catalina.core.applicationfilterchain.dofilter(applicationfilterchain.java:208) at org.apache.tomcat.websocket.server.wsfilter.dofilter(wsfilter.java:52) at org.apache.catalina.core.applicationfilterchain.internaldofilter(applicationfilterchain.java:241) at org.apache.catalina.core.applicationfilterchain.dofilter(applicationfilterchain.java:208) at org.springframework.web.filter.characterencodingfilter.dofilterinternal(characterencodingfilter.java:197) at org.springframework.web.filter.onceperrequestfilter.dofilter(onceperrequestfilter.java:107) at org.apache.catalina.core.applicationfilterchain.internaldofilter(applicationfilterchain.java:241) at org.apache.catalina.core.applicationfilterchain.dofilter(applicationfilterchain.java:208) at org.apache.catalina.core.standardwrappervalve.invoke(standardwrappervalve.java:220) at org.apache.catalina.core.standardcontextvalve.invoke(standardcontextvalve.java:122) at org.apache.catalina.authenticator.authenticatorbase.invoke(authenticatorbase.java:501) at org.apache.catalina.core.standardhostvalve.invoke(standardhostvalve.java:170) at org.apache.catalina.valves.errorreportvalve.invoke(errorreportvalve.java:98) at org.apache.catalina.core.standardenginevalve.invoke(standardenginevalve.java:116) at org.apache.catalina.connector.coyoteadapter.service(coyoteadapter.java:408) at org.apache.coyote.http11.abstracthttp11processor.process(abstracthttp11processor.java:1040) at org.apache.coyote.abstractprotocol$abstractconnectionhandler.process(abstractprotocol.java:607) at org.apache.tomcat.util.net.nioendpoint$socketprocessor.dorun(nioendpoint.java:1721) at org.apache.tomcat.util.net.nioendpoint$socketprocessor.run(nioendpoint.java:1679) at java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1149) at java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:624) at java.lang.thread.run(thread.java:748) caused by: com.netflix.hystrix.exception.hystrixruntimeexception: loginwiththird_2 failed and fallback failed. at com.netflix.hystrix.abstractcommand$22.call(abstractcommand.java:818) at com.netflix.hystrix.abstractcommand$22.call(abstractcommand.java:793) at rx.internal.operators.operatoronerrorresumenextviafunction$4.onerror(operatoronerrorresumenextviafunction.java:140) at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87) at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87) at com.netflix.hystrix.abstractcommand$deprecatedonfallbackhookapplication$1.onerror(abstractcommand.java:1454) at com.netflix.hystrix.abstractcommand$fallbackhookapplication$1.onerror(abstractcommand.java:1379) at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87) at rx.observers.subscribers$5.onerror(subscribers.java:230) at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:44) at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:28) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:51) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:35) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.operatoronerrorresumenextviafunction$4.onerror(operatoronerrorresumenextviafunction.java:142) at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87) at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87) at rx.internal.operators.operatorsubscribeon$1$1.onerror(operatorsubscribeon.java:59) at rx.observers.subscribers$5.onerror(subscribers.java:230) at rx.internal.operators.onsubscribedooneach$dooneachsubscriber.onerror(onsubscribedooneach.java:87) at rx.observers.subscribers$5.onerror(subscribers.java:230) at com.netflix.hystrix.abstractcommand$deprecatedonrunhookapplication$1.onerror(abstractcommand.java:1413) at com.netflix.hystrix.abstractcommand$executionhookapplication$1.onerror(abstractcommand.java:1344) at rx.observers.subscribers$5.onerror(subscribers.java:230) at rx.observers.subscribers$5.onerror(subscribers.java:230) at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:44) at rx.internal.operators.onsubscribethrow.call(onsubscribethrow.java:28) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:51) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:35) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:51) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:35) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:41) at rx.internal.operators.onsubscribedooneach.call(onsubscribedooneach.java:30) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:48) at rx.internal.operators.onsubscribelift.call(onsubscribelift.java:30) at rx.observable.unsafesubscribe(observable.java:10151) at rx.internal.operators.operatorsubscribeon$1.call(operatorsubscribeon.java:94) at com.netflix.hystrix.strategy.concurrency.hystrixcontexscheduleraction$1.call(hystrixcontexscheduleraction.java:56) at com.netflix.hystrix.strategy.concurrency.hystrixcontexscheduleraction$1.call(hystrixcontexscheduleraction.java:47) at com.netflix.hystrix.strategy.concurrency.hystrixcontexscheduleraction.call(hystrixcontexscheduleraction.java:69) at rx.internal.schedulers.scheduledaction.run(scheduledaction.java:55) at java.util.concurrent.executors$runnableadapter.call(executors.java:511) at java.util.concurrent.futuretask.run(futuretask.java:266) ... 3 more caused by: java.lang.runtimeexception: org.apache.dubbo.rpc.rpcexception: invoke remote method timeout. method: loginwiththird, provider: dubbo://10.65.5.0:11090/weli.wormhole.rpc.user.center.api.iloginneedphoneservice?anyhost=true&application=peanut-admin&bean.name=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&lazy=false&loadbalance=leastactive&methods=logout,sendsms,loginwithvisitor,loginwiththird,loginwithphone,loginwiththirdbindphone&pid=1®ister=true®ister.ip=10.65.1.81&release=2.7.3-snapshot&remote.application=wormhole-api&retries=0&revision=1.1.11-snapshot&side=consumer&sticky=false&timeout=5000×tamp=1647332146309, cause: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090 at suishen.esb.hystrix.dubbo.dubbohystrixcommand.run(dubbohystrixcommand.java:93) at suishen.esb.hystrix.dubbo.dubbohystrixcommand.run(dubbohystrixcommand.java:12) at com.netflix.hystrix.hystrixcommand$2.call(hystrixcommand.java:301) at com.netflix.hystrix.hystrixcommand$2.call(hystrixcommand.java:297) at rx.internal.operators.onsubscribedefer.call(onsubscribedefer.java:46) ... 26 more caused by: org.apache.dubbo.rpc.rpcexception: invoke remote method timeout. method: loginwiththird, provider: dubbo://10.65.5.0:11090/weli.wormhole.rpc.user.center.api.iloginneedphoneservice?anyhost=true&application=peanut-admin&bean.name=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice&lazy=false&loadbalance=leastactive&methods=logout,sendsms,loginwithvisitor,loginwiththird,loginwithphone,loginwiththirdbindphone&pid=1®ister=true®ister.ip=10.65.1.81&release=2.7.3-snapshot&remote.application=wormhole-api&retries=0&revision=1.1.11-snapshot&side=consumer&sticky=false&timeout=5000×tamp=1647332146309, cause: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090 at org.apache.dubbo.rpc.protocol.asynctosyncinvoker.invoke(asynctosyncinvoker.java:63) at org.apache.dubbo.rpc.listener.listenerinvokerwrapper.invoke(listenerinvokerwrapper.java:78) at com.alibaba.dubbo.rpc.invoker$compatibleinvoker.invoke(invoker.java:55) at suishen.esb.hystrix.dubbo.dubbohystrixcommand.run(dubbohystrixcommand.java:84) ... 30 more caused by: java.util.concurrent.executionexception: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090 at java.util.concurrent.completablefuture.reportget(completablefuture.java:357) at java.util.concurrent.completablefuture.get(completablefuture.java:1915) at org.apache.dubbo.rpc.protocol.asynctosyncinvoker.invoke(asynctosyncinvoker.java:56) ... 33 more caused by: org.apache.dubbo.remoting.timeoutexception: waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=rpcinvocation [methodname=loginwiththird, parametertypes=[class weli.wormhole.rpc.user.center.model.loginwiththirdrequest, class weli.wormhole.rpc.user.center.model.logincommonparam], arguments=[loginwiththirdrequest(accesstoken=54_y3n-ysttdkoklr_x-mbjrgkakroldkfx3pv23zaqhmbvmt3zvlifp9mwodhsueckgnnjgc4ap_baox-2zsssfchfnm5wfjbjdyiqh-scjfq, openid=ofrsy6xrapjv9auga9l2kuib0-fk, unionid=null, code=null, type=wx, nickname=null, headimgurl=null, sex=null, skipphonebind=1), logincommonparam(channel=vivo, ip=null, vername=null, platform=null, deviceid=null, imei=null, app=peanut, subapp=peanut, dfid=due_59hd2l9bbguknr3jv2djnw00irtu8z0erfvfxzu5aeqytdlcqmd1a05sm0p2mkrqblcwmglsdfu4ejblc2h1, idfa=null, mac=null, oaid=null, androidid=null)], attachments={path=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, interface=weli.wormhole.rpc.user.center.api.iloginneedphoneservice, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090 at org.apache.dubbo.remoting.exchange.support.defaultfuture.doreceived(defaultfuture.java:189) at org.apache.dubbo.remoting.exchange.support.defaultfuture.received(defaultfuture.java:153) at org.apache.dubbo.remoting.exchange.support.defaultfuture$timeoutchecktask.run(defaultfuture.java:252) at org.apache.dubbo.common.timer.hashedwheeltimer$hashedwheeltimeout.expire(hashedwheeltimer.java:648) at org.apache.dubbo.common.timer.hashedwheeltimer$hashedwheelbucket.expiretimeouts(hashedwheeltimer.java:727) at org.apache.dubbo.common.timer.hashedwheeltimer$worker.run(hashedwheeltimer.java:449) ... 1 more
2、排查处理
a、查询用户中心cat监控,确认用户中心是否收到请求且正常响应。
当前状态:
正常状态:
对比结果:
用户中心已收到正常请求,排除客户端调用问题;
用户中心当前响应时间异常高出平常的响应时间,判断为用户中心内部业务处理出现问题。
b、查看kibana日志,发现无异常日志;
c、使用arthas诊断工具,监控业务调用链各部分耗时情况;
定位原因为mongo查询慢导致整体业务处理超时。
d、分析mongo查询语句,查看索引情况
public user getuserbydfid(string dfid, string app) { criteria criteria = criteria.where("userbindinfo.dfid").is(dfid).and("app").is(app).and("status").is(1); return this.mongotemplate.findone(query.query(criteria), user.class); }
db.getcollection('wormhole_user').getindexes(); [ { "v" : 1, "key" : { "app" : 1 }, "name" : "app", "ns" : "wormhole.wormhole_user", "background" : true }, { "v" : 1, "key" : { "userbindinfo.dfid" : 1 }, "name" : "userbindinfo.dfid", "ns" : "wormhole.wormhole_user", "background" : true }, ... (其余不关注索引已删除) ]
“userbindinfo.dfid”字段存在索引,查询语句也无明显异常。
e、查看华为云后台mongo慢日志
{ "op": "query", "ns": "wormhole.wormhole_user", "command": { "find": "wormhole_user", "filter": { "userbindinfo.dfid": "duqmn3qlplrwc8pne1jot_9sdrrzazt4cuc5rfvrbu4zuuxwtfj3qzhqtmuxam9uxzltrfjyemf6vdrjvwm1c2h1", "app": "maybe", "status": 1 }, "ntoreturn": -1 }, "keysexamined": 1870039, "docsexamined": 1870039, "cursorexhausted": true, "numyield": 14836, "nreturned": 0, "locks": { "global": { "acquirecount": { "r": 14837 } }, "database": { "acquirecount": { "r": 14837 } }, "collection": { "acquirecount": { "r": 14837 } } }, "responselength": 36, "millis": 14545, "plansummary": "ixscan { app: 1 }", "execstats": { "stage": "cached_plan", "nreturned": 0, "executiontimemillisestimate": 14552, "works": 1, "advanced": 0, "needtime": 0, "needyield": 0, "savestate": 14836, "restorestate": 14836, "iseof": 1, "invalidates": 0, "inputstage": { "stage": "limit", "nreturned": 0, "executiontimemillisestimate": 14145, "works": 1870040, "advanced": 0, "needtime": 1870039, "needyield": 0, "savestate": 14836, "restorestate": 14836, "iseof": 1, "invalidates": 0, "limitamount": 1, "inputstage": { "stage": "fetch", "filter": { "$and": [ { "status": { "$eq": 1 } }, { "userbindinfo.dfid": { "$eq": "duqmn3qlplrwc8pne1jot_9sdrrzazt4cuc5rfvrbu4zuuxwtfj3qzhqtmuxam9uxzltrfjyemf6vdrjvwm1c2h1" } } ] }, "nreturned": 0, "executiontimemillisestimate": 14117, "works": 1870040, "advanced": 0, "needtime": 1870039, "needyield": 0, "savestate": 14836, "restorestate": 14836, "iseof": 1, "invalidates": 0, "docsexamined": 1870039, "alreadyhasobj": 0, "inputstage": { "stage": "ixscan", "nreturned": 1870039, "executiontimemillisestimate": 931, "works": 1870040, "advanced": 1870039, "needtime": 0, "needyield": 0, "savestate": 14836, "restorestate": 14836, "iseof": 1, "invalidates": 0, "keypattern": { "app": 1 }, "indexname": "app", "ismultikey": false, "multikeypaths": { "app": [] }, "isunique": false, "issparse": false, "ispartial": false, "indexversion": 1, "direction": "forward", "indexbounds": { "app": [ "["maybe", "maybe"]" ] }, "keysexamined": 1870039, "seeks": 1, "dupstested": 0, "dupsdropped": 0, "seeninvalidated": 0 } } } }, "ts": { "$date": 1647359086553 }, "client": "10.65.5.0", "allusers": [ { "user": "mongosiud", "db": "wormhole" } ], "user": "mongosiud@wormhole" }
通过慢日志发现,mongo并未使用“userbindinfo.dfid”作为索引查询条件,而是使用了“app”作为索引查询条件。
初步认为索引匹配度不够,mongo没正确使用索引。
f、新增联合索引,提高索引匹配度,查询时间恢复正常,接口恢复正常。
db.getcollection("wormhole_user").createindex({ "userbindinfo.dfid": 1, "app": 1, "status": 1 }, { name: "idx_user_app_userbindinfo.dfid", background: true });
3、问题分析
a、explain()语句分析
db.getcollection('wormhole_user').find({"userbindinfo.dfid": "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1", "app": "maybe", "status": 1}).explain(); { "queryplanner" : { "plannerversion" : 1, "namespace" : "wormhole.wormhole_user", "indexfilterset" : false, "parsedquery" : { "$and" : [ { "app" : { "$eq" : "maybe" } }, { "status" : { "$eq" : 1.0 } }, { "userbindinfo.dfid" : { "$eq" : "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1" } } ] }, "winningplan" : { "stage" : "fetch", "filter" : { "$and" : [ { "app" : { "$eq" : "maybe" } }, { "status" : { "$eq" : 1.0 } } ] }, "inputstage" : { "stage" : "ixscan", "keypattern" : { "userbindinfo.dfid" : 1 }, "indexname" : "userbindinfo.dfid", "ismultikey" : false, "multikeypaths" : { "userbindinfo.dfid" : [] }, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "forward", "indexbounds" : { "userbindinfo.dfid" : [ "[\"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\", \"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\"]" ] } } }, "rejectedplans" : [ { "stage" : "fetch", "filter" : { "$and" : [ { "status" : { "$eq" : 1.0 } }, { "userbindinfo.dfid" : { "$eq" : "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1" } } ] }, "inputstage" : { "stage" : "ixscan", "keypattern" : { "app" : 1 }, "indexname" : "app", "ismultikey" : false, "multikeypaths" : { "app" : [] }, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "forward", "indexbounds" : { "app" : [ "[\"maybe\", \"maybe\"]" ] } } }, { "stage" : "fetch", "filter" : { "$and" : [ { "app" : { "$eq" : "maybe" } }, { "userbindinfo.dfid" : { "$eq" : "duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1" } }, { "status" : { "$eq" : 1.0 } } ] }, "inputstage" : { "stage" : "and_sorted", "inputstages" : [ { "stage" : "ixscan", "keypattern" : { "app" : 1 }, "indexname" : "app", "ismultikey" : false, "multikeypaths" : { "app" : [] }, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "forward", "indexbounds" : { "app" : [ "[\"maybe\", \"maybe\"]" ] } }, { "stage" : "ixscan", "keypattern" : { "userbindinfo.dfid" : 1 }, "indexname" : "userbindinfo.dfid", "ismultikey" : false, "multikeypaths" : { "userbindinfo.dfid" : [] }, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "forward", "indexbounds" : { "userbindinfo.dfid" : [ "[\"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\", \"duqtmv8lqcinjzwtuyjuantkbri5_yz3yhd9rfvxvg1wogxry0luanpxdhv5snvbbnrryljpnv9zwjn5sgq5c2h1\"]" ] } } ] } } ] }, "serverinfo" : { "host" : "host-192-168-10-163", "port" : 27017, "version" : "4.0.3", "gitversion" : "0ead8608f9d151a199b05117bcc79ccb8d5f44a0" }, "ok" : 1.0, "operationtime" : timestamp(1647486920, 2), "$clustertime" : { "clustertime" : timestamp(1647486920, 2), "signature" : { "hash" : { "$binary" : "yltatlk6mvpnwdt7ptgwnst/9z4=", "$type" : "00" }, "keyid" : numberlong(7031512438860152835) } } }
- 发现mongo执行计划选择正确,这正好解释了服务在之前很长的一段时间内都是正常的原因;
- mongo慢日志中的执行计划却选择错误,表明有什么原因导致了mongo执行计划的选择发生了变更。
b、文档查询
- mongodb 查询优化器会缓存最有效的查询计划,关联的计划缓存条目会用于具有相同查询形状的后续查询;
- 计划缓存会出现刷新:
- mongodb重启;
- 索引或者集合的删除添加更新等操作;
- 最近最少使用 (lru) 缓存替换机制清除最近最少访问的缓存条目。
结论:之前使用了计划缓存服务正常,到了某一时刻,计划缓存失效,mongodb重新选择计划,此时选择了错误的计划并缓存,导致后续查询全部出现问题
c、查询首次出现的慢日志
{ "op":"query", "ns":"wormhole.wormhole_user", "command":{ "find":"wormhole_user", "filter":{ "userbindinfo.dfid":null, "app":"cybercat", "status":1 }, "ntoreturn":-1 }, "keysexamined":872550, "docsexamined":872550, "frommultiplanner":true, "replanned":true, "cursorexhausted":true, "numyield":13785, "nreturned":0, "locks":{ "global":{ "acquirecount":{ "r":13786 } }, "database":{ "acquirecount":{ "r":13786 } }, "collection":{ "acquirecount":{ "r":13786 } } }, "responselength":36, "millis":27864, "plansummary":"ixscan { app: 1 }", "execstats":{ "stage":"limit", "nreturned":0, "executiontimemillisestimate":26216, "works":872551, "advanced":0, "needtime":872550, "needyield":0, "savestate":13785, "restorestate":13785, "iseof":1, "invalidates":0, "limitamount":1, "inputstage":{ "stage":"fetch", "filter":{ "$and":[ { "status":{ "$eq":1 } }, { "userbindinfo.dfid":{ "$eq":null } } ] }, "nreturned":0, "executiontimemillisestimate":26164, "works":872551, "advanced":0, "needtime":872550, "needyield":0, "savestate":13785, "restorestate":13785, "iseof":1, "invalidates":0, "docsexamined":872550, "alreadyhasobj":0, "inputstage":{ "stage":"ixscan", "nreturned":872550, "executiontimemillisestimate":249, "works":872551, "advanced":872550, "needtime":0, "needyield":0, "savestate":13785, "restorestate":13785, "iseof":1, "invalidates":0, "keypattern":{ "app":1 }, "indexname":"app", "ismultikey":false, "multikeypaths":{ "app":[ ] }, "isunique":false, "issparse":false, "ispartial":false, "indexversion":1, "direction":"forward", "indexbounds":{ "app":[ "["cybercat", "cybercat"]" ] }, "keysexamined":872550, "seeks":1, "dupstested":0, "dupsdropped":0, "seeninvalidated":0 } } }, "ts":{ "$date":1647309521700 }, "client":"10.65.1.25", "allusers":[ { "user":"mongosiud", "db":"wormhole" } ], "user":"mongosiud@wormhole" }
发现查询条件中出现了null值,本地再次分析:
db.getcollection('wormhole_user').find({"userbindinfo.dfid": null, "app": "maybe", "status": 1}).explain(); { "queryplanner" : { "plannerversion" : 1, "namespace" : "wormhole.wormhole_user", "indexfilterset" : false, "parsedquery" : { "$and" : [ { "app" : { "$eq" : "maybe" } }, { "status" : { "$eq" : 1.0 } }, { "userbindinfo.dfid" : { "$eq" : null } } ] }, "winningplan" : { "stage" : "fetch", "filter" : { "$and" : [ { "status" : { "$eq" : 1.0 } }, { "userbindinfo.dfid" : { "$eq" : null } } ] }, "inputstage" : { "stage" : "ixscan", "keypattern" : { "app" : 1 }, "indexname" : "app", "ismultikey" : false, "multikeypaths" : { "app" : [] }, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "forward", "indexbounds" : { "app" : [ "[\"maybe\", \"maybe\"]" ] } } }, "rejectedplans" : [ { "stage" : "fetch", "filter" : { "$and" : [ { "userbindinfo.dfid" : { "$eq" : null } }, { "app" : { "$eq" : "maybe" } }, { "status" : { "$eq" : 1.0 } } ] }, "inputstage" : { "stage" : "ixscan", "keypattern" : { "userbindinfo.dfid" : 1 }, "indexname" : "userbindinfo.dfid", "ismultikey" : false, "multikeypaths" : { "userbindinfo.dfid" : [] }, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "forward", "indexbounds" : { "userbindinfo.dfid" : [ "[undefined, undefined]", "[null, null]" ] } } } ] }, "serverinfo" : { "host" : "host-192-168-10-163", "port" : 27017, "version" : "4.0.3", "gitversion" : "0ead8608f9d151a199b05117bcc79ccb8d5f44a0" }, "ok" : 1.0, "operationtime" : timestamp(1647489666, 1), "$clustertime" : { "clustertime" : timestamp(1647489666, 1), "signature" : { "hash" : { "$binary" : "1kji3aoz2qbowtklbknl9smwlzw=", "$type" : "00" }, "keyid" : numberlong(7031512438860152835) } } }
结果与线上表现一致
4、回顾
紧急处理时,直接创建了一个新的索引,导致了计划缓存失效,重新选择计划并缓存,所以服务恢复了正常。
三、总结
- 避免空值的查询;
- 尽可能的使用explain()分析各种不同的查询情况;
- 可以使用mongo提供的plancache相关功能,查看计划缓存情况;
- 可使用hint()推荐查询索引。
到此这篇关于一次线上mongo慢查询问题排查处理的文章就介绍到这了,更多相关mongo慢查询问题排查内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持!