ANR触发原理

🔗 转载自:https://www.cnblogs.com/huansky/p/14954020.html

一、概述

作为 Android 开发者,相信大家都遇到过 ANR。那么为什么会出现 ANR 呢,ANR 之后系统都做了啥。文章将对这个问题详细解说。

ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。一般地,这时往往会弹出一个提示框,告知用户当前xxx未响应,用户可选择继续等待或者Force Close。

那么哪些场景会造成ANR呢?

  • Service Timeout:比如前台服务在20s内未执行完成;
  • BroadcastQueue Timeout:比如前台广播在10s内未执行完成
  • ContentProvider Timeout:内容提供者,在publish过超时10s;
  • InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件。

触发ANR的过程可分为三个步骤: 埋炸弹, 拆炸弹, 引爆炸弹。

埋炸弹可以理解为发送了一个延迟触发的消息(炸弹);

拆炸弹可以理解为将这个延迟消息(炸弹)取消了,也就不会触发了;

引爆炸弹可以理解为延迟时间已达,开始处理延迟消息(炸弹引爆了)。

二、Service

先附上一张 service 启动流程图:

image.png

Service Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到SERVICE_TIMEOUT_MSG​​消息时触发。

对于Service有两类:

  • 对于前台服务,则超时为SERVICE_TIMEOUT = 20s;
  • 对于后台服务,则超时为SERVICE_BACKGROUND_TIMEOUT = 200s

由变量ProcessRecord.execServicesFg来决定是否前台启动。

2.1 埋炸弹

其中在Service进程attach到system_server进程的过程中会调用realStartServiceLocked()​​方法来埋下炸弹.

首先咱们先看 service 的启动中一个方法 realStartServiceLocked:

// ActiveServices.java
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException {
    ...
    //发送delay消息(SERVICE_TIMEOUT_MSG)
    bumpServiceExecutingLocked(r, execInFg, "create");
    try {
        ...
        //最终执行服务的onCreate()方法
        app.thread.scheduleCreateService(r, r.serviceInfo,
                mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
                app.repProcState);
    } catch (DeadObjectException e) {
        mAm.appDiedLocked(app);
        throw e;
    } finally {
        ...
    }
}

private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
    ... 
    scheduleServiceTimeoutLocked(r.app);
}

void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    long now = SystemClock.uptimeMillis();
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
  
    //当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程
    mAm.mHandler.sendMessageAtTime(msg,
        proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
}

在 AS.realStartServiceLocked 启动 service 方法中,发送了了一个延时的关于超时的消息,这里又对 service 进行了前后台的区分:

// How long we wait for a service to finish executing. 20s
    static final int SERVICE_TIMEOUT = 20*1000;

    // How long we wait for a service to finish executing. 200s
    static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

2.2 拆炸弹

AS.realStartServiceLocked() 调用的过程会埋下一颗炸弹, 超时没有启动完成则会爆炸. 那么什么时候会拆除这颗炸弹的引线呢? 经过Binder等层层调用进入目标进程的主线程handleCreateService()的过程.

// ActivityThread,这里多说一句, ApplicationThread 是其内部类
private void handleCreateService(CreateServiceData data) {
        ...
        java.lang.ClassLoader cl = packageInfo.getClassLoader();
        Service service = (Service) cl.loadClass(data.info.name).newInstance();
        ...

        try {
            //创建ContextImpl对象
            ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
            context.setOuterContext(service);
            //创建Application对象
            Application app = packageInfo.makeApplication(false, mInstrumentation);
            service.attach(context, this, data.info.name, data.token, app,
                    ActivityManagerNative.getDefault());
            //调用服务onCreate()方法 
            service.onCreate();
      
            // 
            ActivityManagerNative.getDefault().serviceDoneExecuting(
                    data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
        } catch (Exception e) {
            ...
        }
    }

在这个过程会创建目标服务对象,以及回调 onCreate() 方法, 紧接再次经过多次调用回到 system_server 来执行 serviceDoneExecuting 。

// ActiveServices
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) {
    ...
    if (r.executeNesting <= 0) {
        if (r.app != null) {
            r.app.execServicesFg = false;
            r.app.executingServices.remove(r);
            if (r.app.executingServices.size() == 0) {
                //当前服务所在进程中没有正在执行的service
                mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
        ...
    }
    ...
}
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;

该方法会在 service 启动完成后移除服务超时消息 SERVICE_TIMEOUT_MSG,时间是 20s。​​

2.3 引爆炸弹

前面介绍了埋炸弹和拆炸弹的过程, 如果在炸弹倒计时结束之前成功拆卸炸弹,那么就没有爆炸的机会, 但是世事难料. 总有些极端情况下无法即时拆除炸弹,导致炸弹爆炸, 其结果就是 App 发生 ANR. 接下来,带大家来看看炸弹爆炸的现场:

在 system_server 进程中有一个Handler线程,当倒计时结束便会向该 Handler 线程发送一条信息SERVICE_TIMEOUT_MSG​​,

// ActivityManagerService.java ::MainHandler
 final class MainHandler extends Handler {
        public MainHandler(Looper looper) {
            super(looper, null, true);
        }

        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
        ......case SERVICE_TIMEOUT_MSG: {
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            } break;
     }
}

当延时时间到了之后,就会对消息进行处理,下面看下具体处理逻辑:

oid serviceTimeout(ProcessRecord proc) {
    String anrMessage = null;

    synchronized(mAm) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        final long now = SystemClock.uptimeMillis();
        final long maxTime =  now -
                (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        ServiceRecord timeout = null;
        long nextTime = 0;
        for (int i=proc.executingServices.size()-1; i>=0; i--) {
       // 从进程里面获取正在运行的 service
            ServiceRecord sr = proc.executingServices.valueAt(i);
            if (sr.executingStart < maxTime) {
                timeout = sr;
                break;
            }
            if (sr.executingStart > nextTime) {
                nextTime = sr.executingStart;
            }
        }
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
            Slog.w(TAG, "Timeout executing service: " + timeout);
            StringWriter sw = new StringWriter();
            PrintWriter pw = new FastPrintWriter(sw, false, 1024);
            pw.println(timeout);
            timeout.dump(pw, " ");
            pw.close();
            mLastAnrDump = sw.toString();
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
            anrMessage = "executing service " + timeout.shortName;
        }
    }

    if (anrMessage != null) {
        //当存在timeout的service,则执行appNotResponding
        mAm.appNotResponding(proc, null, null, false, anrMessage);
    }
}

其中anrMessage的内容为”executing service [发送超时serviceRecord信息]”;

2.4 前台与后台服务的区别

系统对前台服务启动的超时为20s,而后台服务超时为200s,那么系统是如何区别前台还是后台服务呢?来看看ActiveServices的核心逻辑:

ComponentName startServiceLocked(...) {
    final boolean callerFg;
    if (caller != null) {
        final ProcessRecord callerApp = mAm.getRecordForAppLocked(caller);
        callerFg = callerApp.setSchedGroup != ProcessList.SCHED_GROUP_BACKGROUND;
    } else {
        callerFg = true;
    }
    ...
    ComponentName cmp = startServiceInnerLocked(smap, service, r, callerFg, addToStarting);
    return cmp;
}

在startService过程根据发起方进程 callerApp 所属的进程调度组来决定被启动的服务是属于前台还是后台。当发起方进程不等于ProcessList.SCHED_GROUP_BACKGROUND (后台进程组) 则认为是前台服务,否则为后台服务,并标记在ServiceRecord的成员变量createdFromFg。

什么进程属于SCHED_GROUP_BACKGROUND调度组呢?进程调度组大体可分为TOP、前台、后台,进程优先级(Adj)和进程调度组(SCHED_GROUP)算法较为复杂,其对应关系可粗略理解为Adj等于0的进程属于Top进程组,Adj等于100或者200的进程属于前台进程组,Adj大于200的进程属于后台进程组。关于Adj的含义见下表,简单来说就是Adj>200的进程对用户来说基本是无感知,主要是做一些后台工作,故后台服务拥有更长的超时阈值,同时后台服务属于后台进程调度组,相比前台服务属于前台进程调度组,分配更少的CPU时间片。

image

前台服务准确来说,是指由处于前台进程调度组的进程发起的服务。这跟常说的fg-service服务有所不同,fg-service是指挂有前台通知的服务。

需要注意的问题,如果日志中出现 Reason: executing service com.example.baidu/.AnrService 也不一定是因为服务本身耗时导致,比如启动服务后,执行了耗时的操作,启动服务时onCreate函数或者 onStartCommand函数不能执行,超时后,仍然会造成anr

三、BroadcastReceiver

BroadcastReceiver Timeout 是位于”ActivityManager”线程中的BroadcastQueue.BroadcastHandler收到BROADCAST_TIMEOUT_MSG​​消息时触发。

对于广播队列有两个: foreground 队列和 background 队列:

  • 对于前台广播,则超时为 BROADCAST_FG_TIMEOUT = 10s;
  • 对于后台广播,则超时为 BROADCAST_BG_TIMEOUT = 60s

3.1 埋炸弹

先看发送广播的逻辑:

// ActivityManagerService.java]
public final int broadcastIntent(IApplicationThread caller,
            Intent intent, String resolvedType, IIntentReceiver resultTo,
            int resultCode, String resultData, Bundle resultExtras,
            String[] requiredPermissions, int appOp, Bundle bOptions,
            boolean serialized, boolean sticky, int userId) {
        enforceNotIsolatedCaller("broadcastIntent");
        synchronized(this) {
       // 验证广播的有效性
            intent = verifyBroadcastLocked(intent);
       // 获取发送广播的进程信息
            final ProcessRecord callerApp = getRecordForAppLocked(caller);
            final int callingPid = Binder.getCallingPid();
            final int callingUid = Binder.getCallingUid();
            final long origId = Binder.clearCallingIdentity();
            try {
                return broadcastIntentLocked(callerApp,
                        callerApp != null ? callerApp.info.packageName : null,
                        intent, resolvedType, resultTo, resultCode, resultData, resultExtras,
                        requiredPermissions, appOp, bOptions, serialized, sticky,
                        callingPid, callingUid, callingUid, callingPid, userId);
            } finally {
                Binder.restoreCallingIdentity(origId);
            }
        }
    }

broadcastIntent()方法有两个布尔参数 serialized 和 sticky 来共同决定是普通广播,有序广播,还是 Sticky 广播,参数如下:

类型 serialized sticky
sendBroadcast false false
sendOrderedBroadcast true false
sendStickyBroadcast false true

说完发送广播,接下去就要讲讲讲收广播的操作了。

首先广播发出去之后,肯定会存在一个队列里面来进行处理。

// ActivityManagerService
  public ActivityManagerService(Context systemContext, ActivityTaskManagerService atm) {
    // ...... 创建了三个队列来保存不同的广播类型
        mFgBroadcastQueue = new BroadcastQueue(this, mHandler,
                "foreground", foreConstants, false);
        mBgBroadcastQueue = new BroadcastQueue(this, mHandler,
                "background", backConstants, true);
        mOffloadBroadcastQueue = new BroadcastQueue(this, mHandler,
                "offload", offloadConstants, true);
        mBroadcastQueues[0] = mFgBroadcastQueue;
        mBroadcastQueues[1] = mBgBroadcastQueue;
        mBroadcastQueues[2] = mOffloadBroadcastQueue;
  
}

在 ams 的构造函数里面,可以发现这里对广播进行了分类,分别有前台广播,后台广播,Offload 广播,并用一个新的数组将这三个队列放在一起。这里的 handler 是 MainHandler,也就是主线程的。传入是为了获取其 looper 。

BroadcastQueue(ActivityManagerService service, Handler handler,
            String name, BroadcastConstants constants, boolean allowDelayBehindServices) {
        mService = service;
     // 广播的 handler 主要是获取到 ams 中 handler looper 来创建的
        mHandler = new BroadcastHandler(handler.getLooper());
        mQueueName = name;
        mDelayBehindServices = allowDelayBehindServices;
        mConstants = constants;
        mDispatcher = new BroadcastDispatcher(this, mConstants, mHandler, mService);
    }

下面就说下处理广播的逻辑:

private final class BroadcastHandler extends Handler {
        public BroadcastHandler(Looper looper) {
            super(looper, null, true);
        }

        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case BROADCAST_INTENT_MSG: {
                    if (DEBUG_BROADCAST) Slog.v(
                            TAG_BROADCAST, "Received BROADCAST_INTENT_MSG ["
                            + mQueueName + "]");
            // 开始处理广播
                    processNextBroadcast(true);
                } break;
                case BROADCAST_TIMEOUT_MSG: {
                    synchronized (mService) {
                        broadcastTimeoutLocked(true);
                    }
                } break;
            }
        }
    }

可以发现这里调用的是 processNextBroadcast 方法来处理广播。

final void processNextBroadcast(boolean fromMsg) {
    synchronized(mService) {
        //part1: 处理并行广播
        while (mParallelBroadcasts.size() > 0) {
            r = mParallelBroadcasts.remove(0);
            r.dispatchTime = SystemClock.uptimeMillis();
            r.dispatchClockTime = System.currentTimeMillis();
            final int N = r.receivers.size();
            for (int i=0; i<N; i++) {
                Object target = r.receivers.get(i);
                //分发广播给已注册的receiver 
                deliverToRegisteredReceiverLocked(r, (BroadcastFilter)target, false);
            }
            addBroadcastToHistoryLocked(r);//将广播添加历史统计
        }

        //part2: 处理当前有序广播
        do {
            if (mOrderedBroadcasts.size() == 0) {
                mService.scheduleAppGcsLocked(); //没有更多的广播等待处理
                if (looped) {
                    mService.updateOomAdjLocked();
                }
                return;
            }
            r = mOrderedBroadcasts.get(0); //获取串行广播的第一个广播
            boolean forceReceive = false;
            int numReceivers = (r.receivers != null) ? r.receivers.size() : 0;
            if (mService.mProcessesReady && r.dispatchTime > 0) {
                long now = SystemClock.uptimeMillis();
                if ((numReceivers > 0) && (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
                    broadcastTimeoutLocked(false); //当广播处理时间超时,则强制结束这条广播
                }
            }
            ...
            if (r.receivers == null || r.nextReceiver >= numReceivers
                    || r.resultAbort || forceReceive) {
                if (r.resultTo != null) {
                    //处理广播消息消息,调用到onReceive()
                    performReceiveLocked(r.callerApp, r.resultTo,
                        new Intent(r.intent), r.resultCode,
                        r.resultData, r.resultExtras, false, false, r.userId);
                }

                cancelBroadcastTimeoutLocked(); //取消BROADCAST_TIMEOUT_MSG消息
                addBroadcastToHistoryLocked(r);
                mOrderedBroadcasts.remove(0);
                continue;
            }
        } while (r == null);

        //part3: 获取下一个receiver
        r.receiverTime = SystemClock.uptimeMillis();
        if (recIdx == 0) {
            r.dispatchTime = r.receiverTime;
            r.dispatchClockTime = System.currentTimeMillis();
        }
        if (!mPendingBroadcastTimeoutMessage) {
            long timeoutTime = r.receiverTime + mTimeoutPeriod;
            setBroadcastTimeoutLocked(timeoutTime); //设置广播超时延时消息
        }

        //part4: 处理下条有序广播
        ProcessRecord app = mService.getProcessRecordLocked(targetProcess,
                info.activityInfo.applicationInfo.uid, false);
        if (app != null && app.thread != null) {
            app.addPackage(info.activityInfo.packageName,
                    info.activityInfo.applicationInfo.versionCode, mService.mProcessStats);
            processCurBroadcastLocked(r, app); //[处理串行广播]
            return;
            ...
        }

        //该receiver所对应的进程尚未启动,则创建该进程
        if ((r.curApp=mService.startProcessLocked(targetProcess,
                info.activityInfo.applicationInfo, true,
                r.intent.getFlags() | Intent.FLAG_FROM_BACKGROUND,
                "broadcast", r.curComponent,
                (r.intent.getFlags()&Intent.FLAG_RECEIVER_BOOT_UPGRADE) != 0, false, false))
                        == null) {
            ...
            return;
        }
    }
}

对于广播超时处理时机:

  1. 首先在part3的过程中setBroadcastTimeoutLocked(timeoutTime) 设置超时广播消息;
  2. 然后在part2根据广播处理情况来处理:
    • 当广播接收者等待时间过长,则调用 broadcastTimeoutLocked(false);也就是引爆炸弹
    • 当执行完广播,则调用 cancelBroadcastTimeoutLocked; 也就是拆除炸弹
// BroadcastQueue
final void setBroadcastTimeoutLocked(long timeoutTime) {
    if (! mPendingBroadcastTimeoutMessage) {
        Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
        mHandler.sendMessageAtTime(msg, timeoutTime);
        mPendingBroadcastTimeoutMessage = true;
    }
}

设置定时广播 BROADCAST_TIMEOUT_MSG,即当前往后推 mTimeoutPeriod 时间广播还没处理完毕,则进入广播超时流程。

// BroadcastConstants.java 

   private static final long DEFAULT_TIMEOUT = 10_000;
    // Timeout period for this broadcast queue
    public long TIMEOUT = DEFAULT_TIMEOUT;
    // Unspecified fields retain their current value rather than revert to default 超时时间还是可以设置的
    TIMEOUT = mParser.getLong(KEY_TIMEOUT, TIMEOUT);

来看下具体时间的设置,超时设置的是 10 s。

3.2 拆炸弹

broadcast跟service超时机制大抵相同:

// 取消超时  
final void cancelBroadcastTimeoutLocked() {
        if (mPendingBroadcastTimeoutMessage) {
            // 移除消息
            mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
            mPendingBroadcastTimeoutMessage = false;
        }
    }

移除广播超时消息 BROADCAST_TIMEOUT_MSG,这样就把诈弹拆除了。

3.3 引爆炸弹

下面看下引爆炸弹的逻辑,前面我们已经介绍了 BroadcastQueue 中的 handler 的实现了,下面直接看下超时的处理逻辑:

//fromMsg = true
final void broadcastTimeoutLocked(boolean fromMsg) {
    if (fromMsg) {
        mPendingBroadcastTimeoutMessage = false;
    }

    if (mOrderedBroadcasts.size() == 0) {
        return;
    }

    long now = SystemClock.uptimeMillis();
    BroadcastRecord r = mOrderedBroadcasts.get(0);
    if (fromMsg) {
        if (mService.mDidDexOpt) {
            mService.mDidDexOpt = false;
            long timeoutTime = SystemClock.uptimeMillis() + mTimeoutPeriod;
            setBroadcastTimeoutLocked(timeoutTime);
            return;
        }
  
        if (!mService.mProcessesReady) {
            return; //当系统还没有准备就绪时,广播处理流程中不存在广播超时
        }

        long timeoutTime = r.receiverTime + mTimeoutPeriod;
        if (timeoutTime > now) {
            //如果当前正在执行的receiver没有超时,则重新设置广播超时
            setBroadcastTimeoutLocked(timeoutTime);
            return;
        }
    }

    BroadcastRecord br = mOrderedBroadcasts.get(0);
    if (br.state == BroadcastRecord.WAITING_SERVICES) {
        //广播已经处理完成,但需要等待已启动service执行完成。当等待足够时间,则处理下一条广播。
        br.curComponent = null;
        br.state = BroadcastRecord.IDLE;
        processNextBroadcast(false);
        return;
    }

    r.receiverTime = now;
    //当前BroadcastRecord的anr次数执行加1操作
    r.anrCount++;

    if (r.nextReceiver <= 0) {
        return;
    }
    ...
  
    Object curReceiver = r.receivers.get(r.nextReceiver-1);
    //查询App进程
    if (curReceiver instanceof BroadcastFilter) {
        BroadcastFilter bf = (BroadcastFilter)curReceiver;
        if (bf.receiverList.pid != 0
                && bf.receiverList.pid != ActivityManagerService.MY_PID) {
            synchronized (mService.mPidsSelfLocked) {
                app = mService.mPidsSelfLocked.get(
                        bf.receiverList.pid);
            }
        }
    } else {
        app = r.curApp;
    }

    if (app != null) {
        anrMessage = "Broadcast of " + r.intent.toString();
    }

    if (mPendingBroadcast == r) {
        mPendingBroadcast = null;
    }

    //继续移动到下一个广播接收者
    finishReceiverLocked(r, r.resultCode, r.resultData,
            r.resultExtras, r.resultAbort, false);
    scheduleBroadcastsLocked();

    if (anrMessage != null) {
        // 发送 anr 消息,带上了 anr 进程信息和 anr 消息
        mHandler.post(new AppNotResponding(app, anrMessage));
    }
}