Flutter Test Pyramid: Layers, Coverage, and E2E with Appium

Test pyramid distribution, code coverage, handling native dialogs in integration tests, why Appium over Maestro, and E2E options: CI emulator + Slack, Firebase Test Lab, BrowserStack.

Flutter Test Pyramid: Layers, Coverage, and E2E with Appium

Introduction

The test pyramid (many unit tests at the base, integration tests in the middle, and a few E2E tests at the top) keeps feedback fast and limits E2E tests to critical flows.

In Flutter, widget tests are often a separate layer. This article covers layer distribution, code coverage, handling native dialogs in integration tests, why Appium over Maestro, and using E2E results (screenshots, Slack; optionally Firebase Test Lab, BrowserStack).


Test pyramid: layers and distribution

Unit (60–70%)

Providers, services, models; isolated with mocks. Fastest layer.

Widget (20–30%)

Screens and components; tested using testWidgets, finders, semantics. E2E element lookup can use the same semantics.

Integration (5–10%)

Runs the full app on a real device or emulator to cover user flows. Native dialog handling is discussed below.

E2E (1–5%)

Runs against the real APK/IPA, with automation driven from outside the app. Tools: Appium, Detox, Maestro, etc. This article uses Appium (WebdriverIO) as the example.

Ratios are a guideline, not a rule: put most logic in unit tests, UI behaviour in widget tests, critical flows in integration tests, and keep E2E to a small set of system-level checks.


Code coverage

In Flutter, code coverage is measured with unit and widget tests: flutter test --coverage produces LCOV output; integration and E2E tests are not included in that report.

Most critical code in the lib/ directory should be covered by unit and widget tests; a common target is 80%+ line/statement coverage. The threshold can be raised or set per critical module depending on project risk.


Integration: Native dialogs

In integration tests the app runs on a real device, so native system dialogs (permissions, location, notifications) may appear. These cannot be reliably controlled directly from the Flutter integration test environment; timing and text vary by device/OS. Automation may fail to find the “Allow” / “Deny” button or hit timing issues; tests become flaky and non-deterministic.

How to handle it:

  • Skip in test mode: The app should not trigger real permission prompts when integration tests run; a test-only flag (e.g. skipPermissionRequests) skips permission steps so the flow continues without dialogs.
  • Mock / fake data: Use SharedPreferences, test flags, or dependency overrides so the app behaves as if permission was already granted; then the native dialog never appears.
  • Handle separately in E2E: If real dialogs appear in E2E, use a helper on the Appium/WebdriverIO side to tap the dialog (permission text / button selectors); in Flutter integration tests, prefer skip/mock where possible.

That keeps integration flows deterministic; native dialog behaviour can be covered in separate E2E or manual scenarios.


Maestro vs Appium: Why Appium?

Common E2E options include Maestro (YAML-based, easy setup, Flutter-focused flows) and Appium (WebDriver protocol, language-agnostic, many clients). This article uses Appium; reasons in short:

  • One stack across multiple platforms: The same suite (WebdriverIO + Mocha) can run on Android and iOS, on a local emulator or on cloud devices (BrowserStack, Firebase Test Lab); switching is done via driver config (capabilities, env).
  • CI integration: Screenshots, video recordings, reports, and Slack notifications fit into existing tooling (GitHub Actions, artifacts, webhooks); results and visual evidence can be collected and shared automatically.
  • Maturity and control: With Appium + WebdriverIO, selectors, timeouts, retries, and reporting are all in code; Maestro uses less code and more YAML; however, for complex scenarios and debugging, Appium provides finer control.

Maestro is a good fit for quick prototypes or Flutter-only E2E; here, multi-environment (local + cloud), result artifacts, and Slack notifications were priorities, so Appium was chosen.


E2E tests: Options by use case

These three options are not steps where “one isn’t enough”; they are different choices by use case. What each does, why, and when (if you want) to use it:

1. Run emulator in CI/CD and send results to Slack

What: A workflow (e.g. test.yml) starts an Android emulator, runs the Appium E2E suite, captures screenshots and recordings, and optionally sends them to Slack as messages or attachments.

Why: Provides fast feedback on every run and allows E2E results to be viewed directly in Slack as well as in CI artifacts.

When / if needed: On every PR or commit, or via manual/trigger; use it when you want to share pass/fail and visual evidence (screenshots, video) in Slack.

Secrets:

  • SLACK_WEBHOOK_URL — Incoming webhook; workflow sends “E2E Tests Failed” + run link when the job fails; E2E hooks send a test-summary message after the run when set.
  • SLACK_BOT_TOKEN, SLACK_CHANNEL_ID — Bot token and channel ID; workflow step “Upload Test Results to Slack” posts a header and per-test result messages, and uploads screenshots/videos into each test’s thread.

2. Using Firebase Test Lab

What: The APK is uploaded to Firebase Test Lab; Robo or instrumented tests run on real or varied devices. A separate workflow (e.g. test-firebase.yml) handles upload and result retrieval.

Why: Real-device testing in the Google ecosystem; FTL reports and results stored in a bucket.

When / if needed: Periodic (e.g. nightly) or manual runs; use when you want real devices and FTL reports.

Secrets:

  • GCP_SA_KEY — Full contents of the service account JSON; used as credentials_json in the workflow.
  • GCP_PROJECT_ID — GCP / Firebase project ID; used for bucket name and gcloud config.

Roles:

  • Firebase Test Lab Admin (roles/cloudtestservice.testAdmin) — At project level; run tests and access results.
  • Storage Object Creator or Storage Object Admin (roles/storage.objectCreator / roles/storage.objectAdmin) — On the custom bucket (gs://ftl-results-<GCP_PROJECT_ID>); for writing results.

Billing must be enabled for the project. The bucket must be created and the storage role assigned to the service account once.


3. Using BrowserStack

What: The same Appium E2E suite runs on BrowserStack App Automate on cloud devices. The workflow (e.g. test-browserstack.yml) builds the APK or accepts it from outside; tests run on the chosen device/OS.

Why: Wide device matrix (brand, model, OS version); device provisioning and maintenance are on BrowserStack; the same test code is validated on different devices.

When / if needed: When you want E2E validation on specific device or OS combinations; run manually or via trigger.

Secrets:

  • BROWSERSTACK_USERNAME — BrowserStack account username; for API and App Automate access.
  • BROWSERSTACK_ACCESS_KEY — Account access key; used with BROWSERSTACK_USERNAME for authentication.

Usage:

The APK is built in the workflow or provided via apk_url (workflow input); it is uploaded to BrowserStack and run on the selected device.

The same E2E test code (e2e/) is used; when BROWSERSTACK_USERNAME / BROWSERSTACK_ACCESS_KEY are set in the driver config, Appium connects to the BrowserStack hub and runs on the cloud device with the chosen capabilities (device, OS).


Summary

Choose based on your use case: CI emulator + Slack notifications for fast feedback; Firebase Test Lab for real device testing and reports; BrowserStack for device matrix coverage using the same test code.


Thank you for reading, I hope it is helpful. If you have any questions, feel free to ask on LinkedIn.

You can access the source code from the link below:

https://github.com/hasankarli/example_tests