Computer use agents can interact with websites the same way a person would: by opening a browser, inspecting the page, and taking the next best action from what they see. In this lesson, you'll build a browser automation agent that searches Airbnb, extracts structured listing data, and identifies the cheapest stay in Stockholm.
The lesson combines Browser-Use for AI-driven navigation, Playwright and Chrome DevTools Protocol (CDP) for browser control, Azure OpenAI for vision-enabled reasoning, and Pydantic for structured extraction.
This lesson will cover:
- Understanding when computer use agents are a better fit than API-only automation
- Combining Browser-Use with Playwright and CDP for reliable browser lifecycle management
- Using Azure OpenAI vision and structured Pydantic output to extract listing data from dynamic web pages
- Deciding when to use an agent-first, actor-first, or hybrid browser automation workflow
After completing this lesson, you will know how to:
- Configure Browser-Use with Azure OpenAI and Playwright
- Build a browser automation workflow that navigates a real website and handles dynamic UI elements
- Extract typed results from visible page content and turn them into downstream business logic
- Choose between agent and actor patterns based on how predictable the browser task is
This lesson includes one notebook tutorial:
- 15-browser-user.ipynb: Launches a Chrome session over CDP, searches Airbnb for Stockholm listings, extracts prices with Browser-Use vision, and returns the cheapest option as structured data.
- Python 3.12+
- Azure OpenAI deployment configured in your environment
- Chrome or Chromium installed locally
- Playwright dependencies installed
- Basic familiarity with async Python
Install the packages used in the notebook:
pip install browser_use playwright python-dotenv
playwright install chromiumSet the Azure OpenAI environment variables used by the notebook:
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=...
# Optional: defaults to the latest API version when omitted
AZURE_OPENAI_API_VERSION=...The notebook demonstrates a hybrid browser automation workflow:
- Chrome starts with CDP enabled so both Playwright and Browser-Use can share the same browser session.
- A Browser-Use agent handles open-ended navigation tasks such as opening Airbnb, dismissing pop-ups, and searching for Stockholm.
- The active page is inspected with a structured Pydantic schema to extract listing titles, nightly prices, ratings, and URLs.
- Python logic compares the extracted listings and highlights the cheapest result.
This approach keeps the flexible, vision-based reasoning that Browser-Use is good at while still giving you deterministic browser control when you need it.
- Start with an agent for exploration and dynamic navigation.
- Switch to direct page control when the interaction becomes predictable.
- Use structured output models so extracted data is validated and type-safe.
- Add delays strategically after actions that trigger visible UI changes.
- Capture screenshots while iterating so failures are easier to debug.
- Expect websites to change and design fallback strategies for pop-ups and layout shifts.
- Blend agent and actor patterns to get both flexibility and precision.
- Travel booking and price monitoring
- E-commerce price comparison and availability checks
- Structured extraction from dynamic websites
- Vision-aware UI testing and verification
- Website monitoring and alerting
- Intelligent form filling across multi-step flows
