## The Challenge

Measuring the behavior of people interacting with interfaces is a critical part of developing products such as websites. There are two major types of usability tests: formative tests and summative tests. Summative tests can be described as “the usability of an application using metrics.” It can be further broken down into benchmark and comparative tests. The main goal of a benchmark test is to describe the usability of a product relative to a goal. It can also be used as a baseline for any future design changes. A comparative test involves the comparison between two different designs of the same product or the comparison of two competing products.

For this project I assessed the usability of the Mercedes Benz website by the means of one **benchmark test** and one **comparative test**.

Disclaimer: This project is not sponsored or affiliated with Mercedes Benz. It was done purely for academic purposes.

**Benchmark Test:**

If a user knows what car they want, they are going to want to figure out if there are any in-stock at their closest dealership. Users were presented the task of finding out how many Mercedes Benz AMG S65 Sedans there are at the dealership closest to the zip code of 91101. If users got the numerical value correct (which is 5), they are assigned the value of 1. If users inputted anything other than 5, they were assigned a 0. The goal of the test is to find out if at least 70% of users are able to complete the task. We are therefore testing the following hypotheses:

H0 : p = 0.7 versus H1 : p > 0.7

The best point estimate of p, the proportion of the population with a certain characteristic can be expressed by:

p̂ *= x / n *

For this task I utilized Amazon Mechanical Turk to obtain a random sample of 50 potential users and found that 30 out of 50 are able to successfully complete the task, so p̂ = 30/50. To determine if a sample proportion of 0.6 is statistically significant, we build a probability model. If we took a second random sample of 50 users, it will likely result in a different sample proportion. Since *np *(1 - *p*) = 50 (0.7) (1 - 0.7) = 10.5 ≥ 10 and the sample size (*n *= 50) is sufficiently smaller than the population size (if we assume there are at least N = 1,000 users), we can use a normal model to describe the variability in p.̂ The mean of the distribution of p̂ is μp̂ = 0.7 since we assume the statement in the null hypothesis is true and the standard deviation of the distribution of p̂ is σp̂ = √(p (1 - p) / n = √0.7 (1 - 0.7) / 50 ≈ 0.065.

The level of significance is α = 0.05 so that we only have a 5% chance of making a Type I error. We want to know if it is unusual to obtain a sample proportion of 0.6 or more from a population whose proportion is assumed to be 0.7.

The test statistic is:

z0 =p̂-p0 /√(p0 (1-p0)/n)

= (0.6 - 0.7) / √(0.7 (1 - 0.7) / 50) = -0.1 / 0.0648 = -1.54

Because this is a right-tailed test, we determine the critical value at the α = 0.05 level of significance to be z0.05 = 1.645. The test statistic z0 = -1.54 is less than the critical value of 1.645, we do not reject the null hypothesis.

There is not sufficient evidence at the α = 0.05 level of significance to conclude that more than 70% of users are able to successfully complete the assigned task.

Conclusion:

It is up to individual companies to set a bar of their minimum task completion rate. For example, if Mercedes Benz wants at least 70% of all users to be able to successfully complete this task, they will need to continue to iterate and make changes to their website until the goal is met. Another way to look at it is to use this statistic as a benchmark for future website edits.

**Comparative Test: **

It can be useful to measure up your own website with competitors that are offering similar products. Using Amazon Mechanical Turk, I collected the time required to find safety features of 2 vehicles with similar price points (the Mercedes Benz C250 Coupe and the BMW 428i Coupe) from 34 random potential users on the Mercedes Benz website and the BMW website. The goal of the test is to confirm if the Mercedes Benz website is easier to navigate (as far as finding safety features are concerned). Half of the users were required to complete the task on the Mercedes Benz website first and the other half of the users were required to complete the task on the BMW website first.

This is a matched-pairs design because the variable is measured on the same set of users for both websites. We first compute the difference between the time taken on the Mercedes Benz website and the BMW website. If the time taken on the Mercedes Benz website is less than the BMW website, we can expect the Xi - Yi values to be negative. We are testing the hypotheses of:

H0:μd =0 versus H1:μd <0 with an α=0.05 level of significance.

The sample mean is d-bar = -35.3235 seconds and the sample standard deviation is sd = 79.1855 seconds.

The test statistic is:

t0 = d-bar0 / (sd / √n) = -35.3235 / (79.1855 / √34) = -2.601

Because this is a left-tailed test, we determine the critical value at the α = 0.05 level of significance with 34 - 1 = 33 degrees of freedom to be -t0.05 = -1.692

The test statistic, t0 = -2.601 lies in the critical region, we reject the null hypothesis. There is sufficient evidence at the α = 0.05 level of significance to conclude that the time taken to find safety features on the Mercedes Benz C250 Coupe is less than the time taken to find safety features on the BMW 428i Coupe.

Conclusion:

It is safe to say that it is quicker to navigate to and find safety features on the Mercedes Benz website. While speed is not everything it is a vital part of any product. We can either continue to test it against other similar websites to conclude that the process is sufficiently quick. When prioritizing requirements for website updates for any potential changes to the workflow of finding safety features, I would label it as low priority since it seems to be working well.