How To Download Files With Selenium And Why You Shouldn’t

This entry was posted by on Wednesday, 25 July, 2012 at

In this blog post I will try and make you think why you are performing automated file download tests, and I will provide some Java code that will enable you to perform file downloads in a cross platform way without resorting to hacks like AutoIT.

First things first, don’t do file download tests!

Let’s start off with a scenario. You and the BA have talked to the product owner and they have said that they want to give the users some cool functionality that enables them to download some PDF’s with useful information in them. Everybody agrees that this is easy to implement and that you can very quickly run an exploratory test to check that it works by clicking on the download link in your browser. It downloads the file, you open it in PDF reader, all looks good and everybody is happy.

Now comes the tricky bit, you are asked to automate this scenario. After all, we want the build to go red on the CI server if some changes the developers make break your shiny new PDF download functionality. So, you load up Selenium and start replicating the actions you would take if you were playing the scenario out manually:

  • You load the page with the download link.
  • You find the <a> element on the page.
  • You click on it…

You have just fallen into a trap, the trap being that Selenium can’t deal with OS level dialogues so as soon as you click on the download link your test stops, you do not pass go and you don’t collect £200.

You go and have a look at the Selenium mailing lists and see lots of posts about AutoIT or maybe a post about a Java robot class and start looking at implementing one of these to interact with your OS level dialogue box…

STOP RIGHT THERE!

Now is the time to take a step backward and work out exactly what you want to test.

Do you really need to download that file?

I’m guessing your initial reaction is “Yes, I do. I need to make sure that the download functionality continues to work”. Sounds pretty reasonable so far; let’s go a further down this rabbit hole:

  • How many files are you planning to download?
  • How big are these files?
  • Do you have disk space to hold all of these files?
  • Do you have network capacity to continually download these files?
  • What are you planning to do with the downloaded file?

The last questions is where people usually stop and realise that they aren’t actually planning to do anything with the downloaded file. They are just planning to download the file and as long as a file has been downloaded they are happy that the test has passed. Now ask yourself, do you really need to download a file to perform this test. All you are actually doing is checking that when you click on a link you are getting a valid response from the server. You aren’t checking that you can download the file, you are checking for broken links. This is a worthwhile test, but it doesn’t require you to actually download anything. So let’s put AutoIT back in its little box and give you some code that can check to see if the link is valid.

Checking that links are valid

It’s actually pretty simple, all you need to do is find the link on the page, extract a URL from its href attribute and then check to see if sending an HTTP GET request to that URL results in a valid response. To do this I have a URLStatusChecker class:

package com.lazerycode.selenium.urlstatuschecker;
 
import org.apache.http.client.methods.*;
 
public enum RequestMethod {
    OPTIONS(new HttpOptions()),
    GET(new HttpGet()),
    HEAD(new HttpHead()),
    POST(new HttpPost()),
    PUT(new HttpPut()),
    DELETE(new HttpDelete()),
    TRACE(new HttpTrace());
 
    private final HttpRequestBase requestMethod;
 
    RequestMethod(HttpRequestBase requestMethod) {
        this.requestMethod = requestMethod;
    }
 
    public HttpRequestBase getRequestMethod() {
        return this.requestMethod;
    }
}
package com.lazerycode.selenium.urlstatuschecker;
 
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpRequestBase;
import org.apache.http.client.params.ClientPNames;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.cookie.BasicClientCookie;
import org.apache.http.params.HttpParams;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.log4j.Logger;
import org.openqa.selenium.Cookie;
import org.openqa.selenium.WebDriver;
 
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.util.Set;
 
public class URLStatusChecker {
 
    private static final Logger LOG = Logger.getLogger(URLStatusChecker.class);
    private URI linkToCheck;
    private WebDriver driver;
    private boolean mimicWebDriverCookieState = true;
    private boolean followRedirects = false;
    private RequestMethod httpRequestMethod = RequestMethod.GET;
 
    public URLStatusChecker(WebDriver driverObject) throws MalformedURLException, URISyntaxException {
        this.driver = driverObject;
    }
 
    /**
     * Specify a URL that you want to perform an HTTP Status Check upon
     *
     * @param linkToCheck
     * @throws MalformedURLException
     * @throws URISyntaxException
     */
    public void setURIToCheck(String linkToCheck) throws MalformedURLException, URISyntaxException {
        this.linkToCheck = new URI(linkToCheck);
    }
 
    /**
     * Specify a URL that you want to perform an HTTP Status Check upon
     *
     * @param linkToCheck
     * @throws MalformedURLException
     */
    public void setURIToCheck(URI linkToCheck) throws MalformedURLException {
        this.linkToCheck = linkToCheck;
    }
 
    /**
     * Specify a URL that you want to perform an HTTP Status Check upon
     *
     * @param linkToCheck
     */
    public void setURIToCheck(URL linkToCheck) throws URISyntaxException {
        this.linkToCheck = linkToCheck.toURI();
    }
 
    /**
     * Set the HTTP Request Method (Defaults to 'GET')
     *
     * @param requestMethod
     */
    public void setHTTPRequestMethod(RequestMethod requestMethod) {
        this.httpRequestMethod = requestMethod;
    }
 
    /**
     * Should redirects be followed before returning status code?
     * If set to true a 302 will not be returned, instead you will get the status code after the redirect has been followed
     * DEFAULT: false
     *
     * @param value
     */
    public void followRedirects(Boolean value) {
        this.followRedirects = value;
    }
 
    /**
     * Perform an HTTP Status check and return the response code
     *
     * @return
     * @throws IOException
     */
    public int getHTTPStatusCode() throws IOException {
 
        HttpClient client = new DefaultHttpClient();
        BasicHttpContext localContext = new BasicHttpContext();
 
        LOG.info("Mimic WebDriver cookie state: " + this.mimicWebDriverCookieState);
        if (this.mimicWebDriverCookieState) {
            localContext.setAttribute(ClientContext.COOKIE_STORE, mimicCookieState(this.driver.manage().getCookies()));
        }
        HttpRequestBase requestMethod = this.httpRequestMethod.getRequestMethod();
        requestMethod.setURI(this.linkToCheck);
        HttpParams httpRequestParameters = requestMethod.getParams();
        httpRequestParameters.setParameter(ClientPNames.HANDLE_REDIRECTS, this.followRedirects);
        requestMethod.setParams(httpRequestParameters);
 
        LOG.info("Sending " + requestMethod.getMethod() + " request for: " + requestMethod.getURI());
        HttpResponse response = client.execute(requestMethod, localContext);
        LOG.info("HTTP " + requestMethod.getMethod() + " request status: " + response.getStatusLine().getStatusCode());
 
        return response.getStatusLine().getStatusCode();
    }
 
    /**
     * Mimic the cookie state of WebDriver (Defaults to true)
     * This will enable you to access files that are only available when logged in.
     * If set to false the connection will be made as an anonymouse user
     *
     * @param value
     */
    public void mimicWebDriverCookieState(boolean value) {
        this.mimicWebDriverCookieState = value;
    }
 
    /**
     * Load in all the cookies WebDriver currently knows about so that we can mimic the browser cookie state
     *
     * @param seleniumCookieSet
     * @return
     */
    private BasicCookieStore mimicCookieState(Set seleniumCookieSet) {
        BasicCookieStore mimicWebDriverCookieStore = new BasicCookieStore();
        for (Cookie seleniumCookie : seleniumCookieSet) {
            BasicClientCookie duplicateCookie = new BasicClientCookie(seleniumCookie.getName(), seleniumCookie.getValue());
            duplicateCookie.setDomain(seleniumCookie.getDomain());
            duplicateCookie.setSecure(seleniumCookie.isSecure());
            duplicateCookie.setExpiryDate(seleniumCookie.getExpiry());
            duplicateCookie.setPath(seleniumCookie.getPath());
            mimicWebDriverCookieStore.addCookie(duplicateCookie);
        }
 
        return mimicWebDriverCookieStore;
    }
}

This will take a URL supplied to it and then return an HTTP status code. If it’s there I would expect a 200 (OK) or maybe even a 302 (Redirect). If it’s not there, I would expect a 404 (Not found) or if things really went badly a 500 (Server Error). It’s up to you to define which HTTP status code is a pass or a fail, the above code will simply tell you what the HTTP status code is. The above code is a little more complex than just performing a HTTP GET, it also mirrors your WebDriver session so that you can access the same resources as the user you are currently logged in as.
To use it you would simply do the following:

@Test
public void statusCode404FromString() throws Exception {
    urlChecker.setURIToCheck(webServerURL + ":" + webServerPort + "/doesNotExist.html");
    urlChecker.setHTTPRequestMethod(RequestMethod.GET);
    assertThat(urlChecker.getHTTPStatusCode(), is(equalTo(404)));
}

That’s nice but I really do want to download the file
I know that there are some people who really do want to download the actual file and perform checks on it, so how should we do it?

Everybody raves about AutoIT, that’s a good solution right?

Well, no actually it’s not. AutoIT will only work on Windows so you can kiss goodbye to your cross platform testing. AutoIT will also be looking for a specific window name so you are going to need to have an AutoIT script for every different download dialogue that you trigger and if you are not calling it programmatically, but leaving it running in the background, it is going to automatically click on every download box that appears, not just the ones you want to interact with during your testing. Oh, did I also mention that you are going to have problems renaming the file you download?

OK that doesn’t sound so good, how about a Java robot class? Lots of people talk about them as well

That’s better; it can be cross platform compliant and you can rename files that you download, but it still has issues. With a Java robot class you will be either blindly clicking at a specific location, or trying to send keystrokes to the pop up dialogue in the hope that it is in the state you expect it to be in. This again means that you are probably going to have to have different robot classes for different operating systems and maybe for different browsers. It’s a lot of work and not guaranteed to be successful.

So what do I do then? Forget about it? Use Sikuli?

There is another option, you can use the information provided by Selenium to programmatically download the file and completely bypass the OS level dialogue:

package com.lazerycode.selenium.filedownloader;
 
import org.apache.commons.io.FileUtils;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.params.ClientPNames;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.cookie.BasicClientCookie;
import org.apache.http.params.HttpParams;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.log4j.Logger;
import org.openqa.selenium.Cookie;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
 
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
import java.net.URL;
import java.util.Set;
 
public class FileDownloader {
 
    private static final Logger LOG = Logger.getLogger(FileDownloader.class);
    private WebDriver driver;
    private String localDownloadPath = System.getProperty("java.io.tmpdir");
    private boolean followRedirects = true;
    private boolean mimicWebDriverCookieState = true;
    private int httpStatusOfLastDownloadAttempt = 0;
 
    public FileDownloader(WebDriver driverObject) {
        this.driver = driverObject;
    }
 
    /**
     * Specify if the FileDownloader class should follow redirects when trying to download a file
     *
     * @param value
     */
    public void followRedirectsWhenDownloading(boolean value) {
        this.followRedirects = value;
    }
 
    /**
     * Get the current location that files will be downloaded to.
     *
     * @return The filepath that the file will be downloaded to.
     */
    public String localDownloadPath() {
        return this.localDownloadPath;
    }
 
    /**
     * Set the path that files will be downloaded to.
     *
     * @param filePath The filepath that the file will be downloaded to.
     */
    public void localDownloadPath(String filePath) {
        this.localDownloadPath = filePath;
    }
 
    /**
     * Download the file specified in the href attribute of a WebElement
     *
     * @param element
     * @return
     * @throws Exception
     */
    public String downloadFile(WebElement element) throws Exception {
        return downloader(element, "href");
    }
 
    /**
     * Download the image specified in the src attribute of a WebElement
     *
     * @param element
     * @return
     * @throws Exception
     */
    public String downloadImage(WebElement element) throws Exception {
        return downloader(element, "src");
    }
 
    /**
     * Gets the HTTP status code of the last download file attempt
     *
     * @return
     */
    public int getHTTPStatusOfLastDownloadAttempt() {
        return this.httpStatusOfLastDownloadAttempt;
    }
 
    /**
     * Mimic the cookie state of WebDriver (Defaults to true)
     * This will enable you to access files that are only available when logged in.
     * If set to false the connection will be made as an anonymouse user
     *
     * @param value
     */
    public void mimicWebDriverCookieState(boolean value) {
        this.mimicWebDriverCookieState = value;
    }
 
    /**
     * Load in all the cookies WebDriver currently knows about so that we can mimic the browser cookie state
     *
     * @param seleniumCookieSet
     * @return
     */
    private BasicCookieStore mimicCookieState(Set seleniumCookieSet) {
        BasicCookieStore mimicWebDriverCookieStore = new BasicCookieStore();
        for (Cookie seleniumCookie : seleniumCookieSet) {
            BasicClientCookie duplicateCookie = new BasicClientCookie(seleniumCookie.getName(), seleniumCookie.getValue());
            duplicateCookie.setDomain(seleniumCookie.getDomain());
            duplicateCookie.setSecure(seleniumCookie.isSecure());
            duplicateCookie.setExpiryDate(seleniumCookie.getExpiry());
            duplicateCookie.setPath(seleniumCookie.getPath());
            mimicWebDriverCookieStore.addCookie(duplicateCookie);
        }
 
        return mimicWebDriverCookieStore;
    }
 
    /**
     * Perform the file/image download.
     *
     * @param element
     * @param attribute
     * @return
     * @throws IOException
     * @throws NullPointerException
     */
    private String downloader(WebElement element, String attribute) throws IOException, NullPointerException, URISyntaxException {
        String fileToDownloadLocation = element.getAttribute(attribute);
        if (fileToDownloadLocation.trim().equals("")) throw new NullPointerException("The element you have specified does not link to anything!");
 
        URL fileToDownload = new URL(fileToDownloadLocation);
        File downloadedFile = new File(this.localDownloadPath + fileToDownload.getFile().replaceFirst("/|\\\\", ""));
        if (downloadedFile.canWrite() == false) downloadedFile.setWritable(true);
 
        HttpClient client = new DefaultHttpClient();
        BasicHttpContext localContext = new BasicHttpContext();
 
        LOG.info("Mimic WebDriver cookie state: " + this.mimicWebDriverCookieState);
        if (this.mimicWebDriverCookieState) {
            localContext.setAttribute(ClientContext.COOKIE_STORE, mimicCookieState(this.driver.manage().getCookies()));
        }
 
        HttpGet httpget = new HttpGet(fileToDownload.toURI());
        HttpParams httpRequestParameters = httpget.getParams();
        httpRequestParameters.setParameter(ClientPNames.HANDLE_REDIRECTS, this.followRedirects);
        httpget.setParams(httpRequestParameters);
 
        LOG.info("Sending GET request for: " + httpget.getURI());
        HttpResponse response = client.execute(httpget, localContext);
        this.httpStatusOfLastDownloadAttempt = response.getStatusLine().getStatusCode();
        LOG.info("HTTP GET request status: " + this.httpStatusOfLastDownloadAttempt);
        LOG.info("Downloading file: " + downloadedFile.getName());
        FileUtils.copyInputStreamToFile(response.getEntity().getContent(), downloadedFile);
        response.getEntity().getContent().close();
 
        String downloadedFileAbsolutePath = downloadedFile.getAbsolutePath();
        LOG.info("File downloaded to '" + downloadedFileAbsolutePath + "'");
 
        return downloadedFileAbsolutePath;
    }
 
}

The above code will mimic your current WebDriver session and programmatically download your file to the system temp directory where you can perform further checks upon it (It tells you where it downloaded it to). It’s relatively simple to use, a couple of basic examples are shown below:

@Test
public void downloadAFile() throws Exception {
    FileDownloader downloadTestFile = new FileDownloader(driver);
    driver.get("http://www.localhost.com/downloadTest.html");
    WebElement downloadLink = driver.findElement(By.id("fileToDownload"));
    String downloadedFileAbsoluteLocation = downloadTestFile.downloadFile(downloadLink);
 
    assertThat(new File(downloadedFileAbsoluteLocation).exists(), is(equalTo(true)));
    assertThat(downloadTestFile.getHTTPStatusOfLastDownloadAttempt(), is(equalTo(200)));
}
 
@Test
public void downloadAnImage() throws Exception {
    FileDownloader downloadTestFile = new FileDownloader(driver);
    driver.get("http://www.localhost.com//downloadTest.html");
    WebElement image = driver.findElement(By.id("ebselenImage"));
    String downloadedImageAbsoluteLocation = downloadTestFile.downloadImage(image);
 
    assertThat(new File(downloadedImageAbsoluteLocation).exists(), is(equalTo(true)));
    assertThat(downloadTestFile.getHTTPStatusOfLastDownloadAttempt(), is(equalTo(200)));
}

But that’s not the same as clicking on a link and downloading the file…


Well, actually it is. When you click on the link your browser sends a HTTP GET request over to the webserver and then downloads the file to a temporary location, then it hands the file over to the operating system which then pops up a dialogue asking you where you really want to save it. All you are doing is taking the browser and the operating system out of the equation. Let’s face it, if the browsers download mechanism doesn’t work, there isn’t anything much you can do about it anyway (apart from raise a bug with the browser vendor).

I’ve tried it and it works, but how do I know I have the right file?

The most simple and obvious way to check that the file is correct is to compare it to a known good copy of the file. If the file we have downloaded matches the original file it must be the correct file. No doubt you are now thinking “But hang on a moment, that means I need to keep a copy of every file that I download and some of them are massive…”. Not quite, there is another option you can just store a hash of the known good file. Taking an unsalted MD5/SHA1 hash of a file will always produce the same hash for the same file. So all you need to do is take a hash of the file you have downloaded and compare it to a known good hash of the file. If the hash doesn’t match you can fail the test and then examine the file manually later to find out what went wrong.
The final bit of code I have to offer is a class that will perform a hash check for you:

package com.lazerycode.selenium.filedownloader;
 
public enum HashType {
    MD5,
    SHA1;
}
package com.lazerycode.selenium.filedownloader;
 
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.log4j.Logger;
 
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
 
public class CheckFileHash {
 
    private static final Logger LOG = Logger.getLogger(CheckFileHash.class);
    private HashType typeOfHash = null;
    private String expectedFileHash = null;
    private File fileToCheck = null;
 
    /**
     * The File to perform a Hash check upon
     *
     * @param fileToCheck
     * @throws FileNotFoundException
     */
    public void fileToCheck(File fileToCheck) throws FileNotFoundException {
        if (!fileToCheck.exists()) throw new FileNotFoundException(fileToCheck + " does not exist!");
 
        this.fileToCheck = fileToCheck;
    }
 
    /**
     * Hash details used to perform the Hash check
     *
     * @param hash
     * @param hashType
     */
    public void hashDetails(String hash, HashType hashType) {
        this.expectedFileHash = hash;
        this.typeOfHash = hashType;
    }
 
    /**
     * Performs a expectedFileHash check on a File.
     *
     * @return
     * @throws IOException
     */
    public boolean hasAValidHash() throws IOException {
        if (this.fileToCheck == null) throw new FileNotFoundException("File to check has not been set!");
        if (this.expectedFileHash == null || this.typeOfHash == null) throw new NullPointerException("Hash details have not been set!");
 
        String actualFileHash = "";
        boolean isHashValid = false;
 
        switch (this.typeOfHash) {
            case MD5:
                actualFileHash = DigestUtils.md5Hex(new FileInputStream(this.fileToCheck));
                if (this.expectedFileHash.equals(actualFileHash)) isHashValid = true;
                break;
            case SHA1:
                actualFileHash = DigestUtils.shaHex(new FileInputStream(this.fileToCheck));
                if (this.expectedFileHash.equals(actualFileHash)) isHashValid = true;
                break;
        }
 
        LOG.info("Filename = '" + this.fileToCheck.getName() + "'");
        LOG.info("Expected Hash = '" + this.expectedFileHash + "'");
        LOG.info("Actual Hash = '" + actualFileHash + "'");
 
        return isHashValid;
    }
 
}

You can then use it in a test like the one below:

private final URL testFile = this.getClass().getResource("/download.zip");
 
@Test
public void checkValidMD5Hash() throws Exception {
    CheckFileHash fileToCheck = new CheckFileHash();
    fileToCheck.fileToCheck(new File(testFile.toURI()));
    fileToCheck.hashDetails("def3a66650822363f9e0ae6b9fbdbd6f", MD5);
    assertThat(fileToCheck.hasAValidHash(), is(equalTo(true)));
}

Hopefully I have managed to make you think twice about downloading files in your automated tests and provided a good cross platform/cross browser solution that will remove the need to add in yet another application to your test framework. The code above is a snapshot in time and will continue to be tweaked and updated as I’m made aware of problems, or think of better ways to do things.

One thing you may have noticed is that the code performing a status check and a file download is very similar.  I’m aware of this but wanted to keep it separate to reinforce the idea that you do not need to do file downloads, I will be merging both parts together in the near future.

If you want to have a look at the latest revision it’s available on Github as part of https://github.com/Ardesco/Powder-Monkey.

All feedback appreciated :)

72 Responses to “How To Download Files With Selenium And Why You Shouldn’t”

  1. AD

    Great post , will try out. Is there anyway we can upload operation (File / image) without using AutoIT or Sikuli?

    • Ardesco

      Yup the Selenium devs thought about this ages ago. You can just use

      driver.findElement(By.id(“foo”)).sendKeys(“AbsoluteFileLocation”);

      If the element you are sending text to is a

      <input type="file">

      it will upload the file.

      • AD

        I tried that but it does not work in my case though the element is an .

        • Ardesco

          Looks like your HTML got stripped out :)

          • AD

            :-) yeah looks like

  2. Shockwavenn

    Also you can set custom Browser Profile to download files to custom directory without confirmation dialog. I use it in Firefox\Chrome, not sure if it works for another browsers.

    • Ardesco

      You could, but then what are you actually going to do with the downloaded file? Was there any point in downloading it in the first place?

      Remember the action of downloading a file in itself is a pretty pointless test as I’ve tried to make clear in this blog post.

      • Shockwavenn

        Even if I don’t need this file, it much easy to set custom profile – it’s only 2 lines of code (at least on Ruby).
        But maybe it’s just my prejudgement, because I actually NEED content of those files

        • Ardesco

          The advantage to the method that I have suggested above is that you can porgramatically download any file, rename it if required and then pick it up in code and do things with it.

          Some problems I can see with your solution:

          • You have to create a download profile for each browser (if you are not doing this programatically you then have the maintenance of ensuring that profile exists on every test machine)
          • If you download a file with a filename that already exists the file will likely be saved with a filename you are not expecting, how do you know which file to check?
          • You don’t know if the file was successfully downloaded
          • if the file doesn’t exist in the download folder do you know why? Was it a 404(bad?) or a 503(expected?)

            The main reason for doing things the way I have suggested above is control, you have it and it enables you to do more.

            Obviously this is a Java implementation but the theory behind it can apply to any programming language.

          • Shockwavenn

            “You have to create a download profile for each browser (if you are not doing this programatically you then have the maintenance of ensuring that profile exists on every test machine)”
            I even didn’t know, that I can create browser profile for Selenium manually, I’m always create it programmatically

            “If you download a file with a filename that already exists the file will likely be saved with a filename you are not expecting, how do you know which file to check?”
            I clean default download directory for each test and copy download file to another location, if I need this file for some reason

            “You don’t know if the file was successfully downloaded”
            I have timout for file to appear in download directory – if it didn’t appear in this time I assume that download was incorrect

            “Was it a 404(bad?) or a 503(expected?)”
            Nothing to say here, my method can’t catch those errors in elegant way

          • MIchael

            The problem with this method is if you have to actually “click” on the WebElement to get a download started (as in my case). I’m going to be looking into the profile thing.

          • Ardesco

            Why would you need to phsyically click on the WebElement? This would imply you have some sort of JavaScript action required to perform a download, which IMHO is not a good idea because you are preventing people who do not have JavaScript enabled from accesing the download.

            If this is your scenario it should be fairly trivial to add a JavaScript handler that can fire JavaScript events across. I’m assuming all the JavaScript does is build a valid URL which is then passed to the broswer to use for download purposes (JavaScript is client side, so unless you are using something like NodeJS it’s unlikely to be doing anything on the server). At the end of the day all we need to do is collect the URL of the file on the remote server (something that every downloadable link will have). Getting the download link may be slightly more complicated if there is some fancy JavaScript stuff obfuscating the URL, but it will be possible to get it at some point.

        • amilacp

          I want to download an xml file from a site. The code I’m using in c#. But still I get the open or save dialog and I can’t proceed beyond. Can you please send me your code?
          FirefoxProfileManager manager = new FirefoxProfileManager();
          var profiles = manager.ExistingProfiles;
          FirefoxProfile profile = new FirefoxProfile(profiles.FirstOrDefault());

          profile.SetPreference(“browser.download.folderList”, 1);
          profile.SetPreference(“browser.download.manager.showWhenStarting”, false);
          profile.SetPreference(“browser.helperApps.neverAsk.saveToDisk”, “application/xml”);

          • Ardesco

            I would suggest having a look at the code in the article and writing a .NET equivalent :)

  3. Thanks for making a different branch of this Java downloader code. This version seems easier than (your) other one:

    https://github.com/Ardesco/Ebselen/blob/master/ebselen-core/src/main/java/com/lazerycode/ebselen/customhandlers/FileDownloader.java

    Also, another good use of this programmatic download is if you have content that downloads/displays inline in browser rather than save to file (unless to modify browser to treat as file download), such as PDF files, movies, Flash, and stuff that is returned to browser with header “Content-Disposition: inline;…” For such files rendered with their native app (Adobe PDF, Excel, Word, etc.) you’ll not be able to do any validation with Selenium and have to resort to using AutoIt, etc. or modify browser profile to force most MIME types to download to file. This method would be another alternative (if you can extract the URL of the inline content).

    Also, I think it would be nice to have a file downloader version that just takes in a URL rather than require a WebElement. In case you have to extract URL differently than from an element directly.

    • Ardesco

      The other one was an earlier cut and an initial implementation. This one I have broken out as a standalone class with the idea that it can be pulled into any project.

      The future idea is to make Powder_Monkey a series of useful utilities that you can import into any framework (obviously assuming I get enough time to keep adding useful things). At that point I’ll probably add it into Ebselen and remove any matching implementations.

      As for the limitations, they are intentional. In my mind the only things you should be downloading in your tests are things that the end user would download, to my mind this is explicit download links and images dotted around the page. I don’t see any point in adding something that takes a random URL and downloads a file, it is unlikely you would need the cookie/session information held in the browser to access the content and Apache commons already has this as a one liner:

      copyURLToFile(java.net.URL source, java.io.File destination, int connectionTimeout, int readTimeout) throws java.io.IOException

      Of course if you don’t agree with my (possibly blinkered) view you are quite free to take the existing implementation and tweak it a bit to make it do what you want :). Hopefully the code is written well enough for the intention to be fairly obvious.

      • Thanks for response. I do intend to modify the code for my use as best I can. The Apache commons method looks useful, but not for my case.

        I see what you’re talking about the general (or simple) case. But in today’s web apps that make use of AJAX, REST APIs, etc. you don’t always get simple web elements that store the target file link/URL in a href or src attribute. The element could be a div tag or other such tag rendered pretty with CSS/javascript, and who’s actual value is generated “onclick”. Your only choice then is to either click on it and somehow get the dynamically generated link from the resulting action, or figure out the correct javascript code to execute to return you the desired link w/o having to click the element (that would otherwise result in a file download or say a PDF popup window). And the resulting link may also require existing session to work (can’t just call the URL by itself).

        In those cases, at least how I see it, a URL input to the file downloader could be useful, and we leave it to the user for how they will extract the URL off the tricky element, since that’s likely web app implementation specific.

  4. create the popup using the “window.createPopup()”,i use webdriver ,how to get the popup

    • Ardesco

      I’m not sure I understand what you are asking…

  5. Dmitry

    Sounds like your approach won’t work with https connection. Am I missed something ?

    • Ardesco

      httpclient supports https so in theory it should work (although I haven’t tested it).

      At the end of the day an http request is nothing special, it’s just a secure connection between you and the server you are talking to. It’s not “safe”, just private.

  6. Dmitry

    It is working fine. It was my code problem. Thank you.

  7. G H

    Who says selenium is only used for testing?
    What about a web crawler whose entire purpose is to download files en masse?
    What trap? Skipping the file download dialogue is easy.

    You make a few too many blanket statements.

    • Ardesco

      I haven’t said Selenium is only for testing. To be totally clear, my position is that Selenium is *not* a testing tool, it is a Browser Automation Tool.

      Now the above being said, my post was specifically aimed at people using Selenium for testing purposes (I thought I made that pretty clear in the first paragraph). It is my position that people who are downloading files for testing purposes should not just be skipping the file download dialogue and hoping that files appear somewhere (It’s a lot more work to scan a directory waiting for a fully downloaded file to appear and check it is correct than just skipping the file download dialogue). They should be testing that download links are valid and if they are downloading files they should be testing that the file is the expected file (If they don’t do this the test was pointless).

      Now as to your suggestion of using Selenium as a spider, if you want to do that and just download every file you come across than skipping the file download dialogue does make more sense, but I would argue that you are using the wrong tool for the job. Use a proper web crawler, why? well because:

      * It will be faster
      * It is less likely to get caught in a spider trap (you would have to write you own code to do this with Selenium and the calculations can get quite complicated)
      * Selenium is really not the right tool for the job, Selenium is a browser automation tool designed to mimic a human. In my opinion you should really use the right tool for the job.

      I’m happy to discuss the blanket statements if you point out what they are and why you aren’t happy with them.

  8. Antonio

    Very useful, thank you for sharing.

  9. Great post. Thought I’d share this bit of info:

    Unfortunately, I had problems trying to integrate the code into our test framework when I initially tried it. For anyone, like me, who might have issue using it OR doesn’t have a need for the full feature of this solution/library, you can build simpler Java code to download file. I have code snippet as example. The current code uses Java’s built in java.net.URLConnection class for downloading, but if you check revision history, earlier code has example using Apache Commons HttpClient 3.1. The snippet hasn’t been fully tested, but should mostly work and is a generalized version of code I wrote for my organization’s custom test framework.

    https://gist.github.com/4411221

    • Ardesco

      Out of interest what problems did you have?

      • Been a while so forget exact problem encountered (error message or failure behavior), but from what I recall, after integrating your code in (w/ no Java compile errors), executing code to download file failed to work, it just didn’t save to disk, either the save part failed or more likely the HTTP request portion failed (improper request or response incomplete). There wasn’t enough info to indicate source of problem and I didn’t have time to debug further at the time. When it came time to revisit this, I didn’t want to spend the hassle of trying to integrate the code to our framework again (I did throwaway code earlier), and my team was leaning toward avoiding third party libraries or code (via JAR references), so I worked on building our own wrapper than to retrofit your classes into our framework (and Java package naming convention).

        If you still improve on your code here, what would make it better for others to use is a truly standalone library that could be packaged as JAR and not need retrofitting into one’s own framework, where your library truly encapsulates everything and user only needs to pass in WebDriver instances and get a Java file object (or string file path) reference to downloaded file. Something like this perhaps:

        import yourlibrarypackage…;
        Downloader client = new Downloader(driver); //or pass in driver.manage().getcookies, etc.
        String downloadedFilePath = client.downloadFile(url);
        //could be File object instead of string path to file, etc.
        //now user can do whatever they want with file
        //and manually delete it as needed afterwards

        that kind of library interface is what I ended up creating from scratch using the Java native HTTP classes.

        • Ardesco

          My long term plan would be to package up https://github.com/Ardesco/Powder-Monkey and stick it in maven central, there just doesn’t seem to be enough in there right now to be that useful as a standalone package. It would be nice to say download a basic selenium framework template, add this jar and off you go.

  10. Erzsebet (XDelphiGrl)

    Hello!

    Thanks for outlining your file download test philosophy. I’m new to automated testing, and was about to head down the Robot/AutoIt path, even though that had a definite code smell. Thanks, too, for making your code available on GitHub. I’ve enjoyed working with it. I do have one question, though: have you been able to pass cookies to Internet Explorer? I have code that works to test file downloading on Firefox and Chrome, but fails on IE. A Wireshark sniff revealed that the getHTTPStatusCode method is not passing any cookies when it makes the GET request. Have you seen this behavior? If so, any suggestions as to what will fix it?

    Thanks!

    -erzsebet (XDelphiGrl on GitHub)

    • Ardesco

      The code doesn’t pass cookies to the browser, it reads the cookies from the browser and then uses them to make a request to the server hosting the website. It’s worked on my machine so far (how many times have you heard that) but that doesn’t mean there are no flaws. The different drivers deal with cookies in slightly different ways, can you see the cookies in the IE JavaScript console if you type document.cookie? If not Selenium cannot see them either and is not able to collect them and reuse them when trying to download the file/image directly. This is normally caused by server side cookies.

      • Erzsebet (XDelphiGrl)

        You are correct; the problem is at the Selenium IE driver level. When I use the IE driver, the cookie domain is always null when I inspect the cookie Selenium is using. However, checking document.cookie in the IE JS console does show the the cookie, and when I view cookie details, it is fully described (name, value, expiry, AND the all-important domain). Somehow, between the browser and the IE webdriver, the domain and expiry are nulled. In the case of the application I’m testing, the server rejects cookies that do not have a domain, causing my file download to fail – though not through any fault of your code!

        Thanks for clarifying how your code is working. I should have looked at the source again before writing my comment. :-)

        Hope Monday is treating you well!

        • Ardesco

          In that case if you know the domain you could always explicitly set it for IE to get around the problem, not the most elegant of solutions but it should get things working for you :)

          • Erzsebet (XDelphiGrl)

            That’s my plan! :) Thanks!

  11. Rexy

    Useful post!!!
    but I used this to download flatfiles.
    I have a another different requirement..
    I have to select a record and click on download button (this button has a .js to download the file). and that would download a zip file.. for which content should be verified.
    really dont know how to customize this code..
    Please help.

    • Ardesco

      Couldn’t tell you from the information supplied so far. I would guess that the JavaScript returns a URL which is then downloaded, you could write something that could JavaScript or ask your devs to add a test support package to JavaScript that would allow you to make a JavaScript call that will provide you with the URL of the file the JavaScript would normally give you.

  12. Renato

    Hi, thanks for such a great post.
    I couldn’t help but notice that all your examples envolve a GET request.
    Would you please give an example with a POST request?
    One other question: is it possible to access the Download File dialog through WebDriver, just like it is done with Alert dialogs?

    • Ardesco

      To do a post just modify the request method e.g.

      urlChecker.setHTTPRequestMethod(RequestMethod.POST);

      No you can’t access the download dialogue through WebDriver (which is why this blog entry exists). A download dialogue is an OS level dialogue and something that WebDriver is unable to interact with.

  13. user

    Thanks for the good post.But I ran into 2 issues here.
    I implemented second approach where I download file and:
    assertThat(new File(downloadedFileAbsoluteLocation).exists(), is(equalTo(true)));
    assertThat(downloadTestFile.getHTTPStatusOfLastDownloadAttempt(), is(equalTo(200)));

    1. Issue:
    In row File downloadedFile = new File(this.localDownloadPath + fileToDownload.getFile().replaceFirst(“/|\\\\”, “”));
    There is a bug when my fileToDownload.getFile() returns “/script/myFileName”
    because it tries to create folder /tmp + script = /tmpscript and fails because can’t create it.

    2. Issue when it starts download but application redirected you on separated page with message that some error had happened. In this case HTTP response is 200 and html page is loaded as a file.

    Summarizing this only last variant can work fine :)
    Anyway thanks for the good job.

    • Ardesco

      I’ve refactored the code and it now should just download the file to your temp directory and return you a file object.

      This should be more reliable and makes the code much simpler.

      • user

        Awesome!

        BTW. I noticed that hashcode comparison doesn’t work for ZIP files due to ZIP hash include calculating of timestamp. So even when file is the same, hash will be different.

        Do you have any solution for this? I’m thinking about unzip file and then verify its hash. Maybe you have some better idea?

        Thanks.

        • Ardesco

          Unzip is certainly an option, I’m assuming that the files included in the zip are deterministic.

          If you are working with files that aren’t deterministic it’s very hard to validate them with any automation technique, files like this are either going to have to be manually validated or checked using fuzzy matching techniques that are prone to degrees of error. It becomes a question of just how close is close enough to give you a good degree of confidence.

  14. Hi,
    I tried to use a code in testing my application. I have a link that is used for downloading a file and look like this:
    Last Downloaded File
    Could you please tell me if your code should work on it?
    Thanks,
    Iulian.

  15. Iulian

    Hi again,
    The href part looks like this:
    href=”javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(“ctl00$c$hullsList$lnkLastDownloadedFile”, “”, true, “”, “”, false, true))

    Thanks again,
    Iulian.

    • Ardesco

      In this case you would need to emulate the post performed by JavaScript, it will require some understanding of what the JavaScript is actually doing

  16. Amit Kapoor

    Ardesco

    I implemented your method and excellent job by you , but can you explain all the methods in der-tails like:-
    1.) private BasicCookieStore mimicCookieState(Set seleniumCookieSet) {
    BasicCookieStore mimicWebDriverCookieStore = new BasicCookieStore();
    for (Cookie seleniumCookie : seleniumCookieSet) {
    BasicClientCookie duplicateCookie = new BasicClientCookie(seleniumCookie.getName(), seleniumCookie.getValue());
    duplicateCookie.setDomain(seleniumCookie.getDomain());
    duplicateCookie.setSecure(seleniumCookie.isSecure());
    duplicateCookie.setExpiryDate(seleniumCookie.getExpiry());
    duplicateCookie.setPath(seleniumCookie.getPath());
    mimicWebDriverCookieStore.addCookie(duplicateCookie);
    }

    return mimicWebDriverCookieStore;
    }

    why we need this thing (my guess to maintain the session)

    and
    please add another method to delete the downloaded file
    .

    • Ardesco

      That method mimic’s your WebDriver cookie state so that if you need to be logged into the website you are testing to access content you can still download files.

      You shouldn’t need any methods added to delete the files, that is functionality already available to you in core Java:

      File downloadedFile = new File(<fileLocation>);
      downloadedFile.delete();
      

      • Amit Kapoor

        What if the browser’s cookie is disabled ??
        and what about the protected pdf’s and word documents,they cannot be viewed through this.

        • Ardesco

          Well if browser cookies are disabled you won’t be able to log in anyway. As for protected documents, who cares? It wan’t stop you taking a file hash and comparing it to a known good one.

  17. santhosh

    I am getting error type mismatch: cannot convert from element type Object to Cookie
    for (Cookie seleniumCookie : seleniumCookieSet)

  18. santhosh

    In URLStatusChecker.java

  19. Priyanka

    can you please tell me how should i need to implement this to a selenium test code .As i’m so new to testing kindly sum one help me for the implementation part

  20. stubbe

    Great blogpost, thank you.

    The filedownloader is not working for me in IntelliJ with Selenium2 (FileDownloader
    downloadTestFile = new FileDownloader(driver) – it cannot only take “driver” as an argument), so how can I use the “package com.lazerycode.selenium.urlstatuschecker;” instead?

    I have these two in my build.gradle:
    testCompile ‘com.lazerycode.jmeter:jmeter-maven-plugin:1.8.1′
    testCompile ‘com.lazerycode.selenium:driver-binary-downloader-maven-plugin:0.9.2′

    • Ardesco

      Why are you adding maven plugins to a build.gradle file?
      Why have you added the jmeter-maven-plugin?

  21. Thanks for this post! It’s been very helpful and a great resource to point people to.

    I don’t do much work in Java, so I ended up porting a lot of the practices outlined in this post to Ruby.

    You can find the posts here:
    * Browser specific configuration: http://elemental-selenium.com/tips/2-download-a-file
    * Using an HTTP library: http://elemental-selenium.com/tips/8-download-a-file-revisited
    * Using an HTTP library for secure files: http://elemental-selenium.com/tips/15-download-secure-files

  22. Salman

    Hi,
    Sorry I am a bit confused. Which code do we use to check the HTTP status code as there are two
    package com.lazerycode.selenium.urlstatuschecker;
    right at the top under “Checking that links are valid”
    I am using Eclipse for Selenium JAVA Webdriver. Could someone let me know how to use the code written above under “Checking that links are valid”. Many thanks.

    • Ardesco

      Not sure I understand your question, there is an enum, a Class and a test. All three work together.

  23. Lucky

    Hi,

    In my case the href link returns a hash( # ).
    Also we use UCM for content management so the download is handled by UCM and not java script.
    Could you guide me how do I go about testing the download in this case please?

    • Ardesco

      In that case it’s likely there is some JavaScript in the background that is supplying the browser a URL to download content from. At the end of the day the browser is always going to need to redirect to a URL where the file is available, you need to capture this URL so that you can use something else to download the file from it.

  24. Ben

    I actually really do need my test to look at a downloaded log file, to verify it contains the correct information. Unfortunately, the site I’m testing is using HTTPOnly cookies and we only support IE, so getting the cookies necessary to set up the HTTPClient doesn’t seem possible…

  25. duda

    Hi,
    I’m trying to download txt file. Unfortunately I can’t provide link to website.
    in html:
    Export properties
    Solution works perfectly with FF, but in case of IE_10 I get full html code written into file. What I’m doing wrong?

    • duda

      sorry, in html

  26. Nick

    Hi,
    Although I do tend to agree with most of what you have said, there are occasions where I need to check the contents of the file and using a hash won’t work because the file contains dynamic data, e.g. references, Ids, dates etc.
    How do you test for this sort of scenario?

  27. Jasper

    Hi Ardesco,

    Great example, thanks! I am facing the same problems you so accurately describe (already have a lot of tests that download files, but now we want to implement cross-browser testing on virtual machines.. Problem!).

    I tried to implement your code but the following imports you use are deprecated. Do you perhaps have an update example with the new (not deprecated) imports? I’m just a Java beginner and I can’t figure out how to get it work.. Thanks in advance!

    import org.apache.http.client.params.ClientPNames;
    import org.apache.http.client.protocol.ClientContext;
    import org.apache.http.impl.client.DefaultHttpClient;
    import org.apache.http.params.HttpParams;

    • Ardesco

      I need to update it I guess, when i get time I’ll tweak what I have on github.

  28. junkew

    Partly I agree with what you wrote but still there are reasons where you should be able to handle the dialogboxes of your browser.

    AutoIT is not dependent on titles and can be of good help if you really need it clicking away unwanted dialogs or handle something like a save dialog

    And AutoIT can handle all browsers directly (but not OS independent)
    http://www.autoitscript.com/forum/topic/153520-iuiautomation-ms-framework-automate-chrome-ff-ie/ could help out a lot by testing your whole GUI thru AutoIT depending on what your needs are

    • Ardesco

      If you want to use AutoIt to test your GUI go for it, you still shouldn’t try and bundle it in to selenium to click a save dialogue in my mind,. If you are just clicking it away what are you actually checking?

Trackbacks/Pingbacks

  1. A Smattering of Selenium #105 « Official Selenium Blog
  2. Selenium Best Practices « Don't Make the Same Mistake Twice
  3. Optimizing Selenium tests with HTTP requests « autumnator
  4. Selenium file download by code and request for more platform options « autumnator
  5. how to handle download popup window using selenium webdriver

Leave a Reply




+ five = eight