find-mirror - find the best mirror in a list of mirrors


find-mirror [options] [files...]


  --count=N          repeat measurement N times per host
  --debug            debugging output
  --domains=LIST     comma-separated list of patterns
  --extract[=TYPE]   print mirrors, but do not contact them
                     TYPE can be "urls", "hosts", or empty.
  --help             show this help message
  --ignore-case      case-insensitive pattern matching
  --jobs=N           fork N simultaneous children
  --method           one of: ping,echo,connect
    --ping           same as --method=ping [default]
    --echo           same as --method=echo
    --connect        same as --method=connect
  --pattern=PATTERN  use custom perl regex to extract urls
  --relaxed          match anything that looks like a host
  --top[=N]          find the best N hosts
  --verbose          verbose output
  --version          report version and exit


    $  find-mirror mirrors.html
    $  lynx -dump | find-mirror -j 4

The latest version can be found at:


find-mirror is a utilily to extract and rank addresses and urls by reachability and data rate. It is meant to be used when you are presented with a list of links to mirrors, ftp sites, etc., and you need to select one (or more) of them.

Since it extracts urls directly from html (and from any arbitrary text), you can find your mirror with very little effort.


Repeat the measurement N times per host. For a ping, it's the number of successfully sent and received responses. For html download, it's the number download attempts. Try --count=3 for a nice measurement. Any higher than 3 is a waste of time. The type of measurement is specified by the --method option.

Print lots of debugging info. Automatically enables --verbose.

Specify a comma-separated list of perl regexes to match the top-level domain. This is useful if you only want a mirror in your country, and it can be determined by the top-level domain.

Do not specify the initial . character for the domain. Example:

    find-mirror --domains=com,net,edu

For more powerful control over the url matching, see option --pattern.

Extract and print the urls and hosts, but do not contact them. The optional TYPE can be one of:
Prints the full urls, e.g.

This is the default.

Prints only the host, e.g.,

Show the help message and exit.

Perform case-insensitive pattern matching. This option only makes sense if you also specify your own pattern with the --domains option or the --pattern option.

You can achieve the same results with perl's ugly (?i:...) syntax. e.g.

    $ find-mirror -i --domains=com,net,org
    $ find-mirror --domains='(?i:com|net|org)'  # same thing

This causes find-mirror to work in parallel. If N is greater than zero, then find-mirror forks N processes. If N is zero, then find-mirror forks a process for each host. The default is N=1.

This option can greatly speed execution time.

Caution: Do not specifiy --jobs=0 unless you know that the list of mirrors is short. Otherwise, you may create so many processes that your measurements are adversely affected (i.e., by processing latency, network flooding, etc.), and your system is overloaded.

Specifies how the data rate to a host is to be determined. None of these methods are 100% guaranteed to determine the best mirror. In fact, some may incorrectly discard a mirror that might be the best one (e.g, if a firewall prevents packets from reaching the mirror). The default is ping.

METHOD can be one of:

Use the system ping command. This is the fastest measurement, but doesn't tell you much about the download rate to the host. The number of packets is specified by the --count option. See ping(1) for details.

Sends UDP packets to the host's echo server, and waits for the responses. The problem with this is that the echo server may be disabled or blocked. The number of packets is specified by the --count option.

Measures the TCP connection establishment time to port 80 or 21 of the remote host. The measurement is taken from the first successful connection attempt. The number of connection/teardowns is specified by the --count option.

Choose the method depending upopn how much time you have on your hands, and how accurate you need the results to be. Here are the methods in order of overall execution time, from quickest, to slowest.

Same as --method=connect

Same as --method=echo

Same as --method=ping [default]

Use a custom perl regex to extract urls. PATTERN must be a valid perl regex. If it matches successfully, the url must be in $& and the host address in $1. The default is something like this (but not exactly):
    ( (?:ftp|http)://        # protocol specifier
      (\w+(\.\w+)+(\.\w+))   # hostname in $1
      (:\d+)?/(\S+)?)        # optional port and path

This option tells find-mirror to relax the pattern matching rules when looking for hosts. Instead of trying to match an entire url, it finds anything that looks like a host.

Finds only the top N hosts. This is useful for measuring a huge list of mirrors, when you only really want the top five. It reduces the execution time as compared to measuring the entire list.

Prints lots of information about what it is doing. If you really want to see a mess, try --debug.

Report version and exit.



See BUGS file for details.


John Millaway <>

SourceForge Logo