NAME

find-mirror - find the best mirror in a list of mirrors

SYNOPSIS

find-mirror [options] [files...]

Options:

  --count=N          repeat measurement N times per host
  --debug            debugging output
  --domains=LIST     comma-separated list of patterns
  --extract[=TYPE]   print mirrors, but do not contact them
                     TYPE can be "urls", "hosts", or empty.
  --help             show this help message
  --ignore-case      case-insensitive pattern matching
  --jobs=N           fork N simultaneous children
  --method           one of: ping,echo,connect
    --ping           same as --method=ping [default]
    --echo           same as --method=echo
    --connect        same as --method=connect
  --pattern=PATTERN  use custom perl regex to extract urls
  --relaxed          match anything that looks like a host
  --top[=N]          find the best N hosts
  --verbose          verbose output
  --version          report version and exit

Examples:

    $  find-mirror mirrors.html
    $  lynx -dump http://foo.com/mirrors.html | find-mirror -j 4

The latest version can be found at: http://sourceforge.net/projects/find-mirror/

DESCRIPTION

find-mirror is a utilily to extract and rank addresses and urls by reachability and data rate. It is meant to be used when you are presented with a list of links to mirrors, ftp sites, etc., and you need to select one (or more) of them.

Since it extracts urls directly from html (and from any arbitrary text), you can find your mirror with very little effort.

OPTIONS

--count=N

Repeat the measurement N times per host. For a ping, it's the number of successfully sent and received responses. For html download, it's the number download attempts. Try --count=3 for a nice measurement. Any higher than 3 is a waste of time. The type of measurement is specified by the --method option.

--debug

Print lots of debugging info. Automatically enables --verbose.

--domains=LIST

Specify a comma-separated list of perl regexes to match the top-level domain. This is useful if you only want a mirror in your country, and it can be determined by the top-level domain.

Do not specify the initial . character for the domain. Example:

    find-mirror --domains=com,net,edu

For more powerful control over the url matching, see option --pattern.

--extract[=TYPE]

Extract and print the urls and hosts, but do not contact them. The optional TYPE can be one of:

urls

Prints the full urls, e.g. ftp://ftp.cpan.org/pub/foo/bar

This is the default.

hosts

Prints only the host, e.g., ftp.cpan.org

--help

Show the help message and exit.

--ignore-case

Perform case-insensitive pattern matching. This option only makes sense if you also specify your own pattern with the --domains option or the --pattern option.

You can achieve the same results with perl's ugly (?i:...) syntax. e.g.

    $ find-mirror -i --domains=com,net,org
    $ find-mirror --domains='(?i:com|net|org)'  # same thing

--jobs=N

This causes find-mirror to work in parallel. If N is greater than zero, then find-mirror forks N processes. If N is zero, then find-mirror forks a process for each host. The default is N=1.

This option can greatly speed execution time.

Caution: Do not specifiy --jobs=0 unless you know that the list of mirrors is short. Otherwise, you may create so many processes that your measurements are adversely affected (i.e., by processing latency, network flooding, etc.), and your system is overloaded.

--method=METHOD

Specifies how the data rate to a host is to be determined. None of these methods are 100% guaranteed to determine the best mirror. In fact, some may incorrectly discard a mirror that might be the best one (e.g, if a firewall prevents packets from reaching the mirror). The default is ping.

METHOD can be one of:

ping: Use the system ping command. This is the fastest measurement, but doesn't tell you much about the download rate to the host. The number of packets is specified by the --count option. See ping(1) for details.
echo: Sends UDP packets to the host's echo server, and waits for the responses. The problem with this is that the echo server may be disabled or blocked. The number of packets is specified by the --count option.
connect: Measures the TCP connection establishment time to port 80 or 21 of the remote host. The measurement is taken from the first successful connection attempt. The number of connection/teardowns is specified by the --count option.

Choose the method depending upopn how much time you have on your hands, and how accurate you need the results to be. Here are the methods in order of overall execution time, from quickest, to slowest.

    ping
    echo
    connect
    download

--connect

Same as --method=connect

--echo

Same as --method=echo

--ping

Same as --method=ping [default]

--pattern=PATTERN

Use a custom perl regex to extract urls. PATTERN must be a valid perl regex. If it matches successfully, the url must be in $& and the host address in $1. The default is something like this (but not exactly):

    ( (?:ftp|http)://        # protocol specifier
      (\w+(\.\w+)+(\.\w+))   # hostname in $1
      (:\d+)?/(\S+)?)        # optional port and path

--relaxed

This option tells find-mirror to relax the pattern matching rules when looking for hosts. Instead of trying to match an entire url, it finds anything that looks like a host.

--top[=N]

Finds only the top N hosts. This is useful for measuring a huge list of mirrors, when you only really want the top five. It reduces the execution time as compared to measuring the entire list.

--verbose

Prints lots of information about what it is doing. If you really want to see a mess, try --debug.

--version

Report version and exit.

EXAMPLES

Basic usage (input can be any arbitrary text):

    $ find-mirror  mirrors.html
    $ find-mirror  list1.dat list2.dat
    $ find-mirror < mirrors.txt

Piped from lynx or wget:

    $ wget -q -O- http://foo.com/mirrors/ | find-mirror
    $ lynx -dump  http://foo.com/mirrors/ | find-mirror   # same thing

In parallel using the -j option (think `GNU make'):
```
    $ find-mirror -j 4  mirrors.html
```

Just extract urls, don't contact them:

    $ find-mirror --extract  < mirrors.html

Use different methods to rank the mirrors (native ping, echo response, connection establishment):

    $ find-mirror --ping    < mirrors.html
    $ find-mirror --echo    < mirrors.html
    $ find-mirror --connect < mirrors.html
    $ find-mirror --http    < mirrors.html

Match anything that looks like a hostname (instead of full urls):
```
    $ find-mirror --relaxed  < mirrors.html
```

Match only .org and .edu domains:

    $ find-mirror --domains='org,edu' < mirrors.html

Match only Pennsylvania, New Jersey, and New York, USA domains:

    $ find-mirror --domains='(pa,nj,ny)\.us' < mirrors.html

Match against a custom pattern:

    $ find-mirror --pattern='(ftp(\.[[:alnum:]-]+)+\.kernel.org)' < mirrors.html

BUGS

See BUGS file for details.

AUTHORS

John Millaway <john43@temple.edu>