find-mirror - find the best mirror in a list of mirrors
find-mirror [options] [files...]
Options:
--count=N repeat measurement N times per host
--debug debugging output
--domains=LIST comma-separated list of patterns
--extract[=TYPE] print mirrors, but do not contact them
TYPE can be "urls", "hosts", or empty.
--help show this help message
--ignore-case case-insensitive pattern matching
--jobs=N fork N simultaneous children
--method one of: ping,echo,connect
--ping same as --method=ping [default]
--echo same as --method=echo
--connect same as --method=connect
--pattern=PATTERN use custom perl regex to extract urls
--relaxed match anything that looks like a host
--top[=N] find the best N hosts
--verbose verbose output
--version report version and exit
Examples:
$ find-mirror mirrors.html
$ lynx -dump http://foo.com/mirrors.html | find-mirror -j 4
The latest version can be found at:
http://sourceforge.net/projects/find-mirror/
find-mirror is a utilily to extract and rank addresses and urls by reachability
and data rate. It is meant to be used when you are presented with a list of links to
mirrors, ftp sites, etc., and you need to select one (or more) of them.
Since it extracts urls directly from html (and from any arbitrary text), you
can find your mirror with very little effort.
- --count=N
-
Repeat the measurement N times per host. For a ping, it's the number of
successfully sent and received responses. For html download, it's the number
download attempts. Try --count=3 for a nice measurement. Any higher than 3 is a
waste of time. The type of measurement is specified by the --method option.
- --debug
-
Print lots of debugging info. Automatically enables --verbose.
- --domains=LIST
-
Specify a comma-separated list of perl regexes to match the top-level
domain. This is useful if you only want a mirror in your country, and
it can be determined by the top-level domain.
Do not specify the initial . character for the domain. Example:
find-mirror --domains=com,net,edu
For more powerful control over the url matching, see option --pattern.
- --extract[=TYPE]
-
Extract and print the urls and hosts, but do not contact them.
The optional TYPE can be one of:
- urls
-
Prints the full urls, e.g. ftp://ftp.cpan.org/pub/foo/bar
This is the default.
- hosts
-
Prints only the host, e.g., ftp.cpan.org
- --help
-
Show the help message and exit.
- --ignore-case
-
Perform case-insensitive pattern matching. This option only makes sense if
you also specify your own pattern with the --domains option or the --pattern option.
You can achieve the same results with perl's ugly (?i:...) syntax. e.g.
$ find-mirror -i --domains=com,net,org
$ find-mirror --domains='(?i:com|net|org)' # same thing
- --jobs=N
-
This causes find-mirror to work in parallel. If N is greater than zero,
then find-mirror forks N processes. If N is zero, then find-mirror forks
a process for each host. The default is N=1.
This option can greatly speed execution time.
Caution: Do not specifiy --jobs=0 unless you know that the list of mirrors
is short. Otherwise, you may create so many processes that your measurements are
adversely affected (i.e., by processing latency, network flooding, etc.), and
your system is overloaded.
- --method=METHOD
-
Specifies how the data rate to a host is to be determined.
None of these methods are 100% guaranteed to determine the best
mirror. In fact, some may incorrectly discard a mirror that might be
the best one (e.g, if a firewall prevents packets from reaching the
mirror). The default is ping.
METHOD can be one of:
- ping
-
Use the system ping command. This is the fastest measurement, but doesn't
tell you much about the download rate to the host. The number of
packets is specified by the --count option. See
ping(1)
for details.
- echo
-
Sends UDP packets to the host's echo server, and waits for the responses.
The problem with this is that the echo server may be disabled or blocked.
The number of packets is specified by the --count option.
- connect
-
Measures the TCP connection establishment time to port 80 or 21 of
the remote host. The measurement is taken from the first
successful connection attempt. The number of connection/teardowns
is specified by the --count option.
Choose the method depending upopn how much time you have on your hands, and how
accurate you need the results to be. Here are the methods in order of overall
execution time, from quickest, to slowest.
ping
echo
connect
download
- --connect
-
Same as --method=connect
- --echo
-
Same as --method=echo
- --ping
-
Same as --method=ping [default]
- --pattern=PATTERN
-
Use a custom perl regex to extract urls. PATTERN must be a valid perl
regex. If it matches successfully, the url must be in $& and the host
address in $1. The default is something like this (but not exactly):
( (?:ftp|http):// # protocol specifier
(\w+(\.\w+)+(\.\w+)) # hostname in $1
(:\d+)?/(\S+)?) # optional port and path
- --relaxed
-
This option tells find-mirror to relax the pattern matching rules
when looking for hosts. Instead of trying to match an entire
url, it finds anything that looks like a host.
- --top[=N]
-
Finds only the top N hosts. This is useful for measuring a huge list of
mirrors, when you only really want the top five. It reduces the
execution time as compared to measuring the entire list.
- --verbose
-
Prints lots of information about what it is doing. If you really want to
see a mess, try --debug.
- --version
-
Report version and exit.
-
Basic usage (input can be any arbitrary text):
$ find-mirror mirrors.html
$ find-mirror list1.dat list2.dat
$ find-mirror < mirrors.txt
-
Piped from lynx or wget:
$ wget -q -O- http://foo.com/mirrors/ | find-mirror
$ lynx -dump http://foo.com/mirrors/ | find-mirror # same thing
-
In parallel using the -j option (think `GNU make'):
$ find-mirror -j 4 mirrors.html
-
Just extract urls, don't contact them:
$ find-mirror --extract < mirrors.html
-
Use different methods to rank the mirrors (native ping, echo response, connection establishment):
$ find-mirror --ping < mirrors.html
$ find-mirror --echo < mirrors.html
$ find-mirror --connect < mirrors.html
$ find-mirror --http < mirrors.html
-
Match anything that looks like a hostname (instead of full urls):
$ find-mirror --relaxed < mirrors.html
-
Match only .org and .edu domains:
$ find-mirror --domains='org,edu' < mirrors.html
-
Match only Pennsylvania, New Jersey, and New York, USA domains:
$ find-mirror --domains='(pa,nj,ny)\.us' < mirrors.html
-
Match against a custom pattern:
$ find-mirror --pattern='(ftp(\.[[:alnum:]-]+)+\.kernel.org)' < mirrors.html
See BUGS file for details.
John Millaway <john43@temple.edu>