bdownload package

bdownload.download module

class bdownload.download.BDownloader(max_workers=None, max_parallel_downloads=5, workers_per_download=4, min_split_size=1048576, chunk_size=102400, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20, pool_block=True, request_timeout=None, request_retries=None, status_forcelist=None, resumption_retries=None, continuation=True, referrer=None, check_certificate=True, ca_certificate=None, certificate=None, auth=None, netrc=None, headers=None)[source]

Bases: object

The class for executing and managing download jobs.

The context of the current downloading job is structured as:

ctx = {
    "total_size": 2000,  # total size of all the to-be-downloaded files, maybe inaccurate due to chunked transfer encoding
    "accurate": True,  # Is `total_size` accurate?
    "last_progress": 0,  # the overall progress, in bytes, from last run loaded when resuming from interruption
    "downloaded": 0,  # newly accumulated bytes from this run of downloads, which are updated on completion of every worker thread
    "orig_path_urls": [('file1', 'url1  url2    url3'), ('file2', 'url4 url5    url6')],  # originally added downloads,
        # which don't necessarily correspond to `files` e.g. due to duplicate or interruption
    "file_cnt": 2,  # number of current downloading files, i.e. `alt_files`
    "alt_files": [("full_path_to_file1", `ctx_file1_obj`), ("full_path_to_file2", `ctx_file2_obj`)],  # flattened `files`
        # with the exception of the succeeded on addition
    "active_files": [("full_path_to_file1", `ctx_file1_obj`)],  # scheduled, in-processing file downloads
    "active_downloads": 1,  # number of in-processing file downloads
    "next_download": 1,  # index to the next to schedule to download file
    "poll_changed": False,  # Have the polled files' states changed?
    "files":{
        "full_path_to_file1":{
            "length": 2000,  # 0 means 'unknown', i.e. file size can't be pre-determined through any one of provided URLs
            "progress": 0,  # `SUCCEEDED` downloaded bytes: initialized to 0, set to the last progress when
                            # resuming and updated on completion (SUCCEEDED only!) of every task (`Future`)
            "last_progress": 0,  # CONSTANT: the loaded progress of last run upon resuming from interruption
            "downloaded": 0,  # downloaded bytes: initialized to 0, and updated on completion (SUCCEEDED, FAILED)
                              # of every task (`Future`)
            "stdout": False,  # standard output
            "resumable": True,
            "resuming_from_intr": False,  # Are we resuming from keyboard interruption?
            "download_state": "inprocess",
            "cancelled_on_exception": False,
            "orig_path_url": ('file1', 'url1    url2    url3'),  # (path, url) as a subparameter passed to :meth:`downloads`
            "path_url": ('full_path_to_file1', 'url1    url2    url3'),  # (full_pathname, active_URLs)
            "urls":{"url1":{"auth": None, "auth_header": {"Authorization": "Basic dXNlcjpwYXNz"}, "accept_ranges": "bytes", "refcnt": 1, "interrupted": 2, "succeeded": -5},
                    "url2":{"auth": None, "auth_header": {"Authorization": "Digest username='user',realm=..."}, "accept_ranges": "none", "refcnt": 0, "interrupted": 0, "succeeded": 0},
                    "url3":{"auth": None, "auth_header": {}, "accept_ranges": "bytes", "refcnt": 1, "interrupted": 0, "succeeded": -2}},
            "alt_ranges": [("bytes=1000-1999", `ctx_range2_obj`)],  # task ranges stack
            "worker_ranges": [("bytes=0-999", `ctx_range1_obj`)],  # active range downloading tasks
            "active_workers": 1,  # number of active worker threads on ranges downloading of the file
            "ranges_succeeded": 0,  # number of ranges successfully downloaded
            "ranges":{
                "bytes=0-999": {
                    "start": 0,  # start byte position
                    "end": 999,  # end byte position, None for 'unkown', see above
                    "offset": 0,  # current pointer position relative to 'start'(i.e. 0)
                    "last_offset": 0,  # the last pointer position where the range task failed and was rescheduled in this run
                    "start_time": 0,
                    "rt_dl_speed": 0,  # x seconds interval
                    "download_state": "inprocess",
                   "future": future1,
                   "url": [url1],
                   "alt_urls": {}
                },
                "bytes=1000-1999": {
                    "start":1000,
                    "end":1999,
                    "offset": 0,  # current pointer position relative to 'start'(i.e. 1000)
                    "last_offset": 0,  # the last pointer position where the range task failed and was rescheduled in this run
                    "start_time": 0,
                    "rt_dl_speed": 0,  # x seconds interval
                    "download_state": "inprocess",
                    "future": future2,
                    "url": [url3],
                    "alt_urls": {}
                }
            }
        },
        "full_path_to_file2":{
        }
    }
}
CANCELLED = 'cancelled'
FAILED = 'failed'
INPROCESS = 'inprocess'
INPROCESS_EXT = '.bdl'
PENDING = 'pending'
PROGRESS_BS_BAR = 'bar'
PROGRESS_BS_MILL = 'mill'
PROGRESS_BS_NONE = 'none'
RESUM_PARTS_EXT = '.bdl.par'
STD_OUT = '-'
SUCCEEDED = 'succeeded'
__init__(max_workers=None, max_parallel_downloads=5, workers_per_download=4, min_split_size=1048576, chunk_size=102400, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20, pool_block=True, request_timeout=None, request_retries=None, status_forcelist=None, resumption_retries=None, continuation=True, referrer=None, check_certificate=True, ca_certificate=None, certificate=None, auth=None, netrc=None, headers=None)[source]

Create and initialize a BDownloader object.

Parameters:
  • max_workers (int) – The max_workers parameter specifies the number of the parallel downloading threads, whose default value is determined by #num_of_processor * 5 if set to None.

  • max_parallel_downloads (int) – max_parallel_downloads limits the number of files downloading concurrently. It has a default value of 5.

  • workers_per_download (int) – workers_per_download sets the maximum number of worker threads for every file downloading job, which defaults to 4.

  • min_split_size (int) – min_split_size denotes the size in bytes of file pieces split to be downloaded in parallel, which defaults to 1024*1024 bytes (i.e. 1MB).

  • chunk_size (int) – The chunk_size parameter specifies the chunk size in bytes of every http range request, which will take a default value of 1024*100 (i.e. 100KB) if not provided.

  • proxy (str) – The proxy supports both HTTP and SOCKS proxies in the form of 'http://[user:pass@]host:port' and 'socks5://[user:pass@]host:port', respectively.

  • cookies (str, dict or CookieJar) – If cookies needs to be set, it must either take the form of 'cookie_key=cookie_value', with multiple pairs separated by whitespace and/or semicolon if applicable, e.g. 'key1=val1 key2=val2;key3=val3', be packed into a dict, or be an instance of CookieJar, i.e. cookielib.CookieJar for Python27, http.cookiejar.CookieJar for Python3.x or RequestsCookieJar from requests.

  • user_agent (str) – When user_agent is not given, it will default to 'bdownload/VERSION', with VERSION being replaced by the package’s version number.

  • logger (logging.Logger) – The logger parameter specifies an event logger. If logger is not None, it must be an object of class logging.Logger or of its customized subclass. Otherwise, it will use a default module-level logger returned by logging.getLogger(__name__).

  • progress (str) – progress determines the style of the progress bar displayed while downloading files. Possible values are 'mill', 'bar' and 'none'. 'mill' is the default. To disable this feature, e.g. while scripting or multi-instanced, set it to 'none'.

  • num_pools (int) – The num_pools parameter has the same meaning as num_pools in urllib3.PoolManager and will eventually be passed to it. Specifically, num_pools specifies the number of connection pools to cache.

  • pool_maxsize (int) – pool_maxsize will be passed to the underlying requests.adapters.HTTPAdapter. It specifies the maximum number of connections to save that can be reused in the urllib3 connection pool.

  • pool_block (bool) – pool_block specifies whether the urllib3 connection pool should block the call or create new connections when there is no free connections available.

  • request_timeout (float or 2-tuple of float) – The request_timeout parameter specifies the timeouts for the internal requests session. The timeout value(s) as a float or (connect, read) tuple is intended for both the connect and the read timeouts, respectively. If set to None, it will take a default value of RequestsSessionWrapper.TIMEOUT.

  • request_retries (int) –

    request_retries specifies the maximum number of retry attempts allowed on exceptions and interested status codes(i.e. status_forcelist) for the builtin Retry logic of urllib3. It will default to URLLIB3_BUILTIN_RETRIES_ON_EXCEPTION if not given.

    Notes

    There are two retry mechanisms that jointly determine the total retries of a request. One is the above-mentioned Retry logic that is built into urllib3, and the other is the extended high-level retry factor that is meant to complement the builtin retry mechanism. The total retries is bounded by the following formula:

    request_retries * (_requests_extended_retries_factor + 1)

    See retry_requests(), RequestsSessionWrapper and requests_retry_session() for more details on the retry mechanisms.

  • status_forcelist (set of int) – status_forcelist specifies a set of HTTP status codes that a retry should be enforced on. The default set of status codes shall be URLLIB3_RETRY_STATUS_CODES if not given.

  • resumption_retries (int) – The resumption_retries parameter specifies the maximum allowable number of retries on error at resuming the interrupted download while streaming the request content. The default value of it is REQUESTS_RETRIES_ON_STREAM_EXCEPTION when not provided.

  • continuation (bool) – The continuation parameter specifies whether, if possible, to resume the partially downloaded files before, e.g. when the downloads had been terminated by the user by pressing Ctrl-C. When not present, it will default to True.

  • referrer (str) – referrer specifies an HTTP request header Referer that applies to all downloads. If set to '*', the request URL shall be used as the referrer per download.

  • check_certificate (bool) – The check_certificate parameter specifies whether to verify the server’s TLS certificate or not. It defaults to True.

  • ca_certificate (str) – The ca_certificate parameter specifies a path to the preferred CA bundle file (.pem) or directory with certificates in PEM format of trusted CAs. If set to a path to a directory, the directory must have been processed using the c_rehash utility supplied with OpenSSL, according to requests. NB the cert files in the directory each only contain one CA certificate.

  • certificate (str or tuple) – certificate specifies a client certificate. It has the same meaning as that of cert in requests.request().

  • auth (tuple or requests.auth.AuthBase) –

    The auth parameter sets a (user, pass) tuple or Auth handler to enable Basic/Digest/Custom HTTP Authentication. It will be passed down to the underlying requests.Session instance as the default authentication.

    Warning

    The auth will be applied to all the downloads for HTTP Authentication. Don’t use this parameter, if not all of the downloads need the authentication, to avoid leaking credential. Instead, use the netrc parameter for fine-grained control over HTTP Authentication.

  • netrc (dict) – netrc specifies a dictionary of 'machine': (login, password) (or 'machine': requests.auth.AuthBase) for HTTP Authentication, similar to the .netrc file format in spirit.

  • headers (dict) – headers specifies extra HTTP headers, standard or custom, for use in all of the requests made by the session. The headers take precedence over the ones specified by other parameters, e.g. user_agent, if conflict happens.

Raises:

ValueError – Raised when the cookies is of the str type and not in valid format.

_backup_resumption_ctx(the_file, ctx_file)[source]

Back up the necessary context of the unsuccessful download for resuming later.

Parameters:
  • the_file (str) – The full path name of the file being downloaded.

  • ctx_file (dict) – The download context of the file the_file.

Returns:

The resumption context for the file the_file.

Return type:

dict

_build_ctx(path_urls)[source]

Build the context for downloading the file(s).

Parameters:

path_urls (list of tuple) – Paths and URLs for the file(s) to download, see downloads() for details.

Returns:

A 6-tuple of lists '(active, active_orig, failed, failed_orig, existing, existing_orig)', where the lists active and active_orig contain the active (path, url)’s, converted and original respectively; failed and failed_orig contain the same (path, url)’s that are not downloadable; existing and existing_orig contain the downloads whose desired files already exist out there.

Raises:

BDownloaderException – Raised when the termination or cancellation flag has been set.

_build_ctx_internal(path_name, url)[source]

The helper method that actually does the build of the downloading context of the file.

Parameters:
  • path_name (str) – The full path name of the file to download.

  • url (str) – The URL referencing the target file.

Returns:

A 3-tuple '(downloadable, (path, url), (orig_path, orig_url))', where the downloadable indicates whether the desired file is downloadable, unavailable or existing by True, False or None respectively, (path, url) denotes the converted full pathname and the URL that consists only of active URLs, and (orig_path, orig_url) denotes the originally input pathname and URL.

Return type:

tuple

Raises:

BDownloaderException – Raised when the termination or cancellation flag has been set.

_calc_completed()[source]

Calculate the already downloaded bytes of the files.

Returns:

The size in bytes of the downloaded pieces.

Return type:

int

_cancel_all_on_interrupted()[source]

Cancel all the pending tasks when receiving the SIGINT signal or the QUIT command.

_finalize_on_interrupted_py2()[source]

When interrupted under Python2.x, perform state transitions manually and act accordingly.

_get_alt_urls(path_name)[source]

Get alternative URLs from the multiple sources of the file to resume downloading from.

Parameters:

path_name (str) – The full path name of the file to be downloaded.

Returns:

The alternative source URLs sorted by descending succeeded downloads, then by ascending interrupted and references.

Return type:

list

static _get_fname_from_hdr(content_disposition)[source]

“Get the file name from the HTTP response header.

Parameters:

content_disposition (str) – Content of the Content-Disposition field of the response header.

Returns:

The extracted file name.

Return type:

str

static _get_fname_from_url(url)[source]

Generate a file name from the download URL.

Parameters:

url (str) – A URL referencing the intended file.

Returns:

The automatically generated file name.

Return type:

str

_get_remote_file_multipart(path_name, req_range)[source]

The worker thread body for downloading an assigned piece of a file.

Parameters:
  • path_name (str) – The full path name of the file to be downloaded.

  • req_range (str) – A chunk of the file path_name as a range request of the form 'bytes=start-end'.

Returns:

None.

Raises:
  • BDownloaderException – Raised when connect timeouts, read timeouts, failed connections or bad status codes occurred and the retries is exhausted.

  • EnvironmentError – Raised when file operations failed.

_get_remote_file_singlewhole(path_name, req_range)[source]

The worker thread body for downloading the whole of a file, as opposed to _get_remote_file_multipart().

Parameters:
  • path_name (str) – The full path name of the file to be downloaded.

  • req_range (str) – The whole chunk of the file path_name as a mock range request of the form 'bytes=0-None'.

Returns:

None.

Raises:
  • BDownloaderException – Raised when connect timeouts, read timeouts, failed connections or bad status codes occurred and the retries is exhausted.

  • EnvironmentError – Raised when file operations failed.

_is_all_done()[source]

Check if all the tasks have completed.

Returns:

True if all the Futures have been done, meaning that all the files have finished downloading, whether successfully or not; False otherwise.

Return type:

bool

_is_download_resumable(path_name)[source]

Check if the current download of the file can be resumed from the point of last interruption through retrying.

Parameters:

path_name (str) – The full path name of the file being downloaded.

Returns:

True if the server accepts range requests for the file, otherwise False.

Return type:

bool

_is_parallel_downloadable(path_name)[source]

Check if the file can be downloaded in parallel, i.e. using multi-threads to download the file pieces simultaneously.

Parameters:

path_name (str) – The full path name of the file to be downloaded.

Returns:

True if the file length is known and the server accepts its range requests, otherwise False.

Return type:

bool

_load_resumption_ctx(the_file, ctx_file)[source]

Load from the resumption parts file to restore the download context.

Parameters:
  • the_file (str) – The full path name of the file to download.

  • ctx_file (dict) – The download context of the file the_file.

Returns:

A 2-tuple (is_resuming, resumption_ctx), where is_resuming indicates whether the download is resuming from last interruption, and if this is the case (True), resumption_ctx holds the successfully loaded resumption context.

Return type:

(bool, dict)

_mgmnt_task()[source]

The management thread body.

This thread manages the downloading process of the whole job queue, currently including state management only. When all the tasks have been done, it signals the waiting thread and exits immediately.

Returns:

None.

_on_cancelled(the_file, ctx_file)[source]

When transitioning to the CANCELLED state, remove the empty, obsolete files.

_on_failed(the_file, ctx_file)[source]

When transitioning to the FAILED state, save the resumption ctx or remove the intermediate files.

_on_succeeded(the_file, ctx_file)[source]

When transitioning to the SUCCEEDED state, convert from in-process to finished file and do the cleanup.

_pick_file_url(path_name)[source]

Select one URL from the multiple sources of the file to download from.

Parameters:

path_name (str) – The full path name of the file to be downloaded.

Yields:

list – A list of URL(s) to download the file from using a strategy of Round Robin.

_progress_task()[source]

The thread body for showing the progress of the downloading tasks.

Returns:

None.

_rename_existing_file(full_pathname)[source]

Rename the file or directory with the given pathname if present.

Parameters:

full_pathname (str) – The full path name of the file to check for duplicate.

_result()[source]

“Return both the succeeded and failed downloads when all done or interrupted by user.

Returns:

Same as that returned by wait_for_all().

Return type:

tuple of list

_schedule_dl_tasks(path_name, num_tasks)[source]

Arrange the range downloading tasks of the file and assign them to the thread pool executor.

Parameters:
  • path_name (str) – The full path name of the file being scheduled for.

  • num_tasks (int) – The number of the range tasks requested to allocate.

Returns:

The (re-)scheduled range tasks and their corresponding download contexts.

Return type:

list of tuple

_schedule_file_download(the_file, ctx_file)[source]

Remove the succeeded range tasks, reassign the failed and arrange new for the file downloading.

_schedule_files_downloads()[source]

Remove the completed tasks from the files downloading queue and submit new file task assignments.

_state_mgmnt()[source]

Perform the state-related operations of file downloading.

This method updates the download status of the files and their related chunks when the associated worker threads completed, either because of finished without error, raised on exception or cancelled intentionally.

Returns:

None.

static _topmost_missing_dir(path)[source]

Find the topmost non-existent directory for a given path.

Parameters:

path (str) – A path to the directory to save the downloaded file in.

Returns:

The uppermost directory that is missing from the path.

Return type:

str

_wait_py2()[source]

Wait for all the jobs done on Python 2.x

_wait_py3()[source]

Wait for all the jobs done on Python 3.x and newer

static calc_req_ranges(req_len, split_size, req_start=0)[source]

Split the request req_len into chunks of the size split_size starting from the point req_start.

Parameters:
  • req_len (int) – The length of the request to split.

  • split_size (int) – The size of each split chunk.

  • req_start (int) – The start position to split from.

Returns:

The list of ranges in the form of 2-tuple '(start ,end)'.

Return type:

list of tuple

cancel(keyboard_interrupt=True)[source]

Cancel all the download jobs.

Parameters:

keyboard_interrupt (bool) – Specifies whether or not the user hit the interrupt key (e.g. Ctrl-C).

Returns:

None.

close()[source]

Shut down and perform the cleanup.

Returns:

None.

download(path_name, url)[source]

Submit a single downloading job to the downloading queue.

This method is simply a wrapper of the method downloads().

Parameters:
  • path_name (str) – The full path name of the file to be downloaded.

  • url (str) – The URL referencing the target file.

Returns:

None.

Raises:

BDownloaderException – Same as in downloads().

Notes

The limitation on the method and the path_name parameter herein is the same as in downloads().

downloads(path_urls)[source]

Submit multiple downloading jobs at a time to the downloading queue.

Parameters:

path_urls (list of tuples) – path_urls accepts a list of tuples of the form (path, url), where path should be a pathname, optionally prefixed with absolute or relative paths, and url should be a URL string, which may consist of multiple TAB-separated URLs pointing to the same file. Note that a single dash ‘-’ specifies the path reserved for the standard output. A valid path_urls, for example, could be [(‘/opt/files/bar.tar.bz2’, 'https://foo.cc/bar.tar.bz2'), (‘./sanguoshuowen.pdf’, 'https://bar.cc/sanguoshuowen.pdf\thttps://foo.cc/sanguoshuowen.pdf'), (‘/to/be/created/’, 'https://flash.jiefang.rmy/lc-cl/gaozhuang/chelsia/rockspeaker.tar.gz'), (‘/path/to/existing-dir’, 'https://ghosthat.bar/foo/puretonecone81.xz\thttps://tpot.horn/foo/pure tonecone81.xz\thttps://hawkhill.bar/foo/puretonecone81.xz')].

Returns:

None.

Raises:

BDownloaderException – Raised when the downloads were interrupted, e.g. by calling cancel() in a SIGINT signal handler, in the process of submitting the download requests.

Notes

The method is not thread-safe, which means it should not be called at the same time in multiple threads with one instance.

When multi-instanced (e.g. one instance per thread), the file paths specified in one instance should not overlap those in another to avoid potential race conditions. File loss may occur, for example, if a failed download task in one instance tries to delete a directory that is being accessed by some download tasks in other instances. However, this limitation doesn’t apply to the file paths specified in a same instance.

static list_split(li, chunk_size=5)[source]

Break a list into chunks.

Parameters:
  • li (list) – The list to split.

  • chunk_size (int) – The size of the resultant chunk list.

Yields:

list – The next chunk of the split list li.

progress_all()[source]

Get the coarse-grained, overall progress of the downloads.

Returns:

The 3-tuple of the form (completed_bytes, total_bytes, is_accurate). completed_bytes is updated on a chunk basis from the worker threads by the management task. If is_accurate is False then total_bytes is inaccurate, i.e. some downloads have undetermined sizes, which also means completed_bytes may be greater than the total_bytes; otherwise, total_bytes is the exact sum of sizes of all the downloads. Note that total_bytes (and is_accurate) may vary during the phase of submitting the downloads.

Return type:

tuple

raise_on_interrupted()[source]

Raise a customized exception signaling that the downloads have been terminated by the user.

Raises:

BDownloaderException – Raised when the termination or cancellation flag has been set.

result()[source]

Return the final download status.

Returns:

0 for success, and -1 failure.

Return type:

int

results()[source]

Get both the succeeded and failed downloads when all done or interrupted by user.

Returns:

Same as that returned by wait_for_all().

Return type:

tuple of list

wait_for_all()[source]

Wait for all the downloading jobs to complete.

Returns:

A 2-tuple of lists '(succeeded, failed)'. The first list succeeded contains the originally passed (path, url)s that finished successfully, while the second list failed contains the raised and cancelled ones.

Return type:

tuple of list

exception bdownload.download.BDownloaderException[source]

Bases: Exception

The exception indicating that an error occurred while executing the download tasks.

bdownload.download.COOKIE_STR_REGEX = re.compile('^\\s*(?:[^,; =]+=[^,; ]+\\s*(?:$|\\s+|;\\s*))+\\s*$')

A compiled regular expression object used to match the cookie string in the form of key/value pairs.

See also BDownloader.__init__() for more details about cookies.

Type:

regex

class bdownload.download.HTTPBasicAuthEx(username, password)[source]

Bases: HTTPBasicAuth

Attaches HTTP Basic Authentication to the given Request object.

This class is adapted from requests.auth.HTTPBasicAuth and requests.auth.HTTPDigestAuth, with added support for handling Unauthorized request on the response.

handle_401(r, **kwargs)[source]

Takes the given response and tries basic-auth, if needed.

bdownload.download.HTTP_HEADER_REGEX = re.compile('^\\s*[a-zA-Z0-9_-]+:\\s*[a-zA-Z0-9_ :;.,\\\\/"\\\'?!(){}[\\]@<>=\\-+*#$&`|~^%]*$')

A compiled regular expression object used to validate the HTTP request header in the 'name: value' format.

Refer to https://developers.cloudflare.com/rules/transform/request-header-modification/reference/header-format.

Type:

regex

class bdownload.download.MillProgress(label='', hide=None, expected_size=None, every=1, eta_tag='eta:', elapsed_tag='elapsed:')[source]

Bases: object

Print a mill while progressing.

This class is adapted from clint.textui.progress, with added support for unknown expected_size.

ETA_INTERVAL = 1
ETA_SMA_WINDOW = 9
MILL_CHARS = ['|', '/', '-', '\\']
MILL_TEMPLATE = '{}  {}  {:,d}/{:<}  {}  {} {}\r'
NULL_EXPECTED_DISP = '--'
NULL_EXPECTED_WIDTH = 2
STREAM = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
done()[source]
static format_time(seconds)[source]
mill_char(progress)[source]
show(progress, count=None)[source]
bdownload.download.PICKLE_PROTOCOL_NUMBER = 2

The highest pickle protocol number valid for both Python 2.x and Python 3.x.

Type:

int

bdownload.download.REQUESTS_EXTENDED_RETRIES_FACTOR = 1

Default number of retries factor for _requests_extended_retries_factor.

Type:

int

bdownload.download.REQUESTS_RETRIES_ON_STREAM_EXCEPTION = 10

Default number of retries on exceptions raised while streaming the request content.

Type:

int

bdownload.download.RETRY_BACKOFF_FACTOR = 0.1

Default retry backoff factor.

Type:

float

bdownload.download.RETRY_EXEMPT_STATUS_CODES = frozenset({401, 407, 511})

Default status codes that should be avoided retrying on before handled

Type:

set

class bdownload.download.RequestsSessionWrapper(timeout=None, proxy=None, cookies=None, user_agent=None, referrer=None, verify=True, cert=None, headers=None, auth=None, requester_cb=None)[source]

Bases: Session

Subclass of the requests.Session class with extended retry-on-exception behavior for the get method.

Note

The retry mechanism here is independent from that built into urllib3 (see _requests_extended_retries_factor and retry_requests()). That is, the decorated retry attempts will be triggered whenever the get method raised on some requests.RequestException or for any bad status code, regardless of whether or not the builtin Retry of urllib3 is enabled. Nevertheless, they together determine the number of the total retries. See requests_retry_session() for more information about their cooperation.

TIMEOUT = (3.05, 6)

the connect timeout value defaults to 3.05 seconds, and the read timeout 6 seconds.

Type:

Default timeouts

__init__(timeout=None, proxy=None, cookies=None, user_agent=None, referrer=None, verify=True, cert=None, headers=None, auth=None, requester_cb=None)[source]

Initialize the Session instance.

The HTTP header User-Agent of the session is set to a default value of bdownload/VERSION, if not provided, with VERSION being replaced by the package’s version number.

Parameters:
  • timeout (float or 2-tuple of float) – Timeout value(s) as a float or (connect, read) tuple for both the connect and the read timeouts, respectively. If set to None, 0 or (), whether the whole or any item thereof, it will take a default value from TIMEOUT, accordingly.

  • proxy (str) – Same as for BDownloader.__init__().

  • cookies (str, dict or CookieJar) – Same as for BDownloader.__init__().

  • user_agent (str) – Same as for BDownloader.__init__().

  • referrer (str) – Same as for BDownloader.__init__().

  • verify (bool or str) – Same as for requests.request().

  • cert (str or tuple) – Same as for requests.request().

  • headers (dict) – Same meaning as in BDownloader.__init__().

  • auth (tuple or requests.auth.AuthBase) – Same meaning as in BDownloader.__init__().

  • requester_cb (func) – The callback function provided by the downloader that uses the instantiated session object as the HTTP(S) requester. It will get called when making an HTTP GET request.

static _build_cookiejar_from_kvp(key_values)[source]

Build a CookieJar from cookies in the form of key/value pairs.

Parameters:

key_values (str) – The cookies must take the form of 'cookie_key=cookie_value', with multiple pairs separated by whitespace and/or semicolon if applicable, e.g. 'key1=val1 key2=val2; key3=val3'.

Returns:

The built CookieJar for requests sessions.

Return type:

requests.cookies.RequestsCookieJar

Raises:

ValueError – Raised when the cookies string key_values is not in valid format.

get(url, **kwargs)[source]

Wrapper around requests.Session’s get method decorated with the retry_requests() decorator.

Parameters:
  • url – URL for the file to download from.

  • **kwargs – Same arguments as that requests.Session.get takes.

Returns:

The response to the HTTP GET request.

Return type:

requests.Response

Raises:
  • BDownloaderException – Raised when the termination or cancellation flag has been set, for example, if RequestsSessionWrapper.requester_cb is initialized to BDownloader.raise_on_interrupted().

  • requests.RequestException – Raised when any of requests’s exceptions occurred or bad status codes were received and retries have been exhausted.

  • ExceptionByRequesterCB – Same exception(s) as that raised by RequestsSessionWrapper.requester_cb, if any.

bdownload.download.URLLIB3_BUILTIN_RETRIES_ON_EXCEPTION = 1

Default number of retries on exception set through urllib3’s Retry mechanism.

Type:

int

bdownload.download.URLLIB3_RETRY_STATUS_CODES = frozenset({413, 429, 500, 502, 503, 504})

Default status codes to retry on intended for the underlying urllib3.

Type:

set

bdownload.download._cpu_count()[source]

A simple wrapper around the cpu_count() for escaping the NotImplementedError.

Returns:

The number of CPUs in the system. Return 0 if not obtained.

Return type:

int

bdownload.download._requests_extended_retries_factor = 1

Number of retries that complements and extends the builtin Retry mechanism of urllib3.

This global variable is meant for the decorator retry_requests(), and its value can be modified through the module level function set_requests_retries_factor(). It is initialized to REQUESTS_EXTENDED_RETRIES_FACTOR by default, and usually you don’t want to change it.

Together with urllib3’s builtin retry logic, they determine the total number of the retries on exceptions and bad status codes at requests for downloading. For more details on the retry mechanisms, see requests_retry_session().

Notes

Don’t mix these two retry mechanisms up with the retries at failed connections while streaming the request content.

Type:

int

bdownload.download.requests_retry_session(builtin_retries=None, backoff_factor=0.1, status_forcelist=None, session=None, num_pools=20, pool_maxsize=20, pool_block=True, **kwargs)[source]

Create a session object of the class RequestsSessionWrapper by default.

Aside from the retry mechanism implemented by the wrapper decorator, the created session also leverages the built-in retries bound to urllib3. When both of them are enabled, they cooperate to determine the total retry attempts. The worst-case retries is determined using the following formula:

builtin_retries * (_requests_extended_retries_factor + 1)

which applies to all the exceptions and those status codes that fall into the status_forcelist. For other status codes, the maximum retries shall be _requests_extended_retries_factor.

Parameters:
  • builtin_retries (int) – Maximum number of retry attempts allowed on errors and interested status codes, which will apply to the retry logic of the underlying urllib3. If set to None or 0, it will default to URLLIB3_BUILTIN_RETRIES_ON_EXCEPTION.

  • backoff_factor (float) – The backoff factor to apply between retries.

  • status_forcelist (set of int) – A set of HTTP status codes that a retry should be enforced on. The default status forcelist shall be URLLIB3_RETRY_STATUS_CODES if not given.

  • session (requests.Session) – An instance of the class requests.Session or its customized subclass. When not provided, it will use RequestsSessionWrapper to create by default.

  • num_pools (int) – The number of connection pools to cache, which has the same meaning as num_pools in urllib3.PoolManager and will eventually be passed to it.

  • pool_maxsize (int) – The maximum number of connections to save that can be reused in the urllib3 connection pool, which will be passed to the underlying requests.adapters.HTTPAdapter.

  • pool_block (bool) – Whether the connection pool should block or create more connections when there is no free connections available.

  • **kwargs – Same arguments as that RequestsSessionWrapper.__init__() takes.

Returns:

The session instance with retry capability.

Return type:

requests.Session

bdownload.download.retry_requests(exceptions, status_exemptlist=frozenset({401, 407, 511}), backoff_factor=0.1, logger=None)[source]

A decorator that retries calling the wrapped requests’ function using an exponential backoff on exception.

The retry attempt will be activated in the event of exceptions being caught and for all the bad status codes (i.e. codes ranging from 400 to 600) except the ones in status_exemptlist.

Parameters:
  • exceptions (Exception or tuple of Exceptions) – The exceptions to check against.

  • status_exemptlist (set of int) – A set of HTTP status codes that the retry should be avoided.

  • backoff_factor (float) – The backoff factor to apply between retries.

  • logger (logging.Logger) – An event logger.

Returns:

The wrapper function.

Raises:

exceptions – Re-raise the last caught exception when retries is exhausted.

Notes

This function has an external dependency on the global variable _requests_extended_retries_factor, whose value can be changed through the function set_requests_retries_factor(). Also, it should be greater than 0, thus allowing the decorated method to retry at least once to cover the edge cases of exceptions and bad status codes.

bdownload.download.set_requests_retries_factor(retries)[source]

Set the retries factor for the decorator retry_requests().

Parameters:

retries (int) – Number of retries when a decorated method of requests raised an exception or returned any bad status code. It should take a value of at least 1, or else nothing changes.

Returns:

None.

bdownload.download.unquote_unicode(string)[source]

Unquote a percent-encoded string.

Parameters:

string (str) – A %xx- and %uxxxx- encoded string.

Returns:

The unquoted unicode string.

Return type:

str

bdownload.cli module

This module provides the entry point main for the command line utility bdownload.

bdownload.cli._cmd_quit_handler(bdownloader, signum, frame)[source]

The handler for the signals SIGTERM, SIGABRT, SIGHUP and SIGBREAK.

Parameters:
  • bdownloader (BDownloader) – The BDownloader instance acting as the file downloader.

  • signum – The signal number being one of the possible values as signal.SIGTERM, signal.SIGABRT, signal.SIGHUP, or signal.SIGBREAK.

  • frame – The current stack frame when the signal SIGINT is received.

bdownload.cli._dec_raw_tab_separated_urls(url)[source]

Decode a raw URL string that may consist of multiple escaped TAB-separated URLs.

Parameters:

url (str) – URL for the file to be downloaded, which might be TAB-separated composite URL pointing to the same file.

Returns:

Decoded URL.

Return type:

str

Raises:

ArgumentTypeError – Raised when url contains URL(s) that don’t conform to the format “http[s]://[user:pass@]foo.bar[*]”.

Examples

Examples of the parameter url include:
  • 'https://fakewebsite-01.com/downloads/soulbody4ct.pdf\thttps://fakesite.com/archives/soulbody4ct.pdf'

  • 'https://fakewebsite-01.com/downloads/ipcress.docx\thttps://fakewebsite-02.com/archives/ipcress.docx'

  • 'https://tianchengren:öp€nsasimi@i.louder.ss\thttps://fangxun.xiaoqing.sunmoon.xue'

bdownload.cli._interrupt_handler(bdownloader, signum, frame)[source]

The handler for the signals SIGINT and SIGQUIT.

Parameters:
  • bdownloader (BDownloader) – The BDownloader instance acting as the file downloader.

  • signum – The signal number being either signal.SIGINT or signal.SIGQUIT.

  • frame – The current stack frame when the signal SIGINT is received.

bdownload.cli._load_cookies(cookies)[source]

Load cookie(s) either from a Netscape cookie file or a string.

Parameters:

cookies (str) –

Cookies either in the form of a string (maybe whitespace- and/or semicolon- separated) like “cookie_key=cookie_value cookie_key2=cookie_value2; cookie_key3=cookie_value3”, or a file, e.g. named “cookies.txt”, in the Netscape cookie file format.

Note

The option -D DIR does not apply to the cookie file.

Returns:

A CookieJar or a validated cookies string.

Return type:

cookielib.MozillaCookieJar or str

Raises:

ArgumentTypeError – Raised when exception occurred while loading the cookies file or the cookies string is not in valid format.

bdownload.cli._normalize_bytes_num(bytes_num)[source]

Normalize and convert the integer number string expressed in the unit Byte.

Parameters:

bytes_num (str) – The integer number string that may be suffixed with a quantity of ‘K’ or ‘M’, where ‘K’ indicates multiples of 1024 and ‘M’ means multiples of 1024*1024.

Returns:

Normalized integer number.

Return type:

int

Raises:

ArgumentTypeError – Raised when passed bytes_num is neither a normal integer decimal number string nor a suffixed one.

bdownload.cli._validate_http_header(header)[source]

Validate and normalize the HTTP request header.

bdownload.cli._win32_utf8_argv()[source]

Use kernel32.GetCommandLineW and shell32.CommandLineToArgvW to get sys.argv as a list of UTF-8 strings.

Versions 2.5 and older of Python don’t support Unicode (“mon€y röcks” for example) in sys.argv on Windows, with the underlying Windows API instead replacing multi-byte characters with ‘?’.

Returns:

Command-line arguments. A list of utf-8 strings for success, None on failure.

Return type:

list of str

bdownload.cli.ignore_termination_signals()[source]

Cause the process not to respond to termination signals.

bdownload.cli.install_signal_handlers(bdownloader)[source]

Install handlers for termination signals.

Parameters:

bdownloader (BDownloader) – The BDownloader instance acting as the file downloader.

bdownload.cli.main()[source]

Collect the command-line arguments from sys.argv, parse and do the downloading as specified.

bdownload.utils module

bdownload.utils.get_latest_tag_github(owner, repo, key, **kwargs)[source]

Get the latest tag/version of a GitHub repository.

Parameters:
Returns:

The name of the latest tag.

Return type:

str

Raises:

exception – Same exception as that raised by bdownload.download.RequestsSessionWrapper.get().

bdownload.utils.update_cacert()[source]

Update certifi to the latest version of certificate authority (CA) bundle on Python2.7.