bdownload package¶
bdownload.download module¶
- class bdownload.download.BDownloader(max_workers=None, max_parallel_downloads=5, workers_per_download=4, min_split_size=1048576, chunk_size=102400, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20, pool_block=True, request_timeout=None, request_retries=None, status_forcelist=None, resumption_retries=None, continuation=True, referrer=None, check_certificate=True, ca_certificate=None, certificate=None, auth=None, netrc=None, headers=None)[source]¶
Bases:
objectThe class for executing and managing download jobs.
The context of the current downloading job is structured as:
ctx = { "total_size": 2000, # total size of all the to-be-downloaded files, maybe inaccurate due to chunked transfer encoding "accurate": True, # Is `total_size` accurate? "last_progress": 0, # the overall progress, in bytes, from last run loaded when resuming from interruption "downloaded": 0, # newly accumulated bytes from this run of downloads, which are updated on completion of every worker thread "orig_path_urls": [('file1', 'url1 url2 url3'), ('file2', 'url4 url5 url6')], # originally added downloads, # which don't necessarily correspond to `files` e.g. due to duplicate or interruption "file_cnt": 2, # number of current downloading files, i.e. `alt_files` "alt_files": [("full_path_to_file1", `ctx_file1_obj`), ("full_path_to_file2", `ctx_file2_obj`)], # flattened `files` # with the exception of the succeeded on addition "active_files": [("full_path_to_file1", `ctx_file1_obj`)], # scheduled, in-processing file downloads "active_downloads": 1, # number of in-processing file downloads "next_download": 1, # index to the next to schedule to download file "poll_changed": False, # Have the polled files' states changed? "files":{ "full_path_to_file1":{ "length": 2000, # 0 means 'unknown', i.e. file size can't be pre-determined through any one of provided URLs "progress": 0, # `SUCCEEDED` downloaded bytes: initialized to 0, set to the last progress when # resuming and updated on completion (SUCCEEDED only!) of every task (`Future`) "last_progress": 0, # CONSTANT: the loaded progress of last run upon resuming from interruption "downloaded": 0, # downloaded bytes: initialized to 0, and updated on completion (SUCCEEDED, FAILED) # of every task (`Future`) "stdout": False, # standard output "resumable": True, "resuming_from_intr": False, # Are we resuming from keyboard interruption? "download_state": "inprocess", "cancelled_on_exception": False, "orig_path_url": ('file1', 'url1 url2 url3'), # (path, url) as a subparameter passed to :meth:`downloads` "path_url": ('full_path_to_file1', 'url1 url2 url3'), # (full_pathname, active_URLs) "urls":{"url1":{"auth": None, "auth_header": {"Authorization": "Basic dXNlcjpwYXNz"}, "accept_ranges": "bytes", "refcnt": 1, "interrupted": 2, "succeeded": -5}, "url2":{"auth": None, "auth_header": {"Authorization": "Digest username='user',realm=..."}, "accept_ranges": "none", "refcnt": 0, "interrupted": 0, "succeeded": 0}, "url3":{"auth": None, "auth_header": {}, "accept_ranges": "bytes", "refcnt": 1, "interrupted": 0, "succeeded": -2}}, "alt_ranges": [("bytes=1000-1999", `ctx_range2_obj`)], # task ranges stack "worker_ranges": [("bytes=0-999", `ctx_range1_obj`)], # active range downloading tasks "active_workers": 1, # number of active worker threads on ranges downloading of the file "ranges_succeeded": 0, # number of ranges successfully downloaded "ranges":{ "bytes=0-999": { "start": 0, # start byte position "end": 999, # end byte position, None for 'unkown', see above "offset": 0, # current pointer position relative to 'start'(i.e. 0) "last_offset": 0, # the last pointer position where the range task failed and was rescheduled in this run "start_time": 0, "rt_dl_speed": 0, # x seconds interval "download_state": "inprocess", "future": future1, "url": [url1], "alt_urls": {} }, "bytes=1000-1999": { "start":1000, "end":1999, "offset": 0, # current pointer position relative to 'start'(i.e. 1000) "last_offset": 0, # the last pointer position where the range task failed and was rescheduled in this run "start_time": 0, "rt_dl_speed": 0, # x seconds interval "download_state": "inprocess", "future": future2, "url": [url3], "alt_urls": {} } } }, "full_path_to_file2":{ } } }- CANCELLED = 'cancelled'¶
- FAILED = 'failed'¶
- INPROCESS = 'inprocess'¶
- INPROCESS_EXT = '.bdl'¶
- PENDING = 'pending'¶
- PROGRESS_BS_BAR = 'bar'¶
- PROGRESS_BS_MILL = 'mill'¶
- PROGRESS_BS_NONE = 'none'¶
- RESUM_PARTS_EXT = '.bdl.par'¶
- STD_OUT = '-'¶
- SUCCEEDED = 'succeeded'¶
- __init__(max_workers=None, max_parallel_downloads=5, workers_per_download=4, min_split_size=1048576, chunk_size=102400, proxy=None, cookies=None, user_agent=None, logger=None, progress='mill', num_pools=20, pool_maxsize=20, pool_block=True, request_timeout=None, request_retries=None, status_forcelist=None, resumption_retries=None, continuation=True, referrer=None, check_certificate=True, ca_certificate=None, certificate=None, auth=None, netrc=None, headers=None)[source]¶
Create and initialize a
BDownloaderobject.- Parameters:
max_workers (int) – The max_workers parameter specifies the number of the parallel downloading threads, whose default value is determined by
#num_of_processor * 5if set to None.max_parallel_downloads (int) – max_parallel_downloads limits the number of files downloading concurrently. It has a default value of 5.
workers_per_download (int) – workers_per_download sets the maximum number of worker threads for every file downloading job, which defaults to 4.
min_split_size (int) – min_split_size denotes the size in bytes of file pieces split to be downloaded in parallel, which defaults to 1024*1024 bytes (i.e. 1MB).
chunk_size (int) – The chunk_size parameter specifies the chunk size in bytes of every http range request, which will take a default value of 1024*100 (i.e. 100KB) if not provided.
proxy (str) – The proxy supports both HTTP and SOCKS proxies in the form of
'http://[user:pass@]host:port'and'socks5://[user:pass@]host:port', respectively.cookies (str, dict or CookieJar) – If cookies needs to be set, it must either take the form of
'cookie_key=cookie_value', with multiple pairs separated by whitespace and/or semicolon if applicable, e.g.'key1=val1 key2=val2;key3=val3', be packed into adict, or be an instance ofCookieJar, i.e.cookielib.CookieJarfor Python27,http.cookiejar.CookieJarfor Python3.x orRequestsCookieJarfromrequests.user_agent (str) – When user_agent is not given, it will default to
'bdownload/VERSION', withVERSIONbeing replaced by the package’s version number.logger (logging.Logger) – The logger parameter specifies an event logger. If logger is not None, it must be an object of class
logging.Loggeror of its customized subclass. Otherwise, it will use a default module-level logger returned bylogging.getLogger(__name__).progress (str) – progress determines the style of the progress bar displayed while downloading files. Possible values are
'mill','bar'and'none'.'mill'is the default. To disable this feature, e.g. while scripting or multi-instanced, set it to'none'.num_pools (int) – The num_pools parameter has the same meaning as num_pools in
urllib3.PoolManagerand will eventually be passed to it. Specifically, num_pools specifies the number of connection pools to cache.pool_maxsize (int) – pool_maxsize will be passed to the underlying
requests.adapters.HTTPAdapter. It specifies the maximum number of connections to save that can be reused in the urllib3 connection pool.pool_block (bool) – pool_block specifies whether the urllib3 connection pool should block the call or create new connections when there is no free connections available.
request_timeout (float or 2-tuple of float) – The request_timeout parameter specifies the timeouts for the internal
requestssession. The timeout value(s) as a float or(connect, read)tuple is intended for both theconnectand thereadtimeouts, respectively. If set toNone, it will take a default value ofRequestsSessionWrapper.TIMEOUT.request_retries (int) –
request_retries specifies the maximum number of retry attempts allowed on exceptions and interested status codes(i.e. status_forcelist) for the builtin Retry logic of
urllib3. It will default toURLLIB3_BUILTIN_RETRIES_ON_EXCEPTIONif not given.Notes
There are two retry mechanisms that jointly determine the total retries of a request. One is the above-mentioned Retry logic that is built into
urllib3, and the other is the extended high-level retry factor that is meant to complement the builtin retry mechanism. The total retries is bounded by the following formula:request_retries * (
_requests_extended_retries_factor+ 1)See
retry_requests(),RequestsSessionWrapperandrequests_retry_session()for more details on the retry mechanisms.status_forcelist (set of int) – status_forcelist specifies a set of HTTP status codes that a retry should be enforced on. The default set of status codes shall be
URLLIB3_RETRY_STATUS_CODESif not given.resumption_retries (int) – The resumption_retries parameter specifies the maximum allowable number of retries on error at resuming the interrupted download while streaming the request content. The default value of it is
REQUESTS_RETRIES_ON_STREAM_EXCEPTIONwhen not provided.continuation (bool) – The continuation parameter specifies whether, if possible, to resume the partially downloaded files before, e.g. when the downloads had been terminated by the user by pressing Ctrl-C. When not present, it will default to True.
referrer (str) – referrer specifies an HTTP request header
Refererthat applies to all downloads. If set to'*', the request URL shall be used as the referrer per download.check_certificate (bool) – The check_certificate parameter specifies whether to verify the server’s TLS certificate or not. It defaults to True.
ca_certificate (str) – The ca_certificate parameter specifies a path to the preferred CA bundle file (.pem) or directory with certificates in PEM format of trusted CAs. If set to a path to a directory, the directory must have been processed using the
c_rehashutility supplied with OpenSSL, according torequests. NB the cert files in the directory each only contain one CA certificate.certificate (str or tuple) – certificate specifies a client certificate. It has the same meaning as that of cert in
requests.request().auth (tuple or
requests.auth.AuthBase) –The auth parameter sets a (user, pass) tuple or Auth handler to enable Basic/Digest/Custom HTTP Authentication. It will be passed down to the underlying
requests.Sessioninstance as the default authentication.Warning
The auth will be applied to all the downloads for HTTP Authentication. Don’t use this parameter, if not all of the downloads need the authentication, to avoid leaking credential. Instead, use the netrc parameter for fine-grained control over HTTP Authentication.
netrc (dict) – netrc specifies a dictionary of
'machine': (login, password)(or'machine': requests.auth.AuthBase) for HTTP Authentication, similar to the .netrc file format in spirit.headers (dict) – headers specifies extra HTTP headers, standard or custom, for use in all of the requests made by the session. The headers take precedence over the ones specified by other parameters, e.g. user_agent, if conflict happens.
- Raises:
ValueError – Raised when the cookies is of the
strtype and not in valid format.
- _backup_resumption_ctx(the_file, ctx_file)[source]¶
Back up the necessary context of the unsuccessful download for resuming later.
- _build_ctx(path_urls)[source]¶
Build the context for downloading the file(s).
- Parameters:
path_urls (list of tuple) – Paths and URLs for the file(s) to download, see
downloads()for details.- Returns:
A 6-tuple of lists
'(active, active_orig, failed, failed_orig, existing, existing_orig)', where thelistsactiveandactive_origcontain the active(path, url)’s, converted and original respectively;failedandfailed_origcontain the same(path, url)’s that are not downloadable;existingandexisting_origcontain the downloads whose desired files already exist out there.- Raises:
BDownloaderException – Raised when the termination or cancellation flag has been set.
- _build_ctx_internal(path_name, url)[source]¶
The helper method that actually does the build of the downloading context of the file.
- Parameters:
- Returns:
A 3-tuple
'(downloadable, (path, url), (orig_path, orig_url))', where thedownloadableindicates whether the desired file is downloadable, unavailable or existing byTrue,FalseorNonerespectively,(path, url)denotes the converted full pathname and the URL that consists only of active URLs, and(orig_path, orig_url)denotes the originally input pathname and URL.- Return type:
- Raises:
BDownloaderException – Raised when the termination or cancellation flag has been set.
- _calc_completed()[source]¶
Calculate the already downloaded bytes of the files.
- Returns:
The size in bytes of the downloaded pieces.
- Return type:
- _cancel_all_on_interrupted()[source]¶
Cancel all the pending tasks when receiving the
SIGINTsignal or the QUIT command.
- _finalize_on_interrupted_py2()[source]¶
When interrupted under Python2.x, perform state transitions manually and act accordingly.
- _get_alt_urls(path_name)[source]¶
Get alternative URLs from the multiple sources of the file to resume downloading from.
- static _get_fname_from_hdr(content_disposition)[source]¶
“Get the file name from the HTTP response header.
- Parameters:
content_disposition (str) – Content of the
Content-Dispositionfield of the response header.- Returns:
The extracted file name.
- Return type:
References
- _get_remote_file_multipart(path_name, req_range)[source]¶
The worker thread body for downloading an assigned piece of a file.
- Parameters:
- Returns:
None.
- Raises:
BDownloaderException – Raised when connect timeouts, read timeouts, failed connections or bad status codes occurred and the retries is exhausted.
EnvironmentError – Raised when file operations failed.
- _get_remote_file_singlewhole(path_name, req_range)[source]¶
The worker thread body for downloading the whole of a file, as opposed to
_get_remote_file_multipart().- Parameters:
- Returns:
None.
- Raises:
BDownloaderException – Raised when connect timeouts, read timeouts, failed connections or bad status codes occurred and the retries is exhausted.
EnvironmentError – Raised when file operations failed.
- _is_all_done()[source]¶
Check if all the tasks have completed.
- Returns:
Trueif all theFutures have been done, meaning that all the files have finished downloading, whether successfully or not;Falseotherwise.- Return type:
- _is_download_resumable(path_name)[source]¶
Check if the current download of the file can be resumed from the point of last interruption through retrying.
- _is_parallel_downloadable(path_name)[source]¶
Check if the file can be downloaded in parallel, i.e. using multi-threads to download the file pieces simultaneously.
- _load_resumption_ctx(the_file, ctx_file)[source]¶
Load from the resumption parts file to restore the download context.
- Parameters:
- Returns:
A 2-tuple
(is_resuming, resumption_ctx), whereis_resumingindicates whether the download is resuming from last interruption, and if this is the case (True),resumption_ctxholds the successfully loaded resumption context.- Return type:
- _mgmnt_task()[source]¶
The management thread body.
This thread manages the downloading process of the whole job queue, currently including state management only. When all the tasks have been done, it signals the waiting thread and exits immediately.
- Returns:
None.
- _on_cancelled(the_file, ctx_file)[source]¶
When transitioning to the CANCELLED state, remove the empty, obsolete files.
- _on_failed(the_file, ctx_file)[source]¶
When transitioning to the FAILED state, save the resumption ctx or remove the intermediate files.
- _on_succeeded(the_file, ctx_file)[source]¶
When transitioning to the SUCCEEDED state, convert from in-process to finished file and do the cleanup.
- _pick_file_url(path_name)[source]¶
Select one URL from the multiple sources of the file to download from.
- Parameters:
path_name (str) – The full path name of the file to be downloaded.
- Yields:
list – A list of URL(s) to download the file from using a strategy of
Round Robin.
- _progress_task()[source]¶
The thread body for showing the progress of the downloading tasks.
- Returns:
None.
- _rename_existing_file(full_pathname)[source]¶
Rename the file or directory with the given pathname if present.
- Parameters:
full_pathname (str) – The full path name of the file to check for duplicate.
- _result()[source]¶
“Return both the succeeded and failed downloads when all done or interrupted by user.
- Returns:
Same as that returned by
wait_for_all().- Return type:
tuple of list
- _schedule_dl_tasks(path_name, num_tasks)[source]¶
Arrange the range downloading tasks of the file and assign them to the thread pool executor.
- _schedule_file_download(the_file, ctx_file)[source]¶
Remove the succeeded range tasks, reassign the failed and arrange new for the file downloading.
- _schedule_files_downloads()[source]¶
Remove the completed tasks from the files downloading queue and submit new file task assignments.
- _state_mgmnt()[source]¶
Perform the state-related operations of file downloading.
This method updates the download status of the files and their related chunks when the associated worker threads completed, either because of finished without error, raised on exception or cancelled intentionally.
- Returns:
None.
- static _topmost_missing_dir(path)[source]¶
Find the topmost non-existent directory for a given path.
- static calc_req_ranges(req_len, split_size, req_start=0)[source]¶
Split the request req_len into chunks of the size split_size starting from the point req_start.
- cancel(keyboard_interrupt=True)[source]¶
Cancel all the download jobs.
- Parameters:
keyboard_interrupt (bool) – Specifies whether or not the user hit the interrupt key (e.g. Ctrl-C).
- Returns:
None.
- download(path_name, url)[source]¶
Submit a single downloading job to the downloading queue.
This method is simply a wrapper of the method
downloads().- Parameters:
- Returns:
None.
- Raises:
BDownloaderException – Same as in
downloads().
Notes
The limitation on the method and the path_name parameter herein is the same as in
downloads().
- downloads(path_urls)[source]¶
Submit multiple downloading jobs at a time to the downloading queue.
- Parameters:
path_urls (
listoftuples) – path_urls accepts a list of tuples of the form(path, url), wherepathshould be a pathname, optionally prefixed with absolute or relative paths, andurlshould be a URL string, which may consist of multiple TAB-separated URLs pointing to the same file. Note that a single dash ‘-’ specifies thepathreserved for the standard output. A valid path_urls, for example, could be [(‘/opt/files/bar.tar.bz2’,'https://foo.cc/bar.tar.bz2'), (‘./sanguoshuowen.pdf’,'https://bar.cc/sanguoshuowen.pdf\thttps://foo.cc/sanguoshuowen.pdf'), (‘/to/be/created/’,'https://flash.jiefang.rmy/lc-cl/gaozhuang/chelsia/rockspeaker.tar.gz'), (‘/path/to/existing-dir’,'https://ghosthat.bar/foo/puretonecone81.xz\thttps://tpot.horn/foo/pure tonecone81.xz\thttps://hawkhill.bar/foo/puretonecone81.xz')].- Returns:
None.
- Raises:
BDownloaderException – Raised when the downloads were interrupted, e.g. by calling
cancel()in aSIGINTsignal handler, in the process of submitting the download requests.
Notes
The method is not thread-safe, which means it should not be called at the same time in multiple threads with one instance.
When multi-instanced (e.g. one instance per thread), the file paths specified in one instance should not overlap those in another to avoid potential race conditions. File loss may occur, for example, if a failed download task in one instance tries to delete a directory that is being accessed by some download tasks in other instances. However, this limitation doesn’t apply to the file paths specified in a same instance.
- progress_all()[source]¶
Get the coarse-grained, overall progress of the downloads.
- Returns:
The 3-tuple of the form
(completed_bytes, total_bytes, is_accurate).completed_bytesis updated on a chunk basis from the worker threads by the management task. Ifis_accurateis False thentotal_bytesis inaccurate, i.e. some downloads have undetermined sizes, which also meanscompleted_bytesmay be greater than thetotal_bytes; otherwise,total_bytesis the exact sum of sizes of all the downloads. Note thattotal_bytes(andis_accurate) may vary during the phase of submitting the downloads.- Return type:
- raise_on_interrupted()[source]¶
Raise a customized exception signaling that the downloads have been terminated by the user.
- Raises:
BDownloaderException – Raised when the termination or cancellation flag has been set.
- result()[source]¶
Return the final download status.
- Returns:
0 for success, and -1 failure.
- Return type:
- results()[source]¶
Get both the succeeded and failed downloads when all done or interrupted by user.
- Returns:
Same as that returned by
wait_for_all().- Return type:
tuple of list
- wait_for_all()[source]¶
Wait for all the downloading jobs to complete.
- Returns:
A 2-tuple of lists
'(succeeded, failed)'. The first listsucceededcontains the originally passed(path, url)s that finished successfully, while the second listfailedcontains the raised and cancelled ones.- Return type:
tuple of list
- exception bdownload.download.BDownloaderException[source]¶
Bases:
ExceptionThe exception indicating that an error occurred while executing the download tasks.
- bdownload.download.COOKIE_STR_REGEX = re.compile('^\\s*(?:[^,; =]+=[^,; ]+\\s*(?:$|\\s+|;\\s*))+\\s*$')¶
A compiled regular expression object used to match the cookie string in the form of key/value pairs.
See also
BDownloader.__init__()for more details about cookies.- Type:
regex
- class bdownload.download.HTTPBasicAuthEx(username, password)[source]¶
Bases:
HTTPBasicAuthAttaches HTTP Basic Authentication to the given Request object.
This class is adapted from
requests.auth.HTTPBasicAuthandrequests.auth.HTTPDigestAuth, with added support for handling Unauthorized request on the response.
- bdownload.download.HTTP_HEADER_REGEX = re.compile('^\\s*[a-zA-Z0-9_-]+:\\s*[a-zA-Z0-9_ :;.,\\\\/"\\\'?!(){}[\\]@<>=\\-+*#$&`|~^%]*$')¶
A compiled regular expression object used to validate the HTTP request header in the
'name: value'format.- Type:
regex
- class bdownload.download.MillProgress(label='', hide=None, expected_size=None, every=1, eta_tag='eta:', elapsed_tag='elapsed:')[source]¶
Bases:
objectPrint a mill while progressing.
This class is adapted from
clint.textui.progress, with added support for unknown expected_size.- ETA_INTERVAL = 1¶
- ETA_SMA_WINDOW = 9¶
- MILL_CHARS = ['|', '/', '-', '\\']¶
- MILL_TEMPLATE = '{} {} {:,d}/{:<} {} {} {}\r'¶
- NULL_EXPECTED_DISP = '--'¶
- NULL_EXPECTED_WIDTH = 2¶
- STREAM = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>¶
- bdownload.download.PICKLE_PROTOCOL_NUMBER = 2¶
The highest pickle protocol number valid for both Python 2.x and Python 3.x.
- Type:
- bdownload.download.REQUESTS_EXTENDED_RETRIES_FACTOR = 1¶
Default number of retries factor for
_requests_extended_retries_factor.- Type:
- bdownload.download.REQUESTS_RETRIES_ON_STREAM_EXCEPTION = 10¶
Default number of retries on exceptions raised while streaming the request content.
- Type:
- bdownload.download.RETRY_EXEMPT_STATUS_CODES = frozenset({401, 407, 511})¶
Default status codes that should be avoided retrying on before handled
- Type:
- class bdownload.download.RequestsSessionWrapper(timeout=None, proxy=None, cookies=None, user_agent=None, referrer=None, verify=True, cert=None, headers=None, auth=None, requester_cb=None)[source]¶
Bases:
SessionSubclass of the
requests.Sessionclass with extended retry-on-exception behavior for thegetmethod.Note
The retry mechanism here is independent from that built into
urllib3(see_requests_extended_retries_factorandretry_requests()). That is, the decorated retry attempts will be triggered whenever thegetmethod raised on somerequests.RequestExceptionor for any bad status code, regardless of whether or not the builtin Retry ofurllib3is enabled. Nevertheless, they together determine the number of the total retries. Seerequests_retry_session()for more information about their cooperation.- TIMEOUT = (3.05, 6)¶
the connect timeout value defaults to 3.05 seconds, and the read timeout 6 seconds.
- Type:
Default timeouts
- __init__(timeout=None, proxy=None, cookies=None, user_agent=None, referrer=None, verify=True, cert=None, headers=None, auth=None, requester_cb=None)[source]¶
Initialize the
Sessioninstance.The HTTP header
User-Agentof the session is set to a default value of bdownload/VERSION, if not provided, with VERSION being replaced by the package’s version number.- Parameters:
timeout (float or 2-tuple of float) – Timeout value(s) as a float or
(connect, read)tuple for both theconnectand thereadtimeouts, respectively. If set toNone,0or(), whether the whole or any item thereof, it will take a default value fromTIMEOUT, accordingly.proxy (str) – Same as for
BDownloader.__init__().cookies (str, dict or CookieJar) – Same as for
BDownloader.__init__().user_agent (str) – Same as for
BDownloader.__init__().referrer (str) – Same as for
BDownloader.__init__().headers (dict) – Same meaning as in
BDownloader.__init__().auth (tuple or
requests.auth.AuthBase) – Same meaning as inBDownloader.__init__().requester_cb (func) – The callback function provided by the downloader that uses the instantiated session object as the HTTP(S) requester. It will get called when making an HTTP GET request.
- static _build_cookiejar_from_kvp(key_values)[source]¶
Build a CookieJar from cookies in the form of key/value pairs.
- Parameters:
key_values (str) – The cookies must take the form of
'cookie_key=cookie_value', with multiple pairs separated by whitespace and/or semicolon if applicable, e.g.'key1=val1 key2=val2; key3=val3'.- Returns:
The built CookieJar for
requestssessions.- Return type:
requests.cookies.RequestsCookieJar- Raises:
ValueError – Raised when the cookies string key_values is not in valid format.
- get(url, **kwargs)[source]¶
Wrapper around
requests.Session’s get method decorated with theretry_requests()decorator.- Parameters:
url – URL for the file to download from.
**kwargs – Same arguments as that
requests.Session.gettakes.
- Returns:
The response to the HTTP
GETrequest.- Return type:
requests.Response- Raises:
BDownloaderException – Raised when the termination or cancellation flag has been set, for example, if
RequestsSessionWrapper.requester_cbis initialized toBDownloader.raise_on_interrupted().requests.RequestException – Raised when any of
requests’s exceptions occurred or bad status codes were received and retries have been exhausted.ExceptionByRequesterCB – Same exception(s) as that raised by
RequestsSessionWrapper.requester_cb, if any.
- bdownload.download.URLLIB3_BUILTIN_RETRIES_ON_EXCEPTION = 1¶
Default number of retries on exception set through
urllib3’s Retry mechanism.- Type:
- bdownload.download.URLLIB3_RETRY_STATUS_CODES = frozenset({413, 429, 500, 502, 503, 504})¶
Default status codes to retry on intended for the underlying
urllib3.- Type:
- bdownload.download._cpu_count()[source]¶
A simple wrapper around the
cpu_count()for escaping the NotImplementedError.- Returns:
The number of CPUs in the system. Return
0if not obtained.- Return type:
- bdownload.download._requests_extended_retries_factor = 1¶
Number of retries that complements and extends the builtin Retry mechanism of
urllib3.This global variable is meant for the decorator
retry_requests(), and its value can be modified through the module level functionset_requests_retries_factor(). It is initialized toREQUESTS_EXTENDED_RETRIES_FACTORby default, and usually you don’t want to change it.Together with
urllib3’s builtin retry logic, they determine the total number of the retries on exceptions and bad status codes at requests for downloading. For more details on the retry mechanisms, seerequests_retry_session().Notes
Don’t mix these two retry mechanisms up with the retries at failed connections while streaming the request content.
- Type:
- bdownload.download.requests_retry_session(builtin_retries=None, backoff_factor=0.1, status_forcelist=None, session=None, num_pools=20, pool_maxsize=20, pool_block=True, **kwargs)[source]¶
Create a session object of the class
RequestsSessionWrapperby default.Aside from the retry mechanism implemented by the wrapper decorator, the created session also leverages the built-in retries bound to
urllib3. When both of them are enabled, they cooperate to determine the total retry attempts. The worst-case retries is determined using the following formula:builtin_retries * (
_requests_extended_retries_factor+ 1)which applies to all the exceptions and those status codes that fall into the status_forcelist. For other status codes, the maximum retries shall be
_requests_extended_retries_factor.- Parameters:
builtin_retries (int) – Maximum number of retry attempts allowed on errors and interested status codes, which will apply to the retry logic of the underlying
urllib3. If set to None or0, it will default toURLLIB3_BUILTIN_RETRIES_ON_EXCEPTION.backoff_factor (float) – The backoff factor to apply between retries.
status_forcelist (set of int) – A set of HTTP status codes that a retry should be enforced on. The default status forcelist shall be
URLLIB3_RETRY_STATUS_CODESif not given.session (
requests.Session) – An instance of the classrequests.Sessionor its customized subclass. When not provided, it will useRequestsSessionWrapperto create by default.num_pools (int) – The number of connection pools to cache, which has the same meaning as num_pools in
urllib3.PoolManagerand will eventually be passed to it.pool_maxsize (int) – The maximum number of connections to save that can be reused in the
urllib3connection pool, which will be passed to the underlyingrequests.adapters.HTTPAdapter.pool_block (bool) – Whether the connection pool should block or create more connections when there is no free connections available.
**kwargs – Same arguments as that
RequestsSessionWrapper.__init__()takes.
- Returns:
The session instance with retry capability.
- Return type:
requests.Session
- bdownload.download.retry_requests(exceptions, status_exemptlist=frozenset({401, 407, 511}), backoff_factor=0.1, logger=None)[source]¶
A decorator that retries calling the wrapped
requests’ function using an exponential backoff on exception.The retry attempt will be activated in the event of exceptions being caught and for all the bad status codes (i.e. codes ranging from 400 to 600) except the ones in status_exemptlist.
- Parameters:
exceptions (
ExceptionortupleofExceptions) – The exceptions to check against.status_exemptlist (set of int) – A set of HTTP status codes that the retry should be avoided.
backoff_factor (float) – The backoff factor to apply between retries.
logger (logging.Logger) – An event logger.
- Returns:
The wrapper function.
- Raises:
exceptions – Re-raise the last caught exception when retries is exhausted.
Notes
This function has an external dependency on the global variable
_requests_extended_retries_factor, whose value can be changed through the functionset_requests_retries_factor(). Also, it should be greater than0, thus allowing the decorated method to retry at least once to cover the edge cases of exceptions and bad status codes.
- bdownload.download.set_requests_retries_factor(retries)[source]¶
Set the retries factor for the decorator
retry_requests().- Parameters:
retries (int) – Number of retries when a decorated method of
requestsraised an exception or returned any bad status code. It should take a value of at least1, or else nothing changes.- Returns:
None.
bdownload.cli module¶
This module provides the entry point main for the command line utility bdownload.
- bdownload.cli._cmd_quit_handler(bdownloader, signum, frame)[source]¶
The handler for the signals
SIGTERM,SIGABRT,SIGHUPandSIGBREAK.- Parameters:
bdownloader (BDownloader) – The
BDownloaderinstance acting as the file downloader.signum – The signal number being one of the possible values as
signal.SIGTERM,signal.SIGABRT,signal.SIGHUP, orsignal.SIGBREAK.frame – The current stack frame when the signal
SIGINTis received.
- bdownload.cli._dec_raw_tab_separated_urls(url)[source]¶
Decode a raw URL string that may consist of multiple escaped TAB-separated URLs.
- Parameters:
url (str) – URL for the file to be downloaded, which might be TAB-separated composite URL pointing to the same file.
- Returns:
Decoded URL.
- Return type:
- Raises:
ArgumentTypeError – Raised when url contains URL(s) that don’t conform to the format “http[s]://[user:pass@]foo.bar[*]”.
Examples
- Examples of the parameter url include:
'https://fakewebsite-01.com/downloads/soulbody4ct.pdf\thttps://fakesite.com/archives/soulbody4ct.pdf''https://fakewebsite-01.com/downloads/ipcress.docx\thttps://fakewebsite-02.com/archives/ipcress.docx''https://tianchengren:öp€nsasimi@i.louder.ss\thttps://fangxun.xiaoqing.sunmoon.xue'
- bdownload.cli._interrupt_handler(bdownloader, signum, frame)[source]¶
The handler for the signals
SIGINTandSIGQUIT.- Parameters:
bdownloader (BDownloader) – The
BDownloaderinstance acting as the file downloader.signum – The signal number being either
signal.SIGINTorsignal.SIGQUIT.frame – The current stack frame when the signal
SIGINTis received.
- bdownload.cli._load_cookies(cookies)[source]¶
Load cookie(s) either from a Netscape cookie file or a string.
- Parameters:
cookies (str) –
Cookies either in the form of a string (maybe whitespace- and/or semicolon- separated) like “cookie_key=cookie_value cookie_key2=cookie_value2; cookie_key3=cookie_value3”, or a file, e.g. named “cookies.txt”, in the Netscape cookie file format.
Note
The option -D DIR does not apply to the cookie file.
- Returns:
A
CookieJaror a validated cookies string.- Return type:
cookielib.MozillaCookieJaror str- Raises:
ArgumentTypeError – Raised when exception occurred while loading the cookies file or the cookies string is not in valid format.
- bdownload.cli._normalize_bytes_num(bytes_num)[source]¶
Normalize and convert the integer number string expressed in the unit
Byte.- Parameters:
bytes_num (str) – The integer number string that may be suffixed with a quantity of ‘K’ or ‘M’, where ‘K’ indicates multiples of 1024 and ‘M’ means multiples of 1024*1024.
- Returns:
Normalized integer number.
- Return type:
- Raises:
ArgumentTypeError – Raised when passed bytes_num is neither a normal integer decimal number string nor a suffixed one.
- bdownload.cli._validate_http_header(header)[source]¶
Validate and normalize the HTTP request header.
- bdownload.cli._win32_utf8_argv()[source]¶
Use
kernel32.GetCommandLineWandshell32.CommandLineToArgvWto getsys.argvas a list of UTF-8 strings.Versions 2.5 and older of Python don’t support Unicode (“mon€y röcks” for example) in
sys.argvon Windows, with the underlying Windows API instead replacing multi-byte characters with ‘?’.- Returns:
Command-line arguments. A list of utf-8 strings for success, None on failure.
- Return type:
list of str
- bdownload.cli.ignore_termination_signals()[source]¶
Cause the process not to respond to termination signals.
- bdownload.cli.install_signal_handlers(bdownloader)[source]¶
Install handlers for termination signals.
- Parameters:
bdownloader (BDownloader) – The
BDownloaderinstance acting as the file downloader.
bdownload.utils module¶
- bdownload.utils.get_latest_tag_github(owner, repo, key, **kwargs)[source]¶
Get the latest tag/version of a GitHub repository.
- Parameters:
owner (str) – The account owner of the repository.
repo (str) – The name of the repository.
key (func) – A function for extracting comparison key from each tag/version.
**kwargs – Same arguments as that of
bdownload.download.RequestsSessionWrapper.__init__().
- Returns:
The name of the latest tag.
- Return type:
- Raises:
exception – Same exception as that raised by
bdownload.download.RequestsSessionWrapper.get().