Skip to content

Commit 519d3cc

Browse files
authored
[Executor] Make retry count configurable in RetryingFunctionExecutor (#1444)
* [RetryingFunctionExecutor] Allow to add retires in lithops config
1 parent d458d24 commit 519d3cc

11 files changed

Lines changed: 159 additions & 103 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
### Fixed
1212
- [Localhost] Fix shutil.Error caused by existing __pycache__ directory when copying files in the runner
13+
- [Executor] Make retry count configurable in RetryingFunctionExecutor
1314

1415

1516
## [v3.6.1]

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ To contribute a patch:
1616
1. Break your work into small, single-purpose patches if possible. It's much
1717
harder to merge in a large change with a lot of disjoint features.
1818
2. Submit the patch as a GitHub pull request against the master branch.
19-
3. Make sure that your code passes the functional tests. See the [Functional testing](#functional-testing) section below.
19+
3. Make sure that your code passes the tests.
2020
4. Make sure that your code passes the linter. Install `flake8` with `pip3 install flake8` and run the next command until you don't see any linitng error:
2121
```bash
2222
flake8 lithops --count --max-line-length=180 --statistics --ignore W605,W503
2323
```
24-
6. Add new unit tests for your code.
24+
5. Add new tests for your code.
2525

2626

2727
Testing

config/README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -144,18 +144,19 @@ if __name__ == '__main__':
144144

145145
## Summary of configuration keys for Lithops
146146

147-
|Group|Key|Default|Mandatory|Additional info|
148-
|---|---|---|---|---|
149-
|lithops | backend | aws_lambda | no | Compute backend implementation. `localhost` is the default if no config or config file is provided|
150-
|lithops | storage | aws_s3 | no | Storage backend implementation. `localhost` is the default if no config or config file is provided|
151-
|lithops | data_cleaner | True | no |If set to True, then the cleaner will automatically delete all the temporary data that was written into `storage_bucket/lithops.jobs`|
152-
|lithops | monitoring | storage | no | Monitoring system implementation. One of: **storage** or **rabbitmq** |
153-
|lithops | monitoring_interval | 2 | no | Monitoring check interval in seconds in case of **storage** monitoring |
154-
|lithops | data_limit | 4 | no | Max (iter)data size (in MB). Set to False for unlimited size |
155-
|lithops | execution_timeout | 1800 | no | Functions will be automatically killed if they exceed this execution time (in seconds). Alternatively, it can be set in the `call_async()`, `map()` or `map_reduce()` calls using the `timeout` parameter.|
156-
|lithops | include_modules | [] | no | Explicitly pickle these dependencies. All required dependencies are pickled if default empty list. No one dependency is pickled if it is explicitly set to None |
157-
|lithops | exclude_modules | [] | no | Explicitly keep these modules from pickled dependencies. It is not taken into account if you set include_modules |
158-
|lithops | log_level | INFO |no | Logging level. One of: WARNING, INFO, DEBUG, ERROR, CRITICAL, Set to None to disable logging |
159-
|lithops | log_format | "%(asctime)s [%(levelname)s] %(name)s -- %(message)s" |no | Logging format string |
160-
|lithops | log_stream | ext://sys.stderr |no | Logging stream. eg.: ext://sys.stderr, ext://sys.stdout|
161-
|lithops | log_filename | |no | Path to a file. log_filename has preference over log_stream. |
147+
| Group | Key | Default | Mandatory | Additional info |
148+
|---------|---------------------|--------------|-----------|--------------------------------------------------------------------------------------------------|
149+
| lithops | backend | aws_lambda | no | Compute backend implementation. `localhost` is the default if no config or config file is provided. |
150+
| lithops | storage | aws_s3 | no | Storage backend implementation. `localhost` is the default if no config or config file is provided. |
151+
| lithops | data_cleaner | True | no | If True, automatically deletes temporary data written to `storage_bucket/lithops.jobs`. |
152+
| lithops | monitoring | storage | no | Monitoring system implementation. Options: **storage** or **rabbitmq**. |
153+
| lithops | monitoring_interval | 2 | no | Interval in seconds for monitoring checks when using **storage** monitoring. |
154+
| lithops | data_limit | 4 | no | Maximum size (in MB) for iterator data chunks. Set to False for unlimited size. |
155+
| lithops | execution_timeout | 1800 | no | Maximum execution time in seconds for functions. Functions exceeding this time are terminated. Can also be set per call via the `timeout` parameter. |
156+
| lithops | include_modules | [] | no | List of dependencies to explicitly include for pickling. If empty, all required dependencies are included. If set to None, no dependencies are included. |
157+
| lithops | exclude_modules | [] | no | List of dependencies to exclude from pickling. Ignored if `include_modules` is set. |
158+
| lithops | log_level | INFO | no | Logging level. Options: WARNING, INFO, DEBUG, ERROR, CRITICAL. Set to None to disable logging. |
159+
| lithops | log_format | "%(asctime)s [%(levelname)s] %(name)s -- %(message)s" | no | Format string for log messages. |
160+
| lithops | log_stream | ext://sys.stderr | no | Logging output stream, e.g., ext://sys.stderr or ext://sys.stdout. |
161+
| lithops | log_filename | (empty) | no | File path for logging output. Overrides `log_stream` if set. |
162+
| lithops | retries | 0 | no | Number of retries for failed function invocations when using the `RetryingFunctionExecutor`. Default is 0. Can be overridden per API call. |

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ To start using Lithops:
7373
7474
pip install lithops
7575
76-
2. Configure your cloud credentials (see the full guide in :doc:`/config`)
76+
2. Configure your cloud credentials (see the :doc:`Configuration Guide <source/configuration>`)
7777

7878
3. Write and run your first parallel job:
7979

docs/source/api_futures.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ The core abstraction in Lithops is the **executor**, responsible for orchestrati
77

88
To get started, you typically import `lithops` and create an executor instance to run your code. Lithops provides a flexible set of executors to suit different needs.
99

10-
### Primary Executors
10+
Primary Executors
11+
-----------------
1112

1213
* **FunctionExecutor** (`lithops.FunctionExecutor()`):
1314
The main, generic executor that automatically selects its execution mode based on the provided configuration.
@@ -17,7 +18,8 @@ To get started, you typically import `lithops` and create an executor instance t
1718
A robust wrapper around `FunctionExecutor` that transparently handles retries on failed tasks.
1819
It supports all features of `FunctionExecutor` with added automatic retry logic, improving fault tolerance and reliability for unstable or transient failure-prone environments.
1920

20-
### Secondary Executors
21+
Secondary Executors
22+
-------------------
2123

2224
For more specialized use cases, Lithops also provides explicit executors for each execution mode:
2325

@@ -30,14 +32,12 @@ For more specialized use cases, Lithops also provides explicit executors for eac
3032
* **StandaloneExecutor** (`lithops.StandaloneExecutor()`):
3133
Runs jobs on standalone compute backends such as clusters or virtual machines, suitable for long-running or resource-heavy tasks.
3234

33-
---
3435

35-
### Configuration and Initialization
36+
Configuration and Initialization
37+
================================
3638

3739
By default, executors load configuration from the Lithops configuration file (e.g., `lithops_config.yaml`). You can also supply configuration parameters programmatically via a Python dictionary when creating an executor instance. Parameters passed explicitly override those in the config file, allowing for flexible customization on the fly.
3840

39-
---
40-
4141
This layered executor design lets Lithops provide a powerful, unified API for parallel function execution — from local development to multi-cloud production deployments with fault tolerance and retries built-in.
4242

4343

docs/source/comparing_lithops.rst

Lines changed: 23 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,61 @@
1-
Comparing Lithops with other distributed computing frameworks
1+
Comparing Lithops with Other Distributed Computing Frameworks
22
=============================================================
33

4-
In a nutshell, Lithops differs from other distributed computing frameworks in that Lithops leverages serverless
5-
functions to compute massively parallel computations.
4+
Lithops introduces a novel approach to distributed computing by leveraging **serverless functions** for massively parallel computations. Unlike traditional frameworks that require managing a cluster of nodes, Lithops utilizes Function-as-a-Service (FaaS) platforms to dynamically scale execution resources — down to zero when idle and massively up when needed.
65

7-
In addition, Lithops provides a simple and easy-to-use interface to access and process data stored in Object Storage
8-
from your serverless functions.
9-
10-
Moreover, Lithops abstract design allows seamlessly portability between clouds and FaaS services, avoiding vendor
11-
lock-in.
6+
In addition, Lithops offers a simple and consistent programming interface to transparently process data stored in **Object Storage** from within serverless functions. Its **modular and cloud-agnostic architecture** enables seamless portability across different cloud providers and FaaS platforms, effectively avoiding vendor lock-in.
127

138
PyWren
149
------
1510

16-
.. image:: https://www.faasification.com/assets/img/tools/pywren-logo-big.png
17-
:align: center
18-
:width: 250
11+
`PyWren <http://pywren.io/>`_ is the precursor to Lithops. Initially designed to run exclusively on AWS Lambda using a Conda runtime and supporting only Python 2.7, it served as a proof of concept for using serverless functions in scientific computing.
1912

13+
In 2018, the Lithops team forked PyWren to adapt it for **IBM Cloud Functions**, which offered a Docker-based runtime. This evolution also introduced support for **Object Storage as a primary data source** and opened the door to more advanced use cases such as Big Data analytics.
2014

21-
`PyWren <http://pywren.io/>`_ is Lithops' "father" project. PyWren was only designed to run in AWS Lambda with a
22-
Conda environment and only supported Python 2.7. In 2018, Lithops' creators forked PyWren and adapted it to IBM Cloud
23-
Functions, which, in contrast, uses a Docker runtime. The authors also explored new usages for PyWren, like processing Big Data from
24-
Object Storage. Then, on September 2020, IBM PyWren authors decided that the project had evolved enough to no longer be
25-
considered a simple fork of PyWren for IBM cloud and became Lithops. With this change, the project would no longer be
26-
tied to the old PyWren model and could move to more modern features such as mulit-cloud support or the transparent
27-
multiprocessing interface.
15+
By September 2020, the IBM PyWren fork had diverged significantly. The maintainers rebranded the project as **Lithops**, reflecting its broader goals — including multi-cloud compatibility, improved developer experience, and support for modern Python environments and distributed computing patterns.
2816

29-
You can read more about PyWren IBM Cloud at the Middleware'18 industry paper `Serverless Data Analytics in the IBM Cloud <https://dl.acm.org/doi/10.1145/3284028.3284029>`_.
17+
For more details, refer to the Middleware'18 industry paper:
18+
`Serverless Data Analytics in the IBM Cloud <https://dl.acm.org/doi/10.1145/3284028.3284029>`_.
3019

3120
Ray and Dask
3221
------------
3322

34-
.. image:: https://warehouse-camo.ingress.cmh1.psfhosted.org/98ae79911b7a91517ba16ef2dc7dc3b972214820/68747470733a2f2f6769746875622e636f6d2f7261792d70726f6a6563742f7261792f7261772f6d61737465722f646f632f736f757263652f696d616765732f7261795f6865616465725f6c6f676f2e706e67
35-
:align: center
23+
.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/ray_logo.png
3624
:width: 250
37-
3825
.. image:: https://docs.dask.org/en/stable/_images/dask_horizontal.svg
39-
:align: center
4026
:width: 250
4127

4228

43-
In comparison with Lithops, both `Ray <https://ray.io/>`_ and `Dask <https://dask.org/>`_ leverage a cluster of nodes for distributed computing, while Lithops
44-
mainly leverages serverless functions. This restraint makes Ray much less flexible than Lithops in terms of scalability.
29+
`Ray <https://ray.io/>`_ and `Dask <https://dask.org/>`_ are distributed computing frameworks designed to operate on a **predefined cluster of nodes** (typically virtual machines). In contrast, Lithops relies on **serverless runtimes**, which allows for *elastic and fine-grained scaling* — including scaling to zero — with no idle infrastructure costs.
4530

46-
Although Dask and Ray can scale and adapt the resources to the amount of computation needed, they don't scale to zero since
47-
they must keep a "head node" or "master" that controls the cluster and must be kept up.
31+
While Ray and Dask provide dynamic task scheduling and can autoscale within an IaaS environment, they always require a **centralized "head node" or controller** to manage the cluster, making them less suitable for ephemeral and cost-efficient cloud-native computing.
4832

49-
In any case, the capacity and scalability of Ray or Dask in IaaS using virtual machines is not comparable to that of serverless functions.
33+
Additionally, the performance and elasticity of Ray and Dask in IaaS environments are not directly comparable to Lithops' **fully serverless model**, which benefits from the near-infinite parallelism offered by cloud functions.
5034

5135
PySpark
5236
-------
5337

5438
.. image:: https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Apache_Spark_logo.svg/2560px-Apache_Spark_logo.svg.png
55-
:align: center
5639
:width: 250
5740

41+
`PySpark <https://spark.apache.org/docs/latest/api/python/>`_ is the Python interface for Apache Spark, a well-established distributed computing engine. Spark is typically deployed on a **static cluster of machines**, either on-premises or in cloud environments using HDFS or cloud-native file systems.
5842

59-
Much like Ray or Dask, PySpark is a distributed computing framework that uses cluster technologies. PySpark provides Python bindings for Spark.
60-
Spark is designed to work with a fixed-size node cluster, and it is typically used to process data from on-prem HDFS
61-
and analyze it using SparkSQL and Spark DataFrame.
62-
43+
PySpark is optimized for **batch analytics** using DataFrames and SparkSQL, but it lacks native integration with FaaS models. Its operational model is not inherently elastic and requires continuous management of a Spark cluster, which may not align with modern, fully managed, or serverless computing paradigms.
6344

6445
Serverless Framework
6546
--------------------
6647

6748
.. image:: https://cdn.diegooo.com/media/20210606183353/serverless-framework-icon.png
68-
:align: center
6949
:width: 250
7050

51+
`Serverless Framework <https://www.serverless.com/>`_ is a deployment toolchain designed primarily for **building and deploying serverless web applications**, especially on AWS, GCP, and Azure. It is widely used to manage HTTP APIs, event-driven services, and infrastructure-as-code (IaC) for cloud-native apps.
7152

72-
Serverless Framework is a tool to develop serverless applications (mainly NodeJS) and deploy them seemlessly on AWS, GCP
73-
or Azure.
53+
Although both Lithops and Serverless Framework leverage **serverless functions**, their objectives are fundamentally different:
54+
55+
- **Serverless Framework** focuses on application deployment (e.g., microservices, REST APIs).
56+
- **Lithops** targets **parallel and data-intensive workloads**, enabling large-scale execution of Python functions over scientific datasets, data lakes, and unstructured data in object storage.
57+
58+
Summary
59+
-------
7460

75-
Although both Serverless Framework and Lithops use serverless functions, their objective is completely different:
76-
Serverless Framework aims to provide an easy-to-use tool to develop applications related to web services, like HTTP APIs,
77-
while Lithops aims to develop applications related to highly parallel scientific computation and Big Data processing.
61+
Lithops stands out as a **cloud-native, serverless-first framework** purpose-built for **parallel computing, data analytics, and scientific workloads**. By abstracting away infrastructure management and providing built-in object storage integration, it delivers a unique balance of **simplicity**, **performance**, and **multi-cloud compatibility** — distinguishing it from traditional cluster-based frameworks and generic serverless tools alike.

docs/source/compute_config/kubernetes_rabbitmq.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
All of these changes are **ideal** for pipelines where launching **hundreds of parallel tasks as quickly as possible** is a critical requirement, in a fixed size heterogeneous cluster.
66

7-
### Changes of K8s RabbitMQ
7+
## Changes of K8s RabbitMQ
88

99
* **Utilization of RabbitMQ:** Within this architecture, RabbitMQ is employed to launch group invocations in a single call, avoiding the need for multiple calls for each function execution. Additionally, it enables data exchange between the client and running pods, bypassing the Storage Backend as an intermediary, which is slower. This accelerates and streamlines communication significantly.
1010

@@ -74,25 +74,25 @@ All of these tests consist of running 225 functions on a 2-node cluster, each wi
7474

7575
In this scenario, it is evident that the invocation time is consistently reduced by a factor of **up to 5x** on cold start and **up to 7x** on warm start. This represents a significant enhancement for parallel function execution.
7676

77-
#### Plot 1: Kubernetes K8s original.
77+
- Plot 1: Kubernetes K8s original.
7878

7979
*Elapsed time = 16,9 sec.*
8080

8181
![Kubernetes K8s original plot](../images/plots_kubernetes/k8s_original_histogram.png)
8282

83-
#### Plot 2: Kubernetes K8s original with master on Warm Start.
83+
- Plot 2: Kubernetes K8s original with master on Warm Start.
8484

8585
*Elapsed time = 8,1 sec.*
8686

8787
![Kubernetes K8s original with Warm Start plot](../images/plots_kubernetes/k8s_original_warm_start_histogram.png)
8888

89-
#### Plot 3: Kubernetes K8s RabbitMQ.
89+
- Plot 3: Kubernetes K8s RabbitMQ.
9090

9191
*Elapsed time = 8 sec.*
9292

9393
![Kubernetes K8s RabbitMQ plot](../images/plots_kubernetes/rabbitmq_histogram.png)
9494

95-
#### Plot 4: Kubernetes K8s RabbitMQ with workers on Warm Start.
95+
- Plot 4: Kubernetes K8s RabbitMQ with workers on Warm Start.
9696

9797
*Elapsed time = 5,9 sec.*
9898

docs/source/contributing.rst

Lines changed: 36 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -18,24 +18,39 @@ To contribute a patch
1818
1. Break your work into small, single-purpose patches if possible. It's much
1919
harder to merge in a large change with a lot of disjoint features.
2020
2. Submit the patch as a GitHub pull request against the master branch.
21-
3. Make sure that your code passes the unit tests.
22-
4. Make sure that your code passes the linter.
23-
5. Add new unit tests for your code.
24-
25-
26-
Unit testing
27-
------------
28-
29-
To test that all is working as expected, run either:
30-
31-
.. code::
32-
33-
$ lithops test
34-
35-
36-
.. code::
37-
38-
$ python3 -m lithops.tests.tests_main
39-
40-
41-
Please follow the guidelines in :ref:`testing` for more details.
21+
3. Make sure that your code passes the tests.
22+
4. Make sure that your code passes the linter. Install `flake8` with `pip3 install flake8` and run the next command until you don't see any linitng error:
23+
```bash
24+
flake8 lithops --count --max-line-length=180 --statistics --ignore W605,W503
25+
```
26+
5. Add new tests for your code.
27+
28+
29+
Testing
30+
-------
31+
32+
To test that all is working as expected, you must install `pytest`, navigate to the tests folder `lithops/tests/`, and execute:
33+
```bash
34+
pytest -v
35+
```
36+
37+
If you made changes to a specific backend, please run tests on that backend.
38+
For example, if you made changes to the AWS Lambda backend, execute the tests with:
39+
```bash
40+
pytest -v --backend aws_lambda --storage aws_s3
41+
```
42+
43+
You can list all the available tests using:
44+
```bash
45+
pytest --collect-only
46+
```
47+
48+
To run a specific test or group of tests, use the `-k` parameter, for example:
49+
```bash
50+
pytest -v --backend localhost --storage localhost -k test_map
51+
```
52+
53+
To view all the Lithops logs during the tests, and in DEBUG mode, execute:
54+
```bash
55+
pytest -o log_cli=true --log-cli-level=DEBUG --backend localhost --storage localhost
56+
```

0 commit comments

Comments
 (0)