|
| 1 | +# khiops-python Coding Guidelines |
| 2 | + |
| 3 | +## Quick Start |
| 4 | +```bash |
| 5 | +# Make sure you are a member of the pyKhiops GitHub project and that you have set up the SSH keys |
| 6 | +# for Git |
| 7 | + |
| 8 | +# Clone the repo |
| 9 | +git clone ssh://github.com/khiopsml/khiops-python |
| 10 | + |
| 11 | +# Set up the commit message template |
| 12 | +git config commit.template .git_commit_template |
| 13 | + |
| 14 | +# Install pre-commit and the pre-commit scripts |
| 15 | +pip install -U pre-commit |
| 16 | +pre-commit install |
| 17 | + |
| 18 | +# To run all tests |
| 19 | +python -m unittest |
| 20 | + |
| 21 | +# To run all test suites in a particular test file, for instance: |
| 22 | +python -m unittest tests/test_core.py |
| 23 | + |
| 24 | +# To run a particular test suite, for instance: |
| 25 | +python -m unittest tests.test_sklearn.KhiopsSklearnParameterPassingTests |
| 26 | +``` |
| 27 | + |
| 28 | +## Coding Style |
| 29 | + |
| 30 | +### Language |
| 31 | +`khiops-python` is coded in English (comments included). |
| 32 | + |
| 33 | +### Style Enforcement: Tools & Git Hooks |
| 34 | +The coding style of pyKhiops is almost PEP8 compliant. To enforce these guidelines we use three |
| 35 | +tools: |
| 36 | +- [Black](https://github.com/psf/black) |
| 37 | +- [isort](https://pycqa.github.io/isort/) |
| 38 | +- [Pylint](https://www.pylint.org/) |
| 39 | + |
| 40 | +We strongly recommend to automate the execution of these tools with [pre-commit] which setups Git |
| 41 | +hooks that run before each `git commit`. To create the hook scripts in your local repository copy |
| 42 | +execute at the root of the repo: |
| 43 | +```bash |
| 44 | +pip install pre-commit |
| 45 | +pre-commit install |
| 46 | +``` |
| 47 | +The `pre-commit` configuration is located at [./.pre-commit-config.yaml](./.pre-commit-config.yaml). |
| 48 | +If either `black` or `isort` fail the commit will be aborted and you must execute a `git add` on the |
| 49 | +files corrected by these tools and then re-commit. As for `pylint`, it is configured with a score |
| 50 | +threshold so that only code with serious errors will abort the commit. |
| 51 | + |
| 52 | +[pre-commit]: https://pre-commit.com |
| 53 | + |
| 54 | +### Manually executing the tools |
| 55 | +It is useful to manually run Black as the layout of the formatted code sometimes suggests |
| 56 | +improvements. Note that Black does not solve all issues (eg. PEP8 variable names, docstring line |
| 57 | +length), so you should check the issues reported by `pylint` and address all *errors* (code E). |
| 58 | + |
| 59 | +You can address other less serious issues reported by a full run of Pylint only if you have the |
| 60 | +time. The important point **is not to be a slave** of the linter. |
| 61 | + |
| 62 | +#### A Note on Line Length |
| 63 | + |
| 64 | +Black will shorten the lines to 88 chars, **except for long literal strings**. Since we enforce the |
| 65 | +line-length with `pylint` you must fix this by hand or it will pass neither the pre-commit nor the |
| 66 | +CI/CD workflows. To quickly find the lines to fix execute: |
| 67 | + |
| 68 | +``` |
| 69 | +pylint --disable=all --enable=line-too-long |
| 70 | +``` |
| 71 | + |
| 72 | +### Paragraph-Oriented Programming Style |
| 73 | +We encourage a programming style that increases readability at the cost of some verbosity. The idea |
| 74 | +is to create paragraphs of code, which consist in a header and body: |
| 75 | +- The body is the code to be executed. |
| 76 | +- The header is a comment describing what the code does at a high level of abstraction. |
| 77 | + |
| 78 | +Each paragraph is separated by a single empty line, except when it starts an indented code snippet |
| 79 | +(notably `if` expressions) or when it follows a docstring that has the same indentation level. |
| 80 | + |
| 81 | +Most of the code should go into paragraphs; exceptions are: |
| 82 | +- return statements |
| 83 | +- loop variable assignments (usually on `while` loops) |
| 84 | +- very short and obvious methods: the docstring suffices as header |
| 85 | + |
| 86 | +Additionally, the number of paragraphs should be kept as small as possible, because technically |
| 87 | +**commenting every line** conforms to this style. |
| 88 | + |
| 89 | +The advantages of this approach are: |
| 90 | +- It allows to quickly skim through the code while understanding the big picture |
| 91 | +- It forces to code coherent blocks of code |
| 92 | +- It forces to understand what you are coding |
| 93 | + |
| 94 | +and the disadvantages: |
| 95 | +- Comments need maintenance, especially when refactoring |
| 96 | +- Sometimes single lines of obvious code must be commented |
| 97 | + |
| 98 | +#### Example |
| 99 | +```python |
| 100 | +def value_count(values): |
| 101 | + """Prints the counts of each unique value in an array""" |
| 102 | + # Initialize the counts dictionary |
| 103 | + counts = {} |
| 104 | + |
| 105 | + # Count the unique occurrences in values |
| 106 | + for value in values: |
| 107 | + # If the value count exists: update it |
| 108 | + if value in counts: |
| 109 | + counts[value] += 1 |
| 110 | + |
| 111 | + # Otherwise create a new count of 1 |
| 112 | + else: |
| 113 | + counts[value] = 1 |
| 114 | + |
| 115 | + # Print the counts |
| 116 | + for value, count in counts.items(): |
| 117 | + print(f"{value}: {count}") |
| 118 | +``` |
| 119 | +Note that the first paragraph could be fused into the second. These kinds of decisions are rather |
| 120 | +subjective and code review may help to resolve them. |
| 121 | + |
| 122 | +### Writing Documentation |
| 123 | +See the documentation practices and tools [here](./doc/README.md). |
| 124 | + |
| 125 | +## Git & GitHub |
| 126 | +### Main Guiding Principles |
| 127 | +> The commit history must be the cleanest possible |
| 128 | +
|
| 129 | +> The `dev` branch must pass all tests |
| 130 | +
|
| 131 | +### Commits |
| 132 | +#### Commit Message |
| 133 | +The commit message title should answer the question *What do these changes do?* |
| 134 | +Respect the following format when writing it: |
| 135 | +- use English in imperative form (eg. "Add new feature", "Fix old bug") |
| 136 | +- start with upper-case |
| 137 | +- respect the git commit message length if possible |
| 138 | +- put no periods at the end and avoid any other punctuation |
| 139 | +- do not use Markdown |
| 140 | + |
| 141 | +The optional commit description should answer the questions *How are these changes made?* and *Why |
| 142 | +are these changes made?*. Use preferably bullet points to write this part. The description can |
| 143 | +eventually answer more thoroughly *What do these changes do?* but try create commits that are |
| 144 | +"atomic" enough so that their title suffices to answer this question. |
| 145 | + |
| 146 | +#### Rewriting History |
| 147 | +Rewriting commit history is only allowed in feature branches and `dev`. Rewriting the history in |
| 148 | +`dev` is very limited: it should be only done to serious problems in the history. It is forbidden to |
| 149 | +rebase before the last merge commit. |
| 150 | + |
| 151 | +A usual application of history rewriting is to eliminate commits named "Fix previous commit". In |
| 152 | +this case use `rebase` to rewrite the history as follows: |
| 153 | +```bash |
| 154 | +git rebase -i <REF-TO-FIX> # Usually REF-TO-FIX is either "develop", "HEAD~2" or a specific hash |
| 155 | +``` |
| 156 | +to interactively move and squash fix commits with the `fixup` or `squash` operator. Usually this |
| 157 | +operation will make your feature branch diverge from `origin` (GitHub repo), so a `git push --force` |
| 158 | +is necessary to update it. |
| 159 | + |
| 160 | +More generally, rewriting allows to obtain a commit history that is the cleanest possible. See [Git |
| 161 | +Rewriting History](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History) for more details. |
| 162 | + |
| 163 | +**Warning:** Do not rebase on a commit before the branching point of your feature branch as it may |
| 164 | +screw up your local repository by erasing merge commits. |
| 165 | + |
| 166 | +### Branches and Development Workflow |
| 167 | +We use feature branches that branch/merge from/to `dev`. The `main` branch is only used for final |
| 168 | +releases. |
| 169 | + |
| 170 | +The rules for the branches are: |
| 171 | +- Feature branches: You can commit and rewrite the branch at will. It should pass the CI/CD workflows |
| 172 | + before merging to `dev`. Note that only short tests are run in feature branches. |
| 173 | +- `dev` branch: You should avoid directly commit and rewrite this branch (and only maintainers |
| 174 | + can do it). This branch may not succeed all the CI/CD workflows since short and long tests are run. |
| 175 | + You may release beta versions from this branch. |
| 176 | +- `main` branch: Only merges from `dev` are accepted. All CI/CD workflows must succeed (this |
| 177 | + includes the long tests). |
| 178 | + |
| 179 | +As for the use of feature branches, GitHub makes it easy to implement the following workflow: |
| 180 | +- Create an issue for the particular feature/fix/etc. |
| 181 | +- Create a branch for the issue by clicking the "Create a branch" link in the issue's right column. |
| 182 | +- Develop, commit changes locally. Don't hesitate to rewrite the branch history if it is clearer. |
| 183 | + This will create a feature branch from `dev` with a name starting with the issue number followed |
| 184 | + by the issue title. |
| 185 | +- Once you have some commits (`WIP`s or otherwise), you may push your branch and create a pull |
| 186 | + request (PR) for the issue. |
| 187 | +- pull `dev` and rebase on it if there are new changes (see [below](#rebasing-on-develop)) |
| 188 | +- Push your feature branch to GitHub (if necessary, use `push --force` to rewrite the feature branch |
| 189 | + history) and ensure that it passes the CI/CD jobs. |
| 190 | +- Ask the PR be reviewed by another team member: |
| 191 | + - The reviewer creates comment threads with the issues. |
| 192 | + - Discuss the issues and/or make fixes to your code to address them; push the new code to GitHub. |
| 193 | + - The reviewer closes the thread once the issue is settled. |
| 194 | +- Merge once all review threads are closed and the reviewer LGTM-ed the PR. |
| 195 | + |
| 196 | +#### Branch Naming |
| 197 | +Use the automatic branch names that GitHub creates from an issue. Name issues succinctly so the |
| 198 | +branches have short names. If you name branches by yourself try informative and succinct. |
| 199 | + |
| 200 | +### Pull Requests |
| 201 | +This project's GitHub repository is configured to create merge commits for PR and to remove the |
| 202 | +feature branches by default after merging. This strategy has the following pros and cons: |
| 203 | + |
| 204 | +_Pros:_ |
| 205 | +- The merges of feature branches are tracked in the history. |
| 206 | +- The repository branch set is small and clean with no *stale* ones. |
| 207 | + |
| 208 | +_Cons:_ |
| 209 | +- Extra commits in history. |
| 210 | +- Non-linear `dev` and `main` history. |
| 211 | + |
| 212 | +### Rebasing on `dev` |
| 213 | +It may happen that the `dev` branch was updated while you are developing your feature branch. |
| 214 | +To avoid extra merge commits do the following to update your local copy of your feature branch: |
| 215 | +```bash |
| 216 | +# on my-feat-branch |
| 217 | +git stash # only when you have non-committed changes |
| 218 | +git switch dev |
| 219 | +git pull |
| 220 | +git switch my-feat-branch |
| 221 | +git rebase dev |
| 222 | +# fix any conflicts you may have |
| 223 | +git stash pop # only when you have non-committed changes |
| 224 | +``` |
| 225 | + |
| 226 | +## Dependencies |
| 227 | +### Package dependencies |
| 228 | +We should strive to minimize external package dependencies to minimize installation problems. The |
| 229 | +current dependency policy is: |
| 230 | +- `pykhiops.core` should only depend on python built-in modules. |
| 231 | +- `pykhiops.sklearn` should only depend on python built-in modules and the following mainstream |
| 232 | +data-science packages: |
| 233 | + - [Scikit-learn](https://scikit-learn.org/stable/) |
| 234 | + - [Pandas](https://pandas.pydata.org/) |
| 235 | + |
| 236 | +Note that these 4 packages already have a sizable number of dependencies. We should discuss |
| 237 | +thoroughly the pros and cons of any new external package dependency before adding it. |
| 238 | + |
| 239 | +### Development/Build dependencies |
| 240 | +For development dependencies (eg. `black`, `isort`, `sphinx`, `wrapt`, `furo`) we can be more |
| 241 | +carefree while still trying to not add too many dependencies. |
| 242 | + |
| 243 | +## Versioning |
| 244 | +We follow a non-standard `MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]` versioning convention. The |
| 245 | +first three numbers `MAJOR.MINOR.PATCH` are the latest Khiops version that is compatible with the |
| 246 | +package. The number `INCREMENT` indicates the evolution of `khiops-python` followed by an optional |
| 247 | +`[-PRE_RELEASE` version for alpha, beta and release candidate releases (eg. `-b2`). |
| 248 | + |
| 249 | +## Releases |
| 250 | +The following is the check list to be done upon a major release: |
| 251 | +- Merge the `dev` branch into `main` |
| 252 | +- Tag the merge commit with the release version |
| 253 | +- Rebase the `dev` branch onto `main` |
| 254 | + - This is necessary to include the merge commit into master to calculate intermediary versions |
| 255 | + with versioneer |
| 256 | +- Make a GitHub Release (TBD) |
0 commit comments