Skip to content

Commit 3f58477

Browse files
Merge pull request #148 from KhiopsML/145-prepare-release-10200
Prepare Release 10.2.0.0
2 parents 9b95df7 + 26c5771 commit 3f58477

75 files changed

Lines changed: 536 additions & 185 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,5 +59,6 @@ doc/samples/my_classifier
5959
doc/core/generated
6060
doc/sklearn/generated
6161
doc/internal/generated
62+
doc/tools/generated
6263
doc/_build/
6364
doc/khiops-python-tutorial

CHANGELOG.md

Lines changed: 42 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,42 @@
11
# Changelog
22

33
**Notes:**
4-
- Versioning: The versioning scheme depends on the Khiops version supported (first 2 digits) and a
5-
pyKhiops correlative (third digit). So 10.0.4 = 4th version developed that supports Khiops 10.0.
4+
- Versioning: The versioning scheme depends on the Khiops version supported (first 3 digits) and
5+
a `khiops-python` correlative (4th digit).
6+
- Example: 10.2.1.4 is the 5th version that supports khiops 10.2.1.
67
- Internals: Changes in *Internals* sections are unlikely to be of interest for data scientists.
78

9+
## 10.2.0.1 - 2023-02-15
10+
Note: This release marks the open sourcing of Khiops:
11+
- The `khiops-python` replaces the old `pykhiops` package. We recommend to uninstall `pykhiops`
12+
before installing `khiops-python`. More information at the [Khiops website][khiops].
13+
- `khiops-python` uses a new four digit versioning convention.
14+
- The `khiops` conda package is available for many environments. See the [Khiops website][khiops]
15+
for more information.
16+
17+
### Added
18+
- General:
19+
- `khiops-python` is now available with conda `khiops` package. This package bundles
20+
`khiops-python` and the Khiops binaries so no system-wide Khiops installation is needed. More
21+
information at the [Khiops website][khiops].
22+
- `sklearn`
23+
- Estimator classes can now be trained from Numpy arrays in single-table mode.
24+
- `core`
25+
- `stdout_file_path` and `stderr_file_path` parameters for `khiops.core.api` functions. These
26+
parameters allow to save the stdout/stderr output of the Khiops execution.
27+
28+
### Changed
29+
- `sklearn`
30+
- Estimator classes now comply with scikit-learn standards.
31+
- `core`
32+
- The JSON initialization of `AnalysisResults`, `CoclusteringResults` and its component classes
33+
is coherent with the empty initialization.
34+
35+
### Fixed
36+
- `core`
37+
- Wrong default discretization and grouping methods in `train_predictor` and `train_recoder`.
38+
- `KhiopsDockerRunner` checking the existence `shared_dir` on remote paths.
39+
840
## 10.1.3 - 2023-06-14
941

1042
### Added
@@ -21,11 +53,11 @@
2153
- User-provided scenario prologue is now taken into account into the tasks.
2254

2355
### Changed
56+
- General:
57+
- License has been updated to BSD-3 Clear.
2458
- `sklearn`:
2559
- ``auto_sort`` replaces ``internal_sort`` to control input table sorting in estimators.
2660
- The multi-table documentation has been streamlined to be more precise and clearer.
27-
- General:
28-
- License has been updated to BSD-3 Clear.
2961

3062
### Deprecated
3163
- `sklearn`:
@@ -74,14 +106,14 @@
74106
- `pk-status` script to check the pyKhiops installation.
75107

76108
### Fixed
109+
- General:
110+
- Code samples scripts not being installed: They are located in `<pykhiops_install_dir>/samples`.
77111
- `sklearn`
78112
- `KhiopsCoclustering` raising an exception instead of a warning when no informative coclustering
79113
was found.
80114
- `internal_sort` parameter being ignored in `KhiopsCoclustering`.
81115
- `core`
82116
- `detect_format` failing when the Khiops log had extra output.
83-
- General:
84-
- Code samples scripts not being installed: They are located in `<pykhiops_install_dir>/samples`.
85117

86118
## 10.1.0 - 2022-09-20
87119

@@ -160,6 +192,8 @@
160192
transformations for Khiops.
161193

162194
### Changed
195+
- General:
196+
- Improved documentation completeness and layout.
163197
- `sklearn`
164198
- Estimators do not depend anymore on local files. This fixes many issues including those related
165199
to serialization.
@@ -168,8 +202,6 @@
168202
- Functions `deploy_coclustering` and `deploy_model_for_metrics` are moved from `core.api` to
169203
`core.helpers`. The latter module will contain non-basic functionality, whereas `core.api` will
170204
contain only the official Khiops API.
171-
- Other:
172-
- Improved documentation completeness and layout.
173205

174206
### Deprecated
175207
- `sklearn`:
@@ -353,3 +385,5 @@ Internals:
353385

354386
### Added
355387
- Sources (first commit)
388+
389+
[khiops]: https://khiops.org

CONTRIBUTING.md

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# khiops-python Coding Guidelines
2+
3+
## Quick Start
4+
```bash
5+
# Make sure you are a member of the pyKhiops GitHub project and that you have set up the SSH keys
6+
# for Git
7+
8+
# Clone the repo
9+
git clone ssh://github.com/khiopsml/khiops-python
10+
11+
# Set up the commit message template
12+
git config commit.template .git_commit_template
13+
14+
# Install pre-commit and the pre-commit scripts
15+
pip install -U pre-commit
16+
pre-commit install
17+
18+
# To run all tests
19+
python -m unittest
20+
21+
# To run all test suites in a particular test file, for instance:
22+
python -m unittest tests/test_core.py
23+
24+
# To run a particular test suite, for instance:
25+
python -m unittest tests.test_sklearn.KhiopsSklearnParameterPassingTests
26+
```
27+
28+
## Coding Style
29+
30+
### Language
31+
`khiops-python` is coded in English (comments included).
32+
33+
### Style Enforcement: Tools & Git Hooks
34+
The coding style of pyKhiops is almost PEP8 compliant. To enforce these guidelines we use three
35+
tools:
36+
- [Black](https://github.com/psf/black)
37+
- [isort](https://pycqa.github.io/isort/)
38+
- [Pylint](https://www.pylint.org/)
39+
40+
We strongly recommend to automate the execution of these tools with [pre-commit] which setups Git
41+
hooks that run before each `git commit`. To create the hook scripts in your local repository copy
42+
execute at the root of the repo:
43+
```bash
44+
pip install pre-commit
45+
pre-commit install
46+
```
47+
The `pre-commit` configuration is located at [./.pre-commit-config.yaml](./.pre-commit-config.yaml).
48+
If either `black` or `isort` fail the commit will be aborted and you must execute a `git add` on the
49+
files corrected by these tools and then re-commit. As for `pylint`, it is configured with a score
50+
threshold so that only code with serious errors will abort the commit.
51+
52+
[pre-commit]: https://pre-commit.com
53+
54+
### Manually executing the tools
55+
It is useful to manually run Black as the layout of the formatted code sometimes suggests
56+
improvements. Note that Black does not solve all issues (eg. PEP8 variable names, docstring line
57+
length), so you should check the issues reported by `pylint` and address all *errors* (code E).
58+
59+
You can address other less serious issues reported by a full run of Pylint only if you have the
60+
time. The important point **is not to be a slave** of the linter.
61+
62+
#### A Note on Line Length
63+
64+
Black will shorten the lines to 88 chars, **except for long literal strings**. Since we enforce the
65+
line-length with `pylint` you must fix this by hand or it will pass neither the pre-commit nor the
66+
CI/CD workflows. To quickly find the lines to fix execute:
67+
68+
```
69+
pylint --disable=all --enable=line-too-long
70+
```
71+
72+
### Paragraph-Oriented Programming Style
73+
We encourage a programming style that increases readability at the cost of some verbosity. The idea
74+
is to create paragraphs of code, which consist in a header and body:
75+
- The body is the code to be executed.
76+
- The header is a comment describing what the code does at a high level of abstraction.
77+
78+
Each paragraph is separated by a single empty line, except when it starts an indented code snippet
79+
(notably `if` expressions) or when it follows a docstring that has the same indentation level.
80+
81+
Most of the code should go into paragraphs; exceptions are:
82+
- return statements
83+
- loop variable assignments (usually on `while` loops)
84+
- very short and obvious methods: the docstring suffices as header
85+
86+
Additionally, the number of paragraphs should be kept as small as possible, because technically
87+
**commenting every line** conforms to this style.
88+
89+
The advantages of this approach are:
90+
- It allows to quickly skim through the code while understanding the big picture
91+
- It forces to code coherent blocks of code
92+
- It forces to understand what you are coding
93+
94+
and the disadvantages:
95+
- Comments need maintenance, especially when refactoring
96+
- Sometimes single lines of obvious code must be commented
97+
98+
#### Example
99+
```python
100+
def value_count(values):
101+
"""Prints the counts of each unique value in an array"""
102+
# Initialize the counts dictionary
103+
counts = {}
104+
105+
# Count the unique occurrences in values
106+
for value in values:
107+
# If the value count exists: update it
108+
if value in counts:
109+
counts[value] += 1
110+
111+
# Otherwise create a new count of 1
112+
else:
113+
counts[value] = 1
114+
115+
# Print the counts
116+
for value, count in counts.items():
117+
print(f"{value}: {count}")
118+
```
119+
Note that the first paragraph could be fused into the second. These kinds of decisions are rather
120+
subjective and code review may help to resolve them.
121+
122+
### Writing Documentation
123+
See the documentation practices and tools [here](./doc/README.md).
124+
125+
## Git & GitHub
126+
### Main Guiding Principles
127+
> The commit history must be the cleanest possible
128+
129+
> The `dev` branch must pass all tests
130+
131+
### Commits
132+
#### Commit Message
133+
The commit message title should answer the question *What do these changes do?*
134+
Respect the following format when writing it:
135+
- use English in imperative form (eg. "Add new feature", "Fix old bug")
136+
- start with upper-case
137+
- respect the git commit message length if possible
138+
- put no periods at the end and avoid any other punctuation
139+
- do not use Markdown
140+
141+
The optional commit description should answer the questions *How are these changes made?* and *Why
142+
are these changes made?*. Use preferably bullet points to write this part. The description can
143+
eventually answer more thoroughly *What do these changes do?* but try create commits that are
144+
"atomic" enough so that their title suffices to answer this question.
145+
146+
#### Rewriting History
147+
Rewriting commit history is only allowed in feature branches and `dev`. Rewriting the history in
148+
`dev` is very limited: it should be only done to serious problems in the history. It is forbidden to
149+
rebase before the last merge commit.
150+
151+
A usual application of history rewriting is to eliminate commits named "Fix previous commit". In
152+
this case use `rebase` to rewrite the history as follows:
153+
```bash
154+
git rebase -i <REF-TO-FIX> # Usually REF-TO-FIX is either "develop", "HEAD~2" or a specific hash
155+
```
156+
to interactively move and squash fix commits with the `fixup` or `squash` operator. Usually this
157+
operation will make your feature branch diverge from `origin` (GitHub repo), so a `git push --force`
158+
is necessary to update it.
159+
160+
More generally, rewriting allows to obtain a commit history that is the cleanest possible. See [Git
161+
Rewriting History](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History) for more details.
162+
163+
**Warning:** Do not rebase on a commit before the branching point of your feature branch as it may
164+
screw up your local repository by erasing merge commits.
165+
166+
### Branches and Development Workflow
167+
We use feature branches that branch/merge from/to `dev`. The `main` branch is only used for final
168+
releases.
169+
170+
The rules for the branches are:
171+
- Feature branches: You can commit and rewrite the branch at will. It should pass the CI/CD workflows
172+
before merging to `dev`. Note that only short tests are run in feature branches.
173+
- `dev` branch: You should avoid directly commit and rewrite this branch (and only maintainers
174+
can do it). This branch may not succeed all the CI/CD workflows since short and long tests are run.
175+
You may release beta versions from this branch.
176+
- `main` branch: Only merges from `dev` are accepted. All CI/CD workflows must succeed (this
177+
includes the long tests).
178+
179+
As for the use of feature branches, GitHub makes it easy to implement the following workflow:
180+
- Create an issue for the particular feature/fix/etc.
181+
- Create a branch for the issue by clicking the "Create a branch" link in the issue's right column.
182+
- Develop, commit changes locally. Don't hesitate to rewrite the branch history if it is clearer.
183+
This will create a feature branch from `dev` with a name starting with the issue number followed
184+
by the issue title.
185+
- Once you have some commits (`WIP`s or otherwise), you may push your branch and create a pull
186+
request (PR) for the issue.
187+
- pull `dev` and rebase on it if there are new changes (see [below](#rebasing-on-develop))
188+
- Push your feature branch to GitHub (if necessary, use `push --force` to rewrite the feature branch
189+
history) and ensure that it passes the CI/CD jobs.
190+
- Ask the PR be reviewed by another team member:
191+
- The reviewer creates comment threads with the issues.
192+
- Discuss the issues and/or make fixes to your code to address them; push the new code to GitHub.
193+
- The reviewer closes the thread once the issue is settled.
194+
- Merge once all review threads are closed and the reviewer LGTM-ed the PR.
195+
196+
#### Branch Naming
197+
Use the automatic branch names that GitHub creates from an issue. Name issues succinctly so the
198+
branches have short names. If you name branches by yourself try informative and succinct.
199+
200+
### Pull Requests
201+
This project's GitHub repository is configured to create merge commits for PR and to remove the
202+
feature branches by default after merging. This strategy has the following pros and cons:
203+
204+
_Pros:_
205+
- The merges of feature branches are tracked in the history.
206+
- The repository branch set is small and clean with no *stale* ones.
207+
208+
_Cons:_
209+
- Extra commits in history.
210+
- Non-linear `dev` and `main` history.
211+
212+
### Rebasing on `dev`
213+
It may happen that the `dev` branch was updated while you are developing your feature branch.
214+
To avoid extra merge commits do the following to update your local copy of your feature branch:
215+
```bash
216+
# on my-feat-branch
217+
git stash # only when you have non-committed changes
218+
git switch dev
219+
git pull
220+
git switch my-feat-branch
221+
git rebase dev
222+
# fix any conflicts you may have
223+
git stash pop # only when you have non-committed changes
224+
```
225+
226+
## Dependencies
227+
### Package dependencies
228+
We should strive to minimize external package dependencies to minimize installation problems. The
229+
current dependency policy is:
230+
- `pykhiops.core` should only depend on python built-in modules.
231+
- `pykhiops.sklearn` should only depend on python built-in modules and the following mainstream
232+
data-science packages:
233+
- [Scikit-learn](https://scikit-learn.org/stable/)
234+
- [Pandas](https://pandas.pydata.org/)
235+
236+
Note that these 4 packages already have a sizable number of dependencies. We should discuss
237+
thoroughly the pros and cons of any new external package dependency before adding it.
238+
239+
### Development/Build dependencies
240+
For development dependencies (eg. `black`, `isort`, `sphinx`, `wrapt`, `furo`) we can be more
241+
carefree while still trying to not add too many dependencies.
242+
243+
## Versioning
244+
We follow a non-standard `MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]` versioning convention. The
245+
first three numbers `MAJOR.MINOR.PATCH` are the latest Khiops version that is compatible with the
246+
package. The number `INCREMENT` indicates the evolution of `khiops-python` followed by an optional
247+
`[-PRE_RELEASE` version for alpha, beta and release candidate releases (eg. `-b2`).
248+
249+
## Releases
250+
The following is the check list to be done upon a major release:
251+
- Merge the `dev` branch into `main`
252+
- Tag the merge commit with the release version
253+
- Rebase the `dev` branch onto `main`
254+
- This is necessary to include the merge commit into master to calculate intermediary versions
255+
with versioneer
256+
- Make a GitHub Release (TBD)

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The Clear BSD License
22

3-
Copyright (c) 2023 Orange S.A.
3+
Copyright (c) 2024 Orange S.A.
44
All rights reserved.
55

66
Redistribution and use in source and binary forms, with or without

0 commit comments

Comments
 (0)