Projects to optimize CPython 3.6¶
See also Projects to optimize CPython 3.7.
Complete or almost complete projects¶
- MERGED: Wordcode
- New format of bytecode which will allow to fetch opcode+oparg in a single 16-bit operation.
- FAT Python: PEP 509, PEP 510, PEP 511, fat and
- Owner: Victor Stinner.
- Speed-up: unknown :-(
- CPython build options for out-of-the box performance
- Owner: Alecsandru Patrascu
- Speed-up: unknown.
- MERGED: Change PyMem_Malloc to use PyObject_Malloc allocator?
- Owner: Victor Stinner
- Speed-up: up to 6% faster in fastpickle of perf.py (up to 22% faster on unpickle_list of perf.py, according to Intel run of perf.py).
- Speedup method calls 1.2x
- Globals / builtins cache
- Owner: Antoine Pitrou
- Speedup: 35% faster on a microbenchmark (LOAD_GLOBAL)
- ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7
- Owners: Yury Selivanov, Zach Byrne
- Speed-up: up to 30% faster on microbenchmark.
- Free list for single-digits ints
- Owners: Serhiy Storchaka, Yury Selivanov
- Speedup: up to 18% faster on microbenchmark.
- Faster bit ops for single-digit positive longs
- Owner: Yury Selivanov
- Speedup: between 30% and 55% faster on a microbenchmark
: Closed ——
- [CLOSED, REJECTED] ceval.c: implement fast path for integers with a single digit
- Owners: many authors :-)
- Speedup: up to 26% on microbenchmark, unclear status on macrobenchmark. Unclear status for types other than int and float (slow-down or not?).
- co_stacksize is calculated from unoptimized code
- FASTCALL: avoid creation of temporary tuple/dict when calling C and Python
- Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments
- Tuple creation is too slow
- C implementation of functools.lru_cache
- Change bytecode to optimize MAKE_FUNCTION, maybe also CALL_FUNCTION:
- See also the optimization on CALL_FUNCTION with keyword parameters, but it requires FAT Python: https://bugs.python.org/issue26802#msg263775
- More efficient and/or more compact bytecode?
- New peephole optimizer written in pure Python: bytecode.peephole_opt,
requires the PEP 511.
- Speed-up: probably negligible, and the Python optimizer is much slower than the C optimizer.
- INCA: Inline Caching meets Quickening in Python 3.3