feat: replace eager submodule imports with lazy loading#5110
feat: replace eager submodule imports with lazy loading#5110dgandhi62 wants to merge 18 commits into
Conversation
46a9cc2 to
fe1d535
Compare
7878cdb to
8d0c82c
Compare
0cc9b13 to
a6726f7
Compare
Replace eager "from . import <submodule>" statements in generated __init__.py files with a PEP 562 lazy loading mechanism using module-level __getattr__ and __dir__. This defers submodule imports until first access, dramatically reducing initial import time for large libraries like aws-cdk-lib. Generated modules now emit: - "import importlib as _importlib" (only when submodules exist) - _SUBMODULES set with sorted submodule short names - __getattr__ that lazily imports and caches submodules - __dir__ that returns [*__all__, *_SUBMODULES] Assembly-loading modules are unaffected (they never have child submodules, enforced by existing assert in addPythonModule). All access patterns remain backwards-compatible: - import aws_cdk.aws_s3 (Python resolves directly) - from aws_cdk import aws_s3 (triggers __getattr__) - aws_cdk.aws_s3 (triggers __getattr__) - from aws_cdk import * (triggers __getattr__ for each __all__ entry)
Add typing.TYPE_CHECKING guard with explicit submodule imports so pyright can statically see names listed in __all__. Quote the __dir__ return annotation to avoid reportIndexIssue when pyright evaluates with pythonVersion < 3.9.
- Add typing.TYPE_CHECKING guard with explicit submodule imports so pyright can statically see names in __all__ - Quote __dir__ return annotation to avoid reportIndexIssue - Install __getattr__/__dir__ on the public module post-publication so attribute access works through the publication barrier - Add on-demand type resolution in _reference_map.py to import submodules when the jsii kernel returns unknown types
a6726f7 to
ced5b7f
Compare
| lives in an unloaded submodule), this function triggers the import so that | ||
| the type self-registers with the runtime. | ||
|
|
||
| The FQN format is: ``assembly_name.submodule.path.TypeName`` |
There was a problem hiding this comment.
Unfortunately I don't think jsii FQNs map cleanly onto Python import paths like this.
Pretty sure that the author of a jsii module can map whatever submodule they want to whatever Python module path. Plus, there types-in-types, and the last parts of the FQN might not be modules.
There was a problem hiding this comment.
Here's an interesting question: CAN we deterministically find a Python type given a jsii FQN? (All the information should be in the assembly)
Because if we can, we can fully get rid of the registering-types-by-fqns-on-startup business that we have going on here!
There was a problem hiding this comment.
These tests assert that the generated Python code looks just-so. Those are very brittle, as soon as we change anything about the implementation these tests will break.
Instead, I'd rather test behavior: a Python library generated with this lazy lookup has the following behavior: XYZ (and in fact, probably just "it works the same as it did before" might be good enough 😉 )
Problem
import aws_cdkis slow in python cdk apps because the generated__init__.pyeagerly imports all ~300 child submodules viafrom . import <submodule>. This fixed cost hits everycdk synth,cdk deploy, Lambda cold start, and IDE analysis — regardless of how many services the app actually uses.Solution
Replace the eager import block with PEP 562 lazy loading using module-level
__getattr__and__dir__.Before (generated code)
After (generated code)
Issues Encountered and Resolved after the Design Doc
Five issues were discovered during implementation. Each required a specific fix beyond the core lazy loading pattern:
Issue 1: pyright rejects
list[str]return type on__dir__Problem: The pyright test configures
pythonVersion = "3.8". With that setting,list[str]is invalid as a runtime annotation (builtinlistwasn't subscriptable until 3.9).Fix: Quote the return type so it's a forward reference, not evaluated at runtime:
Issue 2: pyright flags submodule names in
__all__as undefinedProblem: Pyright performs static analysis. It can't see that
__getattr__will resolve submodule names at runtime, so it reportsreportUnsupportedDunderAllfor every submodule listed in__all__.Fix: Emit a
typing.TYPE_CHECKINGguard with explicit re-exports:TYPE_CHECKINGisTruefor static analyzers butFalseat runtime — no cost to lazy loading.Issue 3:
publication.publish()breaks__getattr__on the public moduleProblem:
publication.publish()replaces the module insys.moduleswith a newModuleTypeobject that only copies names from__all__. It does NOT copy__getattr__or__dir__. Since our lazy loading code is defined afterpublication.publish(), it lives on the original (now-private) module object.Fix: After defining
__getattr__and__dir__, explicitly install them on the public module:Defining it before
publish()would not help (from my understanding) becausepublish()only copies names from__all__. It would not transfer the__getattr__Issue 4: jsii runtime can't resolve types from unloaded submodules
Problem: With eager imports, all types were registered at import time. With lazy loading, if the jsii kernel returns a type from a submodule that hasn't been imported yet (e.g., a callback returns an object whose type lives in
cdk16625.donotimport), the runtime raisesUnknown type.Fix: Added on-demand type resolution in
_reference_map.py. When a type FQN isn't found in the registries, the runtime:JSIIAssembly.load())Issue 5: mypy rejects direct assignment to
__getattr__on a moduleProblem:
_sys.modules[__name__].__getattr__ = __getattr__triggers mypy'sCannot assign to a method [method-assign]because mypy treats__getattr__as a special method onModuleType.Fix: Use
setattr()instead of direct assignment. Same runtime effect, bypasses mypy's check.Testing
lazy-imports.test.tscovering all generated code patternscdk16625.donotimport) to verify on-demand type resolutionBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.