GreenCodeAnalyzer

GreenCodeAnalyzer is a static code analysis tool that identifies energy-inefficient patterns in Python code and suggests optimizations to improve energy consumption. By shifting energy efficiency concerns “left” (earlier) in the development process, developers can make more sustainable coding decisions from the start.

Features

Static Energy Analysis: Analyzes Python code without executing it to detect potential energy hotspots
Visual Code Annotations: VS Code extension that provides visual feedback with highlighted energy smells
Optimization Suggestions: Provides specific recommendations to make code more energy-efficient
Multiple Rule Detection: Covers various energy-inefficient patterns common in data science and ML code

Installation & Usage

Installing the Extension from VS Code Marketplace

You can install the GreenCodeAnalyzer extension directly from the VS Code Marketplace:

Open VS Code
Go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of the window or press Ctrl+Shift+X
Search for “GreenCodeAnalyzer” and click on the install button.

Alternatively, you can install it from the VS Code Marketplace website.

Using the Extension

Once installed, you can analyze your Python code for energy inefficiencies:

Open a Python file in VS Code
Use one of the following methods to run the analyzer:
- Press Ctrl+Shift+P to open the Command Palette, then type and select “GreenCodeAnalyzer: Run Analyzer”
- Right-click in the editor and select “Run GreenCodeAnalyzer” from the context menu

The analysis results will appear as decorations in your code editor, highlighting potential energy inefficiencies with suggestions for improvement.

To clear the analysis markers:

Press Ctrl+Shift+P and select “GreenCodeAnalyzer: Clear Gutters”
Or right-click and select “Clear GreenCodeAnalyzer Gutters”

Interpreting Results

Each detected code smell includes:

A description of the energy inefficiency
A specific recommendation for optimization

By integrating seamlessly with the VS Code interface, this extension ensures that developers can quickly identify and fix inefficient code without leaving their workflow.

Supported Rules

Rule	Description	Libraries	Impact	Optimization	Source
Batch Matrix Multiplication	When performing multiple matrix multiplications on a batch of matrices, use optimized batch operations rather than separate operations in loops.	`NumPy` `PyTorch` `TensorFlow`	GPUs thrive on parallel operations over large batches. Small, sequential operations waste cycles and keep the hardware active longer than necessary. Instead, batch matrix multiplication leverages vectorized execution.	NumPy: `numpy.matmul(batch, matrices)` PyTorch: `torch.bmm(batch_matrices1, batch_matrices2)` TensorFlow: `tf.linalg.matmul(batch_matrices1, batch_matrices2)`
Blocking Data Loader	Prevent using data loading strategies that stall GPU execution (e.g., single-process or sequential data loading).	`PyTorch`	If the DataLoader is set up without sufficient concurrency (`num_workers=0`) or uses blocking I/O, the GPU may remain idle while waiting for data. Asynchronous data loading keeps the GPU busy more consistently, reducing overall epoch time and energy.	Use `num_workers > 0` in DataLoader. Consider enabling `pin_memory=True` if the data is loaded from CPU memory to GPU often. For advanced scenarios, use background threads or prefetch queues.	Azzoug, A. (n.d.). GreenPyData. GitHub. https://github.com/AghilesAzzoug/GreenPyData
Broadcasting	Normally, when you want to perform operations like addition and multiplication, you need to ensure that the operands’ shapes match. Tiling can be used to match shapes but stores intermediate results.	`TensorFlow`	Broadcasting allows us to perform implicit tiling, which makes the code shorter and more memory efficient since we don’t need to store the result of the tiling operation.	`a = tf.constant([[1., 2.], [3., 4.]]) b = tf.constant([[1.], [2.]]) # c = a + tf.tile(b, [1, 2]) -> Remove line c = a + b`	Hynn, S. (n.d.). Broadcasting feature not used. DSLinter. https://hynn01.github.io/dslinter/posts/codesmells/18-broadcasting-feature-not-used/ Kiani, V. (n.d.). EffectiveTensorflow. GitHub. https://github.com/vahidk/EffectiveTensorflow?tab=readme-ov-file#broadcast
Calculating Gradients	When performing inference (i.e., forward pass without training or backpropagation), PyTorch by default tracks operations for autograd. TensorFlow will track them if specified to do so.	`PyTorch` `TensorFlow`	Autograd graph tracking increases memory usage and computational cost. Disabling it during inference leads to faster execution, lower energy consumption, and reduced VRAM usage, which is particularly beneficial for GPUs and large models.	PyTorch: `output = model(input) # Autograd is tracking gradients` `with torch.no_grad():` `output = model(input) # More efficient for inference` TensorFlow: `with tf.GradientTape(): # Unnecessary gradient tracking in inference output = model(input) @tf.function(jit_compile=True) # Efficient inference def inference(input): return model(input) output = inference(input)`	PyTorch. (n.d.). torch.no_grad. PyTorch. https://pytorch.org/docs/stable/generated/torch.no.grad.html Stack Overflow. (2022, April 15). What is the purpose of with torch.no_grad? Stack Overflow. https://stackoverflow.com/questions/72504734/what-is-the-purpose-of-with-torch-no-grad
Chain Indexing	Chain indexing refers to when using `df["one"]["two"]`, Pandas will see this operation as two events: call `df["one"]` first and then `["two"]`.	`Pandas`	Performing many calls leads to excessive memory allocations and CPU-intensive Python interpreter overhead. This can result in slow and energy-consuming code.	`df.loc[:,("one","two")]` only performs a single call	Hynn, S. (n.d.). Chain indexing. DSLinter. https://hynn01.github.io/dslinter/posts/codesmells/12-chain-indexing/
Conditional Operations	When performing a conditional operator on an array, tensor, or dataframe inside for loops.	`NumPy` `Pandas` `PyTorch` `TensorFlow`	Doing these operations in for loops leads to inefficient branching and repeated calculations.	NumPy: `arr = np.where(arr > 5, arr, 0)` Pandas: `df['column'].where(df['column'] > 5, 0)` PyTorch: `torch.where(tensor > 5, tensor, torch.zeros_like(tensor))` TensorFlow: `tf.where(tensor > 5, tensor, tf.zeros_like(tensor))`	Hynn, S. (n.d.). Unnecessary iteration. DSLinter. https://hynn01.github.io/dslinter/posts/codesmells/2-unnecessary-iteration/
Data Parallelization	Refrain from wrapping models in `torch.nn.DataParallel` when `torch.nn.parallel.DistributedDataParallel` (DDP) is superior, even on a single node with multiple GPUs.	`PyTorch`	`DataParallel` uses a single process to manage multiple GPU replicas, which can result in significant overhead, especially for gradient synchronization on large `models.DistributedDataParallel` creates one process per GPU (or process group) and provides more efficient communication backends (e.g., NCCL). This typically yields better throughput and uses less CPU overhead, leading to lower energy consumption.	Wrap your model with `DistributedDataParallel` (DDP), even on a single node, rather than using `DataParallel`.	Azzoug, A. (n.d.). GreenPyData. GitHub. https://github.com/AghilesAzzoug/GreenPyData
Element-Wise Operations	When performing an element-wise operator on an array, tensor, or dataframe inside for loops.	`NumPy` `PyTorch` `TensorFlow`	Doing these operations in for loops leads to inefficient branching and repeated calculations.	Element-wise: `arr + 1`, `np.add(arr1, arr2)`, `df['column'] + 1`, `tensor + 1`, `torch.add(tensor1, tensor2)`, `tf.add(tensor1, tensor 2)` Mapping: `np.vectorize(func)(arr)`, `df['column'].apply(func)`, `torch.vectorized_map(func, tensor)`, `tf.map_fn(func, tensor)`, `tf.vectorized_map(func, tensor)`	Hynn, S. (n.d.). Unnecessary iteration. DSLinter. https://hynn01.github.io/dslinter/posts/codesmells/2-unnecessary-iteration/
Excessive GPU Transfers	Frequently moving data between CPU and GPU (e.g., calling `.cpu()` and then `.cuda()` repeatedly) without necessity.	`PyTorch`	This frequent transfer of data produces a high overhead.	Keep tensors in GPU memory throughout operations or batch transfers to minimize overhead.	Balaprakash, P., et al. (2019). Adaptive Methods for Real-Time Transfer Learning on GPUs. IEEE Transactions on Parallel and Distributed Systems.
Excessive Training	Continuing to train a model beyond the point where validation metrics stop improving.	`PyTorch` `TensorFlow` `SciKit-Learn`	Overtraining wastes GPU/CPU cycles with diminishing returns.	Implement early stopping or define a convergence criterion to halt training once metrics plateau.	Caruana, R., et al. (2001). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. Advances in Neural Information Processing Systems.
Filter Operations	When performing a filter operator on an array, tensor, or dataframe inside for loops.	`NumPy` `Pandas` `PyTorch` `TensorFlow`	Boolean indexing or masking allows efficient filtering, avoiding iterative checks on each element.	NumPy: `arr[arr > 5]`, `arr[np.logical_and(arr > 0, arr < 10)]` Pandas: `df[df['column'] > 5]` PyTorch: `tensor[tensor > 5]`, `tensor[torch.logical_and(tensor > 0, tensor < 10)]`, `torch.masked_select(tensor, tensor > 5)` TensorFlow: `tf.boolean_mask(tensor, tensor > 5)`	Hynn, S. (n.d.). Unnecessary iteration. DSLinter. https://hynn01.github.io/dslinter/posts/codesmells/2-unnecessary-iteration/
Ignoring Inplace Ops	Failing to use the in-place variants of PyTorch operations (e.g., `add_`, `mul_`, `relu_`) leads to additional memory allocations and higher overhead.	`PyTorch`	PyTorch (and most deep learning frameworks) stores tensors and gradients in memory. Creating new tensors for every operation triggers more frequent memory allocations, which consume additional CPU/GPU cycles and can cause extra garbage collection. This overhead translates to higher energy usage.	Use in-place operations (`op_()`) where they do not break gradient flow or produce unexpected side effects.	Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems.
Inefficient Caching of Common Arrays	Recreating the same arrays or tensors (e.g., repeating `np.arange(0, n)` in a loop) instead of storing or caching them.	`NumPy` `PyTorch` `TensorFlow`	Repeated creation allocates CPU/GPU cycles and memory, increasing energy usage.	Cache repeated arrays or use partial function application to avoid the overhead of repeated creation.	Breshears, C. (2015). The Art of Concurrency: A Thread Monkey’s Guide to Writing Parallel Applications. O’Reilly Media.
Inefficient Data Loader Transfer	Refrain from using standard (pageable) CPU memory for large data loads when transferring to GPU.	`PyTorch`	When transferring data from CPU to GPU, pinned (page-locked) memory can speed up and streamline transfers in CUDA. Non-pinned memory can cause additional overhead, stalling the GPU.	Enable pin_memory=True in the PyTorch DataLoader, which can significantly reduce latency for GPU-bound training.	Azzoug, A. (n.d.). GreenPyData. GitHub. https://github.com/AghilesAzzoug/GreenPyData
Inefficient Data Frame Join	Performing repeated join operations on large DataFrames without indexing or merging strategies.	`Pandas`	Large repeated joins or merges can be extremely expensive, inflating CPU time and memory usage.	Use indices, sort-merge strategies, or carefully plan merges to reduce overhead.	McKinney, W. (2017). Python for Data Analysis. O’Reilly Media.
Inefficient Iterrows	Using `iterrows` in Pandas to manipulate data row-by-row is a frequent habit in data analysis, despite being much slower than vectorized alternatives.	`Pandas`	Row-by-row iteration incurs Python overhead, slowing execution and increasing energy use.	Use vectorized methods for data manipulation.
Large Batch Size Memory Swapping	Setting a batch size in PyTorch or TensorFlow too large for GPU memory, forcing frequent memory swaps or fallback to CPU.	`PyTorch` `TensorFlow`	Memory swapping drastically slows performance and increases energy usage.	Find an optimal batch size through experiments; use gradient accumulation if large effective batch sizes are required.	Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems.
Recomputing GroupBy Results	Calling `.groupby()` multiple times on the same data with identical keys for similar aggregated statistics.	`Pandas`	Each `.groupby()` operation is expensive; re-running the same computation consumes extra CPU cycles.	Compute all required statistics in a single pass (e.g., `agg(...)`) or store intermediate results for re-use.	McKinney, W. (2017). Python for Data Analysis. O’Reilly Media.
Reduction Operations	When performing a reduction operator on an array, tensor, or dataframe inside for loops.	`NumPy` `Pandas` `PyTorch` `TensorFlow`	Reduction operations to compute sums, means, or other aggregates are slow and have been optimized in libraries.	NumPy: `np.sum`, `np.min`, `np.max` Pandas: `df['column'].sum()`, `df['column'].mean()`, `df.agg('sum')` PyTorch: `torch.sum(tensor)`, `torch.mean(tensor)`, `torch.max(tensor)`, `torch.max(tensor)` TensorFlow: `tf.reduce_sum(tensor)`, `tf.reduce_mean(tensor)`, `tf.reduce_max(tensor)`	Hynn, S. (n.d.). Unnecessary iteration. DSLinter. https://hynn01.github.io/dslinter/posts/codesmells/2-unnecessary-iteration/ Kiani, V. (n.d.). EffectiveTensorflow. GitHub. https://github.com/vahidk/EffectiveTensorflow?tab=readme-ov-file#overload
Redundant Model Re-Fitting	Continuously calling `.fit()` on the same dataset multiple times without any changes in hyperparameters or data.	`SciKit-Learn`	Each `.fit()` call recreates internal data structures, incurring CPU/memory overhead.	Re-use fitted models, or partial fit if iterative approaches are needed.	Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research.

Known Issues

The extension only works with Python files.
Some rules may produce false positives, depending on the context of your code.

This extension is a powerful tool for developers looking to improve the efficiency of their Python code and make more sustainable software decisions.