Skip to content

Commit 2cbc21f

Browse files
Rangeet PanRangeet Pan
authored andcommitted
working on code examples
1 parent d25bc1e commit 2cbc21f

6 files changed

Lines changed: 434 additions & 0 deletions
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"source": [
6+
"Code summarization or code explanation is a task that converts a code written in a programming language to a natural language. This particular task has several\n",
7+
"benefits, such as understanding code without looking at its intrinsic details, documenting code for better maintenance, etc. To do that, one needs to\n",
8+
"understand the basic details of code structure works, and use that knowledge to generate the summary using various AI-based approaches. In this particular\n",
9+
"example, we will be using Large Language Models (LLM), specifically Granite 8B, an open-source model built by IBM. We will show how easily a developer can use\n",
10+
"CLDK to expose various parts of the code by calling various APIs without implementing various time-intensive program analyses from scratch."
11+
],
12+
"metadata": {
13+
"collapsed": false
14+
},
15+
"id": "6ad70b81e8957fc0"
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"source": [
20+
"Step 1: Add all the neccessary imports"
21+
],
22+
"metadata": {
23+
"collapsed": false
24+
},
25+
"id": "15555404790e1411"
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": null,
30+
"outputs": [],
31+
"source": [
32+
"import os\n",
33+
"from pathlib import Path\n",
34+
"import ollama\n",
35+
"from cldk import CLDK\n",
36+
"from cldk.analysis import AnalysisLevel"
37+
],
38+
"metadata": {
39+
"collapsed": false
40+
},
41+
"id": "8e8e5de7e5c68020"
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"source": [
46+
"Step 2: Formulate the LLM prompt. The prompt can be tailored towards various needs. In this case, we show a simple example of generating summary for each\n",
47+
"method in a Java class"
48+
],
49+
"metadata": {
50+
"collapsed": false
51+
},
52+
"id": "ffc4ee9a6d27acc2"
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": null,
57+
"outputs": [],
58+
"source": [
59+
"def format_inst(code, focal_method, focal_class, language):\n",
60+
" \"\"\"\n",
61+
" Format the instruction for the given focal method and class.\n",
62+
" \"\"\"\n",
63+
" inst = f\"Question: Can you write a brief summary for the method `{focal_method}` in the class `{focal_class}` below?\\n\"\n",
64+
"\n",
65+
" inst += \"\\n\"\n",
66+
" inst += f\"```{language}\\n\"\n",
67+
" inst += code\n",
68+
" inst += \"```\" if code.endswith(\"\\n\") else \"\\n```\"\n",
69+
" inst += \"\\n\"\n",
70+
" return inst"
71+
],
72+
"metadata": {
73+
"collapsed": false
74+
},
75+
"id": "9e23523c71636727"
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"source": [],
80+
"metadata": {
81+
"collapsed": false
82+
},
83+
"id": "a4e9cb4e4f00b25c"
84+
},
85+
{
86+
"cell_type": "markdown",
87+
"source": [
88+
"Step 3: Create a function to call LLM. There are various ways to achieve that. However, for illustrative purpose, we use ollama, a library to communicate with models downloaded locally."
89+
],
90+
"metadata": {
91+
"collapsed": false
92+
},
93+
"id": "dd8439be222b5caa"
94+
},
95+
{
96+
"cell_type": "code",
97+
"execution_count": null,
98+
"outputs": [],
99+
"source": [
100+
"def prompt_ollama(message: str, model_id: str = \"granite-code:8b-instruct\") -> str:\n",
101+
" \"\"\"Prompt local model on Ollama\"\"\"\n",
102+
" response_object = ollama.generate(model=model_id, prompt=message)\n",
103+
" return response_object[\"response\"]"
104+
],
105+
"metadata": {
106+
"collapsed": false
107+
},
108+
"id": "62807e0cbf985ae6"
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"source": [
113+
"Step 4: Create an object of CLDK and provide the programming language of the source code."
114+
],
115+
"metadata": {
116+
"collapsed": false
117+
},
118+
"id": "1022e86e38e12767"
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"outputs": [],
124+
"source": [
125+
"if __name__ == \"__main__\":\n",
126+
" # Create a new instance of the CLDK class\n",
127+
" cldk = CLDK(language=\"java\")"
128+
],
129+
"metadata": {
130+
"collapsed": false
131+
},
132+
"id": "a2c8bbe4e3244f60"
133+
},
134+
{
135+
"cell_type": "markdown",
136+
"source": [
137+
"Step 5: CLDK uses different analysis engine--Codeanalyzer (built using WALA and Javaparser), Treesitter, and CodeQL (future). By default, codenanalyzer has\n",
138+
"been selected as the default analysis engine. Also, CLDK support different analysis levels--(a) symbol table, (b) call graph, (c) program dependency graph, and\n",
139+
"(d) system dependency graph. Analysis engine can be selected using ```AnalysisLevel``` enum. In this example, we will generate summarization of all the methods\n",
140+
"of an application. To select the application location, you can set the environment variable ```JAVA_APP_PATH```. "
141+
],
142+
"metadata": {
143+
"collapsed": false
144+
},
145+
"id": "23dd4a6e5d5cb0c5"
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": null,
150+
"outputs": [],
151+
"source": [
152+
" # Create an analysis object over the java application\n",
153+
" analysis = cldk.analysis(project_path=\"JAVA_APP_PATH\", analysis_level=AnalysisLevel.symbol_table)"
154+
],
155+
"metadata": {
156+
"collapsed": false
157+
},
158+
"id": "fdd09f5e77d4a68a"
159+
},
160+
{
161+
"cell_type": "markdown",
162+
"source": [
163+
"Step 6: Iterate over all the class files and create the prompt. In this case, we want to provide a customized Java class in the prompt. For instance,\n",
164+
"\n",
165+
"```\n",
166+
"package com.ibm.org;\n",
167+
"import A.B.C.D;\n",
168+
"...\n",
169+
"public class Foo {\n",
170+
" // code comment\n",
171+
" public void bar(){ \n",
172+
" int a;\n",
173+
" a = baz();\n",
174+
" // do something\n",
175+
" }\n",
176+
" private int baz()\n",
177+
" {\n",
178+
" // do something\n",
179+
" }\n",
180+
" public String dummy (String a)\n",
181+
" {\n",
182+
" // do somthing\n",
183+
" } \n",
184+
"```\n",
185+
"Given the above class, let's say we want to generate a summary for the ```bar``` method. To understand what it does, we add the callee of this method in the prompt, which in this case is ```baz```. We also remove imports, comments, etc. All of these are done using a single call to ```sanitize_focal_class``` API. In this process, we also use Treesitter to analyze the code. Once the input code has been sanitized, we call the ```format_inst``` method to create the LLM prompt, which has been passed to ```prompt_ollama``` method to generate the summary using LLM."
186+
],
187+
"metadata": {
188+
"collapsed": false
189+
},
190+
"id": "f148325e92781e13"
191+
},
192+
{
193+
"cell_type": "code",
194+
"execution_count": null,
195+
"outputs": [],
196+
"source": [
197+
"\n",
198+
" # Iterate over all the files in the project\n",
199+
" for file_path, class_file in analysis.get_symbol_table().items():\n",
200+
" class_file_path = Path(file_path).absolute().resolve()\n",
201+
" # Iterate over all the classes in the file\n",
202+
" for type_name, type_declaration in class_file.type_declarations.items():\n",
203+
" # Iterate over all the methods in the class\n",
204+
" for method in type_declaration.callable_declarations.values():\n",
205+
" # Get code body of the method\n",
206+
" code_body = class_file_path.read_text()\n",
207+
"\n",
208+
" # Initialize the treesitter utils for the class file content\n",
209+
" tree_sitter_utils = cldk.tree_sitter_utils(source_code=code_body)\n",
210+
"\n",
211+
" # Sanitize the class for analysis\n",
212+
" sanitized_class = tree_sitter_utils.sanitize_focal_class(method.declaration)\n",
213+
"\n",
214+
" # Format the instruction for the given focal method and class\n",
215+
" instruction = format_inst(\n",
216+
" code=sanitized_class,\n",
217+
" focal_method=method.declaration,\n",
218+
" focal_class=type_name,\n",
219+
" language=\"java\"\n",
220+
" )\n",
221+
"\n",
222+
" # Prompt the local model on Ollama\n",
223+
" llm_output = prompt_ollama(\n",
224+
" message=instruction,\n",
225+
" model_id=\"granite-code:20b-instruct\",\n",
226+
" )\n",
227+
"\n",
228+
" # Print the instruction and LLM output\n",
229+
" print(f\"Instruction:\\n{instruction}\")\n",
230+
" print(f\"LLM Output:\\n{llm_output}\")"
231+
],
232+
"metadata": {
233+
"collapsed": false
234+
},
235+
"id": "462ef7dceae367ad"
236+
}
237+
],
238+
"metadata": {
239+
"kernelspec": {
240+
"display_name": "Python 3",
241+
"language": "python",
242+
"name": "python3"
243+
},
244+
"language_info": {
245+
"codemirror_mode": {
246+
"name": "ipython",
247+
"version": 2
248+
},
249+
"file_extension": ".py",
250+
"mimetype": "text/x-python",
251+
"name": "python",
252+
"nbconvert_exporter": "python",
253+
"pygments_lexer": "ipython2",
254+
"version": "2.7.6"
255+
}
256+
},
257+
"nbformat": 4,
258+
"nbformat_minor": 5
259+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"id": "initial_id",
7+
"metadata": {
8+
"collapsed": true
9+
},
10+
"outputs": [],
11+
"source": []
12+
}
13+
],
14+
"metadata": {
15+
"kernelspec": {
16+
"display_name": "Python 3",
17+
"language": "python",
18+
"name": "python3"
19+
},
20+
"language_info": {
21+
"codemirror_mode": {
22+
"name": "ipython",
23+
"version": 2
24+
},
25+
"file_extension": ".py",
26+
"mimetype": "text/x-python",
27+
"name": "python",
28+
"nbconvert_exporter": "python",
29+
"pygments_lexer": "ipython2",
30+
"version": "2.7.6"
31+
}
32+
},
33+
"nbformat": 4,
34+
"nbformat_minor": 5
35+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"id": "initial_id",
7+
"metadata": {
8+
"collapsed": true
9+
},
10+
"outputs": [],
11+
"source": []
12+
}
13+
],
14+
"metadata": {
15+
"kernelspec": {
16+
"display_name": "Python 3",
17+
"language": "python",
18+
"name": "python3"
19+
},
20+
"language_info": {
21+
"codemirror_mode": {
22+
"name": "ipython",
23+
"version": 2
24+
},
25+
"file_extension": ".py",
26+
"mimetype": "text/x-python",
27+
"name": "python",
28+
"nbconvert_exporter": "python",
29+
"pygments_lexer": "ipython2",
30+
"version": "2.7.6"
31+
}
32+
},
33+
"nbformat": 4,
34+
"nbformat_minor": 5
35+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"id": "initial_id",
7+
"metadata": {
8+
"collapsed": true
9+
},
10+
"outputs": [],
11+
"source": []
12+
}
13+
],
14+
"metadata": {
15+
"kernelspec": {
16+
"display_name": "Python 3",
17+
"language": "python",
18+
"name": "python3"
19+
},
20+
"language_info": {
21+
"codemirror_mode": {
22+
"name": "ipython",
23+
"version": 2
24+
},
25+
"file_extension": ".py",
26+
"mimetype": "text/x-python",
27+
"name": "python",
28+
"nbconvert_exporter": "python",
29+
"pygments_lexer": "ipython2",
30+
"version": "2.7.6"
31+
}
32+
},
33+
"nbformat": 4,
34+
"nbformat_minor": 5
35+
}

0 commit comments

Comments
 (0)