GraphGPT is a tool that potentially translates every administrative command, that is expressed via natural, unstructured language, into action.
Microsoft Graph is an API that connects most of the Microsoft ecosystem, from Domain Management with Active Directory, handling SharePoint content to managing Security Apps.
Source of image and more information: Microsoft Graph overview - Microsoft Graph | Microsoft Learn
Structured and unstructured information
The words we use and the texts we write have a structure that we, as humans, can understand. When a word is switched up or a sentence is spoken differently, that is usually no problem. But when we want to tell a computer what to do all we could do so far is agree on a very specific structure for this information, such as XML, JSON or CSV for example.
This meant putting information in a specific order, format and schema, which is tedious for us, who express ourselves more freely, but was necessary, as the computing counterpart did not get what we mean by "turn on the faucet and put the temperature to 40 degree please" without extensive machine learning for this one, very specific task.
This now changed with OpenAIs AIs. For this project, the GPT-4 Model will be the bridge between us wordy humans and the narrow-minded API of Microsoft Graph.
Human in the loop
One aspect of AI Security is "don't let it do things unchecked", at least yet. Since we are giving an LLM influence over possible Write actions in a sensitive environment, this needs to be double-checked, aborted or allowed to continue by one of us.
The GraphGPT system is designed to turn an unstructured task into a structured list of URLs and parameters, allowing a human to fill in gaps of missing information, check the information that was extracted and influence the actual call that is to be made.
Here is what could happen otherwise:
Authentication and authorization
Some Graph Endpoints are made for usage by people, while others are purely system managed. People use chats, emails and calendars, daemons can manage devices, files, databases, apps and more.
To allow both, users and machines to perform tasks on Graph with their respective names next to their actions for identification, GraphGPT is designed to either use Tokens, which can come from an OAUTH Workflow finished by a person, or use the identity of the system the service is running on, like managed identity, EnvironmentCredentials and others.
Source of image and more information on Azure Identity: Azure Identity client library for .NET | Azure SDK for Net (windows.net)
Performing the actual Graph Call
The gist of getting things done with Graph API comes down to HTTP REST Calls. There are libraries to facilitate graph access into other languages, but using HTTP is a standard that can be easily understood by GPT-4, and also can be run at runtime without recompiling.
As an example, to enable a user account, the method would be "PATCH", since that is the HTTP Verb to change an existing entry of something. GPT-4 luckily knows this and can tell us in a structured way when something is to be added, to be updated, to be deleted or to be retrieved. The next part is the URL, which usually contains one or more dynamic parts like parameters. GPT-4 can identify those and compile a list of parameters that need to be filled, even providing examples and extracting default values from an initial prompt.
Lastly, there is the Body of the HTTP call, which is almost always a JSON object. This too can have parameters that need to be filled in.
After the human in the loop chose the parameters he wants and double-checks the strategy of GPT-4, we ask the model to fill in the parameters into the actual call, using a Token to make the call work and direct it to the right environment.
Directing GPT-4 do to the right things
Most of us probably had contact with ChatGPT first, before GPT-4 was available later. To make GPT-4 give out structured JSON Data, we simply tell him "You are a JSON AI, you only answer with JSON". When provided with a Schema or example afterwards, that is what he does.
The next important step is to ask for an explanation. This seems to be some needed "thinking time" for GPT-4 to write better ideas in JSON format.
Demo and Code
This video shows the basic concept of a console application as a demo client. Realistically this is not how a deployed cloud-native service would be used. Usages could be:
Called from a Bot, like an MS Teams Framework Bot
Called from a Workflow
Called from an E-Mail
Called from a Voice-to-text service
More technical info and the source code can be found here:
Feel free to contribute, as this is by no means finished.