AutoGPT: Does it Live up to the Hype?
I was working out an outline for a proof of concept: how to generate vulnerability remediation content for certain situations. A close friend and former colleague turned me on to AutoGPT.
I built it that weekend and at first: I thought I was hooked.
Look at this demo. Who wouldn't be hooked? I suggest you pause to view it.
For the best experience, maximize the video and change the playback speed to 50% so you can read and understand what is happening.
If you get it, you get it. I know my former fellow BigFixers, HCLites, and IBMers will catch on immediately.
If you don't understand what is happening: AutoGPT is extrapolating on goals and spawning commands without additional instruction. If you were to generate similar output with ChatGPT you'd have to send a long series of chained prompts - and that's not even considering the fact that ChatGPT can't browse the web for you - at least not just yet (at the time of this blog post ChatGPT does have a browser mode that is currently in Alpha).
The First Test for AutoGPT
My first test: to recreate some of the output from the Demo video.
I told AutoGPT that it's role was to be a BigFix_Expert-AI, that it knew everything about creating custom content and that it loved to blog about the subject.
It's goals:
Grow a LinkedIn following by creating ten interesting and valuable post content.
Grow it's blog audience with ten valuable articles going deep in to how it authored unique and useful BigFix content.
Ensure that all content is fresh, not referring to BigFix versions 9.x and older, and only looking for content opportunities from the past six months (i.e. don't solve a forum.bigfix.com problem from earlier than Fall 2022).
The outcome:
It started off really well. It decided that it needed to research forum.bigfix.com and it looked at Substack posts, LinkedIn posts, and it reviewed the BigFix documentation on HCL's website.
It attempted (and failed) to clone the BigFix Developer Git Hub repo, so I pulled it to a local folder that AutoGPT could write to (note that I didn't do this in Docker because previous AutoGPT attempts crashed and burned, which caused the Docker environment to vaporize so I couldn't review any files left behind).
The net: AutoGPT did pen a few blog posts but it took 10 hours (and about $1.35 in GPT tokens) and the posts were unremarkable. I've gotten much better content with a few well crafted chain prompts in ChatGPT 3.5.
What IS remarkable is the fact that AutoGPT didn't learn from any of its activities.
It tried and failed to clone the BigFix Developer repo 20 times, despite me reminding it that it was already locally cloned.