IBM Research has been at the forefront of exploring the potential of artificial intelligence in simplifying the development and deployment of code. In a significant milestone in 2021, they introduced CodeNet, a vast dataset comprising 500 million lines of code across more than 50 programming languages, along with code snippets, problems, and descriptions. This dataset was envisioned to train future AI agents capable of translating code from older languages to modern ones used in enterprise systems. Additionally, these agents could assist developers in debugging code and even generating code from plain English instructions.
Large language models (LLMs) trained on code have been a game-changer in the software development landscape. These models are increasingly being integrated into development environments to enhance the efficiency of human programmers. Moreover, LLM-based agents are showing promise in autonomously handling complex coding tasks. To fully leverage the potential of code LLMs, a diverse set of capabilities is required, including code generation, bug fixing, code explanation, documentation, and repository maintenance.
The remarkable advancements in LLM technology over recent years have fueled IBM's ambition to translate their vision into reality. This led to the creation of the IBM Watsonx Code Assistant (WCA) product line, such as WCA for Ansible Lightspeed for IT Automation and WCA for IBM Z for application modernization. WCA for IBM Z leverages automated tools and IBM's proprietary 20-billion parameter Granite large language code model to help enterprises convert monolithic COBOL applications into optimized services for IBM Z.
IBM's focus has been on enhancing developer productivity by reducing the time spent on troubleshooting code errors and integrating legacy systems with modern applications. As part of this commitment, IBM has decided to open-source four variations of the IBM Granite code model. These models are designed to assist with code generation tasks and have been trained on code samples from 116 programming languages. The Granite code models family includes models of varying sizes, ranging from 3 to 34 billion parameters, encompassing both base and instruction-following model variants to cater to a wide range of application modernization needs.
By releasing a series of decoder-only Granite code models, IBM is providing developers with powerful tools for tackling complex coding challenges. These models offer versatility, making them suitable for tasks ranging from intricate application modernization projects to scenarios with limited memory resources. IBM's commitment to empowering developers with cutting-edge AI tools underscores their dedication to driving innovation and efficiency in the software development domain.