Published on

Top Platforms to Help You Start Building AI in 2023

Authors
  • avatar
    Name
    Nathan Peper
    Twitter

AI Can Be Overwhelming

AI is an overloaded buzzword that you can't avoid these days, but it's for good reason. Across industries, the adoption of sensors to monitor objects and environments is on the rise, increasing volumes of data are being stored, compute capabilities are evolving, and network connectivity is not only improving but also expanding to new areas.  These foundational components have matured to the point that data science researchers and practitioners have been able to show significant, tangible business value in recent years and this has resulted in an explosion of new tools and companies formed to solve different pain points in the AI application lifecycle.

With all of the buzzwords, terminology, innovation, and rapid changes in the industry, it may feel like it's pointless to try to learn how to build something of your own. You're not alone and that's why I hope this overview can help you learn about and get started with some of the foundational tools and platforms that are helping to democratize AI.

Start Learning to Code in Python

Don't overthink it, just start by learning Python (unless you're in a company or part of an industry that has a strong preference for something like R). There are a number of different programming languages that are out there, but Python has massive popularity, a large community, and is extremely user-friendly - all of which will pay huge dividends as you run into issues learning how to code and searching the depths of the internet for helpful answers. Just as important, once you really begin to learn how to use one programming language, the barriers to learning another language are MUCH lower.

Now you may be thinking, "What about all of these no-code or low-code platforms or the text-to-code capabilities I'm hearing about with OpenAI's ChatGPT, GitHub Copilot, or AWS CodeWhisperer?" First, I personally don't believe the hype of all these no-code and low-code platforms. If it's no- or low-code, the code has been abstracted away from you which takes away your control and creates vendor lock-in. You'll still have to learn a new skill to use their platform with all of the clicking and dragging things around with all of the various menu options and once your trial period runs out for that new, critical capability you just invested so much time in building on a proprietary platform, you'll have to start paying the bill just to keep it running. On the other hand, text-to-code sounds great in theory, but in practice you still have to learn how to read and troubleshoot code. These tools are better suited for people that understand programming to help speed their development timelines.

Python is extremely user-friendly and the best way to do it is by actually building things.  Sign up for an in-person course, online course, watch video tutorials, or just find a site that just helps you build small applications (I was hooked by Automate The Boring Stuff). If you want to get started on your local computer, one thing to consider right away is exactly what and how to install the programs needed to get started. The most popular and easiest path for beginners is to just install Anaconda. It's a great data science and machine learning platform that handles a lot of the complexity for you. As you get comfortable, I HIGHLY recommend learning more about how to use the libraries and channels that are optimized for the system you're working on. My work laptop has an Intel CPU, so I can get a massive performance boost by using Intel-optimized packages. Some of these come as the defaults from Anaconda, but if you take a few minutes to learn how to install packages from a specific channel you'll gain access to many, many more optimized packages. But most importantly, to learn how to code you have to actually follow along and build with code!

Get Familiar with Jupyter

If you're embarking on this journey, you **WILL **have to get familiar with some form or variation of Jupyter - Jupyter Notebooks, JupyterLab, or JupyterHub. Project Jupyter is a non-profit, open-source project that promises to always be 100% open-source software, free for all to use. Jupyter Notebooks is the OG web application that took off in popularity across the community. JupyterLab is the next-generation interactive development environment for notebooks, code, and data, which is where all of the current focus and support resides. Because Project Jupyter is massively popular and open-source, if you learn the basics of working within this environment, you'll easily be able to handle all of the service platforms listed below. While they are not all exactly the same, they are all based on Project Jupyter with their own integrations and add-ons.

However, you can and should definitely use JupyterLab as your interactive interface for learning python. Depending on things like your level of experience, administrative rights on the computer you're using, and the type of computer you're working on, the barriers to entry are probably the lowest with a hosted notebook service to abstract away some of the complexities of the underlying hardware and operating system, especially when it comes to Linux and Windows.

Remote versus Local Development

If you've just decided to begin the journey of learning more about getting hands-on with AI, I would highly recommend that you reduce as much of the complexity as possible and use a browser-based, remote (cloud) managed service. Stick with one service and focus on learning python while building in and interacting with Jupyter. Most of these services with also allow you to interact with the command line/terminal, which will be a natural progression as you learn to do more advanced work and need specific libraries and packages to solve the problem you're working on.

As you get more comfortable, consider learning how to install an Integrated Developer Environment (IDE) on your local computer and then learning how to connect it to the remote, managed service you're using or trying to pull the entire project locally and continue to develop on your local system. There are a lot of options out there, but I'd recommend VS Code or PyCharm. Atom, Spyder, and Sublime are other great names you might hear in the industry, but I'd start with two of the biggest and most supported.

Learning how to connect local and remote assets will most likely be another painful experience, but as you troubleshoot and search for answers you'll learn a lot about the intricacies of various hardware and operating system configurations, package and library dependencies, networking, IP addresses and port configurations,  firewalls and proxy settings, SSH and keys, etc.

One painful thing that I ran into quickly is that the most popular operating systems used in cloud computing and AI are Linux-based, so if you're learning on a Windows-based computer you're going to run into a number of issues. One way to help reduce this friction is to learn about Windows Subsystem for Linux (WSL). If you're not familiar, you'll thank me later.

I'm not going to lie, this journey requires a lot of learning to get through, but you're joining a very open and collaborative community. They ask and answer questions publicly in a number of forums that you'll easily find searching online. There definitely isn't a best, linear learning path to follow. But if you want to avoid the hassles of overcoming setup and infrastructure issues, I think the general process outlined above should really help speed up the learning process for you.

Now For the Top Platforms

Google Colab

Google's Colaboratory, or "Colab" for short, is Google Research's freemium model for a hosted Jupyter notebook service. It allows anyone to write and execute arbitrary python code (and other languages) through the browser. It also allows you to save and share your Jupyter Notebooks with others. All notebooks have access to CPUs, and while it is also limited and varies based on where you fall within the freemium plan, you can also access GPUs and TPUs.  Not only does it have a number of the most popular libraries pre-installed, such as NumPy, Pandas, Matplotlib, PyTorch, TensorFlow, and Keras, but users can also install additional libraries, such as Intel's OpenVINO or optimized versions of PyTorch and TensorFlow. Additionally, while Colab does not use persistent storage between sessions, you can save your files to just about any location and it already has some integrations to make connecting to various Google products easier.

Kaggle Kernels

Kaggle Kernels are integrated notebooks that allow users to write, run, and share their code with the Kaggle community. Kaggle is a Google subsidiary and community platform for data scientists and AI enthusiasts that integrates the ability to collaborate with others, find and publish datasets, use Kaggle Kernels with integrated, but time-limited, GPUs, and compete with other data scientists to solve data science problems through "Kaggle Competitions." Kaggle provides a maximum of 30 hours per week of GPU and 20 hours per week of TPU. Kaggle Kernel environments come preinstalled with a number of common libraries and users have access to install new ones as needed. Additionally, Kaggle supports session persistence for variables and files as long as the user enables the feature. Because Kaggle is more of a community platform, there are also a number of learning resources to help you along your learning journey.

GitHub Codespaces

GitHub Codespaces are development environments hosted in the Microsoft Azure Cloud. While this isn't a purpose-built platform for AI, the platform provides Jupyter templates that allow you to launch and connect to a web-based VS Code session that is running a notebook.   The free tier provides access to 2-core or 4-core CPU machines at 120 core hours per month, which results in 60 or 30 hours free respectively. While GPU access is not available for free, you can request additional resources for a fee. This is still a great option for those looking to get started with AI, work within VS Code, and learn more about version control.  

Want to keep getting updates and tips like these?

Datalore

Datalore by Jetbrains is another great hosted notebook service similar to Jupyter for people getting started with learning and building AI. They provide a free tier with 2 vCPUs to get started and the user interface and existing integrations make it very easy to get started. Higher CPU core counts and access to GPUs are also available for a price if you need to upgrade. Because this platform is provided by the creators of PyCharm IDE, this could be the ideal platform to try out if you're already familiar with or are planning on using PyCharm in the future.

Paperspace Gradient

Paperspace Gradient Notebooks provides a hosted notebook service with a free tier to get started and it's easy to set up. The free tier also provides free access to a GPU, but the notebooks are public and auto-shutdown after a maximum of 6 hours. This is a common practice in freemium plans to prevent users from consuming all of the resources on large model training jobs. Due to the popularity of the platform and the limited number of free resources, users can also run into scenarios where the free tier is "Out of Capacity" just like I did at the time of writing this section.  

Deepnote

Deepnote is a hosted notebook service that is targeted toward team collaboration. It also provides a free tier to get started and is easy to set up. It also has a number of built-in integrations and connection guides for many popular services. The free tier provides the ability for up to 3 editors to collaborate on the same project, up to 5 projects at a time, and unlimited basic machines with 5GB RAM and 2 vCPUs.

CoCalc

Collaborative Calculation and Data Science (CoCalc) is a managed service provided by Sagemath, Inc. It is a cloud-based collaborative software oriented toward research, teaching, and scientific publishing purposes. It also provides a freemium model for its services and provides access to its implementation of JupyterLab with real-time synchronization and collaboration. The free tier is very limited in features compared to most, but it is not time restricted and can scale up to larger instances if needed.  

IBM Cloud Pak for Data as a Service - Watson Studio

IBM Watson Studio is a service provided as part of the Cloud Pak for Data as a Service. It provides a free tier to get started with IBM's hosted notebook and code-free tools solutions. While it does show a more complex environment to develop AI applications, IBM has done a great job at integrating into and providing flexibility for all of the various considerations of a unified solution. While it may be a bit overwhelming for beginners, especially when starting from a blank slate, they also provide industry-relevant templates to show users how all of the pieces can come together and be managed.

Binder

Binder is a great resource that's focused on enabling you to share your project once you're done and ready to share the URL or your GitHub repository. It's free, part of Project Jupyter, and allows users to interact with your notebooks in a live environment. However, it runs on very minimal resources that cannot be increased. It's also a service that containerizes your code and does not allow you to save directly back to the file system or repo, just interact.  While this may seem limiting, it is a great option to share your complete work with others in a minimal, interactive environment easily, but it's obviously not suited for serious development.

Cloud Service Provider Hosted Notebooks

Now there are numerous variations of notebook services that you can launch with nearly all of the Cloud Service Providers (CSPs), but they're not as simple as most of the above signup, click, and deploy services. However, if you're already familiar with their services or are ready to take on learning how to configure, build, and deploy cloud services, it definitely worth looking into AWS SageMaker, Azure Notebooks, GCP Notebooks, etc.  CSPs are rapidly growing, evolving, and building new products and services, which means your setup could require updates or break along the way. However, all of these CSPs offer free credits and incentives to get started with their platforms. So don't rule them out completely, especially as you continue to learn and get more comfortable working with these tools.