> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Serverless AI in Nebius AI Cloud

Serverless AI is a Nebius AI Cloud service for running containerized AI workloads without creating or operating virtual machines or clusters. To run your workload in Serverless AI, you just need to choose how to deploy it (as an interactive *endpoint* or as a non-interactive *job*), specify the path to your container, and select the computing and storage resources that the workload requires.

Serverless AI handles resource provisioning and lifecycle (endpoints and jobs run on [Compute containers over VMs](/compute/virtual-machines/containers)), and usage-based, per-second billing, allowing you to focus on interacting with the workload and getting results from it. To catch and handle errors or unexpected outcomes, you can use the observability and debugging tools that Serverless AI provides.

## Endpoints and jobs

You can deploy your workload as an *endpoint* that listens for requests and returns results immediately, or as a *job* that runs in the background and quits after completing its task. Here is the comparison between endpoints and jobs at a glance:

|                             | **Endpoint**                                                              | **Job**                                                                                                                             |
| --------------------------- | ------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **Workflow**                | Interactive, listens for requests until you terminate it                  | Non-interactive, terminates upon task completion or timeout                                                                         |
| **Stop/start**              | Yes                                                                       | No                                                                                                                                  |
| **Public URL for requests** | Yes                                                                       | No                                                                                                                                  |
| **Typical lifetime**        | Hours to days                                                             | Minutes to days                                                                                                                     |
| **Use cases**               | Persistent workloads: serving and A/B-testing models, real-time inference | Batch workloads: pre-processing data, training and fine-tuning models, batch inference and model evaluation, scientific simulations |
| **Guides**                  | [Getting started with endpoints](./quickstart/endpoints)                  | [Getting started with jobs](./quickstart/jobs)                                                                                      |

## Observability and debugging

Each Serverless AI endpoint and job has a status that indicates the current stage in the lifecycle. If your endpoint or job fails, you can view its logs.

All endpoints and jobs also provide a wide range of GPU and vCPU utilization metrics, sourced from the Compute service and visualized in the web console. For more details, see [Monitoring endpoints and jobs](./monitoring).

## Pricing and quotas

Serverless AI follows Compute billing and quota rules. Billing is usage-based: the service charges you per-second for the computing and storage resources that you allocate to endpoints and jobs. Only active endpoints and jobs are billed and count towards quotas. This can help you avoid unnecessary costs compared to always-on infrastructure.

For more details, see [Pricing and quotas](./pricing-quotas).
