Why does instagrapi get challenge_required on every Lambda cold start?

Each cold start is a fresh container with a fresh device fingerprint. Instagram sees a new device + new IP and triggers the challenge. The fix is to externalize the session — load it from DynamoDB or S3 at handler startup, save updated cookies back at the end.

Can a Lambda function run a long instagrapi scrape?

AWS Lambda's hard 15-minute timeout means: yes for short fetches (single user info, post details), no for follower walks of thousands of accounts. Long scrapes need Step Functions chaining or ECS/Fargate.

What's the right Lambda runtime for instagrapi?

Python 3.11 or 3.12. Use an AWS Lambda layer for instagrapi + its deps to keep handler package small. The current instagrapi requires Python 3.10+; pre-2.5.0 versions are needed for 3.9.

Will my Lambda IP trigger Instagram blocks?

AWS IPs are pre-flagged. Either route through a residential proxy (set_proxy on every cold start), or use HikerAPI which runs on managed residential pools.

Deploy instagrapi to AWS Lambda: cold starts, sessions, timeouts

AWS Lambda is a tempting home for instagrapi: pay-per-invoke pricing, automatic scaling, no server to keep alive, and a deployment story that fits inside a single CI step. The first prototype almost always works on the first try — package instagrapi into a layer, write a handler that calls cl.login() and cl.user_info_by_username(), wire it to API Gateway, and the response comes back in two seconds. The trouble starts on the third or fourth invocation. The function gets a challenge_required it never saw locally, then a please_wait_a_few_minutes, then the account stops responding for an hour. The cold-start environment that made the prototype look elegant is the same environment that throws away every piece of state Instagram uses to recognize the account, on every invocation that does not happen to land on a warm container.

Three Lambda properties bite instagrapi specifically. Cold starts mean a fresh container with a fresh device fingerprint each time the function scales out, which Instagram reads as a brand-new phone signing in to the account. The 15-minute hard timeout caps any single invocation, which makes longer scrapes — follower walks, full-feed pulls, paginated comment threads — impossible to complete in one call. And the AWS egress IP ranges are on Instagram’s pre-flagged list, so even a perfectly persisted session logging in from us-east-1 looks like a credential takeover from a known datacenter range. None of these is fatal on its own. All three together are why most “instagrapi on Lambda” projects ship once, fail by the end of the week, and quietly migrate to a long-running container. This page walks the Lambda-specific patterns that keep the function alive: externalize the session to DynamoDB or S3, route every call through a residential proxy, and choose Step Functions or Fargate the moment the workload outgrows fifteen minutes.

Setup

The packaging shape that scales is a Lambda layer for instagrapi and its dependencies, a thin handler package for the application code, an IAM role with read/write access to the DynamoDB session table, and environment variables for credentials and proxy URL. Use the Python 3.11 or 3.12 runtime — the current instagrapi requires Python 3.10 or newer, and the older 3.9 runtime needs an instagrapi version pinned below 2.5.0, which is not where you want a new deployment to start.

# Build the layer locally (Docker is the easiest reproducible path)
docker run --rm -v "$PWD":/var/task public.ecr.aws/sam/build-python3.11 \
  pip install instagrapi -t python/

zip -r instagrapi-layer.zip python/
aws lambda publish-layer-version \
  --layer-name instagrapi \
  --zip-file fileb://instagrapi-layer.zip \
  --compatible-runtimes python3.11

The layer is published once and reused across every function that needs instagrapi; the handler .zip itself stays under a megabyte and deploys in seconds. The DynamoDB table needs only a partition key on username — every other attribute is opaque JSON the handler reads and writes verbatim. The IAM role gets dynamodb:GetItem and dynamodb:PutItem scoped to that one table; the function does not need broader access, and the smaller permission surface helps when the same role ends up reused across other Lambdas later.

Working example

A complete invocation loads the session blob from DynamoDB before login(), pins the residential proxy in the same setup step, runs the IG call, and writes the updated session back before the response returns. The order matters: proxy before login, session load before login, session dump before return. Skipping any of those three is what produces the drift between a “working locally” prototype and a Lambda that fails on the first cold start in production.

# handler.py
import os, json, boto3
from instagrapi import Client

dynamodb = boto3.resource('dynamodb')
table    = dynamodb.Table(os.environ['SESSION_TABLE'])
USERNAME = os.environ['IG_USERNAME']

def lambda_handler(event, context):
    response = table.get_item(Key={'username': USERNAME})
    settings = response.get('Item', {}).get('settings')

    cl = Client()
    cl.set_proxy(os.environ['RESIDENTIAL_PROXY_URL'])
    if settings:
        cl.set_settings(json.loads(settings))
    cl.login(USERNAME, os.environ['IG_PASSWORD'])

    user = cl.user_info_by_username(event['username'])

    table.put_item(Item={
        'username': USERNAME,
        'settings': json.dumps(cl.get_settings()),
    })

    return {'statusCode': 200, 'body': json.dumps({'pk': user.pk, 'name': user.full_name})}

Wired through API Gateway, the request shape is a GET /lookup?username=instagram and the response is a JSON document with the IG user id and full name. A warm-container invocation completes in roughly 250 ms once the IG call dominates; a cold start adds the layer-load and the DynamoDB round-trip, which puts the worst case at around 2 seconds. The handler is also safe to call concurrently from multiple API Gateway requests as long as every concurrent invocation reads, mutates, and writes its own copy of the session — Lambda’s per-instance isolation keeps the Client objects separate, and DynamoDB’s last-write-wins semantics on a single-replica session is acceptable for read-heavy workloads. Write-heavy patterns (uploads, follow/unfollow loops) need the conditional-write pattern from the Deep dive section.

Production caveats

The three Lambda-specific patterns that break instagrapi deployments are listed below in roughly the order a team encounters them. Each one is solvable, none of the solutions are obvious from the AWS docs, and missing any one of them gives you the “works once, breaks forever” failure mode that drives most teams off Lambda inside the first week.

1. Cold-start fingerprint = challenge_required

Every cold start gives the function a fresh execution environment, which means a fresh in-memory Client with a freshly generated device fingerprint. Instagram reads a fresh fingerprint from a different IP as a new phone signing in to the account, and the response is a challenge_required on the first call. Externalize the session to DynamoDB (or S3 for larger blobs, or ElastiCache Redis for sub-millisecond reads), load it before login() on every invocation, and write the updated settings back at the end of the handler. Critical detail: save the session AFTER the IG call, not only at the start. Mid-handler, Instagram may have rotated cookies or issued a new CSRF token, and the only way the next invocation inherits those rotations is if the handler dumps state on its way out.

2. 15-minute hard timeout

Lambda’s per-invocation hard ceiling is fifteen minutes. A scrape of a single user’s first hundred followers fits comfortably; a walk of fifty thousand followers does not. The right shape for long-running work is to chunk the workload and chain invocations through Step Functions: one Lambda per page of results, the state machine handles retries and pagination tokens, the session blob lives in DynamoDB and is read by every chunk. Workloads that do not chunk cleanly — uploads of large videos, long-running listeners — belong on ECS or Fargate, where a single process can hold an instagrapi Client open for hours and the per-account fingerprint stays warm in memory the entire time. Forcing a long workload into Lambda by making the handler retry on timeout is the worst version: every retry is a new cold start, and Instagram sees a parade of fresh-fingerprint logins for the same account inside an hour.

3. AWS egress IPs pre-flagged

Every major cloud’s egress range is on Instagram’s pre-flagged list, and us-east-1 is the most-flagged of the lot. A perfectly persisted session loading from DynamoDB and logging in from a fresh AWS IP still trips the risk model. Pin one residential proxy per account and set_proxy() it before login() on every cold start — the proxy URL belongs in the environment variables next to the credentials, and a rotation strategy that swaps proxies between invocations is worse than a single sticky address (impossible-travel signals fire faster than IP-rotation signals fool anyone). See the proxy setup guide for the proxy-side patterns. NAT Gateway with a single Elastic IP is not a substitute; the IP is still in an AWS range, just one specific address inside it.

Fix in instagrapi

Four steps, in order. Each one assumes the previous one is already in place; skipping the order is how teams end up with a Lambda that fails for three different reasons at the same time and is hard to debug as a result.

Externalize the session to DynamoDB on every handler invocation. The blob is a few kilobytes of JSON. Read it on entry, write it on exit, and treat the local execution environment as throwaway. See the session persistence guide for storage-side patterns that work across both DynamoDB and S3 backends. A single DynamoDB table per account family is the cheapest version that scales to dozens of accounts with no operational overhead beyond the IAM role.
Use Step Functions for any workload longer than ten minutes. The 15-minute Lambda ceiling is hard, and “ten minutes” is the right working budget once you account for the cold start, the DynamoDB round-trip, and a margin for an unusually slow IG response. Step Functions chunk the work into sub-ten-minute invocations, and the state machine handles retries with exponential backoff against the same DynamoDB session. Anything that resists clean chunking — long uploads, sustained listeners — belongs on ECS or Fargate, not on Lambda with a wishful timeout.
Pin one residential proxy URL per account, stored alongside the credentials. Set the proxy before login() in every handler. Keep the same egress IP for the lifetime of the account; rotation between invocations is worse than sticky. The proxy URL belongs in RESIDENTIAL_PROXY_URL next to IG_USERNAME and IG_PASSWORD, and all three should come from AWS Secrets Manager (or Parameter Store with SecureString) rather than plain environment variables in the Lambda configuration — environment variables are visible to anyone with lambda:GetFunctionConfiguration, which is wider access than the credentials warrant.
Idempotent retry pattern in case of mid-handler timeout. A Lambda that times out mid-IG-call leaves the session in an undefined state — Instagram saw the request, the handler did not see the response, and DynamoDB still holds the pre-call settings. Wrap the handler in a structure that can re-run safely: idempotent IG operations (reads, idempotent upserts in your own database), explicit dedupe keys for any write that touches Instagram (uploads, follow/unfollow), and a CloudWatch alarm on the function’s Duration p99 so a slow IG response trips an alert before it trips the timeout.

Deep dive

The serverless replacement for a Celery beat schedule is EventBridge plus a scheduled Lambda. A rule that fires every fifteen minutes can run a periodic IG sync without any of the always-on cost of a Celery worker, and the same DynamoDB session table the API-Gateway handler writes to is read by the scheduled handler — both are calling the same account, so they must share state, and the conditional-write pattern shown below prevents a race where the API call and the scheduled call both update the session at once. For write-heavy concurrent workloads, replace the unconditional put_item with a ConditionExpression that fails on a version attribute mismatch, then retry the read+write on the conditional-check failure; that pattern keeps the session monotonically consistent even when several Lambdas finish at the same wall-clock instant.

from botocore.exceptions import ClientError

def save_session(cl, current_version):
    try:
        table.put_item(
            Item={
                'username': USERNAME,
                'settings': json.dumps(cl.get_settings()),
                'version':  current_version + 1,
            },
            ConditionExpression='attribute_not_exists(version) OR version = :v',
            ExpressionAttributeValues={':v': current_version},
        )
    except ClientError as e:
        if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
            # Another Lambda wrote first — re-read and retry from the caller.
            raise SessionVersionConflict()
        raise

The full version of that pattern is the same one Step Functions uses internally for its task tokens, and it is the cheapest way to get exactly-once semantics out of a system that gives you only at-least-once delivery by default.