How to Automatically Analyze Your GitHub Repo's Architecture (AST Graph Method)

Most architecture documentation lies. It shows you the system as it was designed — not as it actually runs. The codebase tells the truth.

GitHub repository architecture analysis is the process of programmatically examining your actual code to reconstruct the real architecture — the one your services are running in production. This post explains how AST-based analysis works, what it reveals that diagrams hide, and how to run it against your own repository.

Why Architecture Documents Are Wrong

When Sudarshan first ran our Deep Scanner against a 2-year-old production SaaS codebase, the architecture diagram showed a clean 3-tier system: API → Service Layer → Database.

The AST analysis showed something different:

14 places where the API directly queried the database (no service layer)
3 circular dependencies between modules
API keys hardcoded in .js files committed to the repo (not in .env)
No rate limiting on any public endpoints
Direct database connection from 2 Lambda functions (bypassing the API entirely)

The architectural diagram was aspirational. The code was the truth.

What Is AST Analysis?

An Abstract Syntax Tree (AST) is a tree-based representation of source code that strips away syntax and exposes semantic structure. When you parse a JavaScript file into an AST, you can traverse it to find:

Every import/require statement (reveals module dependencies)
Every function call (reveals cross-module coupling)
Every database query (reveals data access patterns)
Every HTTP client call (reveals external service dependencies)
Every environment variable reference (confirms secret management practices)
Every hardcoded string (reveals potential security issues)

When you run this analysis across an entire repository, you can reconstruct a dependency graph — a map of how every module connects to every other module.

How AST Traversal Reveals Architecture

Here's a conceptual example of what AST traversal looks like for a Node.js/Express API:

// What the code looks like
const db = require('../database/connection');

router.get('/users/:id', async (req, res) => {
  const user = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
  res.json(user);
});

An AST traversal of this file extracts:

Import node: ../database/connection → this route directly accesses the DB
SQL literal: SELECT * FROM users WHERE id = $1 → raw SQL, potential injection point
No auth middleware: No authenticate or authorize function call before the handler

Multiply this across 200 files and you get a complete picture of your actual architecture.

The 8 Most Common Architectural Flaws Found in Real Repos

After running architecture analysis on dozens of production repositories, these are the patterns that appear most frequently:

1. Circular Dependencies

What it looks like:

user-service → order-service → notification-service → user-service

Why it's dangerous: Circular dependencies mean you can't independently deploy, test, or reason about any single service. A change in user-service can have unpredictable cascading effects.

Fix: Introduce a shared message queue. Services emit events; they don't call each other directly.

2. Missing API Gateway Layer

What it looks like:

Frontend → Direct Lambda invocations via AWS SDK

Why it's dangerous: No centralized rate limiting, no unified authentication, no request logging, no circuit breaking. Every Lambda function is independently handling concerns that belong in a gateway.

Fix: Route all external traffic through API Gateway or ALB. Lambda functions should only be invoked by the gateway, never directly from clients.

3. No Rate Limiting on Public Endpoints

What it looks like:

router.post('/api/query', authMiddleware, async (req, res) => {
  // No rate limiting middleware
  const result = await expensiveAICall(req.body.prompt);
  res.json(result);
});

Why it's dangerous: Any authenticated user can make unlimited expensive API calls. Without rate limiting, a single bad actor (or a client bug) can exhaust your AI API budget in minutes.

Fix: Apply Redis-based rate limiting middleware upstream of any expensive operation.

4. Hardcoded API Keys in Source

What it looks like:

const stripe = require('stripe')('sk_live_4xABCDEFGHIJKLMNOP');

Why it's dangerous: Git history is permanent. Even if you remove the key, it exists in every commit before the removal. Anyone who can access the repo can access the key.

Fix: All secrets must come from environment variables. Use AWS Secrets Manager or SSM Parameter Store for production.

5. Direct Database Access from Multiple Services

What it looks like:

User Service → PostgreSQL
Order Service → PostgreSQL (same tables)
Reporting Service → PostgreSQL (same tables)

Why it's dangerous: Multiple services sharing a database schema means schema changes require coordinated multi-service deployments. One service can accidentally corrupt data that another service owns.

Fix: Each service owns its own tables. Cross-service data access must go through the owning service's API.

6. Missing Retry and Circuit Breaker Logic

What it looks like:

const result = await fetch('https://external-api.com/endpoint');
// No timeout, no retry, no fallback

Why it's dangerous: External services fail. Without retry logic, a single failed external call causes a user-visible error. Without circuit breakers, a slow external service causes your entire request queue to pile up.

Fix: Use a resilience library (e.g., cockatiel for Node.js) with exponential backoff and circuit breakers on all external HTTP calls.

7. Unbounded Database Queries

What it looks like:

const allOrders = await db.query('SELECT * FROM orders WHERE user_id = $1', [userId]);

Why it's dangerous: This query works fine with 10 orders per user. At 10,000 orders per user, it loads everything into memory, causes GC pressure, and could OOM the service.

Fix: All list queries must have LIMIT and use cursor-based pagination. Never query without bounds.

8. No Health Check Endpoints

What it looks like:

GET /health → 404

Why it's dangerous: Load balancers and orchestrators (ECS, Kubernetes) need health endpoints to detect and replace failed instances. Without them, traffic routes to dead instances.

Fix: Every service must implement GET /health returning {"status": "healthy"} with proper dependency checks (database connectivity, cache connectivity).

How Deep Scanner Works

SudarshanAI's Deep Scanner automates this entire analysis process. You paste a GitHub repository URL, and the engine:

Clones the repository
Runs AST analysis across all source files
Builds a dependency graph
Identifies the patterns above and additional security vulnerabilities
Generates a full architecture map of the repository

The output shows you both the actual architecture (derived from AST) and the ideal architecture for your system's scale — with specific recommendations for closing the gap.

For a typical 50,000-line repository, the analysis runs in 30-60 seconds.

Running Analysis on Your Own Repository

To get started:

Navigate to Scanner Mode
Paste your GitHub repository URL (public repos only in guest mode)
Review the generated architecture map and security findings
Export the recommendations report

The analysis is free for one run without an account — no setup required.

How to Automatically Analyze Your GitHub Repo's Architecture (AST Graph Method)

Why Architecture Documents Are Wrong

What Is AST Analysis?

How AST Traversal Reveals Architecture

The 8 Most Common Architectural Flaws Found in Real Repos

1. Circular Dependencies

2. Missing API Gateway Layer

3. No Rate Limiting on Public Endpoints

4. Hardcoded API Keys in Source

5. Direct Database Access from Multiple Services

6. Missing Retry and Circuit Breaker Logic

7. Unbounded Database Queries

8. No Health Check Endpoints

How Deep Scanner Works

Running Analysis on Your Own Repository

Try SudarshanAI Free