GitHub repo architecture analysis
How to Automatically Analyze Your GitHub Repo's Architecture (AST Graph Method)
Learn how AST-based analysis reveals your repository's true architecture — including circular dependencies, missing API gateways, hardcoded secrets, and 8 other critical flaws found in production repos.
How to Automatically Analyze Your GitHub Repo's Architecture (AST Graph Method)
Most architecture documentation lies. It shows you the system as it was designed — not as it actually runs. The codebase tells the truth.
GitHub repository architecture analysis is the process of programmatically examining your actual code to reconstruct the real architecture — the one your services are running in production. This post explains how AST-based analysis works, what it reveals that diagrams hide, and how to run it against your own repository.
Why Architecture Documents Are Wrong
When Sudarshan first ran our Deep Scanner against a 2-year-old production SaaS codebase, the architecture diagram showed a clean 3-tier system: API → Service Layer → Database.
The AST analysis showed something different:
- 14 places where the API directly queried the database (no service layer)
- 3 circular dependencies between modules
- API keys hardcoded in
.jsfiles committed to the repo (not in.env) - No rate limiting on any public endpoints
- Direct database connection from 2 Lambda functions (bypassing the API entirely)
The architectural diagram was aspirational. The code was the truth.
What Is AST Analysis?
An Abstract Syntax Tree (AST) is a tree-based representation of source code that strips away syntax and exposes semantic structure. When you parse a JavaScript file into an AST, you can traverse it to find:
- Every import/require statement (reveals module dependencies)
- Every function call (reveals cross-module coupling)
- Every database query (reveals data access patterns)
- Every HTTP client call (reveals external service dependencies)
- Every environment variable reference (confirms secret management practices)
- Every hardcoded string (reveals potential security issues)
When you run this analysis across an entire repository, you can reconstruct a dependency graph — a map of how every module connects to every other module.
How AST Traversal Reveals Architecture
Here's a conceptual example of what AST traversal looks like for a Node.js/Express API:
// What the code looks like
const db = require('../database/connection');
router.get('/users/:id', async (req, res) => {
const user = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
res.json(user);
});
An AST traversal of this file extracts:
- Import node:
../database/connection→ this route directly accesses the DB - SQL literal:
SELECT * FROM users WHERE id = $1→ raw SQL, potential injection point - No auth middleware: No
authenticateorauthorizefunction call before the handler
Multiply this across 200 files and you get a complete picture of your actual architecture.
The 8 Most Common Architectural Flaws Found in Real Repos
After running architecture analysis on dozens of production repositories, these are the patterns that appear most frequently:
1. Circular Dependencies
What it looks like:
user-service → order-service → notification-service → user-service
Why it's dangerous: Circular dependencies mean you can't independently deploy, test, or reason about any single service. A change in user-service can have unpredictable cascading effects.
Fix: Introduce a shared message queue. Services emit events; they don't call each other directly.
2. Missing API Gateway Layer
What it looks like:
Frontend → Direct Lambda invocations via AWS SDK
Why it's dangerous: No centralized rate limiting, no unified authentication, no request logging, no circuit breaking. Every Lambda function is independently handling concerns that belong in a gateway.
Fix: Route all external traffic through API Gateway or ALB. Lambda functions should only be invoked by the gateway, never directly from clients.
3. No Rate Limiting on Public Endpoints
What it looks like:
router.post('/api/query', authMiddleware, async (req, res) => {
// No rate limiting middleware
const result = await expensiveAICall(req.body.prompt);
res.json(result);
});
Why it's dangerous: Any authenticated user can make unlimited expensive API calls. Without rate limiting, a single bad actor (or a client bug) can exhaust your AI API budget in minutes.
Fix: Apply Redis-based rate limiting middleware upstream of any expensive operation.
4. Hardcoded API Keys in Source
What it looks like:
const stripe = require('stripe')('sk_live_4xABCDEFGHIJKLMNOP');
Why it's dangerous: Git history is permanent. Even if you remove the key, it exists in every commit before the removal. Anyone who can access the repo can access the key.
Fix: All secrets must come from environment variables. Use AWS Secrets Manager or SSM Parameter Store for production.
5. Direct Database Access from Multiple Services
What it looks like:
User Service → PostgreSQL
Order Service → PostgreSQL (same tables)
Reporting Service → PostgreSQL (same tables)
Why it's dangerous: Multiple services sharing a database schema means schema changes require coordinated multi-service deployments. One service can accidentally corrupt data that another service owns.
Fix: Each service owns its own tables. Cross-service data access must go through the owning service's API.
6. Missing Retry and Circuit Breaker Logic
What it looks like:
const result = await fetch('https://external-api.com/endpoint');
// No timeout, no retry, no fallback
Why it's dangerous: External services fail. Without retry logic, a single failed external call causes a user-visible error. Without circuit breakers, a slow external service causes your entire request queue to pile up.
Fix: Use a resilience library (e.g., cockatiel for Node.js) with exponential backoff and circuit breakers on all external HTTP calls.
7. Unbounded Database Queries
What it looks like:
const allOrders = await db.query('SELECT * FROM orders WHERE user_id = $1', [userId]);
Why it's dangerous: This query works fine with 10 orders per user. At 10,000 orders per user, it loads everything into memory, causes GC pressure, and could OOM the service.
Fix: All list queries must have LIMIT and use cursor-based pagination. Never query without bounds.
8. No Health Check Endpoints
What it looks like:
GET /health → 404
Why it's dangerous: Load balancers and orchestrators (ECS, Kubernetes) need health endpoints to detect and replace failed instances. Without them, traffic routes to dead instances.
Fix: Every service must implement GET /health returning {"status": "healthy"} with proper dependency checks (database connectivity, cache connectivity).
How Deep Scanner Works
SudarshanAI's Deep Scanner automates this entire analysis process. You paste a GitHub repository URL, and the engine:
- Clones the repository
- Runs AST analysis across all source files
- Builds a dependency graph
- Identifies the patterns above and additional security vulnerabilities
- Generates a full architecture map of the repository
The output shows you both the actual architecture (derived from AST) and the ideal architecture for your system's scale — with specific recommendations for closing the gap.
For a typical 50,000-line repository, the analysis runs in 30-60 seconds.
Running Analysis on Your Own Repository
To get started:
- Navigate to Scanner Mode
- Paste your GitHub repository URL (public repos only in guest mode)
- Review the generated architecture map and security findings
- Export the recommendations report
The analysis is free for one run without an account — no setup required.
Try It Free
Try SudarshanAI Free
Turn any infrastructure idea into a production-ready blueprint in 60 seconds. No signup, no credit card.
Generate Your Blueprint →