Monitoring API Data Changes in Python
Overview
When building applications that rely on external data sources, detecting changes in API responses is a common requirement. Unlike file system monitoring, where tools like watchdog can track changes, monitoring API data requires a different approach. This guide explores a robust solution for monitoring API data changes using Python libraries.
Core Libraries
The monitoring solution relies on three key libraries:
| Library | Purpose | Function |
|---|---|---|
requests | HTTP client | Fetches data from API endpoints |
deepdiff | Object comparison | Detects changes between API responses |
polling2 | Scheduling | Polls APIs at intervals to check for changes |
requests
The requests library is the standard for making HTTP requests in Python:
import requests
def fetch_data(url):
response = requests.get(url)
response.raise_for_status() # Raises exception for HTTP errors
return response.json()
deepdiff
The deepdiff library specializes in detecting differences between complex data structures:
from deepdiff import DeepDiff
# Compare two API responses
diff = DeepDiff(previous_data, current_data, ignore_order=True)
if diff:
# Changes detected
print(f"Changes found: {diff}")
polling2
The polling2 library simplifies periodic checking:
import polling2
# Poll every 60 seconds until a condition is met
polling2.poll(
check_for_changes,
step=60,
timeout=None # Run indefinitely
)
Monitoring Process
The API monitoring workflow follows these steps:
- Initial Fetch: Retrieve the initial data from the API
- Data Storage: Store this data for future comparison
- Periodic Polling: Fetch new data at regular intervals
- Comparison: Compare new data with previous data
- Action: Respond to detected changes
Complete Implementation
Here's a complete solution that ties everything together:
import requests
from deepdiff import DeepDiff
import polling2
# Configuration
API_URL = "https://api.example.com/users"
def fetch_data():
"""Fetch data from the API."""
response = requests.get(API_URL)
response.raise_for_status()
return response.json()
def check_for_changes():
"""Check if there are any changes in the API data."""
global previous_data
current_data = fetch_data()
diff = DeepDiff(previous_data, current_data, ignore_order=True)
if diff:
print("Changes detected:", diff)
# Update the reference data
previous_data = current_data
return True
return False
# Fetch initial data
previous_data = fetch_data()
print("Initial data fetched. Monitoring for changes...")
# Start polling
try:
polling2.poll(
check_for_changes, # Function to call
step=60, # Check every 60 seconds
timeout=None # Run indefinitely
)
except KeyboardInterrupt:
print("Monitoring stopped.")
Advanced Implementation Using APScheduler
Since polling2 is not actively maintained, here's an alternative implementation using APScheduler:
from apscheduler.schedulers.blocking import BlockingScheduler
import requests
from deepdiff import DeepDiff
# Configuration
API_URL = "https://api.example.com/users"
previous_data = None
def fetch_data():
"""Fetch data from the API."""
response = requests.get(API_URL)
response.raise_for_status()
return response.json()
def check_for_changes():
"""Check if there are any changes in the API data."""
global previous_data
current_data = fetch_data()
diff = DeepDiff(previous_data, current_data, ignore_order=True)
if diff:
print("Changes detected:", diff)
previous_data = current_data
# Initialize with first data fetch
previous_data = fetch_data()
print("Initial data fetched. Monitoring for changes...")
# Configure and start the scheduler
scheduler = BlockingScheduler()
scheduler.add_job(check_for_changes, 'interval', seconds=60)
try:
scheduler.start()
except KeyboardInterrupt:
scheduler.shutdown()
print("Monitoring stopped.")
Understanding DeepDiff Output
When changes are detected, deepdiff provides detailed information about what changed:
Changes detected: {
'values_changed': {
"root['users'][0]['name']": {
'new_value': 'Jane',
'old_value': 'John'
}
}
}
This output shows that in the first user object, the name changed from "John" to "Jane".
Optimization Strategies
For production environments, consider these optimizations:
1. Efficient Data Storage
import redis
# Connect to Redis
r = redis.Redis()
def store_previous_data(data):
r.set('previous_api_data', json.dumps(data))
def get_previous_data():
data = r.get('previous_api_data')
return json.loads(data) if data else None
2. Error Handling with Exponential Backoff
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=4, max=60))
def fetch_data_with_retry():
"""Fetch data with retry logic."""
response = requests.get(API_URL)
response.raise_for_status()
return response.json()
3. Monitoring Multiple Endpoints
endpoints = {
"users": "https://api.example.com/users",
"products": "https://api.example.com/products"
}
previous_data = {endpoint: None for endpoint in endpoints}
def check_endpoint(name, url):
current_data = fetch_data(url)
diff = DeepDiff(previous_data[name], current_data, ignore_order=True)
if diff:
print(f"Changes detected in {name}:", diff)
previous_data[name] = current_data
Practical Considerations
When implementing API monitoring, keep these factors in mind:
- API Rate Limits: Adjust polling intervals to respect API limits
- Data Size: Filter responses to reduce comparison overhead
- Webhooks: When available, use webhooks instead of polling
- Persistence: Store state in a database for monitoring across restarts
- Alerting: Send notifications when critical changes are detected
Summary
Monitoring API data changes in Python requires three key components:
- Fetching data with the
requestslibrary - Detecting differences with
deepdiff - Periodic checking with
polling2orAPScheduler
This pattern works well for APIs without real-time notification features and can be optimized for production environments with additional error handling, persistence, and efficiency improvements.