From 0d032b7a7cec11b50c86be3306f240d7f2ccc1f1 Mon Sep 17 00:00:00 2001
From: Chad MILLER <chad@cornsilk.net>
Date: Mon, 20 May 2019 08:55:45 -0700
Subject: [PATCH] Remove download-csv description from README

---
 README.md | 26 +++++++-------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/README.md b/README.md
index 7f49a31..34e52cf 100644
--- a/README.md
+++ b/README.md
@@ -19,16 +19,13 @@ reporting easier.
 
 Operational overview:
 
-1. Python script `load_data.py`:
+1. Django management command `load_data`:
  * downloads a zip clinical trials registry data from ClinicalTrials.gov
- * converts the XML to JSON
- * uploads it to BigQuery
- * runs SQL to transform it to tabular format including fields to
-   indentify ACTs and their lateness
- * downloads SQL as a CSV file
+ * transforms XML into a CSV file
+ * all of #2, `process_data`
 
 2. Django management command `process_data`:
-  * imports CSV file into Django models
+  * imports existing CSV file into Django models
   * precomputes aggregate statistics and turns these into rankings
   * handles other metadata (in particular, hiding trials that are no
     longer ACTs)
@@ -42,12 +39,9 @@ loaded into a staging database / website.
 A separate command copies new data from staging to production
 (following moderation).
 
-Much complex logic has been expressed in SQL, which makes it hard to read
-and test.  This is a legacy of splitting the development between
-academics with the domain expertise (and who could use SQL to
-prototype) and software engineers.  Now the project has been running
-for a while and new development interations are less frequent, a useful
-project would be as much of this logic to Python.
+In the past, importing processes computed and filtered in SQL through
+Bigtable service and some JSON processing, but that is largely gone.
+You may still see scars.
 
 Similarly, the only reason step (1) exists is to create a CSV which
 can be imported to the database.  That CSV is useful in its own right
@@ -56,12 +50,6 @@ intermediate formats that could legitimately be dropped in a
 refactored solution (and the CSV could be generated directly from the
 database).
 
-The historic reason for the XML -> JSON route is because BigQuery
-includes a number of useful JSON functions which can be manipulated by
-people competent in SQL. At the time of writing, there
-is [an open issue](https://github.com/ebmdatalab/clinicaltrials-act-tracker/issues/121) with
-some ideas about refactoring this process.
-
 Static Pages
 ============