Steampipe

open-source analysis system built on postgresql to let you treat many APIs like SQL tables https://steampipe.io/

Tool to try for Doing SQL Data Analysis As A PM?

Jun07'2022

  • I'm waiting for some SQL access and work, and not sure if it's even possible/safe (small team). So this looks interesting....
  • do brew tap turbot/tap then brew install steampipe then steampipe -v
  • then steampipe plugin install csv
  • where do I put the CSV file? I'll guess right in ~/.steampipe - ah, by default it searches within the CWD. So cd over to where the CSV really wants to be.
  • then do steampipe query → get > prompt
  • select count(*) from subscriptions;Error: relation "subscriptions" does not exist (SQLSTATE 42P01)
  • the .inspect steps work fine.
  • select count(*) from csv.Subscriptions;Error: relation "csv.subscriptions" does not exist (SQLSTATE 42P01)
  • ah, Mac case-sensitivity issueselect count(*) from "Subscriptions"; (the actual filename is Subscriptions.csv) works!
  • select date_trunc('month', subscriptions.created_at) as create_month, count(*) from "Subscriptions" group by create_month order by create_month;Error: missing FROM-clause entry for table "subscriptions" (SQLSTATE 42P01)
  • try simpler select distinct "Subscriptions.status" from "Subscriptions";Error: column "Subscriptions.status" does not exist (SQLSTATE 42703)
  • select distinct subscriptions.status from "Subscriptions";Error: missing FROM-clause entry for table "subscriptions" (SQLSTATE 42P01)
  • hmm note that column header is actually subscriptions.status
  • aha the answer is select distinct("subscriptions.status") from "Subscriptions"; → works!
  • select date_trunc('month', "subscriptions.created_at") as create_month, count(*) from "Subscriptions" group by create_month order by create_month;Error: function date_trunc(unknown, text) does not exist (SQLSTATE 42883) grrr
  • maybe because the date fields in this file look like 03-Jul-2022 00:00
  • Jon Udell zoomed with me - ah, Steampipe is generating the schema when it reads the file, and it treats every field as string/text, so I have to include type-conversions within each query.

Jun08

  • ah, the real issue is that all my fields have <null> as value!
  • maybe because file has Windows line-breaks?
  • changing line-endings to Unix made no difference. Changing to Mac, no difference.
  • does it automatically re-parse the file? Jon said something about it being a "file-watcher", which implies it's automatic....
  • tried quitting steampipe CLI then re-launching. Now get Error: failed to start plugin 'csv': failed to parse file header /Users/<path>/Subscriptions.csv: parse error on line 1, column 3432: extraneous or missing " in quoted-field (SQLSTATE HV000)
  • that was with Mac line-endings. So quit again, change file to Unix, re-launch. Now getting successful parse but null fields again.
  • pass to https://csvlint.io/ → no errors, 9 warnings of columns with inconsistent data.... check first column, it's a phone-number field - some have a +1 at the beginning others don't (therefore "look like" numbers?).... check other columns, they're also phone number columns
    • also it notes that the "standard" is CRLF (Windows line-ending), so I'll guess I've leave that alone
  • also, in browsing file, notice some rows that have a CR in the middle of a text field (e.g. address separators), wonder what that's doing
  • download the "Standardized CSV" from https://csvlint.io/, compare files in BBEdit, see the standardized file messed up UTF-8 characters like curly quotes and accented chars. But lint didn't do anything to the mid-field CR issues.
  • re-download raw file, leave as Windows. Use regex to fix the mid-field-CR cases. Parse again, get null values again.
  • stripped file down to 4 records, still no good.
  • stripped down further to first 3 columns, still no good.
  • Jon played around, realized issue was column-names with period in them! Once I munged the header, even the date-trunc started working!
select date_trunc('month', to_date("subscriptions__created_at", 'DD-Mon-YYYY')) as create_month, count(*) from "Subscriptions" group by create_month order by create_month;
  • though there are some TimeZone games being played, because the date-trunc values I get look like 2022-03-31T19:00:00-05:00 - is that Steampipe or Postgresql?
    • Jon notes this is Postgresql, and the solution is to use date_trunc('month', to_date('07-Jun-2022 00:00', 'DD-Mon-YYYY')::timestamp without time zone)
    • heh old notes: Postgresql Timezone Problem

Jun13

  • have a new file - AC_DealsWon.csv exported from ActiveCampaign CRM; put it in different directory, cd over there then launch steampipe query again
  • try a count(*) query → Error: relation "AC_DealsWon" does not exist (SQLSTATE 42P01)
    • look at file - not every cell has quote marks, and there some semi-weird-looking cases...
  • import into gsheet, export out to new CSV, try querying that → same issue
    • look at file - wow even fewer quote-marks than the original!
  • submit the gsheet-export to CSVLint.io → 834 errors for inconsistent line breaks! Download "standardized"
    • the gsheet-export has Windows line-endings; the original file has unix
  • submit the original file to CSVLint.io - 1 error for some weird whitespace at the start of the file, which I don't see when I open in BBEdit. Try to download "standardized" but webapp fails
  • I'm suspicious that some of the headers have a ? at the end, so going to remove those from the original file... also removed some other weird characters from header (incl commas). And save as Windows-line-end
  • steampipe → same fail; CSVLint → still whitespace error
  • Jon says he can work with my 2-line file. So I take that same set and it still fails!
  • inspect csv → list which doesn't include any of the files I tried today
  • copy my file to directory where I worked the other day → still fails
  • how about I retry old work? (Note that I've renamed the container directory for all this work since.) So I cd into that old (but renamed folder):
> select count(*) from "reportSubscriptions";
Error: failed to start plugin 'csv': failed to parse file header /Users/billseitz/documents/career/CommunityPhone/data/AC_DealsWon.csv: parse error on line 1, column 4: bare " in non-quoted-field (SQLSTATE HV000)

Jun23: we recently released CSV plugin v0.3.2, which should fix the issue with null values for header columns with .s in them. If you still run into issues with these column names, please let us know. Just do steampipe plugin update csv - will do brew upgrade steampipe at same time because there's an update at that level, too.


Edited:    |       |    Search Twitter for discussion