Steampipe
open-source analysis system built on postgresql to let you treat many APIs like SQL tables https://steampipe.io/
Tool to try for Doing SQL Data Analysis As A PM?
code https://github.com/turbot/steampipe
Jun07'2022
- I'm waiting for some SQL access and work, and not sure if it's even possible/safe (small team). So this looks interesting....
- do
brew tap turbot/tap
thenbrew install steampipe
thensteampipe -v
- then
steampipe plugin install csv
- where do I put the CSV file? I'll guess right in
~/.steampipe
- ah, by default it searches within the CWD. So cd over to where the CSV really wants to be. - then do
steampipe query
→ get>
prompt select count(*) from subscriptions;
→Error: relation "subscriptions" does not exist (SQLSTATE 42P01)
- the
.inspect
steps work fine. select count(*) from csv.Subscriptions;
→Error: relation "csv.subscriptions" does not exist (SQLSTATE 42P01)
- ah, Mac case-sensitivity issue →
select count(*) from "Subscriptions";
(the actual filename isSubscriptions.csv
) works! select date_trunc('month', subscriptions.created_at) as create_month, count(*) from "Subscriptions" group by create_month order by create_month;
→Error: missing FROM-clause entry for table "subscriptions" (SQLSTATE 42P01)
- try simpler
select distinct "Subscriptions.status" from "Subscriptions";
→Error: column "Subscriptions.status" does not exist (SQLSTATE 42703)
select distinct subscriptions.status from "Subscriptions";
→Error: missing FROM-clause entry for table "subscriptions" (SQLSTATE 42P01)
- hmm note that column header is actually
subscriptions.status
- aha the answer is
select distinct("subscriptions.status") from "Subscriptions";
→ works! select date_trunc('month', "subscriptions.created_at") as create_month, count(*) from "Subscriptions" group by create_month order by create_month;
→Error: function date_trunc(unknown, text) does not exist (SQLSTATE 42883)
grrr- maybe because the date fields in this file look like
03-Jul-2022 00:00
- Jon Udell zoomed with me - ah, Steampipe is generating the schema when it reads the file, and it treats every field as string/text, so I have to include type-conversions within each query.
Jun08
- ah, the real issue is that all my fields have
<null>
as value! - maybe because file has Windows line-breaks?
- changing line-endings to Unix made no difference. Changing to Mac, no difference.
- does it automatically re-parse the file? Jon said something about it being a "file-watcher", which implies it's automatic....
- tried quitting steampipe CLI then re-launching. Now get
Error: failed to start plugin 'csv': failed to parse file header /Users/<path>/Subscriptions.csv: parse error on line 1, column 3432: extraneous or missing " in quoted-field (SQLSTATE HV000)
- that was with Mac line-endings. So quit again, change file to Unix, re-launch. Now getting successful parse but null fields again.
- pass to https://csvlint.io/ → no errors, 9 warnings of columns with inconsistent data.... check first column, it's a phone-number field - some have a
+1
at the beginning others don't (therefore "look like" numbers?).... check other columns, they're also phone number columns- also it notes that the "standard" is CRLF (Windows line-ending), so I'll guess I've leave that alone
- also, in browsing file, notice some rows that have a CR in the middle of a text field (e.g. address separators), wonder what that's doing
- download the "Standardized CSV" from https://csvlint.io/, compare files in BBEdit, see the standardized file messed up UTF-8 characters like curly quotes and accented chars. But lint didn't do anything to the mid-field CR issues.
- re-download raw file, leave as Windows. Use regex to fix the mid-field-CR cases. Parse again, get null values again.
- stripped file down to 4 records, still no good.
- stripped down further to first 3 columns, still no good.
- Jon played around, realized issue was column-names with period in them! Once I munged the header, even the date-trunc started working!
select date_trunc('month', to_date("subscriptions__created_at", 'DD-Mon-YYYY')) as create_month, count(*) from "Subscriptions" group by create_month order by create_month;
- though there are some TimeZone games being played, because the date-trunc values I get look like
2022-03-31T19:00:00-05:00
- is that Steampipe or Postgresql?- Jon notes this is Postgresql, and the solution is to use
date_trunc('month', to_date('07-Jun-2022 00:00', 'DD-Mon-YYYY')::timestamp without time zone)
- heh old notes: Postgresql Timezone Problem
- Jon notes this is Postgresql, and the solution is to use
Jun13
- have a new file -
AC_DealsWon.csv
exported from ActiveCampaign CRM; put it in different directory,cd
over there then launchsteampipe query
again - try a
count(*)
query →Error: relation "AC_DealsWon" does not exist (SQLSTATE 42P01)
- look at file - not every cell has quote marks, and there some semi-weird-looking cases...
- import into gsheet, export out to new CSV, try querying that → same issue
- look at file - wow even fewer quote-marks than the original!
- submit the gsheet-export to
CSVLint.io
→ 834 errors for inconsistent line breaks! Download "standardized"- the gsheet-export has Windows line-endings; the original file has unix
- submit the original file to
CSVLint.io
- 1 error for some weird whitespace at the start of the file, which I don't see when I open in BBEdit. Try to download "standardized" but webapp fails - I'm suspicious that some of the headers have a
?
at the end, so going to remove those from the original file... also removed some other weird characters from header (incl commas). And save as Windows-line-end - steampipe → same fail; CSVLint → still whitespace error
- Jon says he can work with my 2-line file. So I take that same set and it still fails!
inspect csv
→ list which doesn't include any of the files I tried today- copy my file to directory where I worked the other day → still fails
- how about I retry old work? (Note that I've renamed the container directory for all this work since.) So I cd into that old (but renamed folder):
> select count(*) from "reportSubscriptions";
Error: failed to start plugin 'csv': failed to parse file header /Users/billseitz/documents/career/CommunityPhone/data/AC_DealsWon.csv: parse error on line 1, column 4: bare " in non-quoted-field (SQLSTATE HV000)
Jun23: we recently released CSV plugin v0.3.2, which should fix the issue with null values for header columns with .
s in them. If you still run into issues with these column names, please let us know. Just do steampipe plugin update csv
- will do brew upgrade steampipe
at same time because there's an update at that level, too.
Edited: | Tweet this! | Search Twitter for discussion