napkin-backend-bigquery-aeda-2.0.0
Safe HaskellNone
LanguageGHC2024

Napkin.Run.BigQuery.AEDA

Synopsis

Documentation

prettyAEDAStats :: forall {k} (backend :: k). HasBackendQueryStats backend => QueryStats backend -> LogStr #

retrieve :: MonadNapkin BigQuery m => Relation -> UTCTime -> LogEnv -> Bool -> GoogleEnv -> Text -> Labels -> [Relation -> UTCTime -> [Value] -> Maybe Query] -> [(Name, OrderDir)] -> (Query -> Query) -> [[Value]] -> m ([Map Text Value], QueryStats BigQuery) #

The retrieve function fetches stats/histograms in order to create local CSV output. It takes a list of filter-functions to determine what stats to retrieve, a list of (ColumnName, OrderDir) tuples to produce an ordering, and a query updater so that queries can be updated (specifically limited in the case of categorical retrieval)

report :: MonadNapkin BigQuery m => Relation -> UTCTime -> Ref Table -> LogEnv -> GoogleEnv -> Labels -> BQProjectId -> [Relation -> UTCTime -> [Value] -> Maybe Query] -> Name -> Maybe Clustering -> [[Value]] -> m (QueryStats BigQuery) #

Creates or appends to a report table with the gathered stats summarizing a particular table. Takes as arguments filtering functions to determine which stats to gather, a name to give to the report table, and a function to modify CreateTableAs (to use clustering if wanted.) These can be limited by limitColumns.

limitColumns :: (MonadIO m, Monoid b) => Maybe Int -> [[Value]] -> ([[Value]] -> m b) -> m b #

The method of gathering statistics is to create a base query for each column in a table and then Union these queries together. For tables with more than 40 columns it is likely BigQuery will run into resource limits. To avoid this, limit columns limits the number of columns unioned together. The default is DefaultQueryLimit, but this can be overridden by user.